PROJECTS
CLICK TO FIND OUT MORE
ASSIGNMENT
TWEET EMOTION CLASSIFIER
2021
PREDICTS THE SENTIMENT OF A GIVEN SET OF TWEETS
FINAL PROJECT
SCREENPLAY PARSER
2021
LABELS LINES IN A SCREENPLAY (SCENES, CHARACTER DIALOGUES, ETC.) USING REGULAR EXPRESSIONS
ASSIGNMENT
REAL-TIME NEWS WEBSITE
2022
REAL-TIME RESPONSIVE NEWS WEBSITE
ASSIGNMENT
CRUD WEB INTERFACE
2022
SYSTEM FOR TRACKING (TV) SERIES
FINAL PROJECT
ASYNC. BROWSER MULTIPLAYER GAME
2022
CONNECT FOUR
FINAL PROJECT
SEMANTIC Q&A SYSTEM
2023
WIKIDATA BASED INFORMATION RETRIEVAL SYSTEM FOR VIDEO-GAME RELATED QUESTIONS
BACHELOR THESIS
2024
GRADE: 8.0
UTILIZING LARGE LANGUAGE MODELS FOR QUALITY BASED SUMMARIZATION OF BIOMEDICAL LITERATURE
LARGE PROJECT
SHARED TASK SUBMISSION - SEMEVAL 2025
2025
SUBMISSION FOR THE MU-SHROOM LLM HALLUCINATION DETECTION TASK
FINAL PROJECT
ABUSIVE & OFFENSIVE LANGUAGE DETECTION
2025
DETECTION OF HARMFUL LANGUAGE USING SEVERAL MACHINE LEARNING STRATEGIES
MASTER THESIS
2026
GRADE: 7.0
ADAPTING A STATE-OF-THE-ART FINE GRAINED ENTITY LINKING MODEL FOR DUTCH
PERSONAL PROJECT
MILKYWAY OF MUSIC
2026
WEB APP VISUALIZING MUSICAL ARTISTS AND GROUPS AS STARS IN A 3D GALAXY
PERSONAL PROJECT
WIP - KANBAN BOARD APP
2026
PRODUCTIVITY TOOL SIMILAR TO TRELLO
EDUCATION
BACHELOR - INFORMATION SCIENCE
UNIVERSITY OF GRONINGEN | 2020 - 2024
During my bachelor's, I built a broad and interdisciplinary foundation that combined computer science, linguistics, and data analysis. I started by developing a solid understanding of core computer science principles such algorithms, abstraction, data storage, networking, and logical operations, among other topics. By learning to program in Python as well as becoming familiar with the command line and writing shell scripts, I gradually became familiar with key concepts in programming, as well as learning to write legible code. Alongside this, I took courses in linguistics. Here, I analyzed sentence structure and learned about different types of sentence contstructions and patterns.
Following this, I learned more about performing data analysis and visualizing data. Additionally, I learned how to properly interpret and present results based on data by gaining a solid grasp of statistics and clear reporting practices.
Much of my bachelor's was centered around the principles of artificial intelligence and machine learning. This includes building a solid grasp of the mathematical foundations underlying these methods, as well as working with a variety of classical classification approaches such as logistic regression, k-nearest neighbor (KNN), support vector machines (SVM), decision trees, random forests and Naïve Bayes. I also gained hands-on experience with neural networks, such as feed-forward neural networks (FFNNs) and recurrent neural networks (RNNs) for sequence processing, particularly for handling problems in the domain of Natural Language Processing (NLP). In doing so, I became familiar with widely used Python libraries like Scikit-learn, Keras, and PyTorch.
Additionally, I developed practical skills in web technology, focusing on creating responsive, modular, and intuitive designs. Projects included building a responsive news website and developing an asynchrnonous browser game (Connect Four). Through these projects I was introduced to key concepts such as MVC architecture, routing, API implementation, and database interaction.
Around this time, I also learned about database management and design. I learned how to design efficient database schemas, create and interpret ER diagrams, normalize databases using several normal forms, and implement CRUD operations. My studies also included computational linguistics and language technology, where I explored parsing techniques and the intersection of language and computation. Additionally, I took part in courses on digital humanities, information security and encryption, machine translation, and the ethics of AI and NLP. This broadened my understanding of the impact of AI tools in societal contexts, and on the ethical implications.
BACHELOR THESIS
For my Bachelor thesis, I used several open-source Large Language Models to generate summaries of biomedical literature, utilizing various prompting techniques and settings (zero-shot, article-top, article-all, article-all+explanation). I then performed automated (BERTScore) and manual evaluation to assess the quality of the generated summaries. I concluded that one-shot prompting on average led to the most factual summaries, and that automated evaluation does not align with human judgement. More information can be found on GitHub.
MASTER - COMMUNICATION AND INFORMATION STUDIES
(TRACK INFORMATION SCIENCE)
UNIVERSITY OF GRONINGEN | 2024 - 2026
During my master's, I deepened my understanding of the principles behind artificial intelligence. In particular, I gained more insight into the inner workings of the Transformer architecture. This included learning about key mechanisms such as gradient descent, gating, and (self)-attention mechanisms, primarily using hands-on coding exercises.
A central theme of my master's was working with (open-source) Large Language Models, and improving the reliability of their outputs. I explored retrieval-augmented generation (RAG) techniques, focusing on efficient and factual information retrieval using public knowledge bases. As part of a semester-long research project, I participated in the SemEVAL-2025 shared task on hallucination detection (Mu-SHROOM), where I worked on developing an efficient RAG component for fact-checking statements generated by LLMs.
Additionally, I participated in courses on semantic web technologies, learning how to represent and query structured knowledge using RDF triples and Turtle syntax, and how to work with graph-based data.
I also followed courses on the societal impact of computer-mediated communication. Here, I studied and analyzed social media discourse around sensitive topics such as discrimination and racism.
Furthermore, I explored computational semantics. This includes logic and inference, meaning representations, compositionality, and lexical semantics. This provided me with an understanding of how meaning can be modeled and captured computationally. For example, as a final project, we trained an AI model to predict the relation between two sentences (A and B) to see if one sentence entailed or contradicted the other or if the relation was neutral.
The master's program also gave me the flexibility to explore additional interests. I engaged with topics such as UI/UX design and speech science, including the analysis of human speech and speech impairments. For example, I created a case study on Hypokinetic Dysarthria, where I propose analysis using different acoustic measurements.
MASTER THESIS
For my thesis, I focused on the task of (Neural) Entity Linking (EL) for dutch text. This task consists of two main steps, namely Named Entity Recognition (NER) and Named Entity Disambiguation (NED). NER involves recognizing named entities such as locations, persons and organizations in raw input text and labeling them as entities. NED is the task of disambiguating these entities to the correct entry in a structured knowledge base (in my case, a corresponding Dutch Wikipedia page ID). To achieve this, I adapted the code from the existing English SpEL Entity Linking model (Shavarani and Sarkar, 2023) and trained it on Dutch training data that I extracted from a Dutch Wikipedia dump, as well as existing training data from the Dutch part of the MultiNERD dataset, which I also used for evaluation. The resulting model, which I dubbed NLSpEL, achieves competitive performance with state-of-the-art multilingual EL systems (mention detection F1 of 81.5%, entity linking F1 of 71.5%). The model as well as the training data can be accessed on GitHub.