FLOR ALBERTS

PROJECTS

CLICK TO FIND OUT MORE

ASSIGNMENT

TWEET EMOTION CLASSIFIER

2021

PREDICTS THE SENTIMENT OF A GIVEN SET OF TWEETS

FINAL PROJECT

SCREENPLAY PARSER

2021

LABELS LINES IN A SCREENPLAY (SCENES, CHARACTER DIALOGUES, ETC.) USING REGULAR EXPRESSIONS

ASSIGNMENT

REAL-TIME NEWS WEBSITE

2022

REAL-TIME RESPONSIVE NEWS WEBSITE

ASSIGNMENT

CRUD WEB INTERFACE

2022

SYSTEM FOR TRACKING (TV) SERIES

FINAL PROJECT

ASYNC. BROWSER MULTIPLAYER GAME

2022

CONNECT FOUR

FINAL PROJECT

SEMANTIC Q&A SYSTEM

2023

WIKIDATA BASED INFORMATION RETRIEVAL SYSTEM FOR VIDEO-GAME RELATED QUESTIONS

BACHELOR THESIS

2024

GRADE: 8.0

UTILIZING LARGE LANGUAGE MODELS FOR QUALITY BASED SUMMARIZATION OF BIOMEDICAL LITERATURE

LARGE PROJECT

SHARED TASK SUBMISSION - SEMEVAL 2025

2025

SUBMISSION FOR THE MU-SHROOM LLM HALLUCINATION DETECTION TASK

FINAL PROJECT

ABUSIVE & OFFENSIVE LANGUAGE DETECTION

2025

DETECTION OF HARMFUL LANGUAGE USING SEVERAL MACHINE LEARNING STRATEGIES

MASTER THESIS

2026

GRADE: 7.0

ADAPTING A STATE-OF-THE-ART FINE GRAINED ENTITY LINKING MODEL FOR DUTCH

PERSONAL PROJECT

MILKYWAY OF MUSIC

2026

WEB APP VISUALIZING MUSICAL ARTISTS AND GROUPS AS STARS IN A 3D GALAXY

PERSONAL PROJECT

WIP - KANBAN BOARD APP

2026

PRODUCTIVITY TOOL SIMILAR TO TRELLO

EDUCATION

BACHELOR - INFORMATION SCIENCE

UNIVERSITY OF GRONINGEN | 2020 - 2024

During my bachelor's, I built a broad and interdisciplinary foundation that combined computer science, linguistics, and data analysis. I started by developing a solid understanding of core computer science principles such algorithms, abstraction, data storage, networking, and logical operations, among other topics. By learning to program in Python as well as becoming familiar with the command line and writing shell scripts, I gradually became familiar with key concepts in programming, as well as learning to write legible code. Alongside this, I took courses in linguistics. Here, I analyzed sentence structure and learned about different types of sentence contstructions and patterns.

Following this, I learned more about performing data analysis and visualizing data. Additionally, I learned how to properly interpret and present results based on data by gaining a solid grasp of statistics and clear reporting practices.

Much of my bachelor's was centered around the principles of artificial intelligence and machine learning. This includes building a solid grasp of the mathematical foundations underlying these methods, as well as working with a variety of classical classification approaches such as logistic regression, k-nearest neighbor (KNN), support vector machines (SVM), decision trees, random forests and Naïve Bayes. I also gained hands-on experience with neural networks, such as feed-forward neural networks (FFNNs) and recurrent neural networks (RNNs) for sequence processing, particularly for handling problems in the domain of Natural Language Processing (NLP). In doing so, I became familiar with widely used Python libraries like Scikit-learn, Keras, and PyTorch.

Additionally, I developed practical skills in web technology, focusing on creating responsive, modular, and intuitive designs. Projects included building a responsive news website and developing an asynchrnonous browser game (Connect Four). Through these projects I was introduced to key concepts such as MVC architecture, routing, API implementation, and database interaction.

Around this time, I also learned about database management and design. I learned how to design efficient database schemas, create and interpret ER diagrams, normalize databases using several normal forms, and implement CRUD operations. My studies also included computational linguistics and language technology, where I explored parsing techniques and the intersection of language and computation. Additionally, I took part in courses on digital humanities, information security and encryption, machine translation, and the ethics of AI and NLP. This broadened my understanding of the impact of AI tools in societal contexts, and on the ethical implications.

BACHELOR THESIS

For my Bachelor thesis, I used several open-source Large Language Models to generate summaries of biomedical literature, utilizing various prompting techniques and settings (zero-shot, article-top, article-all, article-all+explanation). I then performed automated (BERTScore) and manual evaluation to assess the quality of the generated summaries. I concluded that one-shot prompting on average led to the most factual summaries, and that automated evaluation does not align with human judgement. More information can be found on GitHub.

MASTER - COMMUNICATION AND INFORMATION STUDIES

(TRACK INFORMATION SCIENCE)

UNIVERSITY OF GRONINGEN | 2024 - 2026

During my master's, I deepened my understanding of the principles behind artificial intelligence. In particular, I gained more insight into the inner workings of the Transformer architecture. This included learning about key mechanisms such as gradient descent, gating, and (self)-attention mechanisms, primarily using hands-on coding exercises.

A central theme of my master's was working with (open-source) Large Language Models, and improving the reliability of their outputs. I explored retrieval-augmented generation (RAG) techniques, focusing on efficient and factual information retrieval using public knowledge bases. As part of a semester-long research project, I participated in the SemEVAL-2025 shared task on hallucination detection (Mu-SHROOM), where I worked on developing an efficient RAG component for fact-checking statements generated by LLMs.

Additionally, I participated in courses on semantic web technologies, learning how to represent and query structured knowledge using RDF triples and Turtle syntax, and how to work with graph-based data.

I also followed courses on the societal impact of computer-mediated communication. Here, I studied and analyzed social media discourse around sensitive topics such as discrimination and racism.

Furthermore, I explored computational semantics. This includes logic and inference, meaning representations, compositionality, and lexical semantics. This provided me with an understanding of how meaning can be modeled and captured computationally. For example, as a final project, we trained an AI model to predict the relation between two sentences (A and B) to see if one sentence entailed or contradicted the other or if the relation was neutral.

The master's program also gave me the flexibility to explore additional interests. I engaged with topics such as UI/UX design and speech science, including the analysis of human speech and speech impairments. For example, I created a case study on Hypokinetic Dysarthria, where I propose analysis using different acoustic measurements.

MASTER THESIS

For my thesis, I focused on the task of (Neural) Entity Linking (EL) for dutch text. This task consists of two main steps, namely Named Entity Recognition (NER) and Named Entity Disambiguation (NED). NER involves recognizing named entities such as locations, persons and organizations in raw input text and labeling them as entities. NED is the task of disambiguating these entities to the correct entry in a structured knowledge base (in my case, a corresponding Dutch Wikipedia page ID). To achieve this, I adapted the code from the existing English SpEL Entity Linking model (Shavarani and Sarkar, 2023) and trained it on Dutch training data that I extracted from a Dutch Wikipedia dump, as well as existing training data from the Dutch part of the MultiNERD dataset, which I also used for evaluation. The resulting model, which I dubbed NLSpEL, achieves competitive performance with state-of-the-art multilingual EL systems (mention detection F1 of 81.5%, entity linking F1 of 71.5%). The model as well as the training data can be accessed on GitHub.

Skills

Hover over each bar to find out more

2020 2021 2022 2023 2024 2025 2026

| | | | | | | |

TWEET EMOTION CLASSIFIER

Small project to familiarize us with basic classification and feature extraction methods, as well as calculating Cohen's kappa-score for inter-annotator agreement.

SCREENPLAY PARSER

Python tool to parse screenplay text files and automatically identify structural elements like scene headings, action, and dialogue. It utilizes regular expressions to categorize each line into specific types, such as character names, transitions, or descriptions. Users can process the input screenplays to generate nicely labeled output, extract specific dialogue with character attribution, or retrieve a dictionary of line assignments.

REAL-TIME NEWS WEBSITE

Small web application for managing and displaying news content. It uses PHP pages and template fragments, HTML for page structure, CSS for styling, and JavaScript for interactive behavior. Also stores articles in a JSON file and includes scripts for adding/editing/removing news articles.

CRUD WEB INTERFACE (SERIES DB)

Another PHP-based, MVC-style web application for managing (TV) series data. It uses PHP for server-side logic, separate view templates, and CSS for styling.

ASYNCHRONOUS BROWSER GAME (CONNECT 4)

Connect 4 browser game for two players, using PHP for backend logic and HTML, CSS, and JavaScript/JQuery for the front-end.

SEMANTIC Q&A SYSTEM

Python-based question-answering system for video game related questions. Reads JSON questions, uses spaCy NLP and NLTK WordNet to parse and interpret queries, and then queries Wikidata via SparQL. The results are stored in a separate JSON file.

BACHELOR THESIS

UTILIZING LARGE LANGUAGE MODELS FOR QUALITY BASED SUMMARIZATION OF BIOMEDICAL LITERATURE

Involved generating summaries of biomedical literature using open-source LLMs and evaluating the quality both manually and using BERTScore. More info in the Education section below.

SHARED TASK SUBMISSION - SEMEVAL-2025

MU-SHROOM TASK ON HALLUCINATION DETECTION

PAPER LINK

Our system (FENJI) focused on character-level hallucination detection across 14 languages. We utilized Retrieval-Augmented Generation (RAG) and the FLAN-T5 model to identify hallucinated spans in text generated by instruction-tuned LLMs.

I worked on the Dense Passage Retrieval (DPR) component. This works by encoding both questions and model answers (that are already provided) into fixed-size dense vectors using trained neural encoders, then retrieves relevant passages by comparing vector similarity. At inference time, the system can efficiently find the passages whose embeddings are most similar to the query embeddings.

Though our final approach using FLAN-T5 yielded disappointing results (as can be read in our paper), the DPR-component was efficient and accurate for most questions.

ABUSIVE & OFFENSIVE LANGUAGE DETECTION

Code for offensive and abusive language detection. Starts by preprocessing the data, then reads labeled train/dev/test files where each line contains a text example and a label, and builds several classifiers. These are an LSTM model with pretrained embeddings (derived from GloVE), a Naive Bayes bag-of-words model, a linear SVM, and a transformer-based sequence classifier and evaluates them with accuracy and classification reports. Also includes baseline checks and an offensive-word heuristic.

MASTER THESIS

ADAPTING A STATE-OF-THE-ART FINE GRAINED ENTITY LINKING MODEL FOR DUTCH

I developed NLSpEL, a Dutch Neural Entity Linking model based on the English SpEL framework, trained on Dutch Wikipedia and MultiNERD data. Achieves competitive results with multilingual Entity Linking models, with an F1 score of 81.5% for mention detection and 71.5% for entity linking. A more detailed explanation can be found in the Education section below.

MILKYWAY OF MUSIC

DEMO LINK

Web app visualizing musical artists as stars in a 3D galaxy. Similar artists are (very roughly) clustered together. This is achieved by first recursively retrieving the top 5 most similar artists for each of top-k=1000 artist by using the LastFM API to build a graph. Then, this graph is used to generate similarity embeddings, which are then reduced from 64 to 3 dimensions using a PCA. The result is a 3D coördinate for each artist, creating a point cloud which I flatten and modify to resemble a galaxy disc shape.

In theory, similar artists should be somewhat clustered together. However, by using the PCA, a lot of similarity data is lost, and modifying the shape of the point cloud also has an impact. Therefore all similarity calculations are performed using the unmodified embeddings with 64 dimensions.

Still, this project provides a fun way to get new music recommendations, as you can see in the demo.

(WIP) KANBAN BOARD APP

Work in progress, more info soon.

Python

Python was my first introduction to programming; I have used it to implement game logic when making small games in the Blender game engine (when it was still available) around 2015-2016, in addition to visual node-based programming. After this, I hadn't used it again until the start of my studies in 2020. I am now most confident in my ability to program in Python, as it is the programming language I have used the most, both in my studies and spare time.

Bash

During my Bachelor's, I quickly familiarized myself with the CLI as I worked mostly in Ubuntu. Additionally, I use it for writing shell scripts for automation and quick prototyping.

HTML

One of the main topics during my Bachelor's was web design, this was obviously the first thing to learn.

CSS

Complementing HTML for styling web pages.

JavaScript

Learned this shortly after HTML, one of the languages I am most comfortable working with still.

PHP

Essential for database interaction and making dynamic web pages, also a language I have worked with extensively during my studies.

SQL

I have used (My)SQL extensively and am very comfortable with querying and CRUD operations.

R

I have used this mostly for basic statistical tests, I still feel very much like a novice using this as I only know the very basics.

TypeScript

I have used TypeScript on and off, though I still mostly stick to vanilla JavaScript. However I have recently made the commitment to familiarize myself more with TS as I feel it will make my code less error-prone and more maintainable.

Prolog

I have used Prolog only briefly for a course on logical programming (and more recently for computational linguistics). I find the declarative workflow interesting, however I do not see a reason to use it regularly now.

Angular

I briefly looked into Angular for designing web applications before switching to React (though I do see the advantages of using one over the other).

React

I have used React just for small projects, I am familiar with the workflow and syntax though I wish to challenge myself more and use it in the future to build larger scale applications.

Java

I have experimented with using Java several times and am comfortable with the syntax, however I have not used it (yet) to build full applications.

Laravel

I am currently learning to use Laravel as I like the built-in funcionalities (authentication, DB management).

Docker

I am familiar with the Docker basics and am planning to use it extensively in the future.

C#

I am currently learning C# as I would like to try .NET, as well as learning more about video game design.

.NET

Still looking into the basics.