Research
My academic work sits at the boundary of NLP, cognitive science, and methodology. I used large-scale behavioral experiments and computational models to build better instruments for studying how people process language.
Peer-reviewed publications
2025
The Italian Crowdsourcing Project: Visual word recognition times for 130,495 Italian words
2023
Establishing semantic relatedness through ratings, reaction times, and semantic vectors: A database in Polish
2021
Which words do English non-native speakers know? New supernational levels based on yes/no decision
Affect across adulthood: Evidence from English, Dutch, and Spanish
2020
How do Spanish speakers read words? Insights from a crowdsourced lexical decision megastudy
When a second language hits a native language. What ERPs do and do not tell us about language retrieval difficulty in bilingual language production
2019
Recognition times for 62 thousand English words: Data from the English Crowdsourcing Project
Recognition times for 54 thousand Dutch words: Data from the Dutch Crowdsourcing Project
2018
Word prevalence norms for 62,000 English lemmas
SPALEX: A Spanish lexical decision database from a massive online data collection
The word frequency effect in word processing: An updated review
2017
Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting
2016
How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant's age
The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2
2015
Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment
How useful are corpus-based methods for extrapolating psycholinguistic variables?
SUBTLEX-PL: Subtitle-based word frequency estimates for Polish
2014
A plea for more interactions between psycholinguistics and natural language processing research
SUBTLEX-UK: A new and improved word frequency database for British English
Books and chapters
2017
Corpus Linguistics
2014
Woordenkennis van Nederlanders en Vlamingen anno 2013
Talks, workshops, and posters
2019
Teaching machines to teach people
2016
Changes in the word frequency effect as a function of language exposure
2015
Predicting item-level effects of relatedness with models based on prediction and counting
Can reaction times from massive online experiments be used to study visual word recognition?
Snaut: Accessible Distributional Semantics
2014
How useful are extrapolated measures of psycholinguistic norms?
Continuous estimation of item difficulty in large-scale lexical decision studies
A field guide to processing subtitle frequencies
2013
Corpus-based extrapolation of age of acquisition norms
Extrapolating subjective ratings with topical information from Latent Dirichlet Allocation
Using Topic Models to Extrapolate Subjective Ratings for Psycholinguistic Variables
2012