Introducing a Polish Semantic Priming Dataset for Researchers

14 JUN 2023 BY Paweł Mandera

We are pleased to present our recent study that introduces a Polish semantic priming dataset and semantic similarity ratings based on native Polish speakers. This resource provides a useful tool for researchers interested in the Polish language and linguistics.

Our study involved two experiments. The first experiment aimed to create and validate the dataset, which includes strongly related, weakly related, and semantically unrelated word pairs. The results confirmed that the three conditions could be distinguished by their semantic relatedness.

In the second experiment, we used a subset of stimuli to investigate lexical decision performance in relation to the priming effect. We observed a semantic priming effect for strongly related word pairs, while a smaller yet still significant effect was found for weakly related pairs when compared to unrelated pairs.

The dataset incorporates findings from both experiments and the SimLex-999 for Polish, offering semantic model selection from existing and newly trained semantic spaces. By making this database of semantic vectors, semantic relatedness ratings, and collected behavioral data available, we aim to support researchers in their exploration of the Polish language and linguistics.

We believe that this dataset could prove helpful by allowing researchers to benchmark new vectors and investigate the Polish language in more detail. It is worth noting that this is the first freely available database for Polish that combines measures of semantic distance and human data, potentially contributing to the ongoing study of the language and related fields.