Overview
snaut allows to measure distance between words or documents and explore distributional semantics models through a convenient interface. It was created primarily as a tool for psycholinguists that can be used to measure similarities between words.
Words and phrases
The interface allows to work with single words or with phrases composed of multiple words. In fact you can even think about a whole document as a very long phrese. If you enter a multi-word phrase to the input field it will be represented by the model as a sum of the vectors of all its words. If any of the words used in the phrase is not present in the loaded semantic space, snaut will not be able to compute the vector for this phrase and, as the result, it will ignore the whole phrase.
The phrases need to be entered as a list of words separated with spaces (interpuction must be removed).
Menus
Neighbours
This menu allows to look up nearest neighbours of a set of words.
For example, in order to check what are the words with a smallest distance to
brain and dinosaur, type in brain
and dinosaur
into the input box on
separate lines and press Calculate
. You can choose the metric that is used to
compute distance in the space.
You can also try to enter phrases composed of multiple words, for instance
compare behavior
, research
and behavior research
.
Matrix
If you need to obtain measurements for a large number of words, you can use the matrix menu.
The words for which you need the scores should be entered in the input form on the left. Each word or phrase should be entered in a separate line. Next, in the dropdown menu you can choose what kind of comparison do you want to make. The available options are:
- distances between all pairs of words in the list
- distances between the words in the list and all other words in the loaded semantic space
- distances between all pairs of words between the left input field and the right input field
When you click on Calculate
, snaut will compute the scores and, after this
is finished, it will initialize download of a file with the results. The file
is in a CSV format: it contains a table in a plain text with columns separated
with commas.
You can read in the list of words to the text field from a file on your disc by
clicking on the Load from a file
button below the target input field. The
Check availability
button can be used check whether all words specified in
the input field are present in the semantic space. Keep in mind that, if some
of the words are not in the space, snaut will ignore them when computing the
semantic distance measures.
Pairwise
You can use this menu to investigate distances between individual pairs of
words/documents. Each pair should be entered on a separate row in the input
field and elements of the pair should be speparated with a colon (':') . After
clicking on the Calculate
button, snaut will do the calculation and a
download of a CSV file with the result will be initialized.
For instance, in order to calculate the distance between pairs of words: home
and window
, car
and wheel
, fast car
and slow car
. You should enter:
home : window
car : wheel
car : cloud
fast car : slow car
Similarily to the Matrix interface you can load the list of pairs from a text file or check the availability of the words in the semantic space.
Analogy
snaut implements an offset method described by Mikolov, Yih, & Zweig (2013). The analogy interface allows you to perform algebraic operations using vector semantic space and capture some regularities in the language.
The classical example involves the computation king - man + woman which results in a vector very close to queen.
The computation can be performed by entering the words vectors of which you
want to have positive or negative contribution in the calculation. For
instance, in order to calculate king - man + woman you need to enter
king, woman
in the field positive vectors and man
in the field negative
vectors.