Skip to contents

79 thousand SGNS embeddings from Pedrazzini and McGillivray, trained on a corpus of 19th century British newspapers divided into decades. This is a list of 12 elements, in which every element is an embedding matrix associated with a given decade, 1800-1910. Each matrix is 79 thousand vectors (rows) and 200 dimensions (columns). Note that each embedding has the same vocabulary, but when words do not appear in a given decade they appear as rows with only zero values.

Format

A list of 12 matrices

Source

https://zenodo.org/records/7181682

References

Pedrazzini, Nilo & Barbara McGillivray. 2022. Diachronic word embeddings from 19th-century British newspapers [Data set]. Zenodo. doi:10.5281/zenodo.7181682

Examples


if (FALSE) {


## download the model (once per machine)
download_pretrained("vecs_sgns200_british_news")

## load the model each session
data("vecs_sgns200_british_news")

## check dims
length(vecs_sgns200_british_news) == 12L
dim(vecs_sgns200_british_news[[1]]) == c(78879, 200)

}