1 million English-language SGNS word embeddings trained on Google N-Grams
vecs_sgns300_googlengrams_kte_en.Rd
SGNS embeddings trained Google Books N-Grams from 2000-2012 using 5-grams. The result is a matrix of 928,250 word vectors and 300 dimensions. All n-grams were lowercased "to increase the frequency of rare words."
Details
Kozlowski et al. explains:
For contemporary validation, we train an embedding model on Google Ngrams of publications dating from 2000 through 2012. We use this range of years because Google Ngrams do not include publications more recent than 2012, and this duration is similar to those used in our historical analyses