vecs_sgns300_googlengrams_kte_en.Rd
SGNS embeddings trained Google Books N-Grams from 2000-2012 using 5-grams. The result is a matrix of 928,250 word vectors and 300 dimensions. All n-grams were lowercased "to increase the frequency of rare words."
A matrix of 1 million rows and 300 columns
https://github.com/KnowledgeLab/GeometryofCulture
Kozlowski et al. explains:
For contemporary validation, we train an embedding model on Google Ngrams of publications dating from 2000 through 2012. We use this range of years because Google Ngrams do not include publications more recent than 2012, and this duration is similar to those used in our historical analyses
Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). "The geometry of culture: Analyzing the meanings of class through word embeddings." American Sociological Review, 84(5), 905-949.