Skip to contents

Functions

Functions for downloading models

download_pretrained()
Download specified pretrained model

Topic Models

Available topic models

stm_envsoc
Pre-estimated STM Model for Environmental Sociology Abstracts
stm_fiction_cohort
Pre-estimated STM Model for Fiction-Author Cohort Study

Embedding Models

Available embedding models

vecs_fasttext300_commoncrawl
2 million English-language fastText word embeddings
vecs_fasttext300_wiki_news
1 million English-language fastText word embeddings
vecs_fasttext300_wiki_news_subword
1 million English-language fastText word embeddings, w/subword information
vecs_glove300_wiki_gigaword
400k English-language GloVe word embeddings (300 dimensions)
vecs_sgns300_bnc_pos
160k English-language SGNS word embeddings trained on the British National Corpus
vecs_cbow300_googlenews
3 million English-language CBOW word embeddings trained on Google News corpus
vecs_sgns300_googlengrams_kte_en
1 million English-language SGNS word embeddings trained on Google N-Grams

Diachronic Embedding Models

Available diachronic embedding models

vecs_sgns300_coha_histwords
50k diachronic English-language SGNS word embeddings over 20 decades
vecs_sgns300_googlengrams_histwords
100k diachronic English-language SGNS word embeddings, 20 decades, Google Books corpus
vecs_sgns300_googlengrams_fic_histwords
100k diachronic English-language SGNS word embeddings, 20 decades, Google Books Fiction corpus
vecs_sgns300_googlengrams_histwords_fr
100k diachronic French-language SGNS word embeddings, 20 decades, Google Books corpus
vecs_sgns300_googlengrams_histwords_de
100k diachronic German-language SGNS word embeddings, 20 decades, Google Books corpus
vecs_sgns300_googlengrams_histwords_zh
30k diachronic Chinese-language SGNS word embeddings, 5 decades, Google Books corpus
vecs_svd300_googlengrams_histwords
75k diachronic English-language SVD word embeddings, 20 decades, Google Books corpus
vecs_sgns200_british_news
79k diachronic English-language SNGS word embeddings, 12 decades, British News corpus