This is an R package to load and download a pretrained text analysis models. Some models are quite large and must be separately downloaded first. See also text2map
.
library(remotes)
install_gitlab("culturalcartography/text2map.pretrained")
library(text2map.pretrained)
A few smaller topic models are included when the package is installed:
MODEL | N Docs |
---|---|
stm_envsoc | 817 |
stm_fiction_cohort | 1,000 |
These can be loaded directly with data()
:
data("stm_envsoc")
Word embedding models are much larger and must be first downloaded to your machine. Then they can be loaded with data()
:
# ~1 million fastText word vectors
mod <- "vecs_fasttext300_wiki_news"
# download the model once per machine
download_pretrained(mod)
# load the model each session
data(mod)
dim(wv)
Below are the currently available word embedding models.
MODEL | N TERMS | N DIMS | METHOD |
---|---|---|---|
vecs_fasttext300_wiki_news | 1,000,000 | 300 | fastText |
vecs_fasttext300_wiki_news_subword | 1,000,000 | 300 | fastText |
vecs_fasttext300_commoncrawl | 2,000,000 | 300 | fastText |
vecs_glove300_wiki_gigaword | 400,000 | 300 | GloVe |
vecs_sgns300_coha_histwords | 100,000 | 300 | SGNS |
vecs_sgns300_googlenews | 1,000,000 | 300 | SGNS |
vecs_sgns300_bnc_pos | 163,0000 | 300 | SGNS |
There are four related packages hosted on GitLab:
text2map
: text analysis functionstext2map.corpora
: 13+ text datasetstext2map.dictionaries
: norm dictionaries and word frequency liststext2map.theme
: changes ggplot2
aesthetics and loads viridis color scheme as defaultThe above packages can be installed using the following:
install.packages("text2map")
library(remotes)
install_gitlab("culturalcartography/text2map.theme")
install_gitlab("culturalcartography/text2map.corpora")
install_gitlab("culturalcartography/text2map.dictionaries")
We welcome new models. If you have an embedding model or topic model you would like to be easily available to other researchers, send us an email (maintainers [at] textmapping.com) or submit pull requests.
Please report any issues or bugs here: https://gitlab.com/culturalcartography/text2map.pretrained/-/issues