Pretrained Models for Text Analysis

This is an R package to load and download a pretrained text analysis models. Some models are quite large and must be separately downloaded first. See also text2map.

Installation

library(remotes)
install_gitlab("culturalcartography/text2map.pretrained")

library(text2map.pretrained)

Usage

A few smaller topic models are included when the package is installed:

Structural Topic Models
MODEL N Docs
stm_envsoc 817
stm_fiction_cohort 1,000

These can be loaded directly with data():


data("stm_envsoc")

Word embedding models are much larger and must be first downloaded to your machine. Then they can be loaded with data():


# ~1 million fastText word vectors
mod <- "vecs_fasttext300_wiki_news"

# download the model once per machine
download_pretrained(mod)

# load the model each session
data(mod)
dim(wv)

Below are the currently available word embedding models.

Word Embedding Models
MODEL N TERMS N DIMS METHOD
vecs_fasttext300_wiki_news 1,000,000 300 fastText
vecs_fasttext300_wiki_news_subword 1,000,000 300 fastText
vecs_fasttext300_commoncrawl 2,000,000 300 fastText
vecs_glove300_wiki_gigaword 400,000 300 GloVe
vecs_sgns300_coha_histwords 100,000 300 SGNS
vecs_sgns300_googlenews 1,000,000 300 SGNS
vecs_sgns300_bnc_pos 163,0000 300 SGNS

There are four related packages hosted on GitLab:

The above packages can be installed using the following:

install.packages("text2map")

library(remotes)
install_gitlab("culturalcartography/text2map.theme")
install_gitlab("culturalcartography/text2map.corpora")
install_gitlab("culturalcartography/text2map.dictionaries")

Contributions and Support

We welcome new models. If you have an embedding model or topic model you would like to be easily available to other researchers, send us an email (maintainers [at] textmapping.com) or submit pull requests.

Please report any issues or bugs here: https://gitlab.com/culturalcartography/text2map.pretrained/-/issues