This is an R Package with several datasets for English-language dictionaries useful for text analysis. See also text2map.

Installation

This is primarily a dataset package and therefore we will not be sending it to CRAN. You can install the latest version from GitLab:

library(remotes)
install_gitlab("culturalcartography/text2map.dictionaries")

library(text2map.dictionaries)

Dictionaries

These English dictionaries contain hand-ranked and inferred word “norms” as well as frequency and rank information from various corpora.

  • sensorimotor Lancaster Sensorimotor Norms, N = 40,000 (Lynott, et al. 2020)
  • concreteness Lancaster Concreteness, N = 40,000 (Brysbaert et al. 2014)
  • nrc_vad NRC Valence, Arousal, and Dominance (Mohammad et al. 2018)
  • wkb_vad WKB Valence, Arousal, and Dominance (Warriner et al. 2013)
  • bootstrap_mrc Bootstrapped MRC Psycholinguistic Features (Paetzold and Specia 2016)
  • english_freqs Word Frequency/Rank Lists from Four Corpora
  • elp_lexical English Lexicon Project (ELP) Dictionaries (Balota et al. 2007)
  • subtlexus_freqs SUBTLEXus Word Frequency/Ranks (Brysbaert and New 2009)

There are four related packages hosted on GitLab:

The above packages can be installed using the following:

install.packages("text2map")

library(remotes)
install_gitlab("culturalcartography/text2map.theme")
install_gitlab("culturalcartography/text2map.corpora")
install_gitlab("culturalcartography/text2map.pretrained")

Contributions and Support

We welcome new dictionaries – especially old or rare dictionaries! If you have a dictionary you would like to be easily available to other researchers, send us an email (maintainers [at] textmapping.com) or submit pull requests.

Please report any issues or bugs here: https://gitlab.com/culturalcartography/text2map.dictionaries/-/issues