Skip to contents

This is an R Package with datasets for text analysis, including word frequencies, ranks, and norms for various languages (English, Spanish, French, German, Italian, Portuguese). See also text2map.

Installation

This is primarily a dataset package and therefore we will not be sending it to CRAN. You can install the latest version from GitLab:

library(remotes)
install_gitlab("culturalcartography/text2map.dictionaries")

library(text2map.dictionaries)

Core Dictionaries (Included)

Two dictionaries are installed with the package by default:

Dictionary Description Size
sensorimotor Lancaster Sensorimotor Norms (39,707 terms) ~4 MB
concreteness Lancaster Concreteness Scores (39,954 terms) ~0.4 MB

On-Demand Dictionaries (Downloaded)

All other dictionaries are downloaded on-demand from the repository when you first request them:

Dictionary Description
bgb_pleasantness Bellezza et al. Pleasantness ratings
bootstrap_mrc Bootstrapped MRC psycholinguistic features
callsigns US FCC broadcast station callsigns
chemicals Chemical names and formulas
demonyms Demonyms and adjectivals
diseases MalaCards Human Disease Database
elp_lexical English Lexicon Project data
emfd_norms Extended Moral Foundations Dictionary
english_freqs English word frequencies
french_freqs French word frequencies
german_freqs German word frequencies
global_surnames Global surname prevalence
humor_norms Humor ratings
iconicity Iconicity ratings
italian_freqs Italian word frequencies
kte_survey Kozlowski et al. Cultural Associations
latin_phrases Common Latin phrases
mft_anchors Moral Foundations Theory anchors
nrc_vad NRC Valence, Arousal, Dominance
organisms Scientific organism names
portuguese_freqs Portuguese word frequencies
spanish_freqs Spanish word frequencies
subtlexus_freqs SUBTLEXus frequencies
us_ssa_names US Social Security baby names
us_ssa_surnames US Social Security surnames
wkb_vad Warriner et al. VAD scores

Usage

Loading Dictionaries

The get_dictionary() function loads any dictionary, automatically downloading it if needed:

# Load a core dictionary (already installed)
sensorimotor <- get_dictionary("sensorimotor")

# Load an on-demand dictionary (auto-downloads on first use)
global_surnames <- get_dictionary("global_surnames")

Downloading Dictionaries Explicitly

To download a dictionary before using it:

# Download in .qs2 format (default, smaller and faster)
download_dictionary("english_freqs")

# Download in .rda format
download_dictionary("organisms", format = "rda")

# Download to a custom location
download_dictionary("chemicals", path = "/my/data")

File Formats

The package supports two file formats:

  • .qs2 (default): Smaller file size, faster loading. Uses ZSTD compression (level 22, maximum).
  • .rda: Standard R data format for compatibility.

There are four related packages hosted on GitLab:

The above packages can be installed using the following:

install.packages("text2map")

library(remotes)
install_gitlab("culturalcartography/text2map.theme")
install_gitlab("culturalcartography/text2map.corpora")
install_gitlab("culturalcartography/text2map.pretrained")

Contributions and Support

We welcome new dictionaries – especially old or rare dictionaries! If you have a dictionary you would like to be easily available to other researchers, send us an email (maintainers [at] textmapping.com) or submit pull requests.

Please report any issues or bugs here: https://gitlab.com/culturalcartography/text2map.dictionaries/-/issues