
text2map.dictionaries: Dictionaries for Text Analysis
This is an R Package with datasets for text analysis, including word frequencies, ranks, and norms for various languages (English, Spanish, French, German, Italian, Portuguese). See also text2map.
Installation
This is primarily a dataset package and therefore we will not be sending it to CRAN. You can install the latest version from GitLab:
library(remotes)
install_gitlab("culturalcartography/text2map.dictionaries")
library(text2map.dictionaries)Core Dictionaries (Included)
Two dictionaries are installed with the package by default:
| Dictionary | Description | Size |
|---|---|---|
sensorimotor |
Lancaster Sensorimotor Norms (39,707 terms) | ~4 MB |
concreteness |
Lancaster Concreteness Scores (39,954 terms) | ~0.4 MB |
On-Demand Dictionaries (Downloaded)
All other dictionaries are downloaded on-demand from the repository when you first request them:
| Dictionary | Description |
|---|---|
bgb_pleasantness |
Bellezza et al. Pleasantness ratings |
bootstrap_mrc |
Bootstrapped MRC psycholinguistic features |
callsigns |
US FCC broadcast station callsigns |
chemicals |
Chemical names and formulas |
demonyms |
Demonyms and adjectivals |
diseases |
MalaCards Human Disease Database |
elp_lexical |
English Lexicon Project data |
emfd_norms |
Extended Moral Foundations Dictionary |
english_freqs |
English word frequencies |
french_freqs |
French word frequencies |
german_freqs |
German word frequencies |
global_surnames |
Global surname prevalence |
humor_norms |
Humor ratings |
iconicity |
Iconicity ratings |
italian_freqs |
Italian word frequencies |
kte_survey |
Kozlowski et al. Cultural Associations |
latin_phrases |
Common Latin phrases |
mft_anchors |
Moral Foundations Theory anchors |
nrc_vad |
NRC Valence, Arousal, Dominance |
organisms |
Scientific organism names |
portuguese_freqs |
Portuguese word frequencies |
spanish_freqs |
Spanish word frequencies |
subtlexus_freqs |
SUBTLEXus frequencies |
us_ssa_names |
US Social Security baby names |
us_ssa_surnames |
US Social Security surnames |
wkb_vad |
Warriner et al. VAD scores |
Usage
Loading Dictionaries
The get_dictionary() function loads any dictionary, automatically downloading it if needed:
# Load a core dictionary (already installed)
sensorimotor <- get_dictionary("sensorimotor")
# Load an on-demand dictionary (auto-downloads on first use)
global_surnames <- get_dictionary("global_surnames")Downloading Dictionaries Explicitly
To download a dictionary before using it:
# Download in .qs2 format (default, smaller and faster)
download_dictionary("english_freqs")
# Download in .rda format
download_dictionary("organisms", format = "rda")
# Download to a custom location
download_dictionary("chemicals", path = "/my/data")Related Packages
There are four related packages hosted on GitLab:
-
text2map: text analysis functions -
text2map.corpora: 13+ text datasets -
text2map.pretrained: pretrained embeddings and topic models -
text2map.theme: changesggplot2aesthetics and loads viridis color scheme as default
The above packages can be installed using the following:
install.packages("text2map")
library(remotes)
install_gitlab("culturalcartography/text2map.theme")
install_gitlab("culturalcartography/text2map.corpora")
install_gitlab("culturalcartography/text2map.pretrained")Contributions and Support
We welcome new dictionaries – especially old or rare dictionaries! If you have a dictionary you would like to be easily available to other researchers, send us an email (maintainers [at] textmapping.com) or submit pull requests.
Please report any issues or bugs here: https://gitlab.com/culturalcartography/text2map.dictionaries/-/issues