Skip to contents

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

A copy of the GNU General Public License is available at https://www.gnu.org/licenses/gpl-2.0.html.


Third-Party Data Sources and Attributions

This package incorporates data from the following sources, each with their own license terms:

Dataset Source License
sensorimotor Lynott et al. (2020) CC-BY 4.0
concreteness Brysbaert et al. (2014) CC-BY 4.0
nrc_vad Mohammad et al. (2018) NRC Non-Commercial Research License (no redistribution; citation required; IP retained by NRC Canada)
wkb_vad Warriner, Kuperman, Brysbaert (2013) CC-BY 4.0
bootstrap_mrc Paetzold and Specia (2016) No explicit open-source license; available for academic research per authors; derived from MRC Psycholinguistic Database
english_freqs Norvig (CC-BY 3.0), Wikipedia (CC-BY-SA 4.0), BNC (custom academic research), Kucera & Francis (custom/academic) CC-BY 3.0 / CC-BY-SA 4.0 / custom
subtlexus_freqs Brysbaert and New (2009) Free for academic research; citation required; no explicit redistribution license
elp_lexical Balota et al. (2007) ELP License Agreement (non-commercial research/education; citation required; copyright Washington University in St. Louis)
kte_survey Kozlowski, Taddy, Evans (2019) Available for academic research/replication; no explicit open-source license
humor_norms Engelthaler and Hills (2018) CC-BY 4.0
bgb_pleasantness Bellezza, Greenwald, Banaji (1986) No explicit license; publicly shared by authors for academic research; citation required
us_ssa_names US Social Security Administration Public domain (US government)
us_ssa_surnames US Census Bureau Public domain (US government)
global_surnames SajjadPourali/locatefamily.com (GitHub) MIT
demonyms Wikipedia CC-BY-SA 4.0
organisms ITIS (public domain, US federal) / UniProt (CC-BY 4.0) Public domain / CC-BY 4.0
chemicals ZINC (free for research, attribution required) / Wikipedia (CC-BY-SA 4.0) Custom / CC-BY-SA 4.0
diseases MalaCards (Weizmann Institute) MalaCards terms (academic research; attribution required; no redistribution without permission)
callsigns US FCC Public domain (US government)
iconicity Winter et al. (2022) CC-BY 4.0
french_freqs Leipzig Corpora / Wikipedia / OpenSubtitles CC-BY-SA 4.0
german_freqs Leipzig Corpora / Wikipedia / OpenSubtitles CC-BY-SA 4.0
spanish_freqs Leipzig Corpora / Wikipedia / OpenSubtitles CC-BY-SA 4.0
italian_freqs Leipzig Corpora / Wikipedia / OpenSubtitles CC-BY-SA 4.0
portuguese_freqs Leipzig Corpora / Wikipedia / OpenSubtitles CC-BY-SA 4.0
mft_anchors Graham, Haidt, Nosek (2009) Available for academic research per MoralFoundations.org; no explicit redistribution license; all rights reserved
emfd_norms Hopp et al. (2021) CC-BY 4.0
english_emoticons lingo2word.com, via qdapDictionaries GPL-2
english_discourse_markers Alemany (2005), via qdapDictionaries GPL-2
english_function_words John & Muriel Higgins (ECLIPSE), via qdapDictionaries GPL-2
english_prepositions Public domain, via qdapDictionaries GPL-2
english_syllables Sejnowski & Rosenberg (1987), via qdapDictionaries GPL-2
english_action_verbs Grady Ward / Moby Project, via qdapDictionaries Public domain
english_adverbs Grady Ward / Moby Project, via qdapDictionaries Public domain
english_grady Grady Ward / Moby Project, via qdapDictionaries Public domain
english_normalization_rules ECHNAE Project (MIT), qdapDictionaries (GPL-2), Wikipedia (CC-BY-SA 4.0), Google Books Ngrams (CC-BY 3.0) GPL-2 (MIT and CC-BY/CC-BY-SA compatible with GPL-2)
unicode_normalization Unicode Standard, W3C Unicode License / W3C
english_contractions qdapDictionaries (GPL-2) GPL-2
english_colors W3C CSS4 Public domain
english_numerics Curated for text2map.dictionaries MIT
english_irregular_verbs Wikipedia (CC-BY-SA 4.0) CC-BY-SA 4.0
english_abbreviations ECHNAE Project (MIT) MIT
english_interjections ECHNAE Project (MIT), qdapDictionaries (GPL-2) GPL-2 (MIT-licensed entries compatible with GPL-2; combined work is GPL-2)
english_political_abbreviations ISO 3166-1, USPS, Canada Post, AP Stylebook Public domain / CC-BY-SA 4.0
english_archaic ECHNAE Project (MIT) MIT
english_bigrams Ted Underwood / DataMunging, derived from Google Books Ngrams (CC-BY) CC-BY 3.0
english_log_freq Ted Underwood / DataMunging CC-BY 3.0
english_ocr_corrections Ted Underwood / DataMunging, derived from Google Books Ngrams (CC-BY) CC-BY 3.0
english_variant_spellings Ted Underwood / DataMunging CC-BY 3.0
english_syncope Ted Underwood / DataMunging (CC-BY), ECHNAE Project (MIT) CC-BY 3.0 / MIT
english_fusing_rules Ted Underwood / DataMunging (CC-BY); curated supplement CC-BY 3.0
english_hyphen_rules Ted Underwood / DataMunging CC-BY 3.0
english_personal_names Ted Underwood / DataMunging CC-BY 3.0
english_place_names Ted Underwood / DataMunging CC-BY 3.0
latin_phrases Wikipedia (List of Latin phrases) CC-BY-SA 4.0