This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
A copy of the GNU General Public License is available at https://www.gnu.org/licenses/gpl-2.0.html.
Third-Party Data Sources and Attributions
This package incorporates data from the following sources, each with their own license terms:
| Dataset | Source | License |
|---|---|---|
| sensorimotor | Lynott et al. (2020) | CC-BY 4.0 |
| concreteness | Brysbaert et al. (2014) | CC-BY 4.0 |
| nrc_vad | Mohammad et al. (2018) | NRC Non-Commercial Research License (no redistribution; citation required; IP retained by NRC Canada) |
| wkb_vad | Warriner, Kuperman, Brysbaert (2013) | CC-BY 4.0 |
| bootstrap_mrc | Paetzold and Specia (2016) | No explicit open-source license; available for academic research per authors; derived from MRC Psycholinguistic Database |
| english_freqs | Norvig (CC-BY 3.0), Wikipedia (CC-BY-SA 4.0), BNC (custom academic research), Kucera & Francis (custom/academic) | CC-BY 3.0 / CC-BY-SA 4.0 / custom |
| subtlexus_freqs | Brysbaert and New (2009) | Free for academic research; citation required; no explicit redistribution license |
| elp_lexical | Balota et al. (2007) | ELP License Agreement (non-commercial research/education; citation required; copyright Washington University in St. Louis) |
| kte_survey | Kozlowski, Taddy, Evans (2019) | Available for academic research/replication; no explicit open-source license |
| humor_norms | Engelthaler and Hills (2018) | CC-BY 4.0 |
| bgb_pleasantness | Bellezza, Greenwald, Banaji (1986) | No explicit license; publicly shared by authors for academic research; citation required |
| us_ssa_names | US Social Security Administration | Public domain (US government) |
| us_ssa_surnames | US Census Bureau | Public domain (US government) |
| global_surnames | SajjadPourali/locatefamily.com (GitHub) | MIT |
| demonyms | Wikipedia | CC-BY-SA 4.0 |
| organisms | ITIS (public domain, US federal) / UniProt (CC-BY 4.0) | Public domain / CC-BY 4.0 |
| chemicals | ZINC (free for research, attribution required) / Wikipedia (CC-BY-SA 4.0) | Custom / CC-BY-SA 4.0 |
| diseases | MalaCards (Weizmann Institute) | MalaCards terms (academic research; attribution required; no redistribution without permission) |
| callsigns | US FCC | Public domain (US government) |
| iconicity | Winter et al. (2022) | CC-BY 4.0 |
| french_freqs | Leipzig Corpora / Wikipedia / OpenSubtitles | CC-BY-SA 4.0 |
| german_freqs | Leipzig Corpora / Wikipedia / OpenSubtitles | CC-BY-SA 4.0 |
| spanish_freqs | Leipzig Corpora / Wikipedia / OpenSubtitles | CC-BY-SA 4.0 |
| italian_freqs | Leipzig Corpora / Wikipedia / OpenSubtitles | CC-BY-SA 4.0 |
| portuguese_freqs | Leipzig Corpora / Wikipedia / OpenSubtitles | CC-BY-SA 4.0 |
| mft_anchors | Graham, Haidt, Nosek (2009) | Available for academic research per MoralFoundations.org; no explicit redistribution license; all rights reserved |
| emfd_norms | Hopp et al. (2021) | CC-BY 4.0 |
| english_emoticons | lingo2word.com, via qdapDictionaries | GPL-2 |
| english_discourse_markers | Alemany (2005), via qdapDictionaries | GPL-2 |
| english_function_words | John & Muriel Higgins (ECLIPSE), via qdapDictionaries | GPL-2 |
| english_prepositions | Public domain, via qdapDictionaries | GPL-2 |
| english_syllables | Sejnowski & Rosenberg (1987), via qdapDictionaries | GPL-2 |
| english_action_verbs | Grady Ward / Moby Project, via qdapDictionaries | Public domain |
| english_adverbs | Grady Ward / Moby Project, via qdapDictionaries | Public domain |
| english_grady | Grady Ward / Moby Project, via qdapDictionaries | Public domain |
| english_normalization_rules | ECHNAE Project (MIT), qdapDictionaries (GPL-2), Wikipedia (CC-BY-SA 4.0), Google Books Ngrams (CC-BY 3.0) | GPL-2 (MIT and CC-BY/CC-BY-SA compatible with GPL-2) |
| unicode_normalization | Unicode Standard, W3C | Unicode License / W3C |
| english_contractions | qdapDictionaries (GPL-2) | GPL-2 |
| english_colors | W3C CSS4 | Public domain |
| english_numerics | Curated for text2map.dictionaries | MIT |
| english_irregular_verbs | Wikipedia (CC-BY-SA 4.0) | CC-BY-SA 4.0 |
| english_abbreviations | ECHNAE Project (MIT) | MIT |
| english_interjections | ECHNAE Project (MIT), qdapDictionaries (GPL-2) | GPL-2 (MIT-licensed entries compatible with GPL-2; combined work is GPL-2) |
| english_political_abbreviations | ISO 3166-1, USPS, Canada Post, AP Stylebook | Public domain / CC-BY-SA 4.0 |
| english_archaic | ECHNAE Project (MIT) | MIT |
| english_bigrams | Ted Underwood / DataMunging, derived from Google Books Ngrams (CC-BY) | CC-BY 3.0 |
| english_log_freq | Ted Underwood / DataMunging | CC-BY 3.0 |
| english_ocr_corrections | Ted Underwood / DataMunging, derived from Google Books Ngrams (CC-BY) | CC-BY 3.0 |
| english_variant_spellings | Ted Underwood / DataMunging | CC-BY 3.0 |
| english_syncope | Ted Underwood / DataMunging (CC-BY), ECHNAE Project (MIT) | CC-BY 3.0 / MIT |
| english_fusing_rules | Ted Underwood / DataMunging (CC-BY); curated supplement | CC-BY 3.0 |
| english_hyphen_rules | Ted Underwood / DataMunging | CC-BY 3.0 |
| english_personal_names | Ted Underwood / DataMunging | CC-BY 3.0 |
| english_place_names | Ted Underwood / DataMunging | CC-BY 3.0 |
| latin_phrases | Wikipedia (List of Latin phrases) | CC-BY-SA 4.0 |
