A dataset containing roughly 40,000 words and 40,000 non-words. Variables include lexical characters and behavioral data, including recognition times and association tests. The dataset combines variables several studies and is available at: https://elexicon.wustl.edu/
Variables
Variables:
term. unique word (or non-word character string)
length. number of characters
freq_kf. frequency in the Kucera and Francis list
freq_hal. frequency in the HAL (Hyperspace analog to language) list
log_freq_hal. Log HAL frequency
concreteness_rating. construal (Brysbaert et al. 2013)
semantic_neighborhood_density. density of a term's neighborhood in the HAL model (see Shaoul and Westbury 2010)
semantic_neighbors. term's neighbor's in the HAL model (see Shaoul and Westbury 2010)
semantic_diversity. term's neighborhood diversity in the HAL model (see Shaoul and Westbury 2010)
age_of_acquisition. age (in years) at which respondents thought they had learned the term (see Kuperman, et al. 2012)
body_object_interaction. ease with which the human body can interact with a term's referent (see Pexman et al. 2019)
emotional_valence. score for positive/negative (see VAD, Warriner et al., 2013)
emotional_arousal. score for active/calm (see VAD, Warriner et al., 2013)
emotional_dominance. score for powerful/weak (see VAD, Warriner et al., 2013)
assoc_freq_r1. times term is the "first associate" of other terms (see DeDeyne et al., 2018)
assoc_types_r1. number of unique terms that produce this term first (see DeDeyne et al., 2018)
assoc_freq_r123. times term is one of the first three associates of other terms (see DeDeyne et al., 2018)
assoc_types_r123. number of unique terms that produce this term in first three associates (see DeDeyne et al., 2018)
pron. pronunciation, General American standard. Uses codes based on SAMPA (http://www.phon.ucl.ac.uk/home/sampa)
nphon. number of phonemes
nsyll. number of syllables
pos. part of speech
i_mean_rt. mean reaction time in millieseconds for the lexical decision task in the ELP study
i_zscore. z-score of rection time for the lexical decision task in the ELP study
i_sd. standard deviation of reaction time for the lexical decision task in the ELP study
obs. number of observations for that term for the lexical decision task in the in the ELP study
i_mean_accuracy. average accuracy for a term for the lexical decision task in the ELP study
i_nmg_mean_rt. mean reaction time in millieseconds for the naming task in the ELP study
i_nmg_zscore. z-score of rection time for the naming task in the ELP study
i_nmg_sd. standard deviation of reaction time for the naming task in the ELP study
i_nmg_obs. number of observations for that term for the naming task in the in the ELP study
i_nmg_mean_accuracy. average accuracy for a term for the naming task in the ELP study
References
Balota, D.A., Yap, M.J., Hutchison, K.A. et al. (2007)
The English Lexicon Project.
Behavior Research Methods. 39, 445-459
doi:10.3758/BF03193014
.
Shaoul, C., Westbury, C. (2010)
Exploring lexical co-occurrence space using HiDEx.
Behavior Research Methods. 42, 393-413
doi:10.3758/BRM.42.2.393
.
Kuperman, V., Stadthagen-Gonzalez, H. & Brysbaert, M. (2012).
Age-of-acquisition ratings for 30,000 English words.
Behavior Research Methods. 44, 978-990
doi:10.3758/s13428-012-0210-4
.
Pexman, P.M., Muraki, E., Sidhu, D.M. et al. (2019).
Quantifying sensorimotor experience: Body--object interaction
ratings for more than 9,000 English words.
Behavior Research Methods. 51, 453--466
doi:10.3758/s13428-018-1171-z