A dataset containing roughly 40,000 words and 40,000 non-words. Variables include lexical characters and behavioral data, including recognition times and association tests. The dataset combines variables several studies and is available at: https://elexicon.wustl.edu/
Variables
term. unique word (or non-word character string)
length. number of characters
freq_kf. frequency in the Kucera and Francis list
freq_hal. frequency in the HAL (Hyperspace analog to language) list
log_freq_hal. Log HAL frequency
concreteness_rating. construal (Brysbaert et al. 2013)
semantic_neighborhood_density. density of a term's neighborhood in the HAL model (see Shaoul and Westbury 2010)
semantic_neighbors. term's neighbor's in the HAL model (see Shaoul and Westbury 2010)
semantic_diversity. term's neighborhood diversity in the HAL model (see Shaoul and Westbury 2010)
age_of_acquisition. age (in years) at which respondents thought they had learned the term (see Kuperman, et al. 2012)
body_object_interaction. ease with which the human body can interact with a term's referent (see Pexman et al. 2019)
emotional_valence. score for positive/negative (see VAD, Warriner et al., 2013)
emotional_arousal. score for active/calm (see VAD, Warriner et al., 2013)
emotional_dominance. score for powerful/weak (see VAD, Warriner et al., 2013)
assoc_freq_r1. times term is the "first associate" of other terms (see DeDeyne et al., 2018)
assoc_types_r1. number of unique terms that produce this term first (see DeDeyne et al., 2018)
assoc_freq_r123. times term is one of the first three associates of other terms (see DeDeyne et al., 2018)
assoc_types_r123. number of unique terms that produce this term in first three associates (see DeDeyne et al., 2018)
pron. pronunciation, General American standard. Uses codes based on SAMPA (http://www.phon.ucl.ac.uk/home/sampa)
nphon. number of phonemes
nsyll. number of syllables
pos. part of speech
i_mean_rt. mean reaction time in milliseconds for the lexical decision task in the ELP study
i_zscore. z-score of reaction time for the lexical decision task in the ELP study
i_sd. standard deviation of reaction time for the lexical decision task in the ELP study
obs. number of observations for that term for the lexical decision task in the ELP study
i_mean_accuracy. average accuracy for a term for the lexical decision task in the ELP study
i_nmg_mean_rt. mean reaction time in milliseconds for the naming task in the ELP study
i_nmg_zscore. z-score of reaction time for the naming task in the ELP study
i_nmg_sd. standard deviation of reaction time for the naming task in the ELP study
i_nmg_obs. number of observations for that term for the naming task in the ELP study
i_nmg_mean_accuracy. average accuracy for a term for the naming task in the ELP study
References
Balota, D.A., Yap, M.J., Hutchison, K.A. et al. (2007)
The English Lexicon Project.
Behavior Research Methods. 39, 445-459
doi:10.3758/BF03193014
.
Shaoul, C., Westbury, C. (2010)
Exploring lexical co-occurrence space using HiDEx.
Behavior Research Methods. 42, 393-413
doi:10.3758/BRM.42.2.393
.
Kuperman, V., Stadthagen-Gonzalez, H. & Brysbaert, M. (2012).
Age-of-acquisition ratings for 30,000 English words.
Behavior Research Methods. 44, 978-990
doi:10.3758/s13428-012-0210-4
.
Pexman, P.M., Muraki, E., Sidhu, D.M. et al. (2019).
Quantifying sensorimotor experience: Body–object interaction
ratings for more than 9,000 English words.
Behavior Research Methods. 51, 453–466
doi:10.3758/s13428-018-1171-z
