A dataset containing roughly 40,000 words and 40,000 non-words. Variables include lexical characters and behavioral data, including recognition times and association tests. The dataset combines variables several studies and is available at: https://elexicon.wustl.edu/

elp_lexical

Format

A data frame with 79,672 rows and 32 variables.

Source

https://doi.org/10.3758/BF03193014

Variables

Variables:

  • term. unique word (or non-word character string)

  • length. number of characters

  • freq_kf. frequency in the Kucera and Francis list

  • freq_hal. frequency in the HAL (Hyperspace analog to language) list

  • log_freq_hal. Log HAL frequency

  • concreteness_rating. construal (Brysbaert et al. 2013)

  • semantic_neighborhood_density. density of a term's neighborhood in the HAL model (see Shaoul and Westbury 2010)

  • semantic_neighbors. term's neighbor's in the HAL model (see Shaoul and Westbury 2010)

  • semantic_diversity. term's neighborhood diversity in the HAL model (see Shaoul and Westbury 2010)

  • age_of_acquisition. age (in years) at which respondents thought they had learned the term (see Kuperman, et al. 2012)

  • body_object_interaction. ease with which the human body can interact with a term's referent (see Pexman et al. 2019)

  • emotional_valence. score for positive/negative (see VAD, Warriner et al., 2013)

  • emotional_arousal. score for active/calm (see VAD, Warriner et al., 2013)

  • emotional_dominance. score for powerful/weak (see VAD, Warriner et al., 2013)

  • assoc_freq_r1. times term is the "first associate" of other terms (see DeDeyne et al., 2018)

  • assoc_types_r1. number of unique terms that produce this term first (see DeDeyne et al., 2018)

  • assoc_freq_r123. times term is one of the first three associates of other terms (see DeDeyne et al., 2018)

  • assoc_types_r123. number of unique terms that produce this term in first three associates (see DeDeyne et al., 2018)

  • pron. pronunciation, General American standard. Uses codes based on SAMPA (http://www.phon.ucl.ac.uk/home/sampa)

  • nphon. number of phonemes

  • nsyll. number of syllables

  • pos. part of speech

  • i_mean_rt. mean reaction time in millieseconds for the lexical decision task in the ELP study

  • i_zscore. z-score of rection time for the lexical decision task in the ELP study

  • i_sd. standard deviation of reaction time for the lexical decision task in the ELP study

  • obs. number of observations for that term for the lexical decision task in the in the ELP study

  • i_mean_accuracy. average accuracy for a term for the lexical decision task in the ELP study

  • i_nmg_mean_rt. mean reaction time in millieseconds for the naming task in the ELP study

  • i_nmg_zscore. z-score of rection time for the naming task in the ELP study

  • i_nmg_sd. standard deviation of reaction time for the naming task in the ELP study

  • i_nmg_obs. number of observations for that term for the naming task in the in the ELP study

  • i_nmg_mean_accuracy. average accuracy for a term for the naming task in the ELP study

References

Balota, D.A., Yap, M.J., Hutchison, K.A. et al. (2007) The English Lexicon Project. Behavior Research Methods. 39, 445-459 doi:10.3758/BF03193014 .
Shaoul, C., Westbury, C. (2010) Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods. 42, 393-413 doi:10.3758/BRM.42.2.393 .
Kuperman, V., Stadthagen-Gonzalez, H. & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods. 44, 978-990 doi:10.3758/s13428-012-0210-4 .
Pexman, P.M., Muraki, E., Sidhu, D.M. et al. (2019). Quantifying sensorimotor experience: Body--object interaction ratings for more than 9,000 English words. Behavior Research Methods. 51, 453--466 doi:10.3758/s13428-018-1171-z