Skip to contents

A dataset containing semantic feature production norms and richness metrics for English words, reflecting the number and interconnectedness of semantic attributes associated with each concept.

Format

A data frame with 4436 rows and 18 variables.

Source

McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37, 547-559. doi:10.3758/BF.3.1.S1 ; Buchanan, E. M., Valentine, K. D., & Maxwell, N. P. (2019). English semantic feature production norms: An extended database of 4,436 concepts. Behavior Research Methods, 51, 1849-1863. doi:10.3758/s13428-018-1172-2

Details

Semantic density captures how richly interconnected a word's meaning is. Concepts with high density have many features that are highly correlated with each other (e.g., "dog" has features that mutually reinforce each other), while low-density concepts have more independent features.

This dictionary combines two sources:

  • McRae et al. (2005): 541 concrete concepts (living and nonliving things) with rich per-concept statistics including semantic density, distinctiveness, feature counts by type, and familiarity ratings.

  • Buchanan et al. (2019): 3,895 additional concepts (nouns, verbs, adjectives) with feature counts and production frequency. Licensed under GPL-3.0.

For the 541 McRae concepts, all columns are populated. For Buchanan-only concepts, only term, pos, num_features, mean_prod_freq, n_participants, and source are populated; other McRae-specific columns are NA.

Variables

  • term. the concept word (lowercase)

  • pos. part of speech: "noun", "verb", "adjective", or "other"

  • num_features. number of distinct semantic features produced for the concept

  • num_distinguishing. number of distinguishing features unique to that concept (McRae only; NA for Buchanan)

  • disting_pct. percentage of features that are distinguishing (McRae only; NA for Buchanan)

  • mean_distinct. mean distinctiveness score across features (McRae only; NA for Buchanan)

  • mean_cv. mean coefficient of variation across features (McRae only; NA for Buchanan)

  • density. semantic density score — number of significantly correlated feature pairs (McRae only; NA for Buchanan)

  • num_corred_pairs. number of correlated feature pairs (McRae only; NA for Buchanan)

  • corred_pct. percentage of feature pairs that are correlated (McRae only; NA for Buchanan)

  • num_functional. number of functional features (McRae only; NA for Buchanan)

  • num_visual_motor. number of visual/motor features (McRae only; NA for Buchanan)

  • num_encyclopedic. number of encyclopedic features (McRae only; NA for Buchanan)

  • num_taxonomic. number of taxonomic features (McRae only; NA for Buchanan)

  • familiarity. mean familiarity rating on a 1-9 scale (McRae only; NA for Buchanan)

  • mean_prod_freq. mean production frequency across features (Buchanan only; NA for McRae)

  • n_participants. number of participants who listed features (Buchanan only; NA for McRae)

  • source. data source attribution ("mcrae_2005" or "buchanan_2019")