A dataset containing semantic feature production norms and richness metrics for English words, reflecting the number and interconnectedness of semantic attributes associated with each concept.
Source
McRae, K., Cree, G. S., Seidenberg, M. S., & McNorgan, C. (2005). Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37, 547-559. doi:10.3758/BF.3.1.S1 ; Buchanan, E. M., Valentine, K. D., & Maxwell, N. P. (2019). English semantic feature production norms: An extended database of 4,436 concepts. Behavior Research Methods, 51, 1849-1863. doi:10.3758/s13428-018-1172-2
Details
Semantic density captures how richly interconnected a word's meaning is. Concepts with high density have many features that are highly correlated with each other (e.g., "dog" has features that mutually reinforce each other), while low-density concepts have more independent features.
This dictionary combines two sources:
McRae et al. (2005): 541 concrete concepts (living and nonliving things) with rich per-concept statistics including semantic density, distinctiveness, feature counts by type, and familiarity ratings.
Buchanan et al. (2019): 3,895 additional concepts (nouns, verbs, adjectives) with feature counts and production frequency. Licensed under GPL-3.0.
For the 541 McRae concepts, all columns are populated. For Buchanan-only
concepts, only term, pos, num_features,
mean_prod_freq, n_participants, and source are
populated; other McRae-specific columns are NA.
Variables
term. the concept word (lowercase)
pos. part of speech: "noun", "verb", "adjective", or "other"
num_features. number of distinct semantic features produced for the concept
num_distinguishing. number of distinguishing features unique to that concept (McRae only; NA for Buchanan)
disting_pct. percentage of features that are distinguishing (McRae only; NA for Buchanan)
mean_distinct. mean distinctiveness score across features (McRae only; NA for Buchanan)
mean_cv. mean coefficient of variation across features (McRae only; NA for Buchanan)
density. semantic density score — number of significantly correlated feature pairs (McRae only; NA for Buchanan)
num_corred_pairs. number of correlated feature pairs (McRae only; NA for Buchanan)
corred_pct. percentage of feature pairs that are correlated (McRae only; NA for Buchanan)
num_functional. number of functional features (McRae only; NA for Buchanan)
num_visual_motor. number of visual/motor features (McRae only; NA for Buchanan)
num_encyclopedic. number of encyclopedic features (McRae only; NA for Buchanan)
num_taxonomic. number of taxonomic features (McRae only; NA for Buchanan)
familiarity. mean familiarity rating on a 1-9 scale (McRae only; NA for Buchanan)
mean_prod_freq. mean production frequency across features (Buchanan only; NA for McRae)
n_participants. number of participants who listed features (Buchanan only; NA for McRae)
source. data source attribution ("mcrae_2005" or "buchanan_2019")
