A dataset containing typicality ratings, production frequency, and response time data for English words within semantic categories. Covers both traditional Battig-style categories (concrete and abstract) and object categories from the THINGS database.
Source
Banks, B., & Connell, L. (2022). Category production norms for 117 concrete and abstract categories. Behavior Research Methods, 55, 1292-1313. doi:10.3758/s13428-021-01787-z ; Stoinski, L. M., Perkuhn, J., & Hebart, M. N. (2023). THINGSplus: New norms and metadata for the THINGS database. Behavior Research Methods, 55, 2884-2901. doi:10.3758/s13428-023-02110-8
Details
Typicality indicates how representative a word is of its category (e.g., "robin" is a typical bird, "penguin" is atypical). Higher values indicate greater typicality within the category.
This dictionary combines two sources:
Banks & Connell (2022): 678 entries across 117 categories (67 concrete, 50 abstract) with rated typicality on a 1-5 Likert scale, production frequency, mean rank, and response latency. Includes abstract categories not typically found in object-only norms (e.g., "a crime", "a virtue").
THINGSplus (Stoinski et al., 2023): 2,355 entries across 53 object categories with typicality scores on a 0-1 scale and response times. Licensed under CC-BY 4.0.
Note: typicality scales differ between sources. Banks & Connell uses a 1-5 Likert scale; THINGSplus uses a 0-1 normalized score. These are not directly comparable without rescaling.
Variables
term. the category exemplar word (e.g., "robin", "penguin")
category. the superordinate semantic category (e.g., "bird", "fruit")
domain. category domain: "concrete", "abstract", or "object"
typicality. typicality rating (1-5 Likert for Banks & Connell; 0-1 for THINGSplus)
production_frequency. number of participants who produced this exemplar (Banks & Connell only; NA for THINGSplus)
rank. mean ordinal position in production (Banks & Connell only; NA for THINGSplus)
first_rank_freq. frequency of being produced first (Banks & Connell only; NA for THINGSplus)
response_time. mean response latency in milliseconds (source-dependent: seconds for Banks & Connell, milliseconds for THINGSplus)
source. data source attribution ("banks_connell_2022" or "thingsplus_2023")
