A dataset of English numeric word forms covering cardinal numbers (0–19, tens, hundreds through trillions), zero variants (nought, oh), word-form ordinals (first through twentieth, tens, large), numeric-suffix ordinals (1st, 2nd, ... 21st, 30th, 40th, 50th, 100th), historical ordinal variants (2d, 3d, 1th, 2th, 6o, etc.), and exhaustive Roman numerals (I through MMMCMXCIX, i.e., 1–3999).
Variables
form. the number word or numeral (e.g., "twenty", "1st", "2d", "XLII")
replacement. numeric equivalent as a string (e.g., "20", "first", "second", "42")
category. type: cardinal_small, cardinal_tens, cardinal_large, cardinal_zero, ordinal, ordinal_numeric, ordinal_historical, roman_numeral
source. data source attribution
