A skip-list of English words containing apostrophes that should not
be "corrected" by text normalization pipelines. Includes proper
nouns (e.g., "O'Brien"), standard contractions (e.g., "can't"),
possessives, and dialectal/archaic forms that are legitimate as-is.
Useful for preventing false-positive normalization of
apostrophe-containing words.
A data frame with 2644 rows and 3 variables.
Source
textnorm/ECHNAE, qdapDictionaries
Variables
form. the word as-is (e.g., "O'Brien", "can't", "Ass'n")
category. reason for skip-listing: proper_noun (972),
dialect_reduction (1,547), standard_contraction (75),
possessive_contraction_ambiguous (22), apocopic (21), possessive (7)
source. data source attribution