Skip to contents

A skip-list of English words containing apostrophes that should not be "corrected" by text normalization pipelines. Includes proper nouns (e.g., "O'Brien"), standard contractions (e.g., "can't"), possessives, and dialectal/archaic forms that are legitimate as-is. Useful for preventing false-positive normalization of apostrophe-containing words.

Format

A data frame with 2644 rows and 3 variables.

Source

textnorm/ECHNAE, qdapDictionaries

Variables

  • form. the word as-is (e.g., "O'Brien", "can't", "Ass'n")

  • category. reason for skip-listing: proper_noun (972), dialect_reduction (1,547), standard_contraction (75), possessive_contraction_ambiguous (22), apocopic (21), possessive (7)

  • source. data source attribution