A dataset containing eight English stoplist. Is used with the get_stoplist() function.

stoplists

Format

A data frame with 1775 rows and 2 variables.

Details

The stoplists include:

  • "tiny2020": Tiny (2020) list of 33 words (Default)

  • "snowball2001": Snowball (2001) list of 127 words

  • "snowball2014": Updated Snowball (2014) list of 175 words

  • "van1979": van Rijsbergen's (1979) list of 250 words

  • "fox1990": Christopher Fox's (1990) list of 421 words

  • "smart1993": Original SMART (1993) list of 570 words

  • "onix2000": ONIX (2000) list of 196 words

  • "nltk2001": Python's NLTK (2009) list of 179 words

Tiny 2020, is a very small stop list of the most frequent English conjunctions, articles, prepositions, and demonstratives (N=17). Also includes the 8 forms of the copular verb "to be" and the 8 most frequent personal (singular and plural) pronouns (minus gendered and possessive pronouns).

No contractions are included.

Variables

Variables:

  • words. words to be stopped

  • source. source of the list