A dataset of stoplists — stoplists • text2map

A dataset containing eight English stoplist. Is used with the get_stoplist() function.

stoplists

Format

A data frame with 1775 rows and 2 variables.

Details

The stoplists include:

"tiny2020": Tiny (2020) list of 33 words (Default)
"snowball2001": Snowball (2001) list of 127 words
"snowball2014": Updated Snowball (2014) list of 175 words
"van1979": van Rijsbergen's (1979) list of 250 words
"fox1990": Christopher Fox's (1990) list of 421 words
"smart1993": Original SMART (1993) list of 570 words
"onix2000": ONIX (2000) list of 196 words
"nltk2001": Python's NLTK (2009) list of 179 words

Tiny 2020, is a very small stop list of the most frequent English conjunctions, articles, prepositions, and demonstratives (N=17). Also includes the 8 forms of the copular verb "to be" and the 8 most frequent personal (singular and plural) pronouns (minus gendered and possessive pronouns).

No contractions are included.

Variables

Variables:

words. words to be stopped
source. source of the list