A dataset containing eight English stoplists. Is used
with the get_stoplist() function.
Source
tiny2020: Stoltz and Taylor (2020)
snowball2001: Porter (2001) Snowball stemming algorithm
snowball2014: Porter (2014) updated Snowball stoplist
van1979: van Rijsbergen (1979) "Information Retrieval"
fox1990: Fox (1990) "A Stop List for General Text"
smart1993: Salton and Buckley (1993) SMART retrieval system
onix2000: ONIX (2000) "Oxford English Dictionary" stoplist
nltk2009: Bird, Loper and Klein (2009) NLTK
Details
The stoplists include:
"tiny2020": Tiny (2020) list of 33 words (Default)
"snowball2001": Snowball (2001) list of 127 words
"snowball2014": Updated Snowball (2014) list of 175 words
"van1979": van Rijsbergen's (1979) list of 250 words
"fox1990": Christopher Fox's (1990) list of 421 words
"smart1993": Original SMART (1993) list of 570 words
"onix2000": ONIX (2000) list of 196 words
"nltk2009": Python's NLTK (2009) list of 179 words
Tiny 2020, is a very small stop list of the most frequent English conjunctions, articles, prepositions, and demonstratives (N=17). Also includes the 8 forms of the copular verb "to be" and the 8 most frequent personal (singular and plural) pronouns (minus gendered and possessive pronouns).
No contractions are included.
