Function reference
-
corpus_senti_bench4k
- Subset of 6 Corpora for the SentiStrength Benchmark
-
corpus_annual_review
- Abstracts from the Annual Review of Sociology, 2020
-
corpus_atn_immigr
- Balanced Sample of Immigration related articles from All the News Corpus
-
corpus_beyonce
- Lyrics of Beyonce's Songs
-
corpus_cmu_blogs100
- Sample of 100 Blogposts from the CMU 2008 Political Blog Corpus
-
corpus_envsociology
- Environmental Sociology Article Abstracts, 1990-2014
-
corpus_europarl_subset
- Sample from European Parliament Proceedings Parallel Corpus
-
corpus_finefoods10k
- Subset of Amazon Fine Food Reviews Corpus, 2011-2012
-
corpus_isot_fake_news2k
- Sample of 2,000 ISOT Fake News Dataset
-
corpus_ittpr
- Immigration Think Tank Press Release (ITTPR) Corpus, 1998-2020
-
corpus_presidential
- U.S. Presidential Speeches, 1952-1996
-
corpus_reddit_aita10k
- Subset of Community Ethical Judgements on Real-Life Anecdotes Corpus
-
corpus_taylor_swift
- Lyrics of Taylor Swift's Songs
-
corpus_tng_season5
- Lines from Star Trek: The Next Generation, Season 5
-
corpus_usnss
- National Security Strategy of the United States, 1987-2017
-
corpus_senti_bench
- 6 Corpora for the SentiStrength Benchmark
-
corpus_disaster
- Figure Eight Disaster Tweets
-
corpus_enron
- Internal Emails from Enron Email Corpus
-
corpus_nytimes_covid
- New York Times Articles about COVID-19, 2020
-
corpus_web_dubois
- Lines from three books by W.E.B DuBois
-
corpus_isot_fake_news
- ISOT Fake News Dataset
-
corpus_dsj_vox
- DJS VOX Articles Corpus, 2014-2017
-
corpus_pitchfork
- Pitckfork Reviews, 1999-2019
-
corpus_atn
- All The News (ATN) Corpus 1.0, 2015-2017
-
corpus_atn2
- All The News (ATN) Corpus 2.0, 2016-2020
-
corpus_finefoods
- Amazon Fine Food Reviews Corpus, 2011-2012
-
corpus_reddit_aita
- Community Ethical Judgements on Real-Life Anecdotes Corpus
-
corpus_black_mirror
- Lines from Black Mirror
-
corpus_scifi_pulp
- 20th Century Science Fiction Pulp Magazines
-
corpus_moral_stories
- Moral Stories
-
download_corpus()
- Download specified corpus
-
tweetids_covid
- Tweet IDs for 1,922 tweets using #Covid19 collected in 2021
-
tweetids_covid_geo
- Tweet IDs for 1,999 geo-tagged tweets #Covid19 collected in 2021
-
tweetids_gme
- Tweet IDs of 15,594 tweets using the $GME (GameStop Ticker)
-
tweetids_stayhome
- Tweet IDs for 23,737 tweets using #StayHome collected in 2021