Displays a table of all corpora available in the
text2map.corpora package, along with their metadata.
Arguments
- type
Optional filter: "bundled" for corpora included with the package, "download" for corpora that must be downloaded, or NULL for all.
- category
Optional filter: "corpus" for text corpora, "tweetids" for tweet ID lists, or NULL for all.
- downloaded_only
If TRUE, only show corpora that have been downloaded (always includes bundled corpora).
Examples
list_corpora()
#> Available corpora:
#> corpus type category n_rows n_cols downloaded
#> corpus_annual_review bundled corpus 70 7 Bundled
#> corpus_tng_season5 bundled corpus 10834 5 Bundled
#> corpus_usnss bundled corpus 18 2 Bundled
#> corpus_envsociology bundled corpus 817 8 Bundled
#> corpus_cmu_blogs100 bundled corpus 100 6 Bundled
#> corpus_europarl_subset bundled corpus 10000 4 Bundled
#> corpus_beyonce bundled corpus 83 10 Bundled
#> corpus_taylor_swift bundled corpus 110 10 Bundled
#> corpus_presidential bundled corpus 2475 13 Bundled
#> corpus_senti_bench4k bundled corpus 4044 6 Bundled
#> corpus_isot_fake_news2k bundled corpus 2000 5 Bundled
#> corpus_finefoods10k bundled corpus 9999 9 Bundled
#> corpus_atn_immigr bundled corpus 3230 8 Bundled
#> corpus_ittpr bundled corpus 976 7 Bundled
#> corpus_reddit_aita10k bundled corpus 10157 18 Bundled
#> corpus_enron download corpus 30965 7 Not downloaded
#> corpus_nytimes_covid download corpus 982 28 Not downloaded
#> corpus_disaster download corpus 10860 3 Not downloaded
#> corpus_web_dubois download corpus 12757 5 Not downloaded
#> corpus_isot_fake_news download corpus 44244 5 Not downloaded
#> corpus_pitchfork download corpus 20783 13 Not downloaded
#> corpus_dsj_vox download corpus 22789 8 Not downloaded
#> corpus_atn download corpus 204135 13 Not downloaded
#> corpus_atn2 download corpus 2688879 11 Not downloaded
#> corpus_finefoods download corpus 50000 9 Not downloaded
#> corpus_reddit_aita download corpus 32766 18 Not downloaded
#> corpus_senti_bench download corpus 11557 6 Not downloaded
#> corpus_black_mirror download corpus 18972 5 Not downloaded
#> corpus_scifi_pulp download corpus 2110 11 Not downloaded
#> corpus_moral_stories download corpus 24000 10 Not downloaded
#> tweetids_covid download tweetids 1922 1 Not downloaded
#> tweetids_covid_geo download tweetids 1999 1 Not downloaded
#> tweetids_stayhome download tweetids 23737 1 Not downloaded
#> tweetids_gme download tweetids 15594 1 Not downloaded
list_corpora(type = "bundled")
#> Corpus results:
#> corpus type category n_rows n_cols downloaded
#> corpus_annual_review bundled corpus 70 7 Bundled
#> corpus_tng_season5 bundled corpus 10834 5 Bundled
#> corpus_usnss bundled corpus 18 2 Bundled
#> corpus_envsociology bundled corpus 817 8 Bundled
#> corpus_cmu_blogs100 bundled corpus 100 6 Bundled
#> corpus_europarl_subset bundled corpus 10000 4 Bundled
#> corpus_beyonce bundled corpus 83 10 Bundled
#> corpus_taylor_swift bundled corpus 110 10 Bundled
#> corpus_presidential bundled corpus 2475 13 Bundled
#> corpus_senti_bench4k bundled corpus 4044 6 Bundled
#> corpus_isot_fake_news2k bundled corpus 2000 5 Bundled
#> corpus_finefoods10k bundled corpus 9999 9 Bundled
#> corpus_atn_immigr bundled corpus 3230 8 Bundled
#> corpus_ittpr bundled corpus 976 7 Bundled
#> corpus_reddit_aita10k bundled corpus 10157 18 Bundled
list_corpora(category = "tweetids")
#> Corpus results:
#> corpus type category n_rows n_cols downloaded
#> tweetids_covid download tweetids 1922 1 Not downloaded
#> tweetids_covid_geo download tweetids 1999 1 Not downloaded
#> tweetids_stayhome download tweetids 23737 1 Not downloaded
#> tweetids_gme download tweetids 15594 1 Not downloaded
list_corpora(downloaded_only = TRUE)
#> Corpus results:
#> corpus type category n_rows n_cols downloaded
#> corpus_annual_review bundled corpus 70 7 Bundled
#> corpus_tng_season5 bundled corpus 10834 5 Bundled
#> corpus_usnss bundled corpus 18 2 Bundled
#> corpus_envsociology bundled corpus 817 8 Bundled
#> corpus_cmu_blogs100 bundled corpus 100 6 Bundled
#> corpus_europarl_subset bundled corpus 10000 4 Bundled
#> corpus_beyonce bundled corpus 83 10 Bundled
#> corpus_taylor_swift bundled corpus 110 10 Bundled
#> corpus_presidential bundled corpus 2475 13 Bundled
#> corpus_senti_bench4k bundled corpus 4044 6 Bundled
#> corpus_isot_fake_news2k bundled corpus 2000 5 Bundled
#> corpus_finefoods10k bundled corpus 9999 9 Bundled
#> corpus_atn_immigr bundled corpus 3230 8 Bundled
#> corpus_ittpr bundled corpus 976 7 Bundled
#> corpus_reddit_aita10k bundled corpus 10157 18 Bundled
