Download specified corpus — download_corpus • text2map.corpora

Loads a designated corpus available for `text2map.corpora` package, hosted on GitLab. While some are included when the package is install, some corpora need to be downloaded just once per machine. Downloading may take a while. If no location is specified, the file will be saved in the packages data folder allowing the corpus to be loaded with `data()`. If a location other than the packages data folder is specified, the corpus can be loaded with `load()`.

Usage

download_corpus(
  corpus = c("corpus_enron", "corpus_nytimes_covid", "corpus_disaster",
    "corpus_web_dubois", "corpus_isot_fake_news", "corpus_pitchfork", "corpus_finefoods",
    "corpus_senti_bench", "corpus_reddit_aita", "corpus_atn", "corpus_atn2",
    "corpus_dsj_vox", "corpus_black_mirror", "corpus_scifi_pulp", "corpus_moral_stories",
    "tweetids_covid_geo", "tweetids_covid", "tweetids_gme", "tweetids_stayhome"),
  location = NULL,
  force = FALSE,
  quiet = FALSE
)

Arguments

corpus: Character indicating corpus name to be loaded
location: Default is `NULL` and will save in the R package data folder, If desire, specify saving the corpus elsewhere (Note: if saved elsewhere, corpus must be loaded with `load()`, note `data()`)
force: Default `FALSE`. If model already exists locally, download will be stopped unless `TRUE`.
quiet: Logical (default `FALSE`) to mute messages

Details

The package also includes curate lists of Tweet IDs which (in theory) can be "rehydrated" to rebuild a Tweet corpus:

- `tweetsid_covid_geo` - `tweetsid_covid` - `tweetids_gme` - `tweetids_stayhome`

Author

Dustin Stoltz