Skip to contents

Downloads a designated corpus available for the text2map.corpora package, hosted on GitLab. While some are included when the package is installed, some corpora need to be downloaded just once per machine. Downloading may take a while. If no location is specified, the file will be saved in the package's data folder allowing the corpus to be loaded with load_corpus(). If a location other than the package's data folder is specified, the corpus can be loaded with load_corpus(corpus, location).

Usage

download_corpus(corpus = NULL, location = NULL, force = FALSE, quiet = FALSE)

Arguments

corpus

Character string indicating corpus name to be downloaded

location

Default is NULL and will save in the R package data folder. If desired, specify saving the corpus elsewhere (Note: if saved elsewhere, use load_corpus(corpus, location) instead of data())

force

Default FALSE. If corpus already exists locally, download will be stopped unless TRUE.

quiet

Logical (default FALSE) to mute messages

Details

The function tries to download in the following format priority:

  1. .qs2 - Fastest loading (recommended)

  2. .fst - Fast loading, data.frame only

  3. .rda - Standard R format with best compression

The package also includes curated lists of Tweet IDs which (in theory) can be "rehydrated" to rebuild a Tweet corpus:

  • tweetids_covid_geo

  • tweetids_covid

  • tweetids_gme

  • tweetids_stayhome

Author

Dustin Stoltz