A dataset of 10,876 tweets that used disaster-related terms: e.g., "aftershock," "twister," "rescued," and so on. A team of coders tag whether a tweet was "relevant" (actually about a disaster) or "not relevant" (about a movie or a joke, for example). This dataset was originally collected by the company Figure Eight, which was acquired by Appen in March of 2019. It was also used in an NLP competition hosted on Kaggle.

data("corpus_disaster")

Format

A data frame with 10860 rows and 3 variables.

Source

https://www.kaggle.com/competitions/nlp-getting-started

Variables

  • doc_id. Unique ID for each tweet

  • text. Text of the tweet

  • relevant. Tag of "Relevant" to a disaster or "Not Relevant"