A dataset containing 22,789 Vox articles. Vox Media provided all articles published before March 21, 2017 for the KDD 2017 Workshop on Data Science + Journalism (DSJ). Made available by Elena Zheleva on data.world. 10 articles were removed because they did not parse accurately.

data("corpus_dsj_vox")

Format

A data frame with 22,789 rows and 8 variables.

Source

https://data.world/elenadata/vox-articles

Variables

  • doc_id. Unique identifier for each article

  • title. Title of the news article

  • author. Author of the news article

  • category. Category of the news article, 185 total.

  • published_date. Date article was first published

  • updated_on. Date article was most recently updated

  • slug. Article slug

  • blurb. Article blurb

  • body. Full text of the article