A dataset containing 22,789 Vox articles. Vox Media provided all articles published before March 21, 2017 for the KDD 2017 Workshop on Data Science + Journalism (DSJ). Made available by Elena Zheleva on data.world. 10 articles were removed because they did not parse accurately.
Usage
data("corpus_dsj_vox")
Variables
doc_id. Unique identifier for each article
title. Title of the news article
author. Author of the news article
category. Category of the news article, 185 total.
published_date. Date article was first published
updated_on. Date article was most recently updated
slug. Article slug
blurb. Article blurb
body. Full text of the article