The All The News (ATN) Corpus 1.0. contains 204,135 news articles from 15 news organizations collected in 2017 by Andrew Thompson. According to the creator: "For each publication, I used archive.org to grab the past year-and-a-half of either home-page headlines or RSS feeds and ran those links through the scraper. That is, the articles are not the product of scraping an entire site, but rather their more prominently placed articles."
Usage
data("corpus_atn")
Details
News organizations include:
New York Times
CNN
Breitbart
Business Insider
The Atlantic
Fox News
category.
section.
Talking Points Memo
New York Post
Buzzfeed News
National Review
The Guardian
NPR
Reuters
Vox
The Washington Post
Variables
doc_id. Unique ID for each article
title. Title of the article
author. Author of the article (if provided)
date. Date of publication
content. Full text of the article
publication. News organization publishing the article
category.
section.
url. URL of article (many do not have URLs)
digital.
year. Year of publication
month. Month of publication