Sample of 100 Blogposts from the CMU 2008 Political Blog Corpus — corpus_cmu

A dataset containing 100 posts, randomly sampled, from the CMU 2008 Political Blog Corpus

Usage

data(corpus_cmu_blogs100)

A data frame with 100 rows and 6 variables.

http://reports-archive.adm.cs.cmu.edu/anon/ml2010/CMU-ML-10-101.pdf

row_number. Row number in original CMU dataset
documents. Text of the blogposts
docname. Unique document identifier
rating. Factor variable giving the partisan affiliation of the blog (based on who they supported for president)
day. Day of the year (1 to 365). All entries are from 2008.
blog. Two digit character code corresponding to the name of the blog. They are: American Thinker (at), Digby (db), Hot Air (ha), Michelle Malkin (mm), Think Progress (tp), Talking Points Memo (tpm)