Skip to contents

A dataset containing 100 posts, randomly sampled, from the CMU 2008 Political Blog Corpus

Usage

data(corpus_cmu_blogs100)

Format

A data frame with 100 rows and 6 variables.

Source

http://reports-archive.adm.cs.cmu.edu/anon/ml2010/CMU-ML-10-101.pdf

Variables

  • row_number. Row number in original CMU dataset

  • documents. Text of the blogposts

  • docname. Unique document identifier

  • rating. Factor variable giving the partisan affiliation of the blog (based on who they supported for president)

  • day. Day of the year (1 to 365). All entries are from 2008.

  • blog. Two digit character code corresponding to the name of the blog. They are: American Thinker (at), Digby (db), Hot Air (ha), Michelle Malkin (mm), Think Progress (tp), Talking Points Memo (tpm)