A dataset containing 2,475 speeches given by U.S. presidential candidates. This is a subset of the Annenberg/Pew archive of presidential campaign discourse. Date were cleaned using code adapted from Diana Reddy: https://github.com/dsreddy80 Date were prepared on January 9th, 2021

data(corpus_presidential)

Format

A data frame with 2475 rows and 13 variables.

Source

https://dss2.princeton.edu/data/95/

Variables

  • doc_id. Unique identifier for each email

  • year. Year the speech was delivered

  • month. Month the speech was delivered

  • day. Day of the month the speech was delivered

  • candidate. Last name of the presidential candidate

  • party. Political party of the candidate

  • state. State in which the speech was delivered

  • city. City in which the speech was delivered

  • length. Length of speech in N words

  • subjects. Subjects discussed as determined by original corpus compilers.

  • description. Brief description as determined by original corpus compilers.