A dataset of 30,965 emails. Includes those contained in the "inbox" folders, and only those that were internal (sent to and from Enron email addresses). The original Enron Email Dataset was collected by the CALO Project (A Cognitive Assistant that Learns and Organizes), and contains 500,000 emails.

data("corpus_enron")

Format

A data frame with 30,965 rows and 7 variables.

Source

https://www.cs.cmu.edu/~enron/

Details

  • doc_id. Unique identifier for each email

  • folder. Identifies the employee's account of each email

  • from. Who the email is from

  • to. Who the email is to (multiple email addresses)

  • date_time. Time and date the email was sent

  • subject. Subject of the email

  • text. Main text of the email