Skip to contents

A dataset consisting of a random sample of 10,000 posts from the "Am I the Asshole" (AITA) Reddit forum.

Usage

data(corpus_reddit_aita10k)

Format

A data frame with 10157 rows and 18 variables.

Source

https://github.com/allenai/scruples#data

Variables

  • doc_id. Unique ID for each post

  • action_description. Summary from the post's title

  • action_pronormative_score. How many community members rated the author as not in the wrong

  • action_contranormative_score. How many community members rated the author as in the wrong

  • title. Title of the post

  • text. Text of the post

  • post_type. Historical (the author did it) or Hypothetical (author is considering doing it)

  • label_scores_author. Count for author is wrong

  • label_scores_other. Count for other person is wrong

  • label_scores_everybody. Count for everyone is wrong

  • label_scores_nobody. Count for nobody is wrong

  • label_scores_moreinfo. Count for more information is needed

  • label. Majority label

  • binarized_label_scores_right. Sum of votes for "other" and "nobody" (i.e. author is in the right)

  • binarized_label_scores_wrong. Sum of votes for "author" and "everybody" (i.e. author is in the wrong)

  • binarized_label. Majority binarized label

  • set. Whether the post in the development set, the training set, or the test set.

References

Lourie, N., Le Bras, R. and Choi, Y., 2021, May. Scruples: A corpus of community ethical judgments on 32,000 real-life anecdotes. In Proceedings of the AAAI Conference on Artificial Intelligence