Subset of Community Ethical Judgements on Real-Life Anecdotes Corpus — corpus_reddit

A dataset consisting of a random sample of 10,000 posts from the "Am I the Asshole" (AITA) Reddit forum.

Usage

data(corpus_reddit_aita10k)

Format

A data frame with 10157 rows and 18 variables.

Source

https://github.com/allenai/scruples#data

Variables

doc_id. Unique ID for each post
action_description. Summary from the post's title
action_pronormative_score. How many community members rated the author as not in the wrong
action_contranormative_score. How many community members rated the author as in the wrong
title. Title of the post
text. Text of the post
post_type. Historical (the author did it) or Hypothetical (author is considering doing it)
label_scores_author. Count for author is wrong
label_scores_other. Count for other person is wrong
label_scores_everybody. Count for everyone is wrong
label_scores_nobody. Count for nobody is wrong
label_scores_moreinfo. Count for more information is needed
label. Majority label
binarized_label_scores_right. Sum of votes for "other" and "nobody" (i.e. author is in the right)
binarized_label_scores_wrong. Sum of votes for "author" and "everybody" (i.e. author is in the wrong)
binarized_label. Majority binarized label
set. Whether the post in the development set, the training set, or the test set.

References

Lourie, N., Le Bras, R. and Choi, Y., 2021, May. Scruples: A corpus of community ethical judgments on 32,000 real-life anecdotes. In Proceedings of the AAAI Conference on Artificial Intelligence