A dataset containing a random sample of 50,000 reviews of fine foods from Amazon between Oct. 2011 - Oct. 2012. These data include the fraction of users who found the review helpful (numerator and denominator). We selected one year from the original dateset of 568,454 taken from 1999-2012, and then randomly sample 50,000 from that subset.

data(corpus_finefoods)

Format

A data frame with 50000 rows and 9 variables.

Source

http://snap.stanford.edu/data/web-FineFoods.html

Variables

  • review_id. Unique ID for the review

  • product_id. Unique ID for the product being reviewed

  • user_id. Unique ID for the reviewer

  • profile_name. Name of the reviewer

  • helpfulness_numerator. Number of users who find the review helpful

  • helpfulness_denominator. Total number of users rating helpfulness

  • score. Rating of the produce by the reviewer

  • summary. Review summary

  • text. Text of the review

  • datetime. Time of the review (in datatime UTC)

References

J. McAuley and J. Leskovec. (2013) From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. WWW