A dataset containing the first 5,000 lines from the French, and first 5000 lines of the English translation of the French, from the European Parliament Proceedings Parallel Corpus 1996-2011

data(corpus_europarl_subset)

Format

A data frame with 10,000 rows and 4 variables.

Source

https://www.statmt.org/europarl/

Details

  • text. Text of speeches

  • language. Language of texts (French or English)

  • line. Line number of the speech

  • source. Original file name for text