R/data.R
corpus_europarl_subset.Rd
A dataset containing the first 5,000 lines from the French, and first 5000 lines of the English translation of the French, from the European Parliament Proceedings Parallel Corpus 1996-2011
data(corpus_europarl_subset)
A data frame with 10,000 rows and 4 variables.
https://www.statmt.org/europarl/
text. Text of speeches
language. Language of texts (French or English)
line. Line number of the speech
source. Original file name for text