A dataset containing word frequencies and ranks in news from the Leipzig Corpora collection, Wikipedia, and subtitles from Opensubtitles.
Variables
Variables:
term. unique word (only unigrams)
freq_wiki. term count in the 2021 German Wikipedia corpus
wiki. logical, is the word in the 2021 German Wikipedia corpus
rank_wiki. term rank in the 2021 German Wikipedia corpus
freq_news. term count in the 2024 Leipzig news corpus
news. logical, is the word in the 2024 Leipzig news corpus
rank_news. term rank in the 2024 Leipzig news corpus
freq_subs. term count in the 2018 OpenSubtitles Corpus
subtitles. logical, is the word in the 2018 OpenSubtitles Corpus
rank_subs. term rank in the 2018 OpenSubtitles Corpus
References
Hermit Dave (2020). FrequencyWords https://github.com/hermitdave/FrequencyWords
D. Goldhahn, T. Eckart & U. Quasthoff Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In: Proceedings of the 8th International Language Resources and Evaluation (LREC'12), 2012