Converts a DTM into a data frame with three columns:
documents, terms, frequency. Each row is a unique
document by term frequency. This is akin to reshape2
packages melt function, but works on a sparse matrix.
The resulting data frame is also equivalent to the
tidytext triplet tibble.
Arguments
- dtm
Document-term matrix with terms as columns. Works with DTMs produced by any popular text analysis package, or using the
dtm_builder()function.
Examples
# \donttest{
data(jfk_speech)
jfk_speech$sentence <- tolower(jfk_speech$sentence)
jfk_speech$sentence <- gsub("[[:punct:]]+", " ", jfk_speech$sentence)
dtm <- dtm_builder(jfk_speech, sentence, sentence_id)
dtm_melted <- dtm_melter(dtm)
head(dtm_melted)
#> 84 x 771 sparse Matrix of class "dgCMatrix", with 1795 entries
#> doc_id term freq
#> 1 sent_1 president 1
#> 2 sent_2 president 1
#> 3 sent_4 president 1
#> 4 sent_1 pitzer 1
#> 5 sent_1 mr 1
#> 6 sent_2 mr 1
# }
