Skip to contents

Converts a DTM into a data frame with three columns: documents, terms, frequency. Each row is a unique document by term frequency. This is akin to reshape2 packages melt function, but works on a sparse matrix. The resulting data frame is also equivalent to the tidytext triplet tibble.

Usage

dtm_melter(dtm)

Arguments

dtm

Document-term matrix with terms as columns. Works with DTMs produced by any popular text analysis package, or using the dtm_builder() function.

Value

returns data frame with three columns: doc_id, term, freq

Author

Dustin Stoltz

Examples

# \donttest{
data(jfk_speech)
jfk_speech$sentence <- tolower(jfk_speech$sentence)
jfk_speech$sentence <- gsub("[[:punct:]]+", " ", jfk_speech$sentence)
dtm <- dtm_builder(jfk_speech, sentence, sentence_id)

dtm_melted <- dtm_melter(dtm)
head(dtm_melted)
#> 84 x 771 sparse Matrix of class "dgCMatrix", with 1795 entries
#>   doc_id      term freq
#> 1 sent_1 president    1
#> 2 sent_2 president    1
#> 3 sent_4 president    1
#> 4 sent_1    pitzer    1
#> 5 sent_1        mr    1
#> 6 sent_2        mr    1
# }