A streamlined function to take raw texts from a column of a data.frame and
produce a list of all the unique tokens. Tokenizes by the fixed,
single whitespace, and then extracts the unique tokens. This can be used as
input to dtm_builder()
to standardize the vocabulary (i.e. the columns)
across multiple DTMs. Prior to building the vocabulary, texts should have
whitespace trimmed, if desired, punctuation removed and terms lowercased.
vocab_builder(data, text)
Data.frame with one column of texts
Name of the column with documents' text
returns a list of unique terms in a corpus