Working on an encoding error in fastmatch which shows inconsistent behavior with non-ASCII characters. This dev version provides a temporary fix.

Improvements

  • Add functionality
    • doc_centrality calculates four graph-based centrality metrics using DTMs
    • doc_similarty calculates four document similarity measures using DTMs

Improvements

  • Replaced dependency
    • using ClusterR for get_regions, instead of mlpack
    • Uses the Armadillo library k-means algorithm only (no longer provides an option)
  • Added functionality:
    • seq_builder creates a token-integer sequence representation
  • Added Shakespeare metadata for examples
  • Import Matrix package methods

Improvements

  • Added functionality
    • dtm_builder includes an option to return a dense base R matrix
    • dtm_stopper includes an option to remove based on a terms rank (e.g., top 10), stopping based on count and proportion are now two separate options

Improvements

Improvements

  • Added functionality to dtm_stopper() to stop words by document or term frequencies
    • Nomenclature was changed, stop_freq was changed to stop_termfreq
  • Added functionality to dtm_resampler() to resample proportion and fixed N lengths
  • Added and clarified documentation
  • Added a NEWS.md file to track changes to the package.