Load a Dictionary

Loads a dictionary from local storage. If the dictionary is not found locally and auto_download is TRUE (the default), it will be automatically downloaded from the GitLab repository.

Usage

load_dictionary(
  name,
  format = c("qs2", "rda"),
  auto_download = TRUE,
  quiet = FALSE,
  path = NULL,
  cache = TRUE,
  force_rebuild = FALSE,
  unify = FALSE,
  large = FALSE
)

Arguments

name: Dictionary name (e.g., "global_surnames").
format: Preferred file format: "qs2" (default, smaller and faster) or "rda". If preferred format is not available, will try the other.
auto_download: Automatically download if not found locally? Default TRUE.
quiet: Logical. Suppress informational messages? Default FALSE.
path: Custom path to look for dictionaries. If NULL, uses package data directories. Default NULL.
cache: Logical. Cache the dictionary for faster subsequent loads? Default TRUE. Cached copies are stored in tools::R_user_dir("text2map.dictionaries", "cache").
force_rebuild: Logical. Force re-caching even if a cached copy exists? Default FALSE. Useful after updating a dictionary.
unify: Logical. If TRUE, rename the primary identifier column to term using unify_dictionary. Default FALSE. This provides a consistent column name across dictionaries that use different naming conventions (e.g., "word", "form", "name", "surname").
large: Logical. If TRUE, suppress the memory warning for large dictionaries. Default FALSE. The warning is issued for dictionaries with more than 1 million rows.

Value

A data frame containing the dictionary data.

Details

When cache = TRUE (the default), loaded dictionaries are cached in the user-level cache directory (via tools::R_user_dir()) for faster subsequent loads. Cached copies are stored as .qs2 files with no compression for maximum read speed. Use force_rebuild = TRUE to re-cache a dictionary, or clear_dictionary_cache to remove all cached dictionaries.

Cache invalidation: if a dictionary's source file has been updated (newer VERSION or newer modification time), the cached copy is automatically invalidated and the dictionary is re-loaded from source. This ensures that package updates or re-downloaded dictionaries are always served fresh.

The function searches for dictionaries in the following order:

Cache directory (if cache = TRUE and a cached copy exists)
Custom path (if provided)
Package extdata/ directory for .qs2 files (core dictionaries)
Package data/ directory for .rda files (core dictionaries)
Package ondemand/ directory for any format (downloaded dictionaries)

If the preferred format is not found, the alternative format will be used with a message.

Examples

if (FALSE) { # \dontrun{
# Load a dictionary (auto-downloads and caches if needed)
data <- load_dictionary("global_surnames")

# Load with specific format
data <- load_dictionary("english_freqs", format = "rda")

# Load and unify column names for consistent analysis
data <- load_dictionary("english_abbreviations", unify = TRUE)

# Check if downloaded, don't auto-download
data <- load_dictionary("organisms", auto_download = FALSE)

# Load from custom directory
data <- load_dictionary("my_dict", path = "/my/data")

# Disable caching for this load
data <- load_dictionary("sensorimotor", cache = FALSE)

# Force re-cache after an update
data <- load_dictionary("nrc_vad", force_rebuild = TRUE)

# Suppress memory warning for large dictionaries
data <- load_dictionary("global_surnames", large = TRUE)
} # }

Usage

Arguments

Value

Details

See also

Examples