Skip to contents

Loads a dictionary from local storage. If the dictionary is not found locally and auto_download is TRUE (the default), it will be automatically downloaded from the GitLab repository.

Usage

load_dictionary(
  name,
  format = c("qs2", "rda"),
  auto_download = TRUE,
  quiet = FALSE,
  path = NULL,
  cache = TRUE,
  force_rebuild = FALSE,
  unify = FALSE,
  large = FALSE
)

Arguments

name

Dictionary name (e.g., "global_surnames").

format

Preferred file format: "qs2" (default, smaller and faster) or "rda". If preferred format is not available, will try the other.

auto_download

Automatically download if not found locally? Default TRUE.

quiet

Logical. Suppress informational messages? Default FALSE.

path

Custom path to look for dictionaries. If NULL, uses package data directories. Default NULL.

cache

Logical. Cache the dictionary for faster subsequent loads? Default TRUE. Cached copies are stored in tools::R_user_dir("text2map.dictionaries", "cache").

force_rebuild

Logical. Force re-caching even if a cached copy exists? Default FALSE. Useful after updating a dictionary.

unify

Logical. If TRUE, rename the primary identifier column to term using unify_dictionary. Default FALSE. This provides a consistent column name across dictionaries that use different naming conventions (e.g., "word", "form", "name", "surname").

large

Logical. If TRUE, suppress the memory warning for large dictionaries. Default FALSE. The warning is issued for dictionaries with more than 1 million rows.

Value

A data frame containing the dictionary data.

Details

When cache = TRUE (the default), loaded dictionaries are cached in the user-level cache directory (via tools::R_user_dir()) for faster subsequent loads. Cached copies are stored as .qs2 files with no compression for maximum read speed. Use force_rebuild = TRUE to re-cache a dictionary, or clear_dictionary_cache to remove all cached dictionaries.

Cache invalidation: if a dictionary's source file has been updated (newer VERSION or newer modification time), the cached copy is automatically invalidated and the dictionary is re-loaded from source. This ensures that package updates or re-downloaded dictionaries are always served fresh.

The function searches for dictionaries in the following order:

  1. Cache directory (if cache = TRUE and a cached copy exists)

  2. Custom path (if provided)

  3. Package extdata/ directory for .qs2 files (core dictionaries)

  4. Package data/ directory for .rda files (core dictionaries)

  5. Package ondemand/ directory for any format (downloaded dictionaries)

If the preferred format is not found, the alternative format will be used with a message.

See also

download_dictionary to pre-fetch on-demand dictionaries, list_dictionaries for available dictionaries, clear_dictionary_cache to remove all cached dictionaries, verify_dictionary to check integrity of a local dictionary, unify_dictionary for column name normalization.

Examples

if (FALSE) { # \dontrun{
# Load a dictionary (auto-downloads and caches if needed)
data <- load_dictionary("global_surnames")

# Load with specific format
data <- load_dictionary("english_freqs", format = "rda")

# Load and unify column names for consistent analysis
data <- load_dictionary("english_abbreviations", unify = TRUE)

# Check if downloaded, don't auto-download
data <- load_dictionary("organisms", auto_download = FALSE)

# Load from custom directory
data <- load_dictionary("my_dict", path = "/my/data")

# Disable caching for this load
data <- load_dictionary("sensorimotor", cache = FALSE)

# Force re-cache after an update
data <- load_dictionary("nrc_vad", force_rebuild = TRUE)

# Suppress memory warning for large dictionaries
data <- load_dictionary("global_surnames", large = TRUE)
} # }