Loads a dictionary from local storage. If the dictionary is not found locally and auto_download is TRUE (the default), it will be automatically downloaded from the GitLab repository.
Usage
load_dictionary(
name,
format = c("qs2", "rda"),
auto_download = TRUE,
quiet = FALSE,
path = NULL,
cache = TRUE,
force_rebuild = FALSE,
unify = FALSE,
large = FALSE
)Arguments
- name
Dictionary name (e.g., "global_surnames").
- format
Preferred file format: "qs2" (default, smaller and faster) or "rda". If preferred format is not available, will try the other.
- auto_download
Automatically download if not found locally? Default TRUE.
- quiet
Logical. Suppress informational messages? Default FALSE.
- path
Custom path to look for dictionaries. If NULL, uses package data directories. Default NULL.
- cache
Logical. Cache the dictionary for faster subsequent loads? Default TRUE. Cached copies are stored in
tools::R_user_dir("text2map.dictionaries", "cache").- force_rebuild
Logical. Force re-caching even if a cached copy exists? Default FALSE. Useful after updating a dictionary.
- unify
Logical. If TRUE, rename the primary identifier column to
termusingunify_dictionary. Default FALSE. This provides a consistent column name across dictionaries that use different naming conventions (e.g., "word", "form", "name", "surname").- large
Logical. If TRUE, suppress the memory warning for large dictionaries. Default FALSE. The warning is issued for dictionaries with more than 1 million rows.
Details
When cache = TRUE (the default), loaded dictionaries are cached
in the user-level cache directory (via tools::R_user_dir()) for
faster subsequent loads. Cached copies are stored as .qs2 files with
no compression for maximum read speed. Use force_rebuild = TRUE
to re-cache a dictionary, or clear_dictionary_cache to
remove all cached dictionaries.
Cache invalidation: if a dictionary's source file has been updated (newer VERSION or newer modification time), the cached copy is automatically invalidated and the dictionary is re-loaded from source. This ensures that package updates or re-downloaded dictionaries are always served fresh.
The function searches for dictionaries in the following order:
Cache directory (if
cache = TRUEand a cached copy exists)Custom path (if provided)
Package
extdata/directory for .qs2 files (core dictionaries)Package
data/directory for .rda files (core dictionaries)Package
ondemand/directory for any format (downloaded dictionaries)
If the preferred format is not found, the alternative format will be used with a message.
See also
download_dictionary to pre-fetch on-demand dictionaries,
list_dictionaries for available dictionaries,
clear_dictionary_cache to remove all cached dictionaries,
verify_dictionary to check integrity of a local dictionary,
unify_dictionary for column name normalization.
Examples
if (FALSE) { # \dontrun{
# Load a dictionary (auto-downloads and caches if needed)
data <- load_dictionary("global_surnames")
# Load with specific format
data <- load_dictionary("english_freqs", format = "rda")
# Load and unify column names for consistent analysis
data <- load_dictionary("english_abbreviations", unify = TRUE)
# Check if downloaded, don't auto-download
data <- load_dictionary("organisms", auto_download = FALSE)
# Load from custom directory
data <- load_dictionary("my_dict", path = "/my/data")
# Disable caching for this load
data <- load_dictionary("sensorimotor", cache = FALSE)
# Force re-cache after an update
data <- load_dictionary("nrc_vad", force_rebuild = TRUE)
# Suppress memory warning for large dictionaries
data <- load_dictionary("global_surnames", large = TRUE)
} # }
