Skip to contents

Loads a dictionary and returns a random sample of n rows. Useful for quickly inspecting large dictionaries without keeping the full dataset in memory.

Usage

sample_dictionary(name, n, replace = FALSE, seed = NULL, ...)

Arguments

name

Character. Dictionary name (e.g., "global_surnames").

n

Integer. Number of rows to sample. Must be positive.

replace

Logical. Sample with replacement? Default FALSE. Useful when n is close to or exceeds the number of rows.

seed

Integer. Random seed for reproducibility. If NULL (default), no seed is set.

...

Additional arguments passed to load_dictionary, such as format, auto_download, path, quiet, cache, force_rebuild, unify, or large.

Value

A data frame with n rows (or fewer if the dictionary has fewer rows and replace = FALSE).

Details

The entire dictionary is loaded into memory, then a random sample is taken using sample.int. For very large dictionaries (e.g., global_surnames with 10.6M rows), this still requires loading the full dataset. Use large = TRUE to suppress the memory warning.

See also

load_dictionary for loading the full dictionary, dictionary_info for row counts and column info without loading.

Examples

if (FALSE) { # \dontrun{
# Preview 100 random rows from a large dictionary
sample_dictionary("global_surnames", n = 100)

# Reproducible sample
sample_dictionary("english_freqs", n = 50, seed = 42)

# Sample with unify for consistent column naming
sample_dictionary("english_abbreviations", n = 20, unify = TRUE, seed = 1)
} # }