Sample Rows from a Dictionary

Loads a dictionary and returns a random sample of n rows. Useful for quickly inspecting large dictionaries without keeping the full dataset in memory.

Usage

sample_dictionary(name, n, replace = FALSE, seed = NULL, ...)

Arguments

name: Character. Dictionary name (e.g., "global_surnames").
n: Integer. Number of rows to sample. Must be positive.
replace: Logical. Sample with replacement? Default FALSE. Useful when n is close to or exceeds the number of rows.
seed: Integer. Random seed for reproducibility. If NULL (default), no seed is set.
...: Additional arguments passed to load_dictionary, such as format, auto_download, path, quiet, cache, force_rebuild, unify, or large.

Value

A data frame with n rows (or fewer if the dictionary has fewer rows and replace = FALSE).

Details

The entire dictionary is loaded into memory, then a random sample is taken using sample.int. For very large dictionaries (e.g., global_surnames with 10.6M rows), this still requires loading the full dataset. Use large = TRUE to suppress the memory warning.

Examples

if (FALSE) { # \dontrun{
# Preview 100 random rows from a large dictionary
sample_dictionary("global_surnames", n = 100)

# Reproducible sample
sample_dictionary("english_freqs", n = 50, seed = 42)

# Sample with unify for consistent column naming
sample_dictionary("english_abbreviations", n = 20, unify = TRUE, seed = 1)
} # }

Usage

Arguments

Value

Details

See also

Examples