
Evaluate anchor sets in defining semantic relations
Source:R/utils-embedding-vectors.R
test_anchors.RdThis function evaluates how well an anchor set defines a semantic relations using one of two methods: pairdir (which only evaluates semantic directions) or relco which evaluations semantic directions, semantic centroids and compound concepts). See details.
Arguments
- anchors
A data frame or list of 'anchor' terms
- wv
Matrix of word embedding vectors (a.k.a embedding model) with rows as terms.
- non_anchors
For 'relco', terms that are not anchors (random, unrelated, or distinctive terms).
- method
Which metric used to evaluate, 'pairdir' or 'relco'
- all
Logical (default
FALSE). Whether to evaluate all possible pairwise combinations of two sets of anchors. IfFALSEonly the input pairs are used in evaluation and anchor sets must be of equal lengths.- type
For 'relco', indicate which kind of relation, "direction", "centroid", "compound"
- conf
For 'relco', confidence interval
- dir_method
For 'relco' and
type = "direction", indicate the method for calculating direction ("paired", "pooled", "L2", "PCA"), Seeget_direction()for details.- n_runs
For 'relco', number of runs
- null
For 'relco', null hypothesis, default is 0.
- alpha
For 'relco', significance level
- seed
For 'relco', set sampling seed
- order_non_anchors
Logical (default
FALSE). For 'relco', should the order of the non-anchor terms be fixed between each run- summarize
Logical (default
TRUE). Returns a dataframe with AVERAGE scores for input pairs along with each pairs' contribution. Ifsummarize = FALSE, returns a list with each offset matrix, each contribution, and the average score.
Details
PairDir evaluates how parallel two anchor sets are when used to define a semantic direction. According to Boutyline and Johnston (2023):
"We find that PairDir -- a measure of parallelism between the offset vectors (and thus of the internal reliability of the estimated relation) -- consistently outperforms other reliability metrics in explaining axis accuracy."
Boutyline and Johnston only consider analyst specified pairs. However,
if all = TRUE, all pairwise combinations of terms between each set
are evaluated. This can allow for unequal sets of anchors, however this
increases computational complexity considerably.
Relco (anchor reliability coefficient) evaluates how well individual anchors
index a given semantic relation in comparison to a set of non-anchor words.
This can be used on semantic directions, semantic relations, or compound concepts.
See Taylor et al (2025) for details; see also the CMDist() function.
References
Boutyline, Andrei, and Ethan Johnston. 2023. “Forging Better Axes: Evaluating and Improving the Measurement of Semantic Dimensions in Word Embeddings.” doi:10.31235/osf.io/576h3
Taylor, Marshall, et al. 2025. "A Simulation-Based Slope Metric for Anchor List Reliability in Word Embedding Spaces." doi:10.31235/osf.io/sc2ub_v3
Examples
# load example word embeddings
data(ft_wv_sample)
df_anchors <- data.frame(
a = c("rest", "rested", "stay", "stand"),
z = c("coming", "embarked", "fast", "move")
)
# test pairdir
test_anchors(df_anchors, ft_wv_sample, method = "pairdir")
#> anchor_pair pair_dir
#> 1 AVERAGE 0.13890810
#> 2 rest-coming 0.18960552
#> 3 rested-embarked 0.18302837
#> 4 stay-fast 0.10699562
#> 5 stand-move 0.07600288
test_anchors(df_anchors, ft_wv_sample, method = "pairdir", all = TRUE)
#> anchor_pair pair_dir
#> 1 AVERAGE 0.2748587
#> 2 rest-coming 0.3153744
#> 3 rested-coming 0.2752213
#> 4 stay-coming 0.2356302
#> 5 stand-coming 0.2242636
#> 6 rest-embarked 0.3004799
#> 7 rested-embarked 0.3048728
#> 8 stay-embarked 0.2208549
#> 9 stand-embarked 0.2094862
#> 10 rest-fast 0.3272416
#> 11 rested-fast 0.3054702
#> 12 stay-fast 0.3019808
#> 13 stand-fast 0.2737485
#> 14 rest-move 0.3153754
#> 15 rested-move 0.2671968
#> 16 stay-move 0.2791464
#> 17 stand-move 0.2413955
# test relco
non_anchors <- c("writ", "alloys", "ills", "atlas", "saturn", "cape", "unfolds")
## centroid
test_anchors(df_anchors[, 1], ft_wv_sample, method = "relco",
type = "centroid", non_anchors = non_anchors)
#> # Relation Type: centroid
#> # Global Reliability Coefficient: 0.5221
#> # 5 Highest Contributors: rest, rested, stay, stand
#> # Confidence Interval (two-tailed): 0.5285 to 0.5221 at 95%
#> # t-test: t = 160.9013, df = 1, p-value = 0
#> # Alternative Hypothesis: True global reliability coefficient > 0
#> # Term-Level Contributions:
#> term mean lower_ci upper_ci
#> <chr> <dbl> <dbl> <dbl>
#> 1 rest 0.282 0.274 0.289
#> 2 rested 0.275 0.267 0.282
#> 3 stay 0.253 0.243 0.262
#> 4 stand 0.249 0.241 0.257
#>
## compound
test_anchors(df_anchors$a, ft_wv_sample, method = "relco",
type = "compound", non_anchors = non_anchors)
#> # Relation Type: compound
#> # Global Reliability Coefficient: 0.4994
#> # 5 Highest Contributors: rested, rest, stay, stand
#> # Confidence Interval (two-tailed): 0.5053 to 0.4994 at 95%
#> # t-test: t = 168.1983, df = 1, p-value = 0
#> # Alternative Hypothesis: True global reliability coefficient > 0
#> # Term-Level Contributions:
#> term mean lower_ci upper_ci
#> <chr> <dbl> <dbl> <dbl>
#> 1 rest 0.263 0.260 0.265
#> 2 rested 0.273 0.265 0.280
#> 3 stay 0.239 0.233 0.244
#> 4 stand 0.224 0.219 0.229
#>
## direction
test_anchors(df_anchors, ft_wv_sample, method = "relco",
type = "direction", dir_method = "paired",
non_anchors = non_anchors)
#> # Relation Type: direction
#> # Global Reliability Coefficient: 0.2133
#> # 5 Highest Contributors (Pole 1): rest, rested, stand, stay
#> # 5 Highest Contributors (Pole 2): fast, move, coming, embarked
#> # Confidence Interval (two-tailed): 0.2269 to 0.2133 at 95%
#> # t-test: t = 31.1532, df = 1, p-value = 0
#> # Alternative Hypothesis: True global reliability coefficient > 0
#> # Term-Level Contributions:
#> term mean lower_ci upper_ci pole
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 rest 0.149 0.139 0.159 pole1
#> 2 rested 0.138 0.128 0.148 pole1
#> 3 stay 0.0636 0.0495 0.0778 pole1
#> 4 stand 0.0978 0.0833 0.112 pole1
#> 5 coming -0.0287 -0.0337 -0.0236 pole2
#> 6 embarked -0.0132 -0.0231 -0.00332 pole2
#> 7 fast -0.0353 -0.0404 -0.0302 pole2
#> 8 move -0.0309 -0.0349 -0.0269 pole2
#>
test_anchors(df_anchors, ft_wv_sample, method = "relco",
type = "direction", dir_method = "pooled",
non_anchors = non_anchors)
#> # Relation Type: direction
#> # Global Reliability Coefficient: 0.208
#> # 5 Highest Contributors (Pole 1): rested, rest, stand, stay
#> # 5 Highest Contributors (Pole 2): fast, move, coming, embarked
#> # Confidence Interval (two-tailed): 0.2209 to 0.208 at 95%
#> # t-test: t = 31.9017, df = 1, p-value = 0
#> # Alternative Hypothesis: True global reliability coefficient > 0
#> # Term-Level Contributions:
#> term mean lower_ci upper_ci pole
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 rest 0.134 0.116 0.153 pole1
#> 2 rested 0.139 0.130 0.148 pole1
#> 3 stay 0.0524 0.0341 0.0706 pole1
#> 4 stand 0.0843 0.0675 0.101 pole1
#> 5 coming -0.0286 -0.0329 -0.0244 pole2
#> 6 embarked -0.00662 -0.0129 -0.000290 pole2
#> 7 fast -0.0318 -0.0351 -0.0285 pole2
#> 8 move -0.0294 -0.0330 -0.0258 pole2
#>