Juxtaposing “Anchor” Terms

Word embeddings are commonly used to measure the extent a set of target terms are “biased” along a unidimensional semantic relation – a.k.a. dimension, axis, or direction – ranging from “masculine” to “feminine.” Generalizing from this “gender relation,” analyst now use the same basic procedure to measure all sorts of relations, like old to young, big to small, liberal to conservative, rich to poor, and so on.

While there are several ways one could derive a “dimension,” all procedures involve selecting terms to “anchor” the “poles” of the juxtaposition. For example, get_anchors() provides several anchor sets as starting points for defining relations:

get_anchors(relation = "purity")
   add         subtract     
 1 pure        impure       
 2 purity      impurity     
 3 cleanliness uncleanliness
 4 clean       dirty        
 5 pureness    impureness   
 6 stainless   stain        
 7 untainted   tainted      
 8 immaculate  filthy       
 9 purity      dirt         
10 fresh       stale        
11 sanitation  stain 

Boutyline and Johnston (2023) demonstrate a few methods to determine how well each juxtaposing pair of anchor terms in a given set defines a relation. We implement one such method which they call “PairDir”:

“We find that PairDir – a measure of parallelism between the offset vectors (and thus of the internal reliability of the estimated relation) – consistently outperforms other reliability metrics in explaining axis accuracy.”

Below, we walk through how to replicate a portion of Boutyline and Johnston (2023), namely Table 4.

Getting Started

We will need text2map (version 0.1.9):

We will also need the well-known Google News word2vec embeddings. We can do this using text2map.pretrained:

# remotes::install_gitlab("culturalcartography/text2map.pretrained")


After loading, we need to download the model (once per machine) and then load it into the session (it’s rather large, it will take a minute or so to load).


Testing Anchor Sets

Boutyline and Johnston (2023) take anchors used by Kozlowski et al. (2019) to define a soft-to-hard relation. We can load our anchors into R using this:

df_anchors <- data.frame(
  a = c("soft", "supple", "delicate", "pliable", "fluffy", "mushy", "softer", "softest"),
  z = c("hard", "tough", "dense", "rigid", "firm", "solid", "harder", "hardest")

Then, we use the following function to test the quality of these pairs using the PairDir method:

test_anchors(df_anchors, vecs_cbow300_googlenews)
      anchor_pair     pair_dir
1         AVERAGE  0.168506001
2       soft-hard  0.296905021
3    supple-tough  0.192454715
4  delicate-dense -0.003922452
5   pliable-rigid  0.123870932
6     fluffy-firm  0.143672458
7     mushy-solid  0.110253793
8   softer-harder  0.229002074
9 softest-hardest  0.255811469

Boutyline and Johnston (2023, 26) use these results to guide the selection of new anchor pairs:

After identifying “delicate” as the term we want to replace, we iteratively substitute it with each of the roughly 100,000 words in our embedding’s vocabulary (but not already in this anchor set) and calculate the resulting PairDir score for each substitution. We then take the 100 terms that yielded the highest PairDir scores and manually examine them as candidate replacements, looking for a term that, when contrasted with “dense”, best conceptually describes the latent cultural dimension this axis is meant to measure

We have 3 million words in our embeddings – that’s too many for our demonstration! First, let’s remove words already in our anchor set. Second, let’s also remove terms larger than an unigram or those that include punctuation, and remove any with capital letters as they tend to be proper nouns or acronyms.

candidates <- rownames(vecs_cbow300_googlenews)
candidates <- candidates[!candidates %in% unlist(df_anchors)]
candidates <- candidates[!grepl("_", candidates, fixed = TRUE)]
candidates <- candidates[!grepl("[[:punct:]]", candidates)]
candidates <- candidates[!grepl("[[:upper:]]", candidates)]

That is still a lot of words. Normally, we would select a set of candidate terms to test, but just as a demonstration, let’s randomly sample a manageable number from our vocabulary. We will put these in a data.frame, all juxtaposed against “dense.”

# randomly sample 100
idx_samp <- sample(length(candidates), 100)

# create data.frame
df_alts <- data.frame(
  a = candidates[idx_samp], 
  z = "dense"

Now, we’ll use a for-loop to add each candidate pair, one at a time, to our previous anchor set and grab the candidate’s PairDir score (this takes about 3-4 minutes with 100 candidate pairs)

ls_res <- list()
ptm <- proc.time()

for(i in seq_len(nrow(df_alts))) {

  ls_res[[i]] <- test_anchors(
    rbind(df_anchors, df_alts[i, ]),
    )[10, ]


proc.time() - ptm

Now, we can check to see which have the highest PairDir scores:

bind_rows(ls_res) |> slice_max(pair_dir, n = 10)
       anchor_pair   pair_dir
1   vomitous-dense 0.05675224
2    doosras-dense 0.05661879
3     lopper-dense 0.04916034
4    shinier-dense 0.04835953
5   tranquil-dense 0.04774737
6      gummi-dense 0.04327697
7  ponderous-dense 0.04104144
8  telephoto-dense 0.03877452
9  womanhood-dense 0.03816549
10    dowels-dense 0.03704921

None of these randomly constructed pairs are very good! But, we get a sense of how we could iterate through possible candidate pairs and test them using the PairDir method.


Boutyline, Andrei, and Ethan Johnston. 2023. “Forging Better Axes: Evaluating and Improving the Measurement of Semantic Dimensions in Word Embeddings.” SocArXiv, August.
Kozlowski, Austin C, Matt Taddy, and James A Evans. 2019. “The Geometry of Culture: Analyzing the Meanings of Class Through Word Embeddings.” American Sociological Review 84 (5): 905–49.