get_direction() outputs a vector corresponding to one pole of a "semantic direction" built from sets of antonyms or juxtaposed terms. The output can be used as an input to CMDist() and CoCA().

get_direction(anchors, wv, method = "paired", missing = "stop", n_dirs = 1L)



Two column data frame of juxtaposed 'anchor' terms


Matrix of word embedding vectors (a.k.a embedding model) with rows as terms.


Indicates the method used to generate vector offset. Default is 'paired'. See details.


what action to take if terms are not in embeddings. If action = "stop" (default), the function is stopped and an error messages states which terms are missing. If action = "remove", missing terms or rows with missing terms are removed. Missing terms will be printed as a message.


If method = "PCA", an integer indicating how many directions to return. Default = 1L, indicating a single, bipolar, direction.


returns a one row matrix


Semantic directions can be estimated in using a few methods:

  • 'paired' (default): each individual term is subtracted from exactly one other paired term. there must be the same number of terms for each side of the direction (although one word may be used more than once).

  • 'pooled': terms corresponding to one side of a direction are first averaged, and then these averaged vectors are subtracted. A different number of terms can be used for each side of the direction.

  • 'L2': the vector is calculated the same as with 'pooled' but is then divided by the L2 'Euclidean' norm

  • 'PCA': vector offsets are calculated for each pair of terms, as with 'paired', and if n_dirs = 1L (the default) then the direction is the first principal component. Users can return more than one direction by increasing the n_dirs parameter.


Bolukbasi, T., Chang, K. W., Zou, J., Saligrama, V., and Kalai, A. (2016). Quantifying and reducing stereotypes in word embeddings. arXiv preprint
Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai (2016). 'Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings.' Proceedings of the 30th International Conference on Neural Information Processing Systems. 4356-4364.
Taylor, Marshall A., and Dustin S. Stoltz. (2020) 'Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.' Sociological Science 7:544-569. doi:10.15195/v7.a23 .
Taylor, Marshall A., and Dustin S. Stoltz. (2020) 'Integrating semantic directions with concept mover's distance to measure binary concept engagement.' Journal of Computational Social Science 1-12. doi:10.1007/s42001-020-00075-8 .
Kozlowski, Austin C., Matt Taddy, and James A. Evans. (2019). 'The geometry of culture: Analyzing the meanings of class through word embeddings.' American Sociological Review 84(5):905-949. doi:10.1177/0003122419877135 .
Arseniev-Koehler, Alina, and Jacob G. Foster. (2020). 'Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat.' arXiv preprint


Dustin Stoltz


# load example word embeddings

# create anchor list
gen <- data.frame(
  add = c("woman"),
  subtract = c("man")

dir <- get_direction(anchors = gen, wv = ft_wv_sample)

dir <- get_direction(
  anchors = gen, wv = ft_wv_sample,
  method = "PCA", n = 1L