get_direction()
outputs a vector corresponding to one pole of a
"semantic direction" built from sets of antonyms or juxtaposed terms.
The output can be used as an input to CMDist()
and CoCA()
. Anchors
must be a two-column data.frame or a list of length == 2.
get_direction(anchors, wv, method = "paired", missing = "stop", n_dirs = 1L)
A data frame or list of juxtaposed 'anchor' terms
Matrix of word embedding vectors (a.k.a embedding model) with rows as terms.
Indicates the method used to generate vector offset. Default is 'paired'. See details.
what action to take if terms are not in embeddings. If action = "stop" (default), the function is stopped and an error messages states which terms are missing. If action = "remove", missing terms or rows with missing terms are removed. Missing terms will be printed as a message.
If method = "PCA"
, an integer indicating how many directions
to return. Default = 1L
, indicating a single,
bipolar, direction.
returns a one row matrix
Semantic directions can be estimated in using a few methods:
'paired' (default): each individual term is subtracted from exactly one other paired term. there must be the same number of terms for each side of the direction (although one word may be used more than once).
'pooled': terms corresponding to one side of a direction are first averaged, and then these averaged vectors are subtracted. A different number of terms can be used for each side of the direction.
'L2': the vector is calculated the same as with 'pooled' but is then divided by the L2 'Euclidean' norm
'PCA': vector offsets are calculated for each pair of terms,
as with 'paired', and if n_dirs = 1L
(the default)
then the direction is the first principal component.
Users can return more than one direction by increasing
the n_dirs
parameter.
Bolukbasi, T., Chang, K. W., Zou, J., Saligrama, V., and Kalai, A. (2016).
Quantifying and reducing stereotypes in word embeddings. arXiv preprint
https://arxiv.org/abs/1606.06121v1.
Bolukbasi, Tolga, Kai-Wei Chang, James Zou, Venkatesh Saligrama,
Adam Kalai (2016). 'Man Is to Computer Programmer as Woman Is to Homemaker?
Debiasing Word Embeddings.' Proceedings of the 30th International Conference
on Neural Information Processing Systems. 4356-4364.
https://dl.acm.org/doi/10.5555/3157382.3157584.
Taylor, Marshall A., and Dustin S. Stoltz. (2020)
'Concept Class Analysis: A Method for Identifying Cultural
Schemas in Texts.' Sociological Science 7:544-569.
doi:10.15195/v7.a23
.
Taylor, Marshall A., and Dustin S. Stoltz. (2020) 'Integrating semantic
directions with concept mover's distance to measure binary concept
engagement.' Journal of Computational Social Science 1-12.
doi:10.1007/s42001-020-00075-8
.
Kozlowski, Austin C., Matt Taddy, and James A. Evans. (2019). 'The geometry
of culture: Analyzing the meanings of class through word embeddings.'
American Sociological Review 84(5):905-949.
doi:10.1177/0003122419877135
.
Arseniev-Koehler, Alina, and Jacob G. Foster. (2020). 'Machine learning
as a model for cultural learning: Teaching an algorithm what it means to
be fat.' arXiv preprint https://arxiv.org/abs/2003.12133v2.
# load example word embeddings
data(ft_wv_sample)
# create anchor list
gen <- data.frame(
add = c("woman"),
subtract = c("man")
)
dir <- get_direction(anchors = gen, wv = ft_wv_sample)
dir <- get_direction(
anchors = gen, wv = ft_wv_sample,
method = "PCA", n = 1L
)