Find a specified matrix transformation — find

Given a matrix, \(B\), of word embedding vectors (source) with terms as rows, this function finds a transformed matrix following a specified operation. These include: centering (i.e. translation) and normalization (i.e. scaling). In the first, \(B\) is centered by subtracting column means. In the second, \(B\) is normalized by the L2 norm. Both have been found to improve word embedding representations. The function also finds a transformed matrix that approximately aligns \(B\), with another matrix, \(A\), of word embedding vectors (reference), using Procrustes transformation (see details). Finally, given a term-co-occurrence matrix built on a local corpus, the function can "retrofit" pretrained embeddings to better match the local corpus.

find_transformation(
  wv,
  ref = NULL,
  method = c("align", "norm", "center", "retrofit")
)

Arguments

wv: Matrix of word embedding vectors (a.k.a embedding model) with rows as terms (the source matrix to be transformed).
ref: If method = "align", this is the reference matrix toward which the source matrix is to be aligned.
method: Character vector indicating the method to use for the transformation. Current methods include: "align", "norm", "center", and "refrofit" -- see details.

Value

A new word embedding matrix, transformed using the specified method.

Details

Aligning a source matrix of word embedding vectors, \(B\), to a reference matrix, \(A\), has primarily been used as a post-processing step for embeddings trained on longitudinal corpora for diachronic analysis or for cross-lingual embeddings. Aligning preserves internal (cosine) distances, while orient the source embeddings to minimize the sum of squared distances (and is therefore a Least Squares problem). Alignment is accomplished with the following steps:

translation: centering by column means
scaling: scale (normalizes) by the L2 Norm
rotation/reflection: rotates and a reflects to minimize sum of squared differences, using singular value decomposition

Alignment is asymmetrical, and only outputs the transformed source matrix, \(B\). Therefore, it is typically recommended to align \(B\) to \(A\), and then \(A\) to \(B\). However, simplying centering and norming \(A\) after may be sufficient.

References

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. (2018). 'A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings.' In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 789-798
Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019. 'An effective approach to unsupervised machine translation.' In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 194-203
Hamilton, William L., Jure Leskovec, and Dan Jurafsky. (2018). 'Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.' https://arxiv.org/abs/1605.09096v6.
Lin, Zefeng, Xiaojun Wan, and Zongming Guo. (2019). 'Learning Diachronic Word Embeddings with Iterative Stable Information Alignment.' Natural Language Processing and Chinese Computing. 749-60. doi:10.1007/978-3-030-32233-5_58 .
Schlechtweg et al. (2019). 'A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains.' https://arxiv.org/abs/1906.02979v1. Shoemark et a. (2019). 'Room to Glo: A Systematic Comparison of Semantic Change Detection Approaches with Word Embeddings.' Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. 66-76. doi:10.18653/v1/D19-1007 Borg and Groenen. (1997). Modern Multidimensional Scaling. New York: Springer. 340-342