Given a matrix, \(B\), of word embedding vectors (source) with terms as rows, this function finds a transformed matrix following a specified operation. These include: centering (i.e. translation) and normalization (i.e. scaling). In the first, \(B\) is centered by subtracting column means. In the second, \(B\) is normalized by the L2 norm. Both have been found to improve word embedding representations. The function also finds a transformed matrix that approximately aligns \(B\), with another matrix, \(A\), of word embedding vectors (reference), using Procrustes transformation (see details). Finally, given a term-co-occurrence matrix built on a local corpus, the function can "retrofit" pretrained embeddings to better match the local corpus.
find_transformation(
wv,
ref = NULL,
method = c("align", "norm", "center", "retrofit")
)
Matrix of word embedding vectors (a.k.a embedding model) with rows as terms (the source matrix to be transformed).
If method = "align"
, this is the reference matrix
toward which the source matrix is to be aligned.
Character vector indicating the method to use for the transformation. Current methods include: "align", "norm", "center", and "refrofit" -- see details.
A new word embedding matrix, transformed using the specified method.
Aligning a source matrix of word embedding vectors, \(B\), to a reference matrix, \(A\), has primarily been used as a post-processing step for embeddings trained on longitudinal corpora for diachronic analysis or for cross-lingual embeddings. Aligning preserves internal (cosine) distances, while orient the source embeddings to minimize the sum of squared distances (and is therefore a Least Squares problem). Alignment is accomplished with the following steps:
translation: centering by column means
scaling: scale (normalizes) by the L2 Norm
rotation/reflection: rotates and a reflects to minimize sum of squared differences, using singular value decomposition
Alignment is asymmetrical, and only outputs the transformed source matrix, \(B\). Therefore, it is typically recommended to align \(B\) to \(A\), and then \(A\) to \(B\). However, simplying centering and norming \(A\) after may be sufficient.
Mikel Artetxe, Gorka Labaka, and Eneko Agirre. (2018).
'A robust self-learning method for fully unsupervised
cross-lingual mappings of word embeddings.' In Proceedings
of the 56th Annual Meeting of the Association for
Computational Linguistics. 789-798
Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2019.
'An effective approach to unsupervised machine translation.'
In Proceedings of the 57th Annual Meeting of the Association
for Computational Linguistics. 194-203
Hamilton, William L., Jure Leskovec, and Dan Jurafsky. (2018).
'Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change.'
https://arxiv.org/abs/1605.09096v6.
Lin, Zefeng, Xiaojun Wan, and Zongming Guo. (2019).
'Learning Diachronic Word Embeddings with Iterative Stable
Information Alignment.' Natural Language Processing and
Chinese Computing. 749-60. doi:10.1007/978-3-030-32233-5_58
.
Schlechtweg et al. (2019). 'A Wind of Change: Detecting and
Evaluating Lexical Semantic Change across Times and Domains.'
https://arxiv.org/abs/1906.02979v1.
Shoemark et a. (2019). 'Room to Glo: A Systematic Comparison
of Semantic Change Detection Approaches with Word Embeddings.'
Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing. 66-76. doi:10.18653/v1/D19-1007
Borg and Groenen. (1997). Modern Multidimensional Scaling.
New York: Springer. 340-342