Coming to Your Senses: on Controls and Evaluation Sets in Polysemy Research
Haim Dubossarsky, Eitan Grossman, Daphna Weinshall
June 2018

The point of departure of this article is the claim that sense-specific vectors provide an advantage over normal vectors due to the polysemy that they presumably represent. This claim is based on performance gains observed in gold standard evaluation tests such as word similarity tasks. We demonstrate that this claim, at least as it is instantiated in prior art, is unfounded in two ways. Furthermore, we provide empirical data and an analytic discussion that may account for the previously reported improved performance. First, we show that ground-truth polysemy degrades performance in word similarity tasks. Therefore word similarity tasks are not suitable as an evaluation test for polysemy representation. Second, random assignment of words to senses is shown to improve performance in the same task. This and additional results point to the conclusion that performance gains as reported in previous work may be an artifact of random sense assignment, which is equivalent to sub-sampling and multiple estimation of word vector representations. Theoretical analysis shows that this may on its own be beneficial for the estimation of word similarity, by reducing the bias in the estimation of the cosine distance.
Format: [ pdf ]
Reference: lingbuzz/004301
(please use that when you cite this article)
Published in: Proceedings of EMNLP 2018
keywords: nlp, distributional semantics, polysemy, semantics
Downloaded:38 times


[ edit this article | back to article list ]