In this paper we analyze and compare different types of sentence similarity measures by applying them to the problem of sentential paraphrase classification as features. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity features, we also analyze the corpus itself. As a result of the research we disprove the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphrases and non-paraphrases. We also come up with the recommendations for the application of different similarity measures to classifying paraphrases derived from the news texts.
Язык оригиналаанглийский
Страницы (с-по)74-82
ЖурналProceedings of the IEEE
DOI
СостояниеОпубликовано - 2015
Опубликовано для внешнего пользованияДа

ID: 5799587