In this paper we analyze and compare different types of sentence similarity measures by applying them to the problem of sentential paraphrase classification as features. We work with Russian, and all the experiments are conducted on the Russian paraphrase corpus we have collected from the news headlines (and are collecting at the moment). Apart from the similarity features, we also analyze the corpus itself. As a result of the research we disprove the supposition that it is more difficult to distinguish between precise and loose paraphrases than between loose paraphrases and non-paraphrases. We also come up with the recommendations for the application of different similarity measures to classifying paraphrases derived from the news texts.
Original languageEnglish
Pages (from-to)74-82
JournalProceedings of the IEEE
DOIs
StatePublished - 2015
Externally publishedYes

    Research areas

  • sentence similarity measure shallow similarity semantic similarity dictionary-based similarity distributional semantic similarity vector space model paraphrase identification crowdsourcing technologies

ID: 5799587