The literature describes the Semantic Textual Similarity (STS) area as a fundamental part of many Natural Language Processing (NLP) tasks. The STS approaches are dependent on the availability of lexical-semantic resources. There are several efforts to improve the lexicalsemantics resources for the English language, and the state-of-art report a large amount of application for this language. Brazilian Portuguese linguistics resources, when compared with English ones, do not have the same availability regarding relation and contents, generation a loss of precision in STS tasks. Therefore, the current work presents an approach that combines Brazilian Portuguese and English lexical-semantics ontology resources to reach all potential of both language linguistic relations, to generate a language-mixture model to measure STS. We evaluated the proposed approach with a well-known and respected Brazilian Portuguese STS dataset, which brought to light some considerations about mixture models and their relations with ontology language semantics.

Translated title of the contributionСемантическое сходство текстов на бразильском португальском языке: Подход, основанный на комбинировании нескольких языков
Original languageEnglish
Pages (from-to)235-244
Number of pages10
JournalVestnik Sankt-Peterburgskogo Universiteta, Prikladnaya Matematika, Informatika, Protsessy Upravleniya
Volume15
Issue number2
DOIs
StatePublished - 1 Jan 2019

    Scopus subject areas

  • Computer Science(all)
  • Control and Optimization
  • Applied Mathematics

    Research areas

  • computational linguistics, natural language processing, ontologies, Semantic textual similarity, компьютерная лингвистика, обработка естественного языка, онтологии, семантическое сходство текстов

ID: 49087634