Semantic Textual Similarity on Brazilian Portuguese

Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models

Research output: Contribution to journal › Article › peer-review

DOI

https://doi.org/10.21638/11702/spbu10.2019.207
Final published version

A. Silva
A. Lozkins
L.R. Bertoldi
S. Rigo
V.M. Bure

The literature describes the Semantic Textual Similarity (STS) area as a fundamental part of many Natural Language Processing (NLP) tasks. The STS approaches are dependent on the availability of lexical-semantic resources. There are several efforts to improve the lexicalsemantics resources for the English language, and the state-of-art report a large amount of application for this language. Brazilian Portuguese linguistics resources, when compared with English ones, do not have the same availability regarding relation and contents, generation a loss of precision in STS tasks. Therefore, the current work presents an approach that combines Brazilian Portuguese and English lexical-semantics ontology resources to reach all potential of both language linguistic relations, to generate a language-mixture model to measure STS. We evaluated the proposed approach with a well-known and respected Brazilian Portuguese STS dataset, which brought to light some considerations about mixture models and their relations with ontology language semantics.

Translated title of the contribution	Семантическое сходство текстов на бразильском португальском языке: Подход, основанный на комбинировании нескольких языков
Original language	English
Pages (from-to)	235-244
Number of pages	10
Journal	Vestnik Sankt-Peterburgskogo Universiteta, Prikladnaya Matematika, Informatika, Protsessy Upravleniya
Volume	15
Issue number	2
DOIs	https://doi.org/10.21638/11702/spbu10.2019.207
State	Published - 1 Jan 2019

Scopus subject areas

Computer Science(all)
Control and Optimization
Applied Mathematics

Research areas

computational linguistics, natural language processing, ontologies, Semantic textual similarity, компьютерная лингвистика, обработка естественного языка, онтологии, семантическое сходство текстов

ID: 49087634