Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Факультет математики и компьютерных наук СПбГУ

DOI

https://doi.org/10.3233/JIFS-179908
Конечная издательская версия

Anton Alekseev
Elena Tutubalina
Valentin Malykh
Sergey Nikolenko

Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.

Язык оригинала	английский
Страницы (с-по)	2487-2496
Число страниц	10
Журнал	Journal of Intelligent and Fuzzy Systems
Том	39
Номер выпуска	2
DOI	https://doi.org/10.3233/JIFS-179908
Состояние	Опубликовано - 2020

Области исследований

natural language processing, machine learning, neural networks, aspect extraction, TOPIC COHERENCE

Предметные области Scopus

Теория вероятности и статистика
Технология (все)
Искусственный интеллект

ID: 95167268

Pure – это продукт компании Elsevier
На данном информационном ресурсе могут быть опубликованы архивные материалы
с упоминанием физических и юридических лиц, включенных Министерством юстиции
Российской Федерации в реестр иностранных агентов

Вход в Pure