Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification. / Alekseev, Anton; Tutubalina, Elena; Malykh, Valentin; Nikolenko, Sergey.
в: Journal of Intelligent and Fuzzy Systems, Том 39, № 2, 2020, стр. 2487-2496.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification
AU - Alekseev, Anton
AU - Tutubalina, Elena
AU - Malykh, Valentin
AU - Nikolenko, Sergey
N1 - Publisher Copyright: © 2020 - IOS Press and the authors. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.
AB - Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.
KW - Aspect extraction
KW - deep learning
KW - out-of-domain classification
KW - topic coherence
KW - topic models
KW - natural language processing
KW - machine learning
KW - neural networks
KW - aspect extraction
KW - TOPIC COHERENCE
UR - http://www.scopus.com/inward/record.url?scp=85091103086&partnerID=8YFLogxK
U2 - 10.3233/JIFS-179908
DO - 10.3233/JIFS-179908
M3 - Article
AN - SCOPUS:85091103086
VL - 39
SP - 2487
EP - 2496
JO - Journal of Intelligent and Fuzzy Systems
JF - Journal of Intelligent and Fuzzy Systems
SN - 1064-1246
IS - 2
ER -
ID: 95167268