Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

Standard

Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification. / Alekseev, Anton; Tutubalina, Elena; Malykh, Valentin; Nikolenko, Sergey.

In: Journal of Intelligent and Fuzzy Systems, Vol. 39, No. 2, 2020, p. 2487-2496.

Research output: Contribution to journal › Article › peer-review

BibTeX

@article{b81209e5ce894d56a5bcec15179e69d9,

title = "Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification",

abstract = "Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts. ",

keywords = "Aspect extraction, deep learning, out-of-domain classification, topic coherence, topic models, natural language processing, machine learning, neural networks, aspect extraction, TOPIC COHERENCE",

author = "Anton Alekseev and Elena Tutubalina and Valentin Malykh and Sergey Nikolenko",

year = "2020",

doi = "10.3233/JIFS-179908",

language = "English",

volume = "39",

pages = "2487--2496",

journal = "Journal of Intelligent and Fuzzy Systems",

issn = "1064-1246",

publisher = "IOS Press",

number = "2",

}

RIS

TY - JOUR

T1 - Improving unsupervised neural aspect extraction for online discussions using out-of-domain classification

AU - Alekseev, Anton

AU - Tutubalina, Elena

AU - Malykh, Valentin

AU - Nikolenko, Sergey

PY - 2020

Y1 - 2020

N2 - Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.

AB - Deep learning architectures based on self-attention have recently achieved and surpassed state of the art results in the task of unsupervised aspect extraction and topic modeling. While models such as neural attention-based aspect extraction (ABAE) have been successfully applied to user-generated texts, they are less coherent when applied to traditional data sources such as news articles and newsgroup documents. In this work, we introduce a simple approach based on sentence filtering in order to improve topical aspects learned from newsgroups-based content without modifying the basic mechanism of ABAE. We train a probabilistic classifier to distinguish between out-of-domain texts (outer dataset) and in-domain texts (target dataset). Then, during data preparation we filter out sentences that have a low probability of being in-domain and train the neural model on the remaining sentences. The positive effect of sentence filtering on topic coherence is demonstrated in comparison to aspect extraction models trained on unfiltered texts.

KW - Aspect extraction

KW - deep learning

KW - out-of-domain classification

KW - topic coherence

KW - topic models

KW - natural language processing

KW - machine learning

KW - neural networks

KW - aspect extraction

KW - TOPIC COHERENCE

UR - http://www.scopus.com/inward/record.url?scp=85091103086&partnerID=8YFLogxK

U2 - 10.3233/JIFS-179908

DO - 10.3233/JIFS-179908

M3 - Article

AN - SCOPUS:85091103086

VL - 39

SP - 2487

EP - 2496

JO - Journal of Intelligent and Fuzzy Systems

JF - Journal of Intelligent and Fuzzy Systems

SN - 1064-1246

IS - 2

ER -

ID: 95167268