Standard

Entropy-Based Approach for the Detection of Changes in Arabic Newspapers' Content. / Bernikova, Olga ; Granichin, Oleg ; Lemberg, Dan; Redkin, Oleg ; Volkovich, Zeev.

In: Entropy, Vol. 22, No. 4, 441, 04.2020.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Author

BibTeX

@article{48567b94d2c94a778be90c91b699c519,
title = "Entropy-Based Approach for the Detection of Changes in Arabic Newspapers' Content",
abstract = "A new method for the recognition of meaningful changes in social state based on transformations of the linguistic content in Arabic newspapers is suggested. The detected alterations of the linguistic material in Arabic newspapers play an indicator role. The currently proposed approach acts in an {"}online{"} fashion and uses pre-trained vector representations of Arabic words. After a pre-processing stage, the words in the issues' texts are substituted by vectors obtained within a word embedding methodology. The approach typifies the consistent linguistic template by the similarity of the embedded vectors. A change in the distributions of the issue-grounded samples indicates a difference in the underlying newspaper template. A two-step procedure implements the concept, where the first step compares the similarity distribution of the current issue versus the union of ones corresponding to several of its predecessors. A repeating under-sampling approach accompanied by a two-sample test stabilizes the sampling and returns a collection of the resultant p-values. In the second stage, the entropy of these sets is sequentially calculated, such that the change points of the time series obtained in this way indicate the changes in the newspaper content. Numerical experiments provided on the following issues of several Arabic newspapers published in the Arab Spring period demonstrate the high reliability of the method.",
keywords = "anomaly detection, publishing model modeling, word embedding",
author = "Olga Bernikova and Oleg Granichin and Dan Lemberg and Oleg Redkin and Zeev Volkovich",
note = "Publisher Copyright: {\textcopyright} 2020 by the authors. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.",
year = "2020",
month = apr,
doi = "10.3390/E22040441",
language = "Английский",
volume = "22",
journal = "Entropy",
issn = "1099-4300",
publisher = "MDPI AG",
number = "4",

}

RIS

TY - JOUR

T1 - Entropy-Based Approach for the Detection of Changes in Arabic Newspapers' Content

AU - Bernikova, Olga

AU - Granichin, Oleg

AU - Lemberg, Dan

AU - Redkin, Oleg

AU - Volkovich, Zeev

N1 - Publisher Copyright: © 2020 by the authors. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2020/4

Y1 - 2020/4

N2 - A new method for the recognition of meaningful changes in social state based on transformations of the linguistic content in Arabic newspapers is suggested. The detected alterations of the linguistic material in Arabic newspapers play an indicator role. The currently proposed approach acts in an "online" fashion and uses pre-trained vector representations of Arabic words. After a pre-processing stage, the words in the issues' texts are substituted by vectors obtained within a word embedding methodology. The approach typifies the consistent linguistic template by the similarity of the embedded vectors. A change in the distributions of the issue-grounded samples indicates a difference in the underlying newspaper template. A two-step procedure implements the concept, where the first step compares the similarity distribution of the current issue versus the union of ones corresponding to several of its predecessors. A repeating under-sampling approach accompanied by a two-sample test stabilizes the sampling and returns a collection of the resultant p-values. In the second stage, the entropy of these sets is sequentially calculated, such that the change points of the time series obtained in this way indicate the changes in the newspaper content. Numerical experiments provided on the following issues of several Arabic newspapers published in the Arab Spring period demonstrate the high reliability of the method.

AB - A new method for the recognition of meaningful changes in social state based on transformations of the linguistic content in Arabic newspapers is suggested. The detected alterations of the linguistic material in Arabic newspapers play an indicator role. The currently proposed approach acts in an "online" fashion and uses pre-trained vector representations of Arabic words. After a pre-processing stage, the words in the issues' texts are substituted by vectors obtained within a word embedding methodology. The approach typifies the consistent linguistic template by the similarity of the embedded vectors. A change in the distributions of the issue-grounded samples indicates a difference in the underlying newspaper template. A two-step procedure implements the concept, where the first step compares the similarity distribution of the current issue versus the union of ones corresponding to several of its predecessors. A repeating under-sampling approach accompanied by a two-sample test stabilizes the sampling and returns a collection of the resultant p-values. In the second stage, the entropy of these sets is sequentially calculated, such that the change points of the time series obtained in this way indicate the changes in the newspaper content. Numerical experiments provided on the following issues of several Arabic newspapers published in the Arab Spring period demonstrate the high reliability of the method.

KW - anomaly detection

KW - publishing model modeling

KW - word embedding

UR - http://www.scopus.com/inward/record.url?scp=85084682538&partnerID=8YFLogxK

U2 - 10.3390/E22040441

DO - 10.3390/E22040441

M3 - статья

AN - SCOPUS:85084682538

VL - 22

JO - Entropy

JF - Entropy

SN - 1099-4300

IS - 4

M1 - 441

ER -

ID: 53062935