Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Literary writing style recognition via a minimal spanning tree-based approach. / Shalymov, Dmitry; Granichin, Oleg; Klebanov, Lev; Volkovich, Zeev.
в: Expert Systems with Applications, Том 61, 01.11.2016, стр. 145-153.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Literary writing style recognition via a minimal spanning tree-based approach
AU - Shalymov, Dmitry
AU - Granichin, Oleg
AU - Klebanov, Lev
AU - Volkovich, Zeev
PY - 2016/11/1
Y1 - 2016/11/1
N2 - In this paper, we address the problem of literary writing style determination using a comparison of the randomness of two given texts. We attempt to comprehend if these texts are generated from distinct probability sources that can reveal a difference between the literary writing styles of the corresponding authors. We propose a new approach based on the incorporation of the known Friedman-Rafsky two-sample test into a multistage procedure with the aim of stabilizing the process. A sampling pro cedure constructed by applying the N-grams methodology is applied to simulate samples drawn from the pooled text with the aim of evaluating the null hypothesis distribution that appears after the writing styles coincide. Next, samples from different files are selected, and the p-values of the test statistics are calculated. An empirical distribution of these values is compared numerous times with the uniform one on the interval [0, 1], and the writing styles are recognized as different if the rejection fraction in this comparison's sequence is significantly greater than 0.5. The offered approach is language independent in the community of alphabetic languages and does not involve the use of linguistics. In comparison with most existing methods our approach does not deal with any authorship attribute determination. A text itself, more precisely speaking, the distribution of sequential text templates and their mutual occurrences essentially identifies the style. Experiments demonstrate the strong capability of the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.
AB - In this paper, we address the problem of literary writing style determination using a comparison of the randomness of two given texts. We attempt to comprehend if these texts are generated from distinct probability sources that can reveal a difference between the literary writing styles of the corresponding authors. We propose a new approach based on the incorporation of the known Friedman-Rafsky two-sample test into a multistage procedure with the aim of stabilizing the process. A sampling pro cedure constructed by applying the N-grams methodology is applied to simulate samples drawn from the pooled text with the aim of evaluating the null hypothesis distribution that appears after the writing styles coincide. Next, samples from different files are selected, and the p-values of the test statistics are calculated. An empirical distribution of these values is compared numerous times with the uniform one on the interval [0, 1], and the writing styles are recognized as different if the rejection fraction in this comparison's sequence is significantly greater than 0.5. The offered approach is language independent in the community of alphabetic languages and does not involve the use of linguistics. In comparison with most existing methods our approach does not deal with any authorship attribute determination. A text itself, more precisely speaking, the distribution of sequential text templates and their mutual occurrences essentially identifies the style. Experiments demonstrate the strong capability of the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.
KW - Writing style determination
KW - Two-sample spanning Tree-based test
KW - MULTIVARIATE 2-SAMPLE TEST
KW - AUTHORSHIP ATTRIBUTION
KW - FEDERALIST-PAPERS
KW - TESTS
KW - DISTRIBUTIONS
KW - TEXT
KW - UNMASKING
KW - MESSAGES
KW - WORDS
U2 - 10.1016/j.eswa.2016.05.032
DO - 10.1016/j.eswa.2016.05.032
M3 - статья
VL - 61
SP - 145
EP - 153
JO - Expert Systems with Applications
JF - Expert Systems with Applications
SN - 0957-4174
ER -
ID: 7568102