Literary writing style recognition via a minimal spanning tree-based approach

Standard

Literary writing style recognition via a minimal spanning tree-based approach. / Shalymov, Dmitry; Granichin, Oleg; Klebanov, Lev; Volkovich, Zeev.

в: Expert Systems with Applications, Том 61, 01.11.2016, стр. 145-153.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Author

Shalymov, Dmitry ; Granichin, Oleg ; Klebanov, Lev ; Volkovich, Zeev. / Literary writing style recognition via a minimal spanning tree-based approach. в: Expert Systems with Applications. 2016 ; Том 61. стр. 145-153.

BibTeX

@article{d30825838da3405db60dec80f8bf6914,

title = "Literary writing style recognition via a minimal spanning tree-based approach",

abstract = "In this paper, we address the problem of literary writing style determination using a comparison of the randomness of two given texts. We attempt to comprehend if these texts are generated from distinct probability sources that can reveal a difference between the literary writing styles of the corresponding authors. We propose a new approach based on the incorporation of the known Friedman-Rafsky two-sample test into a multistage procedure with the aim of stabilizing the process. A sampling pro cedure constructed by applying the N-grams methodology is applied to simulate samples drawn from the pooled text with the aim of evaluating the null hypothesis distribution that appears after the writing styles coincide. Next, samples from different files are selected, and the p-values of the test statistics are calculated. An empirical distribution of these values is compared numerous times with the uniform one on the interval [0, 1], and the writing styles are recognized as different if the rejection fraction in this comparison's sequence is significantly greater than 0.5. The offered approach is language independent in the community of alphabetic languages and does not involve the use of linguistics. In comparison with most existing methods our approach does not deal with any authorship attribute determination. A text itself, more precisely speaking, the distribution of sequential text templates and their mutual occurrences essentially identifies the style. Experiments demonstrate the strong capability of the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.",

keywords = "Writing style determination, Two-sample spanning Tree-based test, MULTIVARIATE 2-SAMPLE TEST, AUTHORSHIP ATTRIBUTION, FEDERALIST-PAPERS, TESTS, DISTRIBUTIONS, TEXT, UNMASKING, MESSAGES, WORDS",

author = "Dmitry Shalymov and Oleg Granichin and Lev Klebanov and Zeev Volkovich",

year = "2016",

month = nov,

day = "1",

doi = "10.1016/j.eswa.2016.05.032",

language = "Английский",

volume = "61",

pages = "145--153",

journal = "Expert Systems with Applications",

issn = "0957-4174",

publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Literary writing style recognition via a minimal spanning tree-based approach

AU - Shalymov, Dmitry

AU - Granichin, Oleg

AU - Klebanov, Lev

AU - Volkovich, Zeev

PY - 2016/11/1

Y1 - 2016/11/1

N2 - In this paper, we address the problem of literary writing style determination using a comparison of the randomness of two given texts. We attempt to comprehend if these texts are generated from distinct probability sources that can reveal a difference between the literary writing styles of the corresponding authors. We propose a new approach based on the incorporation of the known Friedman-Rafsky two-sample test into a multistage procedure with the aim of stabilizing the process. A sampling pro cedure constructed by applying the N-grams methodology is applied to simulate samples drawn from the pooled text with the aim of evaluating the null hypothesis distribution that appears after the writing styles coincide. Next, samples from different files are selected, and the p-values of the test statistics are calculated. An empirical distribution of these values is compared numerous times with the uniform one on the interval [0, 1], and the writing styles are recognized as different if the rejection fraction in this comparison's sequence is significantly greater than 0.5. The offered approach is language independent in the community of alphabetic languages and does not involve the use of linguistics. In comparison with most existing methods our approach does not deal with any authorship attribute determination. A text itself, more precisely speaking, the distribution of sequential text templates and their mutual occurrences essentially identifies the style. Experiments demonstrate the strong capability of the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.

AB - In this paper, we address the problem of literary writing style determination using a comparison of the randomness of two given texts. We attempt to comprehend if these texts are generated from distinct probability sources that can reveal a difference between the literary writing styles of the corresponding authors. We propose a new approach based on the incorporation of the known Friedman-Rafsky two-sample test into a multistage procedure with the aim of stabilizing the process. A sampling pro cedure constructed by applying the N-grams methodology is applied to simulate samples drawn from the pooled text with the aim of evaluating the null hypothesis distribution that appears after the writing styles coincide. Next, samples from different files are selected, and the p-values of the test statistics are calculated. An empirical distribution of these values is compared numerous times with the uniform one on the interval [0, 1], and the writing styles are recognized as different if the rejection fraction in this comparison's sequence is significantly greater than 0.5. The offered approach is language independent in the community of alphabetic languages and does not involve the use of linguistics. In comparison with most existing methods our approach does not deal with any authorship attribute determination. A text itself, more precisely speaking, the distribution of sequential text templates and their mutual occurrences essentially identifies the style. Experiments demonstrate the strong capability of the proposed method. (C) 2016 Elsevier Ltd. All rights reserved.

KW - Writing style determination

KW - Two-sample spanning Tree-based test

KW - MULTIVARIATE 2-SAMPLE TEST

KW - AUTHORSHIP ATTRIBUTION

KW - FEDERALIST-PAPERS

KW - TESTS

KW - DISTRIBUTIONS

KW - TEXT

KW - UNMASKING

KW - MESSAGES

KW - WORDS

U2 - 10.1016/j.eswa.2016.05.032

DO - 10.1016/j.eswa.2016.05.032

M3 - статья

VL - 61

SP - 145

EP - 153

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

ER -

ID: 7568102

Standard

Harvard

APA

Vancouver

Author

BibTeX

RIS