This paper suggests a new methodology for patterning writing style evolution using dynamic similarity. We divide a text into sequential, disjoint portions (chunks) of the same size and exploit the Mean Dependence measure, aspiring to model the writing process via association between the current text chunk and its predecessors. To expose the evolution of a style, a new two-step clustering procedure is applied. In the first phase, a distance based on the Mean Dependence between each pair of chunks is evaluated. All document chunks in a pair are embedded in a high dimensional space using a Kuratowski-type embedding procedure and clustered by means of the introduced distance. In the next phase, the rows of the binary cluster classification documents matrix are clustered via the hierarchical single linkage clustering algorithm. By this way, a visualization of the inner stylistic structure of a texts' collection, the resulting classification tree, is provided by the appropriate dendrogram. The approach applied to studying writing style evolution in the "Foundation Universe" by Isaac Asimov, the "Rama" series by Arthur C. Clarke, the "Forsyte Saga" of John Galsworthy, "The Lord of the Rings" by John Ronald Reuel Tolkien and a collection of books prescribed to Romain Gary demonstrates that the suggested methodology is capable of identifying style development over time. Additional numerical experiments with author determination and author verification tasks exhibit the high ability of the method to provide accurate solutions. (C) 2017 Elsevier Ltd. All rights reserved.

Original languageEnglish
Pages (from-to)45-64
Number of pages20
JournalPattern Recognition
Volume77
DOIs
StatePublished - May 2018

    Scopus subject areas

  • Software
  • Artificial Intelligence
  • Signal Processing
  • Computer Vision and Pattern Recognition

    Research areas

  • Patterning, Writing style, Text mining, Dynamics, AUTHORSHIP ATTRIBUTION, K-MEANS, RECOGNITION, COMPRESSION, PLAGIARISM, ALGORITHM, MODELS, KERNEL

ID: 11875344