Writer identification based on letter frequency distribution

Polina Diurdeva, Elena Mikhailova, Dmitry Shalymov

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

2 Scopus citations

Abstract

Lately writer identification problem has become actual due to huge amount of documents in digital form. In the current work an approach based on frequency combination of letters is investigated for solving such a task as classification of documents by authorship. This research examines and compares four different distance measures between a text of unknown authorship and an authors' profile: L1 measure, Kullback-Leibler divergence, base metric of Common TV-gram method (OVG)[8] and certain variation of dissimilarity measure of CNG method which was proposed in [12]. Comparison outlines cases when some metric outperforms others with a specific parameter combination. Experiments are conducted on different Russian and English corpora.

Original languageEnglish
Title of host publication19th Conference of Open Innovations Association, FRUCT 2016
EditorsTatiana Tyutina, Sergey Balandin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages24-30
Number of pages7
ISBN (Electronic)9789526839752
DOIs
StatePublished - 2016
Event19th Conference of Open Innovations Association, FRUCT 2016 - Jyvaskyla, Finland
Duration: 7 Nov 201611 Nov 2016

Conference

Conference19th Conference of Open Innovations Association, FRUCT 2016
CountryFinland
CityJyvaskyla
Period7/11/1611/11/16

Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Writer identification based on letter frequency distribution'. Together they form a unique fingerprint.

Cite this