DOI

The task of text analysis with the objective to determine text’s author is a challenge the solutions of which have engaged researchers since the last century. With the development of social networks and platforms for publishing of web-posts or articles on the Internet, the task of identifying authorship becomes even more acute. Specialists in the areas of journalism and law are particularly interested in finding a more accurate approach in order to resolve disputes related to the texts of dubious authorship. In this article authors carry out an applicability comparison of eight modern Machine Learning algorithms like Support Vector Machine, Naive Bayes, Logistic Regression, K-nearest Neighbors, Decision Tree, Random Forest, Multilayer Perceptron, Gradient Boosting Classifier for classification of Russian web-post collection. The best results were achieved with Logistic Regression, Multilayer Perceptron and Support Vector Machine with linear kernel using combination of Part-of-Speech and Word N-grams as features.

Язык оригиналаанглийский
Название основной публикацииDatabases and Information Systems - 13th International Baltic Conference, DB and IS 2018, Proceedings
РедакторыOlegas Vasilecas, Gintautas Dzemyda, Audrone Lupeikiene
ИздательSpringer Nature
Страницы314-327
Число страниц14
ISBN (печатное издание)9783319975702
DOI
СостояниеОпубликовано - 1 янв 2018
Событие13th International Baltic Conference on Databases and Information Systems, DB and IS 2018 - Trakai, Литва
Продолжительность: 1 июл 20184 июл 2018

Серия публикаций

НазваниеCommunications in Computer and Information Science
Том838
ISSN (печатное издание)1865-0929

конференция

конференция13th International Baltic Conference on Databases and Information Systems, DB and IS 2018
Страна/TерриторияЛитва
ГородTrakai
Период1/07/184/07/18

    Предметные области Scopus

  • Компьютерные науки (все)
  • Математика (все)

ID: 38400560