Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
The task of text analysis with the objective to determine text’s author is a challenge the solutions of which have engaged researchers since the last century. With the development of social networks and platforms for publishing of web-posts or articles on the Internet, the task of identifying authorship becomes even more acute. Specialists in the areas of journalism and law are particularly interested in finding a more accurate approach in order to resolve disputes related to the texts of dubious authorship. In this article authors carry out an applicability comparison of eight modern Machine Learning algorithms like Support Vector Machine, Naive Bayes, Logistic Regression, K-nearest Neighbors, Decision Tree, Random Forest, Multilayer Perceptron, Gradient Boosting Classifier for classification of Russian web-post collection. The best results were achieved with Logistic Regression, Multilayer Perceptron and Support Vector Machine with linear kernel using combination of Part-of-Speech and Word N-grams as features.
Original language | English |
---|---|
Title of host publication | Databases and Information Systems - 13th International Baltic Conference, DB and IS 2018, Proceedings |
Editors | Olegas Vasilecas, Gintautas Dzemyda, Audrone Lupeikiene |
Publisher | Springer Nature |
Pages | 314-327 |
Number of pages | 14 |
ISBN (Print) | 9783319975702 |
DOIs | |
State | Published - 1 Jan 2018 |
Event | 13th International Baltic Conference on Databases and Information Systems, DB and IS 2018 - Trakai, Lithuania Duration: 1 Jul 2018 → 4 Jul 2018 |
Name | Communications in Computer and Information Science |
---|---|
Volume | 838 |
ISSN (Print) | 1865-0929 |
Conference | 13th International Baltic Conference on Databases and Information Systems, DB and IS 2018 |
---|---|
Country/Territory | Lithuania |
City | Trakai |
Period | 1/07/18 → 4/07/18 |
ID: 38400560