Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Investigation of text attribution methods based on frequency author profile. / Diurdeva, Polina; Mikhailova, Elena.
Databases and Information Systems - 13th International Baltic Conference, DB and IS 2018, Proceedings. ред. / Olegas Vasilecas; Gintautas Dzemyda; Audrone Lupeikiene. Springer Nature, 2018. стр. 314-327 (Communications in Computer and Information Science; Том 838).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Investigation of text attribution methods based on frequency author profile
AU - Diurdeva, Polina
AU - Mikhailova, Elena
PY - 2018/1/1
Y1 - 2018/1/1
N2 - The task of text analysis with the objective to determine text’s author is a challenge the solutions of which have engaged researchers since the last century. With the development of social networks and platforms for publishing of web-posts or articles on the Internet, the task of identifying authorship becomes even more acute. Specialists in the areas of journalism and law are particularly interested in finding a more accurate approach in order to resolve disputes related to the texts of dubious authorship. In this article authors carry out an applicability comparison of eight modern Machine Learning algorithms like Support Vector Machine, Naive Bayes, Logistic Regression, K-nearest Neighbors, Decision Tree, Random Forest, Multilayer Perceptron, Gradient Boosting Classifier for classification of Russian web-post collection. The best results were achieved with Logistic Regression, Multilayer Perceptron and Support Vector Machine with linear kernel using combination of Part-of-Speech and Word N-grams as features.
AB - The task of text analysis with the objective to determine text’s author is a challenge the solutions of which have engaged researchers since the last century. With the development of social networks and platforms for publishing of web-posts or articles on the Internet, the task of identifying authorship becomes even more acute. Specialists in the areas of journalism and law are particularly interested in finding a more accurate approach in order to resolve disputes related to the texts of dubious authorship. In this article authors carry out an applicability comparison of eight modern Machine Learning algorithms like Support Vector Machine, Naive Bayes, Logistic Regression, K-nearest Neighbors, Decision Tree, Random Forest, Multilayer Perceptron, Gradient Boosting Classifier for classification of Russian web-post collection. The best results were achieved with Logistic Regression, Multilayer Perceptron and Support Vector Machine with linear kernel using combination of Part-of-Speech and Word N-grams as features.
KW - Author attribution
KW - Frequency author profile
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=85052856289&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-97571-9_25
DO - 10.1007/978-3-319-97571-9_25
M3 - Conference contribution
AN - SCOPUS:85052856289
SN - 9783319975702
T3 - Communications in Computer and Information Science
SP - 314
EP - 327
BT - Databases and Information Systems - 13th International Baltic Conference, DB and IS 2018, Proceedings
A2 - Vasilecas, Olegas
A2 - Dzemyda, Gintautas
A2 - Lupeikiene, Audrone
PB - Springer Nature
T2 - 13th International Baltic Conference on Databases and Information Systems, DB and IS 2018
Y2 - 1 July 2018 through 4 July 2018
ER -
ID: 38400560