Detection of Toxic Language in Short Text Messages › Научные исследования в СПбГУ

Standard

Detection of Toxic Language in Short Text Messages. / Makhnytkina, Olesia; Matveev, Anton; Bogoradnikova, Darya; Lizunova, Inna; Maltseva, Anna ; Shilkina, Natalia.

Speech and Computer : 22nd International Conference, SPECOM 2020, Proceedings. ред. / Alexey Karpov; Rodmonga Potapova. Cham : Springer Nature, 2020. стр. 315-325 (Lecture Notes in Computer Science; Том 12335 ).

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование

Harvard

Makhnytkina, O, Matveev, A, Bogoradnikova, D, Lizunova, I, Maltseva, A & Shilkina, N 2020, Detection of Toxic Language in Short Text Messages. в A Karpov & R Potapova (ред.), Speech and Computer : 22nd International Conference, SPECOM 2020, Proceedings. Lecture Notes in Computer Science, Том. 12335 , Springer Nature, Cham, стр. 315-325, 22nd International Conference on Speech and Computer, St. Petersburg, Российская Федерация, 7/10/20. https://doi.org/10.1007/978-3-030-60276-5_31

APA

Makhnytkina, O., Matveev, A., Bogoradnikova, D., Lizunova, I., Maltseva, A., & Shilkina, N. (2020). Detection of Toxic Language in Short Text Messages. в A. Karpov, & R. Potapova (Ред.), Speech and Computer : 22nd International Conference, SPECOM 2020, Proceedings (стр. 315-325). (Lecture Notes in Computer Science; Том 12335 ). Springer Nature. https://doi.org/10.1007/978-3-030-60276-5_31

Vancouver

Makhnytkina O, Matveev A, Bogoradnikova D, Lizunova I, Maltseva A , Shilkina N. Detection of Toxic Language in Short Text Messages. в Karpov A, Potapova R, Редакторы, Speech and Computer : 22nd International Conference, SPECOM 2020, Proceedings. Cham: Springer Nature. 2020. стр. 315-325. (Lecture Notes in Computer Science). https://doi.org/10.1007/978-3-030-60276-5_31

Author

Makhnytkina, Olesia ; Matveev, Anton ; Bogoradnikova, Darya ; Lizunova, Inna ; Maltseva, Anna ; Shilkina, Natalia. / Detection of Toxic Language in Short Text Messages. Speech and Computer : 22nd International Conference, SPECOM 2020, Proceedings. Редактор / Alexey Karpov ; Rodmonga Potapova. Cham : Springer Nature, 2020. стр. 315-325 (Lecture Notes in Computer Science).

BibTeX

@inproceedings{baeb20f320b54032abee4ff77ebe63c2,

title = "Detection of Toxic Language in Short Text Messages",

abstract = "The ever-increasing online communication landscape provides circumstances for people with significant differences in their views to cross paths unlike it was ever possible before. This leads to the raise of toxicity in online comments and discussions and makes the development of means to detect instances of such phenomenon critically important. The toxic language detection problem is fairly researched and some solutions produce highly accurate predictions when significantly large datasets are available for training. However, such datasets are not always available for various languages. In this paper, we review different ways to approach the problem targeting transferring knowledge from one language to another: machine translation, multi-lingual models, and domain adaptation. We also focus on the analysis of methods for word embedding such as Word2Vec, FastText, GloVe, BERT, and methods for classification of toxic comment: Na{\"i}ve Bayes, Random Forest, Logistic regression, Support Vector Machine, Majority vote, and Recurrent Neural Networks. We demonstrate that for small datasets in the Russian language, traditional machine-learning techniques produce highly competitive results on par with deep learning methods, and also that machine translation of the dataset to the English language produces more accurate results than multi-lingual models.",

keywords = "Classification methods, Toxic language, Machine learning, Toxic language, Machine learning, Natural language processing, Classification methods, Multi-lingual models, Domain adaptation, Word embedding, Machine translation",

author = "Olesia Makhnytkina and Anton Matveev and Darya Bogoradnikova and Inna Lizunova and Anna Maltseva and Natalia Shilkina",

note = "Makhnytkina O., Matveev A., Bogoradnikova D., Lizunova I., Maltseva A., Shilkina N. (2020) Detection of Toxic Language in Short Text Messages. In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science, vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_31; 22nd International Conference on Speech and Computer, SPECOM 2020 ; Conference date: 07-10-2020 Through 09-10-2020",

year = "2020",

month = oct,

doi = "10.1007/978-3-030-60276-5_31",

language = "English",

isbn = "9783030602758",

series = "Lecture Notes in Computer Science",

publisher = "Springer Nature",

pages = "315--325",

editor = "Alexey Karpov and Rodmonga Potapova",

booktitle = "Speech and Computer",

address = "Germany",

url = "http://specom.nw.ru/2020/program/SPECOM-ICR2020-Conference-Program-06102020.pdf",

}

RIS

TY - GEN

T1 - Detection of Toxic Language in Short Text Messages

AU - Makhnytkina, Olesia

AU - Matveev, Anton

AU - Bogoradnikova, Darya

AU - Lizunova, Inna

AU - Maltseva, Anna

AU - Shilkina, Natalia

N1 - Makhnytkina O., Matveev A., Bogoradnikova D., Lizunova I., Maltseva A., Shilkina N. (2020) Detection of Toxic Language in Short Text Messages. In: Karpov A., Potapova R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science, vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_31

PY - 2020/10

Y1 - 2020/10

N2 - The ever-increasing online communication landscape provides circumstances for people with significant differences in their views to cross paths unlike it was ever possible before. This leads to the raise of toxicity in online comments and discussions and makes the development of means to detect instances of such phenomenon critically important. The toxic language detection problem is fairly researched and some solutions produce highly accurate predictions when significantly large datasets are available for training. However, such datasets are not always available for various languages. In this paper, we review different ways to approach the problem targeting transferring knowledge from one language to another: machine translation, multi-lingual models, and domain adaptation. We also focus on the analysis of methods for word embedding such as Word2Vec, FastText, GloVe, BERT, and methods for classification of toxic comment: Naïve Bayes, Random Forest, Logistic regression, Support Vector Machine, Majority vote, and Recurrent Neural Networks. We demonstrate that for small datasets in the Russian language, traditional machine-learning techniques produce highly competitive results on par with deep learning methods, and also that machine translation of the dataset to the English language produces more accurate results than multi-lingual models.

AB - The ever-increasing online communication landscape provides circumstances for people with significant differences in their views to cross paths unlike it was ever possible before. This leads to the raise of toxicity in online comments and discussions and makes the development of means to detect instances of such phenomenon critically important. The toxic language detection problem is fairly researched and some solutions produce highly accurate predictions when significantly large datasets are available for training. However, such datasets are not always available for various languages. In this paper, we review different ways to approach the problem targeting transferring knowledge from one language to another: machine translation, multi-lingual models, and domain adaptation. We also focus on the analysis of methods for word embedding such as Word2Vec, FastText, GloVe, BERT, and methods for classification of toxic comment: Naïve Bayes, Random Forest, Logistic regression, Support Vector Machine, Majority vote, and Recurrent Neural Networks. We demonstrate that for small datasets in the Russian language, traditional machine-learning techniques produce highly competitive results on par with deep learning methods, and also that machine translation of the dataset to the English language produces more accurate results than multi-lingual models.

KW - Classification methods

KW - Toxic language

KW - Machine learning

KW - Toxic language

KW - Machine learning

KW - Natural language processing

KW - Classification methods

KW - Multi-lingual models

KW - Domain adaptation

KW - Word embedding

KW - Machine translation

UR - http://www.scopus.com/inward/record.url?scp=85092889926&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/d27a9d86-6b08-38dc-b9fb-77fbc58630de/

U2 - 10.1007/978-3-030-60276-5_31

DO - 10.1007/978-3-030-60276-5_31

M3 - Conference contribution

AN - SCOPUS:85092889926

SN - 9783030602758

T3 - Lecture Notes in Computer Science

SP - 315

EP - 325

BT - Speech and Computer

A2 - Karpov, Alexey

A2 - Potapova, Rodmonga

PB - Springer Nature

CY - Cham

T2 - 22nd International Conference on Speech and Computer

Y2 - 7 October 2020 through 9 October 2020

ER -

ID: 70278848