SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES › Научные исследования в СПбГУ

Standard

SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES. / Bernikova, O.; Redkin, O.

5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018. STEF92 Technology Ltd., 2018. стр. 407-412.

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование

Harvard

Bernikova, O & Redkin, O 2018, SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES. в 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018. STEF92 Technology Ltd., стр. 407-412. https://doi.org/10.5593/sgemsocial2018H/31/S10.051

APA

Bernikova, O., & Redkin, O. (2018). SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES. в 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018 (стр. 407-412). STEF92 Technology Ltd.. https://doi.org/10.5593/sgemsocial2018H/31/S10.051

Vancouver

Bernikova O, Redkin O. SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES. в 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018. STEF92 Technology Ltd. 2018. стр. 407-412 https://doi.org/10.5593/sgemsocial2018H/31/S10.051

Author

Bernikova, O. ; Redkin, O. / SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES. 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018. STEF92 Technology Ltd., 2018. стр. 407-412

BibTeX

@inproceedings{c65961148f684934a0a662a7bc604a63,

title = "SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES",

abstract = "The paper deals with the linguistic peculiarities of sentiment analysis of documents in Arabic. Automatic definition of emotive component in a large corpus is highly relevant today. At the same time, the theory of emotions in linguistics has not been sufficiently developed yet, therefore there is an urgent need to improve the computer methods of sentiment analysis for carrying out sociolinguistic research. In the framework of this study, we try to determine the patterns inherent in the Arabic language, which must be taken into account when conducting Big Data processing. To implement this task, we used the high-frequency word list, developed on the basis of processing of the texts with a volume of 1 million uses. After that, 1000 units were analyzed within the onedimensional emotional space ({"}positive{"} - {"}negative{"}). As a result it was determined that percentage of emotional vocabulary towards neutral is about 18%; the most representative part of speech in the emotive dictionary is the verb (34%), in approximately equal proportion nouns and verbal nouns (masdars) are represented - 24- 25% respectively, while adjectives constitutes only 16%. It is often difficult to identify a particular sentiment, as its characteristic depends on the context (for example such words as {"}discipline{"} can be used in a variety of contexts, as well as the verb {"}to happen{"}). The third part of the analyzed emotive vocabulary has a negative characteristic (two-thirds - positive). The most often, “positive vocabulary” is expressed by adjectives. These conclusions may be useful for linguistic research in general, and for the development of automated data processing technologies, in particular.",

keywords = "Arabic, computer linguistics, sentiment analysis, vocabulary",

author = "O. Bernikova and O. Redkin",

year = "2018",

month = mar,

doi = "10.5593/sgemsocial2018H/31/S10.051",

language = "English",

isbn = "978-619-7408-32-4",

pages = "407--412",

booktitle = "5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018",

publisher = "STEF92 Technology Ltd.",

address = "Bulgaria",

}

RIS

TY - GEN

T1 - SENTIMENT ANALYSIS IN ARABIC: LINGUISTIC ISSUES

AU - Bernikova, O.

AU - Redkin, O.

PY - 2018/3

Y1 - 2018/3

N2 - The paper deals with the linguistic peculiarities of sentiment analysis of documents in Arabic. Automatic definition of emotive component in a large corpus is highly relevant today. At the same time, the theory of emotions in linguistics has not been sufficiently developed yet, therefore there is an urgent need to improve the computer methods of sentiment analysis for carrying out sociolinguistic research. In the framework of this study, we try to determine the patterns inherent in the Arabic language, which must be taken into account when conducting Big Data processing. To implement this task, we used the high-frequency word list, developed on the basis of processing of the texts with a volume of 1 million uses. After that, 1000 units were analyzed within the onedimensional emotional space ("positive" - "negative"). As a result it was determined that percentage of emotional vocabulary towards neutral is about 18%; the most representative part of speech in the emotive dictionary is the verb (34%), in approximately equal proportion nouns and verbal nouns (masdars) are represented - 24- 25% respectively, while adjectives constitutes only 16%. It is often difficult to identify a particular sentiment, as its characteristic depends on the context (for example such words as "discipline" can be used in a variety of contexts, as well as the verb "to happen"). The third part of the analyzed emotive vocabulary has a negative characteristic (two-thirds - positive). The most often, “positive vocabulary” is expressed by adjectives. These conclusions may be useful for linguistic research in general, and for the development of automated data processing technologies, in particular.

AB - The paper deals with the linguistic peculiarities of sentiment analysis of documents in Arabic. Automatic definition of emotive component in a large corpus is highly relevant today. At the same time, the theory of emotions in linguistics has not been sufficiently developed yet, therefore there is an urgent need to improve the computer methods of sentiment analysis for carrying out sociolinguistic research. In the framework of this study, we try to determine the patterns inherent in the Arabic language, which must be taken into account when conducting Big Data processing. To implement this task, we used the high-frequency word list, developed on the basis of processing of the texts with a volume of 1 million uses. After that, 1000 units were analyzed within the onedimensional emotional space ("positive" - "negative"). As a result it was determined that percentage of emotional vocabulary towards neutral is about 18%; the most representative part of speech in the emotive dictionary is the verb (34%), in approximately equal proportion nouns and verbal nouns (masdars) are represented - 24- 25% respectively, while adjectives constitutes only 16%. It is often difficult to identify a particular sentiment, as its characteristic depends on the context (for example such words as "discipline" can be used in a variety of contexts, as well as the verb "to happen"). The third part of the analyzed emotive vocabulary has a negative characteristic (two-thirds - positive). The most often, “positive vocabulary” is expressed by adjectives. These conclusions may be useful for linguistic research in general, and for the development of automated data processing technologies, in particular.

KW - Arabic

KW - computer linguistics

KW - sentiment analysis

KW - vocabulary

UR - http://dx.doi.org/10.5593/sgemsocial2018H/31/S10.051 https://sgemworld.at/ssgemlib/spip.php?article5564

UR - http://www.mendeley.com/research/sentiment-analysis-arabic-linguistic-issues

U2 - 10.5593/sgemsocial2018H/31/S10.051

DO - 10.5593/sgemsocial2018H/31/S10.051

M3 - Conference contribution

SN - 978-619-7408-32-4

SP - 407

EP - 412

BT - 5th International Multidisciplinary Scientific Conference on Social Sciences and Arts SGEM 2018

PB - STEF92 Technology Ltd.

ER -

ID: 29082919