Legal corpus «CorRIDA» and lexical complexity assessment of Russian official texts

Standard

Legal corpus «CorRIDA» and lexical complexity assessment of Russian official texts. / Блинова, Ольга Владимировна ; Белов, Сергей Александрович.

2019. 42 Реферат от Contemporary Approaches to Legal Linguistics, Вена, Австрия.

Результаты исследований: Материалы конференций › тезисы › Рецензирование

BibTeX

@conference{390885eeb48a48d2900136a7f0022b05,

title = "Legal corpus «CorRIDA» and lexical complexity assessment of Russian official texts",

abstract = "The paper is aimed at studying how difficult it is to understand and interpret Russian official documents. The study is based on the Corpus of Russian Local documents and Acts CorRIDA (subcorpus of the Healthcare domain – 617.000 tokens). The domain includes 6 classes of texts that differ in length, genre, macrostructure and linguistic features.Firstly, we use the methods where lexical complexity is assessed with the use of general word frequency. This approach proceeds from an assumption that rare words get less activated in recipient{\textquoteright}s mind. Secondly, we use methods that allow to assess the relation of the total type amount to the total of tokens in the texts (lexical diversity calculation using measures such as ordinary TTR, Herdan{\textquoteright}s C, Guiraud's Root TTR etc.). This approach is based on an assumption that high TTR indexes may be manifestation of higher lexical complexity.Thirdly, the study uses coefficients of lexical density, which assesses, in particular, the proportion of content words/nouns etc. in the document. This approach proceeds from an assumption that the more concepts used in the text, the more difficult the text is rated.Finally, we rank the texts according to the coefficients of lexical complexity obtained. The paper{\textquoteright}s objectives include also a more detailed analysis of the lexical phenomena that are evaluated as complex.The research is supported by the Russian Science Foundation, project #19-18-00525 “Understability of the Official Russian: Legal and Linguistic Issues”.",

author = "Блинова, {Ольга Владимировна} and Белов, {Сергей Александрович}",

year = "2019",

language = "English",

pages = "42",

note = "null ; Conference date: 08-11-2019 Through 10-11-2019",

url = "http://oegrl.com/wp-content/uploads/2019/10/%C3%96GRL2019_ConferenceSchedule_Presentations_Updated26102019.pdf, http://oegrl.com/index.php/conference-2019/",

}

RIS

TY - CONF

T1 - Legal corpus «CorRIDA» and lexical complexity assessment of Russian official texts

AU - Блинова, Ольга Владимировна

AU - Белов, Сергей Александрович

N1 - Conference code: 1

PY - 2019

Y1 - 2019

N2 - The paper is aimed at studying how difficult it is to understand and interpret Russian official documents. The study is based on the Corpus of Russian Local documents and Acts CorRIDA (subcorpus of the Healthcare domain – 617.000 tokens). The domain includes 6 classes of texts that differ in length, genre, macrostructure and linguistic features.Firstly, we use the methods where lexical complexity is assessed with the use of general word frequency. This approach proceeds from an assumption that rare words get less activated in recipient’s mind. Secondly, we use methods that allow to assess the relation of the total type amount to the total of tokens in the texts (lexical diversity calculation using measures such as ordinary TTR, Herdan’s C, Guiraud's Root TTR etc.). This approach is based on an assumption that high TTR indexes may be manifestation of higher lexical complexity.Thirdly, the study uses coefficients of lexical density, which assesses, in particular, the proportion of content words/nouns etc. in the document. This approach proceeds from an assumption that the more concepts used in the text, the more difficult the text is rated.Finally, we rank the texts according to the coefficients of lexical complexity obtained. The paper’s objectives include also a more detailed analysis of the lexical phenomena that are evaluated as complex.The research is supported by the Russian Science Foundation, project #19-18-00525 “Understability of the Official Russian: Legal and Linguistic Issues”.

AB - The paper is aimed at studying how difficult it is to understand and interpret Russian official documents. The study is based on the Corpus of Russian Local documents and Acts CorRIDA (subcorpus of the Healthcare domain – 617.000 tokens). The domain includes 6 classes of texts that differ in length, genre, macrostructure and linguistic features.Firstly, we use the methods where lexical complexity is assessed with the use of general word frequency. This approach proceeds from an assumption that rare words get less activated in recipient’s mind. Secondly, we use methods that allow to assess the relation of the total type amount to the total of tokens in the texts (lexical diversity calculation using measures such as ordinary TTR, Herdan’s C, Guiraud's Root TTR etc.). This approach is based on an assumption that high TTR indexes may be manifestation of higher lexical complexity.Thirdly, the study uses coefficients of lexical density, which assesses, in particular, the proportion of content words/nouns etc. in the document. This approach proceeds from an assumption that the more concepts used in the text, the more difficult the text is rated.Finally, we rank the texts according to the coefficients of lexical complexity obtained. The paper’s objectives include also a more detailed analysis of the lexical phenomena that are evaluated as complex.The research is supported by the Russian Science Foundation, project #19-18-00525 “Understability of the Official Russian: Legal and Linguistic Issues”.

UR - https://linguistlist.org/issues/29/29-3819/

M3 - Abstract

SP - 42

Y2 - 8 November 2019 through 10 November 2019

ER -

ID: 49022338

Standard

Harvard

APA

Vancouver

Author

BibTeX

RIS