Legal corpus «CorRIDA» and lexical complexity assessment of Russian official texts

Research output: Contribution to conferenceAbstractpeer-review


The paper is aimed at studying how difficult it is to understand and interpret Russian official documents. The study is based on the Corpus of Russian Local documents and Acts CorRIDA (subcorpus of the Healthcare domain – 617.000 tokens). The domain includes 6 classes of texts that differ in length, genre, macrostructure and linguistic features.
Firstly, we use the methods where lexical complexity is assessed with the use of general word frequency. This approach proceeds from an assumption that rare words get less activated in recipient’s mind.
Secondly, we use methods that allow to assess the relation of the total type amount to the total of tokens in the texts (lexical diversity calculation using measures such as ordinary TTR, Herdan’s C, Guiraud's Root TTR etc.). This approach is based on an assumption that high TTR indexes may be manifestation of higher lexical complexity.
Thirdly, the study uses coefficients of lexical density, which assesses, in particular, the proportion of content words/nouns etc. in the document. This approach proceeds from an assumption that the more concepts used in the text, the more difficult the text is rated.
Finally, we rank the texts according to the coefficients of lexical complexity obtained. The paper’s objectives include also a more detailed analysis of the lexical phenomena that are evaluated as complex.
The research is supported by the Russian Science Foundation, project #19-18-00525 “Understability of the Official Russian: Legal and Linguistic Issues”.


ConferenceContemporary Approaches to Legal Linguistics
Abbreviated titleÖGRL2019
Internet address

Scopus subject areas

  • Arts and Humanities(all)

Fingerprint Dive into the research topics of 'Legal corpus «CorRIDA» and lexical complexity assessment of Russian official texts'. Together they form a unique fingerprint.

Cite this