Research output: Contribution to journal › Article › peer-review
Internet data in the study of language change : A case study of alternations in Russian comparatives and a program to work with such data. / Magomedova, V. D.; Slioussar, N. A.
In: Komp'juternaja Lingvistika i Intellektual'nye Tehnologii, 2014, p. 379-390.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Internet data in the study of language change
T2 - A case study of alternations in Russian comparatives and a program to work with such data
AU - Magomedova, V. D.
AU - Slioussar, N. A.
PY - 2014
Y1 - 2014
N2 - The Internet is a unique source of non-standard forms, which gives us a novel opportunity to analyze fine-grained dynamics of language change. We used this opportunity to study the decay of historic consonant alternations in Russian. In standard Russian, these alternations are present in some verb forms and in comparatives (e.g. suxoj 'dry' - sushe 'drier', ljubit' 'to love' - ljublju 'I love'), as well as before certain derivational suffixes. Verb forms have been recently studied by Slioussar and Kholodilova (2013), and we looked at comparatives. Two groups of adjectives were selected: ones that have normative comparatives with alternations and ones that do not, but native speakers still try to generate such forms. In the first group, some adjectives like ubogij 'poky' have up to 30 % of comparatives without alternations, but, unlike with verbs, no significant correlation with adjective frequency or its other characteristics was found. The second group consisted primarily of compound adjectives ending in -gij, -kij, -xij. Here, the most important factor is whether the second part of the compound is used as an independent adjective. If it is not (e.g. as in dlinnorukij 'long- Armed'), most comparatives lack alternations. Searching for forms on the Internet, we faced many problems. The counts provided by search engines are extremely inaccurate, only the first thousand results are shown, they cannot be downloaded in a convenient format, contain a lot of typos and other irrelevant data etc. We present a program called Lingui-Pingui that we developed to solve these and some other problems.
AB - The Internet is a unique source of non-standard forms, which gives us a novel opportunity to analyze fine-grained dynamics of language change. We used this opportunity to study the decay of historic consonant alternations in Russian. In standard Russian, these alternations are present in some verb forms and in comparatives (e.g. suxoj 'dry' - sushe 'drier', ljubit' 'to love' - ljublju 'I love'), as well as before certain derivational suffixes. Verb forms have been recently studied by Slioussar and Kholodilova (2013), and we looked at comparatives. Two groups of adjectives were selected: ones that have normative comparatives with alternations and ones that do not, but native speakers still try to generate such forms. In the first group, some adjectives like ubogij 'poky' have up to 30 % of comparatives without alternations, but, unlike with verbs, no significant correlation with adjective frequency or its other characteristics was found. The second group consisted primarily of compound adjectives ending in -gij, -kij, -xij. Here, the most important factor is whether the second part of the compound is used as an independent adjective. If it is not (e.g. as in dlinnorukij 'long- Armed'), most comparatives lack alternations. Searching for forms on the Internet, we faced many problems. The counts provided by search engines are extremely inaccurate, only the first thousand results are shown, they cannot be downloaded in a convenient format, contain a lot of typos and other irrelevant data etc. We present a program called Lingui-Pingui that we developed to solve these and some other problems.
KW - Comparative
KW - Consonants
KW - Historical alternations
KW - Search optimization
UR - http://www.scopus.com/inward/record.url?scp=84904818552&partnerID=8YFLogxK
M3 - статья
AN - SCOPUS:84904818552
SP - 379
EP - 390
JO - Компьютерная лингвистика и интеллектуальные технологии
JF - Компьютерная лингвистика и интеллектуальные технологии
SN - 2221-7932
ER -
ID: 9219036