The paper deals with a rapidly developing area in modern corpus linguistics, that is creation of big corpora on the base of web texts. This new technology is known as Wacky or BootCat. The problems of such corpora are discussed, namely the quality of web texts and inadequate balance of web corpora. The latter is an obstacle for corpus creators as well as for corpus users. We attempt to compare and describe Russian web corpora from Aranea family of comparable corpora.
Translated title of the contributionEVALUATION OF INTERNET CORPORA OF RUSSIAN
Original languageRussian
Title of host publicationКорпусная лингвистика - 2015
Subtitle of host publicationТруды международной конференции
Place of PublicationСПб
PublisherИздательство Санкт-Петербургского университета
Pages219-229
ISBN (Print)978-5-8465-1498-0
StatePublished - 2015
EventМеждународная конференция "Корпусная лингвистика - 2015" - Санкт-Петербург, Russian Federation
Duration: 22 Jun 201526 Jun 2015

Conference

ConferenceМеждународная конференция "Корпусная лингвистика - 2015"
Country/TerritoryRussian Federation
CityСанкт-Петербург
Period22/06/1526/06/15

ID: 4733587