The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire")

Standard

The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire"). / Zakharov, Victor.

2018 International Workshop on Computational Models in Language and Speech, CMLS 2018. Vol. 2303 RWTH Aahen University, 2018. p. 1-19 (CEUR Workshop Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Harvard

Zakharov, V 2018, The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire"). in 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018. vol. 2303, CEUR Workshop Proceedings, RWTH Aahen University, pp. 1-19, 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018, Kazan, Russian Federation, 1/11/18.

APA

Zakharov, V. (2018). The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire"). In 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018 (Vol. 2303, pp. 1-19). (CEUR Workshop Proceedings). RWTH Aahen University.

Vancouver

Zakharov V. The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire"). In 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018. Vol. 2303. RWTH Aahen University. 2018. p. 1-19. (CEUR Workshop Proceedings).

Author

Zakharov, Victor. / The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire"). 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018. Vol. 2303 RWTH Aahen University, 2018. pp. 1-19 (CEUR Workshop Proceedings).

BibTeX

@inproceedings{71746822a7ad4631ae01bf69fc1bb90e,

title = "The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of {"}empire{"})",

abstract = "The paper presents ongoing results of automatic creation of a semantic field of «empire» in Russian based on distribution and statistical method using corpus data. A semantic field is a collection of content units covering a certain area of human experience and forming relatively an autonomous microsystem with one or a few centers. The nature of relations within it is mostly named as an association. The idea is to extract from data on syntagmatic collocability a set of lexical units connected by semantic paradigmatic relations of various strength using distributional analyses techniques. Nowadays the presence of big corpora and sophisticated algorithms give the possibility and hope to reach a reasonable results. The first goal of the study is to develop tools and methodology to fill semantic fields by lexical units on the basis of morphologically tagged corpora and special sketch grammar and then to measure the strength of relations between units and to evaluate the method. We were using a corpus system the Sketch Engine that implements the method of distributional statistical analysis. Text material was represented by own topical Russian corpora created from Russian texts of XVIII –XX centuries. In the course of work and to achieve the goal we have solved a number of tasks have received lists of items filling the semantic space around a concept of “empire” and we are evaluating the method as successive and promising one. At conclusion further steps were identified to clarify the perspective areas of work and to improve the results obtained.",

keywords = "Concept of Empire in Russian, Distributive and statistical analysis, Semantic field",

author = "Victor Zakharov",

year = "2018",

month = jan,

day = "1",

language = "English",

volume = "2303",

series = "CEUR Workshop Proceedings",

publisher = "RWTH Aahen University",

pages = "1--19",

booktitle = "2018 International Workshop on Computational Models in Language and Speech, CMLS 2018",

address = "Germany",

note = "2018 International Workshop on Computational Models in Language and Speech, CMLS 2018 ; Conference date: 01-11-2018",

}

RIS

TY - GEN

T1 - The distributive and statistical analysis as a tool to automate the formation of semantic fields (on the example of the linguocultural concept of "empire")

AU - Zakharov, Victor

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The paper presents ongoing results of automatic creation of a semantic field of «empire» in Russian based on distribution and statistical method using corpus data. A semantic field is a collection of content units covering a certain area of human experience and forming relatively an autonomous microsystem with one or a few centers. The nature of relations within it is mostly named as an association. The idea is to extract from data on syntagmatic collocability a set of lexical units connected by semantic paradigmatic relations of various strength using distributional analyses techniques. Nowadays the presence of big corpora and sophisticated algorithms give the possibility and hope to reach a reasonable results. The first goal of the study is to develop tools and methodology to fill semantic fields by lexical units on the basis of morphologically tagged corpora and special sketch grammar and then to measure the strength of relations between units and to evaluate the method. We were using a corpus system the Sketch Engine that implements the method of distributional statistical analysis. Text material was represented by own topical Russian corpora created from Russian texts of XVIII –XX centuries. In the course of work and to achieve the goal we have solved a number of tasks have received lists of items filling the semantic space around a concept of “empire” and we are evaluating the method as successive and promising one. At conclusion further steps were identified to clarify the perspective areas of work and to improve the results obtained.

AB - The paper presents ongoing results of automatic creation of a semantic field of «empire» in Russian based on distribution and statistical method using corpus data. A semantic field is a collection of content units covering a certain area of human experience and forming relatively an autonomous microsystem with one or a few centers. The nature of relations within it is mostly named as an association. The idea is to extract from data on syntagmatic collocability a set of lexical units connected by semantic paradigmatic relations of various strength using distributional analyses techniques. Nowadays the presence of big corpora and sophisticated algorithms give the possibility and hope to reach a reasonable results. The first goal of the study is to develop tools and methodology to fill semantic fields by lexical units on the basis of morphologically tagged corpora and special sketch grammar and then to measure the strength of relations between units and to evaluate the method. We were using a corpus system the Sketch Engine that implements the method of distributional statistical analysis. Text material was represented by own topical Russian corpora created from Russian texts of XVIII –XX centuries. In the course of work and to achieve the goal we have solved a number of tasks have received lists of items filling the semantic space around a concept of “empire” and we are evaluating the method as successive and promising one. At conclusion further steps were identified to clarify the perspective areas of work and to improve the results obtained.

KW - Concept of Empire in Russian

KW - Distributive and statistical analysis

KW - Semantic field

UR - http://www.scopus.com/inward/record.url?scp=85060604160&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85060604160

VL - 2303

T3 - CEUR Workshop Proceedings

SP - 1

EP - 19

BT - 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018

PB - RWTH Aahen University

T2 - 2018 International Workshop on Computational Models in Language and Speech, CMLS 2018

Y2 - 1 November 2018

ER -

ID: 38994252