Statistical parameterisation of text corpora

Standard

Statistical parameterisation of text corpora. / Martynenko, Gregory Y.; Sherstinova, Tatiana Y.

Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings. ed. / Petr Sojka; Ivan Kopecek; Karel Pala. Springer Nature, 2000. p. 99-102 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1902).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Martynenko, GY & Sherstinova, TY 2000, Statistical parameterisation of text corpora. in P Sojka, I Kopecek & K Pala (eds), Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1902, Springer Nature, pp. 99-102, 3rd International Workshop on Text, Speech and Dialogue, TSD 2000, Brno, Czech Republic, 13/09/00. https://doi.org/10.1007/3-540-45323-7_17

APA

Martynenko, G. Y., & Sherstinova, T. Y. (2000). Statistical parameterisation of text corpora. In P. Sojka, I. Kopecek, & K. Pala (Eds.), Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings (pp. 99-102). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1902). Springer Nature. https://doi.org/10.1007/3-540-45323-7_17

Vancouver

Martynenko GY, Sherstinova TY. Statistical parameterisation of text corpora. In Sojka P, Kopecek I, Pala K, editors, Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings. Springer Nature. 2000. p. 99-102. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/3-540-45323-7_17

Author

Martynenko, Gregory Y. ; Sherstinova, Tatiana Y. / Statistical parameterisation of text corpora. Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings. editor / Petr Sojka ; Ivan Kopecek ; Karel Pala. Springer Nature, 2000. pp. 99-102 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{dbf25e9768c5404f879593a8eb0f859f,

title = "Statistical parameterisation of text corpora",

abstract = "Statistical parameters, usually used for diagnostic procedures, in many cases cannot be considered to be consistent ones from the statistical point of view, being strongly dependent on sample size. It leads to considerable devaluation of diagnostic results. This paper concerns the problem of consistency verification of parameters in the initial (pre-classification) stage of research. A complete list of parameters, which may be useful for description of text lexicostatistical structure, was determined. Each of these parameters was exposed to the justifiability test. In the result, a number of consistent parameters have been selected, which represent a description tool for the system characteristics of any text and corpora. Having rapid speed of convergence to the limit values, they may effectively perform classification procedures on text data of the arbitrary size. The proposed model of approximation makes it possible as well to forecast the values of all parameters for any sample size.",

author = "Martynenko, {Gregory Y.} and Sherstinova, {Tatiana Y.}",

note = "Publisher Copyright: {\textcopyright} Springer-Verlag Berlin Heidelberg 2000.; 3rd International Workshop on Text, Speech and Dialogue, TSD 2000 ; Conference date: 13-09-2000 Through 16-09-2000",

year = "2000",

doi = "10.1007/3-540-45323-7_17",

language = "English",

isbn = "3540410422",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature",

pages = "99--102",

editor = "Petr Sojka and Ivan Kopecek and Karel Pala",

booktitle = "Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings",

address = "Germany",

}

RIS

TY - GEN

T1 - Statistical parameterisation of text corpora

AU - Martynenko, Gregory Y.

AU - Sherstinova, Tatiana Y.

N1 - Publisher Copyright: © Springer-Verlag Berlin Heidelberg 2000.

PY - 2000

Y1 - 2000

N2 - Statistical parameters, usually used for diagnostic procedures, in many cases cannot be considered to be consistent ones from the statistical point of view, being strongly dependent on sample size. It leads to considerable devaluation of diagnostic results. This paper concerns the problem of consistency verification of parameters in the initial (pre-classification) stage of research. A complete list of parameters, which may be useful for description of text lexicostatistical structure, was determined. Each of these parameters was exposed to the justifiability test. In the result, a number of consistent parameters have been selected, which represent a description tool for the system characteristics of any text and corpora. Having rapid speed of convergence to the limit values, they may effectively perform classification procedures on text data of the arbitrary size. The proposed model of approximation makes it possible as well to forecast the values of all parameters for any sample size.

AB - Statistical parameters, usually used for diagnostic procedures, in many cases cannot be considered to be consistent ones from the statistical point of view, being strongly dependent on sample size. It leads to considerable devaluation of diagnostic results. This paper concerns the problem of consistency verification of parameters in the initial (pre-classification) stage of research. A complete list of parameters, which may be useful for description of text lexicostatistical structure, was determined. Each of these parameters was exposed to the justifiability test. In the result, a number of consistent parameters have been selected, which represent a description tool for the system characteristics of any text and corpora. Having rapid speed of convergence to the limit values, they may effectively perform classification procedures on text data of the arbitrary size. The proposed model of approximation makes it possible as well to forecast the values of all parameters for any sample size.

UR - http://www.scopus.com/inward/record.url?scp=84947550591&partnerID=8YFLogxK

U2 - 10.1007/3-540-45323-7_17

DO - 10.1007/3-540-45323-7_17

M3 - Conference contribution

AN - SCOPUS:84947550591

SN - 3540410422

SN - 9783540410423

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 99

EP - 102

BT - Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings

A2 - Sojka, Petr

A2 - Kopecek, Ivan

A2 - Pala, Karel

PB - Springer Nature

T2 - 3rd International Workshop on Text, Speech and Dialogue, TSD 2000

Y2 - 13 September 2000 through 16 September 2000

ER -

ID: 88462519