Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Statistical parameterisation of text corpora. / Martynenko, Gregory Y.; Sherstinova, Tatiana Y.
Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings. ed. / Petr Sojka; Ivan Kopecek; Karel Pala. Springer Nature, 2000. p. 99-102 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1902).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Statistical parameterisation of text corpora
AU - Martynenko, Gregory Y.
AU - Sherstinova, Tatiana Y.
N1 - Publisher Copyright: © Springer-Verlag Berlin Heidelberg 2000.
PY - 2000
Y1 - 2000
N2 - Statistical parameters, usually used for diagnostic procedures, in many cases cannot be considered to be consistent ones from the statistical point of view, being strongly dependent on sample size. It leads to considerable devaluation of diagnostic results. This paper concerns the problem of consistency verification of parameters in the initial (pre-classification) stage of research. A complete list of parameters, which may be useful for description of text lexicostatistical structure, was determined. Each of these parameters was exposed to the justifiability test. In the result, a number of consistent parameters have been selected, which represent a description tool for the system characteristics of any text and corpora. Having rapid speed of convergence to the limit values, they may effectively perform classification procedures on text data of the arbitrary size. The proposed model of approximation makes it possible as well to forecast the values of all parameters for any sample size.
AB - Statistical parameters, usually used for diagnostic procedures, in many cases cannot be considered to be consistent ones from the statistical point of view, being strongly dependent on sample size. It leads to considerable devaluation of diagnostic results. This paper concerns the problem of consistency verification of parameters in the initial (pre-classification) stage of research. A complete list of parameters, which may be useful for description of text lexicostatistical structure, was determined. Each of these parameters was exposed to the justifiability test. In the result, a number of consistent parameters have been selected, which represent a description tool for the system characteristics of any text and corpora. Having rapid speed of convergence to the limit values, they may effectively perform classification procedures on text data of the arbitrary size. The proposed model of approximation makes it possible as well to forecast the values of all parameters for any sample size.
UR - http://www.scopus.com/inward/record.url?scp=84947550591&partnerID=8YFLogxK
U2 - 10.1007/3-540-45323-7_17
DO - 10.1007/3-540-45323-7_17
M3 - Conference contribution
AN - SCOPUS:84947550591
SN - 3540410422
SN - 9783540410423
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 99
EP - 102
BT - Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings
A2 - Sojka, Petr
A2 - Kopecek, Ivan
A2 - Pala, Karel
PB - Springer Nature
T2 - 3rd International Workshop on Text, Speech and Dialogue, TSD 2000
Y2 - 13 September 2000 through 16 September 2000
ER -
ID: 88462519