Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Statistical parameters, usually used for diagnostic procedures, in many cases cannot be considered to be consistent ones from the statistical point of view, being strongly dependent on sample size. It leads to considerable devaluation of diagnostic results. This paper concerns the problem of consistency verification of parameters in the initial (pre-classification) stage of research. A complete list of parameters, which may be useful for description of text lexicostatistical structure, was determined. Each of these parameters was exposed to the justifiability test. In the result, a number of consistent parameters have been selected, which represent a description tool for the system characteristics of any text and corpora. Having rapid speed of convergence to the limit values, they may effectively perform classification procedures on text data of the arbitrary size. The proposed model of approximation makes it possible as well to forecast the values of all parameters for any sample size.
Original language | English |
---|---|
Title of host publication | Text, Speech and Dialogue - 3rd International Workshop, TSD 2000, Proceedings |
Editors | Petr Sojka, Ivan Kopecek, Karel Pala |
Publisher | Springer Nature |
Pages | 99-102 |
Number of pages | 4 |
ISBN (Print) | 3540410422, 9783540410423 |
DOIs | |
State | Published - 2000 |
Event | 3rd International Workshop on Text, Speech and Dialogue, TSD 2000 - Brno, Czech Republic Duration: 13 Sep 2000 → 16 Sep 2000 |
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 1902 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference | 3rd International Workshop on Text, Speech and Dialogue, TSD 2000 |
---|---|
Country/Territory | Czech Republic |
City | Brno |
Period | 13/09/00 → 16/09/00 |
ID: 88462519