STOPKA: Unbalanced Corpora Classification by Bootstrapping

Standard

STOPKA: Unbalanced Corpora Classification by Bootstrapping. / Popov, Andrey ; Rebrova, Polina; Adaskina, Yulia.

2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). Institute of Electrical and Electronics Engineers Inc., 2015. p. 141-143.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Harvard

Popov, A , Rebrova, P & Adaskina, Y 2015, STOPKA: Unbalanced Corpora Classification by Bootstrapping. in 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). Institute of Electrical and Electronics Engineers Inc., pp. 141-143, Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT 2015), St Petersburg, 8/11/15.

APA

Popov, A., Rebrova, P., & Adaskina, Y. (2015). STOPKA: Unbalanced Corpora Classification by Bootstrapping. In 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT) (pp. 141-143). Institute of Electrical and Electronics Engineers Inc..

Vancouver

Popov A , Rebrova P, Adaskina Y. STOPKA: Unbalanced Corpora Classification by Bootstrapping. In 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). Institute of Electrical and Electronics Engineers Inc. 2015. p. 141-143

Author

Popov, Andrey ; Rebrova, Polina ; Adaskina, Yulia. / STOPKA: Unbalanced Corpora Classification by Bootstrapping. 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT). Institute of Electrical and Electronics Engineers Inc., 2015. pp. 141-143

BibTeX

@inproceedings{6282a3f9f0194527a35ee976d39072c1,

title = "STOPKA: Unbalanced Corpora Classification by Bootstrapping",

abstract = "The paper describes a tool designed to help the expert to filter out irrelevant documents in cases where the data and classification criteria do not allow any automatic algorithm to be applied. The tool is based on a semi-automatic bootstrapping model that analyses the unlabeled corpus, gets the initial annotation information from the expert and uses it to rank documents according to their similarity to the class in question. Our experiments confirm that the method helps to achieve 0.9 Recall by only viewing around 23% of the corpora.",

keywords = "SVM, machine learning, syntax analysis, corpus annotation",

author = "Andrey Popov and Polina Rebrova and Yulia Adaskina",

year = "2015",

language = "English",

isbn = "9789526839707",

pages = "141--143",

booktitle = "2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

address = "United States",

note = "Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT 2015) ; Conference date: 08-11-2015 Through 13-11-2015",

}

RIS

TY - GEN

T1 - STOPKA: Unbalanced Corpora Classification by Bootstrapping

AU - Popov, Andrey

AU - Rebrova, Polina

AU - Adaskina, Yulia

PY - 2015

Y1 - 2015

N2 - The paper describes a tool designed to help the expert to filter out irrelevant documents in cases where the data and classification criteria do not allow any automatic algorithm to be applied. The tool is based on a semi-automatic bootstrapping model that analyses the unlabeled corpus, gets the initial annotation information from the expert and uses it to rank documents according to their similarity to the class in question. Our experiments confirm that the method helps to achieve 0.9 Recall by only viewing around 23% of the corpora.

AB - The paper describes a tool designed to help the expert to filter out irrelevant documents in cases where the data and classification criteria do not allow any automatic algorithm to be applied. The tool is based on a semi-automatic bootstrapping model that analyses the unlabeled corpus, gets the initial annotation information from the expert and uses it to rank documents according to their similarity to the class in question. Our experiments confirm that the method helps to achieve 0.9 Recall by only viewing around 23% of the corpora.

KW - SVM

KW - machine learning

KW - syntax analysis

KW - corpus annotation

UR - https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7382978

M3 - Conference contribution

SN - 9789526839707

SP - 141

EP - 143

BT - 2015 Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT)

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference (AINL-ISMW FRUCT 2015)

Y2 - 8 November 2015 through 13 November 2015

ER -

ID: 4792400