The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical interventions - types of patient treatments. Experiments were carried out to evaluate the quality of clustering for the following algorithms: K-means; K- means++; Hierarchical clustering, SIB (Sequential information bottleneck) together with the LSA (Latent Semantic Analysis) methods and MI (Mutual Information) which allow selecting feature vectors. Best results of clustering were achieved by K- means++ together with LSA then 210- dimensional space was chosen: Purity = 0.5719, Entropy = 1.3841, Normalized Entropy = 0.6299.

Original languageEnglish
Title of host publication2015 INTERNATIONAL CONFERENCE "STABILITY AND CONTROL PROCESSES" IN MEMORY OF V.I. ZUBOV (SCP)
EditorsLA Petrosyan, AP Zhabko
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages555-557
Number of pages3
ISBN (Print)9781467376983
DOIs
StatePublished - 2015
EventInternational Conference on "Stability and Control Processes" in Memory of V.I. Zubov, SCP 2015 - Петергоф, St. Petersburg, Russian Federation
Duration: 5 Oct 20159 Oct 2015
http://www.apmath.spbu.ru/scp2015/openconf.php

Conference

ConferenceInternational Conference on "Stability and Control Processes" in Memory of V.I. Zubov, SCP 2015
Abbreviated titleSCP 2015
Country/TerritoryRussian Federation
CitySt. Petersburg
Period5/10/159/10/15
Internet address

ID: 3983135