Keyword extraction from single Russian document

Mikhail Vadimovich Sandul, Elena Georgievna Mikhailova

Research output

Abstract

The problem of automatic keyword and phrases extraction from a text occurs in different tasks of information retrieval and text mining. The task is the identification of terms that best describe the subject of a document. Currently there are a lot of research to solve this problem. Basically, algorithms are developed for texts in English. The possibility of applying these algorithms to the Russian texts are not sufficiently investigated. One of the most known algorithms for solving the problem of keyword extraction is RAKE. This article examines the effectiveness of RAKE algorithm for texts in Russian. The work also applies the hybrid method, which uses the Γ-index metric for phrases weighting, which were obtained using the algorithm RAKE. The article shows that this algorithm is more accurate than PAKE while reducing the number of selected phrases.

Original languageEnglish
Pages (from-to)30-36
Number of pages7
JournalCEUR Workshop Proceedings
Volume2135
Publication statusPublished - 1 Jan 2018
Event3rd Conference on Software Engineering and Information Management, SEIM 2018 - Saint Petersburg
Duration: 14 Apr 2018 → …

Fingerprint

Information retrieval

Scopus subject areas

  • Computer Science(all)

Cite this

@article{749462373f2442b9abffa8f504288d6a,
title = "Keyword extraction from single Russian document",
abstract = "The problem of automatic keyword and phrases extraction from a text occurs in different tasks of information retrieval and text mining. The task is the identification of terms that best describe the subject of a document. Currently there are a lot of research to solve this problem. Basically, algorithms are developed for texts in English. The possibility of applying these algorithms to the Russian texts are not sufficiently investigated. One of the most known algorithms for solving the problem of keyword extraction is RAKE. This article examines the effectiveness of RAKE algorithm for texts in Russian. The work also applies the hybrid method, which uses the Γ-index metric for phrases weighting, which were obtained using the algorithm RAKE. The article shows that this algorithm is more accurate than PAKE while reducing the number of selected phrases.",
author = "Sandul, {Mikhail Vadimovich} and Mikhailova, {Elena Georgievna}",
year = "2018",
month = "1",
day = "1",
language = "English",
volume = "2135",
pages = "30--36",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "RWTH Aahen University",

}

Keyword extraction from single Russian document. / Sandul, Mikhail Vadimovich; Mikhailova, Elena Georgievna.

In: CEUR Workshop Proceedings, Vol. 2135, 01.01.2018, p. 30-36.

Research output

TY - JOUR

T1 - Keyword extraction from single Russian document

AU - Sandul, Mikhail Vadimovich

AU - Mikhailova, Elena Georgievna

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The problem of automatic keyword and phrases extraction from a text occurs in different tasks of information retrieval and text mining. The task is the identification of terms that best describe the subject of a document. Currently there are a lot of research to solve this problem. Basically, algorithms are developed for texts in English. The possibility of applying these algorithms to the Russian texts are not sufficiently investigated. One of the most known algorithms for solving the problem of keyword extraction is RAKE. This article examines the effectiveness of RAKE algorithm for texts in Russian. The work also applies the hybrid method, which uses the Γ-index metric for phrases weighting, which were obtained using the algorithm RAKE. The article shows that this algorithm is more accurate than PAKE while reducing the number of selected phrases.

AB - The problem of automatic keyword and phrases extraction from a text occurs in different tasks of information retrieval and text mining. The task is the identification of terms that best describe the subject of a document. Currently there are a lot of research to solve this problem. Basically, algorithms are developed for texts in English. The possibility of applying these algorithms to the Russian texts are not sufficiently investigated. One of the most known algorithms for solving the problem of keyword extraction is RAKE. This article examines the effectiveness of RAKE algorithm for texts in Russian. The work also applies the hybrid method, which uses the Γ-index metric for phrases weighting, which were obtained using the algorithm RAKE. The article shows that this algorithm is more accurate than PAKE while reducing the number of selected phrases.

UR - http://www.scopus.com/inward/record.url?scp=85050482656&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:85050482656

VL - 2135

SP - 30

EP - 36

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -