Readability and Scientific Texts Quality for the Automatic Summarization. / Makarova, Olga; Yagunova, Elena.
2015. 28 Abstract from Multilingualism in Specialized Communication: Challenges and Opportunities in the Digital Age. 20th European Symposium on Languages for Special Purposes, Vienna, Austria.Research output: Contribution to conference › Abstract
}
TY - CONF
T1 - Readability and Scientific Texts Quality for the Automatic Summarization
AU - Makarova, Olga
AU - Yagunova, Elena
PY - 2015
Y1 - 2015
N2 - Automatic summarization is a well known problem in natural language processing with many applications in different areas. Automatic creation of summaries for scientific texts may be useful in bibliographic databases, professional and science communications and education. Summarization systems for scientific texts only sometimes use specific style features and academic writing traditions to increase quality. In this work we describe a flexible automatic summarization system for scientific papers written in Russian that chooses a strategy based on the “”substyle”” of an input text. We implemented three main approaches to extraction-based summarization: statistical (n-gram tf-idf), structural (text positions) and semantic (lexical chains). Using simple text characteristics, such as paragraph length, number of sections and number of paragraphs, and integrative features (entropy and readability) system can decide which combination of methods and weights will produce a better summary.
AB - Automatic summarization is a well known problem in natural language processing with many applications in different areas. Automatic creation of summaries for scientific texts may be useful in bibliographic databases, professional and science communications and education. Summarization systems for scientific texts only sometimes use specific style features and academic writing traditions to increase quality. In this work we describe a flexible automatic summarization system for scientific papers written in Russian that chooses a strategy based on the “”substyle”” of an input text. We implemented three main approaches to extraction-based summarization: statistical (n-gram tf-idf), structural (text positions) and semantic (lexical chains). Using simple text characteristics, such as paragraph length, number of sections and number of paragraphs, and integrative features (entropy and readability) system can decide which combination of methods and weights will produce a better summary.
KW - automatic summarization text entropy readability scientific texts lexical chains
M3 - тезисы
SP - 28
Y2 - 7 July 2015 through 9 July 2015
ER -
ID: 6935423