Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features. / Кочаров, Даниил Александрович; Меньшикова, Алла Павловна.
в: Communications in Computer and Information Science, Том 1119, 2019, стр. 126-135.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features
AU - Кочаров, Даниил Александрович
AU - Меньшикова, Алла Павловна
PY - 2019
Y1 - 2019
N2 - The paper presents a comparison between three approaches towards prosodic boundary prediction in Russian text, namely a rule-governed method and methods involving statistical classifier and deep learning technique. The methods aim to predict all possible prosodic boundaries in text applying morphological and syntactic information. All used features were described in terms of Universal Dependencies framework by means of SyntaxNet parser. The rule-governed method runs in a bottom-up fashion, using the information about syntax group edges and applying data-driven and hand-written linguistic rules. For machine learning methods, conditional random fields classifier and bidirectional LSTM model were built, with such features as part-of-speech tag, syntactic dependency type, syntactic relation embedding and presence of syntactic link between the current and adjacent words. As experimental material, we used the data of CORPRES corpus, containing over 30 hours of professionally read speech. Used separately, morphological features are slightly superior to syntactic ones, and their combination improves the results. BiLSTM yields the highest F 1 measure value of 90.4, as compared to 88.8 for CRF and 83.1 for rule-based method.
AB - The paper presents a comparison between three approaches towards prosodic boundary prediction in Russian text, namely a rule-governed method and methods involving statistical classifier and deep learning technique. The methods aim to predict all possible prosodic boundaries in text applying morphological and syntactic information. All used features were described in terms of Universal Dependencies framework by means of SyntaxNet parser. The rule-governed method runs in a bottom-up fashion, using the information about syntax group edges and applying data-driven and hand-written linguistic rules. For machine learning methods, conditional random fields classifier and bidirectional LSTM model were built, with such features as part-of-speech tag, syntactic dependency type, syntactic relation embedding and presence of syntactic link between the current and adjacent words. As experimental material, we used the data of CORPRES corpus, containing over 30 hours of professionally read speech. Used separately, morphological features are slightly superior to syntactic ones, and their combination improves the results. BiLSTM yields the highest F 1 measure value of 90.4, as compared to 88.8 for CRF and 83.1 for rule-based method.
KW - BiLSTM
KW - CRF
KW - Phrasing
KW - Prosodic boundaries
KW - Syntax
KW - Universal dependencies
UR - http://www.scopus.com/inward/record.url?scp=85076223340&partnerID=8YFLogxK
UR - http://www.mendeley.com/research/prosodic-boundaries-prediction-russian-using-morphological-syntactic-features
U2 - 10.1007/978-3-030-34518-1_9
DO - 10.1007/978-3-030-34518-1_9
M3 - Article
VL - 1119
SP - 126
EP - 135
JO - Communications in Computer and Information Science
JF - Communications in Computer and Information Science
SN - 1865-0929
ER -
ID: 48695701