Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features

Standard

Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features. / Кочаров, Даниил Александрович; Меньшикова, Алла Павловна.

в: Communications in Computer and Information Science, Том 1119, 2019, стр. 126-135.

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Author

Кочаров, Даниил Александрович ; Меньшикова, Алла Павловна. / Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features. в: Communications in Computer and Information Science. 2019 ; Том 1119. стр. 126-135.

BibTeX

@article{979a56aea34342f297420f9e99682645,

title = "Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features",

abstract = "The paper presents a comparison between three approaches towards prosodic boundary prediction in Russian text, namely a rule-governed method and methods involving statistical classifier and deep learning technique. The methods aim to predict all possible prosodic boundaries in text applying morphological and syntactic information. All used features were described in terms of Universal Dependencies framework by means of SyntaxNet parser. The rule-governed method runs in a bottom-up fashion, using the information about syntax group edges and applying data-driven and hand-written linguistic rules. For machine learning methods, conditional random fields classifier and bidirectional LSTM model were built, with such features as part-of-speech tag, syntactic dependency type, syntactic relation embedding and presence of syntactic link between the current and adjacent words. As experimental material, we used the data of CORPRES corpus, containing over 30 hours of professionally read speech. Used separately, morphological features are slightly superior to syntactic ones, and their combination improves the results. BiLSTM yields the highest F 1 measure value of 90.4, as compared to 88.8 for CRF and 83.1 for rule-based method. ",

keywords = "BiLSTM, CRF, Phrasing, Prosodic boundaries, Syntax, Universal dependencies",

author = "Кочаров, {Даниил Александрович} and Меньшикова, {Алла Павловна}",

year = "2019",

doi = "10.1007/978-3-030-34518-1_9",

language = "English",

volume = "1119",

pages = "126--135",

journal = "Communications in Computer and Information Science",

issn = "1865-0929",

publisher = "Springer Nature",

}

RIS

TY - JOUR

T1 - Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features

AU - Кочаров, Даниил Александрович

AU - Меньшикова, Алла Павловна

PY - 2019

Y1 - 2019

N2 - The paper presents a comparison between three approaches towards prosodic boundary prediction in Russian text, namely a rule-governed method and methods involving statistical classifier and deep learning technique. The methods aim to predict all possible prosodic boundaries in text applying morphological and syntactic information. All used features were described in terms of Universal Dependencies framework by means of SyntaxNet parser. The rule-governed method runs in a bottom-up fashion, using the information about syntax group edges and applying data-driven and hand-written linguistic rules. For machine learning methods, conditional random fields classifier and bidirectional LSTM model were built, with such features as part-of-speech tag, syntactic dependency type, syntactic relation embedding and presence of syntactic link between the current and adjacent words. As experimental material, we used the data of CORPRES corpus, containing over 30 hours of professionally read speech. Used separately, morphological features are slightly superior to syntactic ones, and their combination improves the results. BiLSTM yields the highest F 1 measure value of 90.4, as compared to 88.8 for CRF and 83.1 for rule-based method.

AB - The paper presents a comparison between three approaches towards prosodic boundary prediction in Russian text, namely a rule-governed method and methods involving statistical classifier and deep learning technique. The methods aim to predict all possible prosodic boundaries in text applying morphological and syntactic information. All used features were described in terms of Universal Dependencies framework by means of SyntaxNet parser. The rule-governed method runs in a bottom-up fashion, using the information about syntax group edges and applying data-driven and hand-written linguistic rules. For machine learning methods, conditional random fields classifier and bidirectional LSTM model were built, with such features as part-of-speech tag, syntactic dependency type, syntactic relation embedding and presence of syntactic link between the current and adjacent words. As experimental material, we used the data of CORPRES corpus, containing over 30 hours of professionally read speech. Used separately, morphological features are slightly superior to syntactic ones, and their combination improves the results. BiLSTM yields the highest F 1 measure value of 90.4, as compared to 88.8 for CRF and 83.1 for rule-based method.

KW - BiLSTM

KW - CRF

KW - Phrasing

KW - Prosodic boundaries

KW - Syntax

KW - Universal dependencies

UR - http://www.scopus.com/inward/record.url?scp=85076223340&partnerID=8YFLogxK

UR - http://www.mendeley.com/research/prosodic-boundaries-prediction-russian-using-morphological-syntactic-features

U2 - 10.1007/978-3-030-34518-1_9

DO - 10.1007/978-3-030-34518-1_9

M3 - Article

VL - 1119

SP - 126

EP - 135

JO - Communications in Computer and Information Science

JF - Communications in Computer and Information Science

SN - 1865-0929

ER -

ID: 48695701

Standard

Harvard

APA

Vancouver

Author

BibTeX

RIS