Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
Audio-Visual Multi-modal Meeting Recording System. / Yang, Wenfeng; Li, Pengyi; Yang, Wei; Liu, Yuxing ; Petrosian, Ovanes; Ли, Инь.
в: Lecture Notes in Networks and Systems, № 776, 21.09.2023, стр. 168-178.Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование
}
TY - JOUR
T1 - Audio-Visual Multi-modal Meeting Recording System
AU - Yang, Wenfeng
AU - Li, Pengyi
AU - Yang, Wei
AU - Liu, Yuxing
AU - Petrosian, Ovanes
AU - Ли, Инь
PY - 2023/9/21
Y1 - 2023/9/21
N2 - There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.
AB - There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.
KW - Audio-Visual Speech Recognition
KW - Multi-modal
KW - Speaker recognition
UR - https://www.mendeley.com/catalogue/2c223cec-3647-362a-aef5-235fc92a03aa/
U2 - 10.1007/978-3-031-43789-2_15
DO - 10.1007/978-3-031-43789-2_15
M3 - Article
SP - 168
EP - 178
JO - Lecture Notes in Networks and Systems
JF - Lecture Notes in Networks and Systems
SN - 2367-3389
IS - 776
ER -
ID: 114434424