Standard

Audio-Visual Multi-modal Meeting Recording System. / Yang, Wenfeng; Li, Pengyi; Yang, Wei; Liu, Yuxing ; Petrosian, Ovanes; Ли, Инь.

в: Lecture Notes in Networks and Systems, № 776, 21.09.2023, стр. 168-178.

Результаты исследований: Научные публикации в периодических изданияхстатьяРецензирование

Harvard

Yang, W, Li, P, Yang, W, Liu, Y, Petrosian, O & Ли, И 2023, 'Audio-Visual Multi-modal Meeting Recording System', Lecture Notes in Networks and Systems, № 776, стр. 168-178. https://doi.org/10.1007/978-3-031-43789-2_15

APA

Yang, W., Li, P., Yang, W., Liu, Y., Petrosian, O., & Ли, И. (2023). Audio-Visual Multi-modal Meeting Recording System. Lecture Notes in Networks and Systems, (776), 168-178. https://doi.org/10.1007/978-3-031-43789-2_15

Vancouver

Yang W, Li P, Yang W, Liu Y, Petrosian O, Ли И. Audio-Visual Multi-modal Meeting Recording System. Lecture Notes in Networks and Systems. 2023 Сент. 21;(776):168-178. https://doi.org/10.1007/978-3-031-43789-2_15

Author

Yang, Wenfeng ; Li, Pengyi ; Yang, Wei ; Liu, Yuxing ; Petrosian, Ovanes ; Ли, Инь. / Audio-Visual Multi-modal Meeting Recording System. в: Lecture Notes in Networks and Systems. 2023 ; № 776. стр. 168-178.

BibTeX

@article{670641eee6fa4b168c0ba8a96feda4f1,
title = "Audio-Visual Multi-modal Meeting Recording System",
abstract = "There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system{\textquoteright}s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.",
keywords = "Audio-Visual Speech Recognition, Multi-modal, Speaker recognition",
author = "Wenfeng Yang and Pengyi Li and Wei Yang and Yuxing Liu and Ovanes Petrosian and Инь Ли",
year = "2023",
month = sep,
day = "21",
doi = "10.1007/978-3-031-43789-2_15",
language = "English",
pages = "168--178",
journal = "Lecture Notes in Networks and Systems",
issn = "2367-3389",
publisher = "Springer Nature",
number = "776",

}

RIS

TY - JOUR

T1 - Audio-Visual Multi-modal Meeting Recording System

AU - Yang, Wenfeng

AU - Li, Pengyi

AU - Yang, Wei

AU - Liu, Yuxing

AU - Petrosian, Ovanes

AU - Ли, Инь

PY - 2023/9/21

Y1 - 2023/9/21

N2 - There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.

AB - There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.

KW - Audio-Visual Speech Recognition

KW - Multi-modal

KW - Speaker recognition

UR - https://www.mendeley.com/catalogue/2c223cec-3647-362a-aef5-235fc92a03aa/

U2 - 10.1007/978-3-031-43789-2_15

DO - 10.1007/978-3-031-43789-2_15

M3 - Article

SP - 168

EP - 178

JO - Lecture Notes in Networks and Systems

JF - Lecture Notes in Networks and Systems

SN - 2367-3389

IS - 776

ER -

ID: 114434424