Audio-Visual Multi-modal Meeting Recording System

Результаты исследований: Научные публикации в периодических изданиях › статья › Рецензирование

Кафедра математического моделирования энергетических систем

DOI

https://doi.org/10.1007/978-3-031-43789-2_15
Конечная издательская версия

Wenfeng Yang
Pengyi Li
Wei Yang
Yuxing Liu
Ovanes Petrosian
Инь Ли

There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.

Язык оригинала	английский
Страницы (с-по)	168-178
Число страниц	11
Журнал	Lecture Notes in Networks and Systems
Номер выпуска	776
DOI	https://doi.org/10.1007/978-3-031-43789-2_15
Состояние	Опубликовано - 21 сен 2023

ID: 114434424

Pure – это продукт компании Elsevier
На данном информационном ресурсе могут быть опубликованы архивные материалы
с упоминанием физических и юридических лиц, включенных Министерством юстиции
Российской Федерации в реестр иностранных агентов

Вход в Pure