DOI

There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.
Язык оригиналаанглийский
Страницы (с-по)168-178
Число страниц11
ЖурналLecture Notes in Networks and Systems
Номер выпуска776
DOI
СостояниеОпубликовано - 21 сен 2023

ID: 114434424