There exist two forms of speaker recognition in meeting recording systems: hardware recognition and software recognition, but the applicability of such two approaches in real meetings is not good enough and the hardware cost is too high. The main contribution of this paper is to use the theory of domain generalization to train the model and use contrast learning to improve the model migration learning ability, while this paper constructs a speaker recognition and meeting content transcription system based on deep learning audiovisual speech recognition (AVSR) model and speaker recognition model (SPR), which only needs a microphone and a camera to recognize the current speaker and use the system’s audiovisual speech recognition The speaker recognition module is used to transcribe the conference content.
Original languageEnglish
Pages (from-to)168-178
Number of pages11
JournalLecture Notes in Networks and Systems
Issue number776
DOIs
StatePublished - 21 Sep 2023

    Research areas

  • Audio-Visual Speech Recognition, Multi-modal, Speaker recognition

ID: 114434424