Speech plays a crucial role in effective communication for teachers. Therefore, it is essential to choose a communication strategy that can help achieve goals quickly. In order to captivate students’ attention and implement a successful communication strategy, teachers often use certain multiword expressions. These expressions are typically manually analyzed in contemporary research on pedagogical discourse. The importance of this research lies in the growing need for a blend of linguistic research methods and artificial intelligence techniques in the domain of pedagogical discourse. Analyzing multiword expressions can help identify language and communication elements that have a significant informational impact, as well as how to properly utilize them. This study aims to identify multiword expressions in a corpus of teachers’ speech, specifically transcripts of recorded lessons from secondary school teachers in the Russian Federation. The corpus consists of lessons from both more and less effective teachers, with more effective teachers meeting specific criteria such as working in non-selective schools with diverse student populations and achieving above-average results in State Final Certification (the 9th Grade). By employing statistical metrics, contextualized vector models, and clustering algorithms, we are able to detect and describe the unique vocabulary used by teachers. The findings reveal that the speech of more effective teachers is distinguished by specific lexical markers related to interaction with students and structuring lessons. These results could be beneficial for speech technology specialists developing voice assistants for teachers, as well as linguists creating speech corpora in the Russian language.