As visual representation of information is getting more and more popular nowadays, studies of so-called “polycode” texts integrating both verbal and non-verbal types of information are of high priority. This study investigates how readers integrate text- gure information when reading multimodal texts. Using the eye- tracking method, we compared the processing of different multimodal and verbal texts. Three experiments were carried out. In Exp.1, native speakers of Russian read infographics (graphic visual representation of information). In Exp.2, we studied the processing of different types of visual notes containing a handwritten text and drawings (‘path’ (trajectory), linear, and radial sketchnotes) by native speakers of Russian. In Exp.3, Chinese students learning Russian as a foreign language examined infographics and verbal texts. We measured the total dwell time, the total xation count, average xation duration for each verbal and non-verbal zone of the texts. Text comprehension was controlled by off-line methods (subjective scaling, answers to the after the text questions, key words). We revealed speci c features of verbal text and polycode text perception and proposed a number of recommendations for creating effective polycode texts. The overall results of the study show that readers process the information better and faster while reading multimodal text of any format than a verbal text. The reading patterns in polycode text processing are text directed. The in uence of the text type becomes crucial only to experienced readers: the better the reading skills are, the greater the in uence of the factor “text type” is.