This article discusses approaches to collecting data from Web 2.0 platforms such as social networks and messengers. The authors propose the implementation of a flexible architecture for a focused web crawler to collect data of users’ discussion in the social network Facebook and the Telegram messenger. The proposed crawler is based on interaction with a platform’s API, get/post requests, and simulating actions in a browser. The authors set up an experiment comparing implementation of proposed data crawling approaches. The data on the COVID-19 virus was collected from Facebook social network and Telegram messenger using RUM Extractor for Facebook and large number open-source Telegram crawler. Developed focused crawler reached the speed of 15 participants per second and 12 posts per second without blocking account when processing a user discussion in Facebook. Telegram crawler showed the speed of 200 participants per second and 300 posts per second without blocking.

Язык оригиналаанглийский
Название основной публикацииStability and Control Processes
Подзаголовок основной публикацииProceedings of the 4th International Conference Dedicated to the Memory of Professor Vladimir Zubov
ИздательSpringer Nature
Число страниц8
ISBN (электронное издание)978-3-030-87966-2
ISBN (печатное издание)978-3-030-87965-5
СостояниеОпубликовано - 2022
СобытиеStability and Control Processes: International Conference Dedicated to the Memory of Professor Vladimir Zubov - Санкт-Петербургский Государственный Университет, Saint Petersburg, Российская Федерация
Продолжительность: 5 окт 20209 окт 2020
Номер конференции: 4

Серия публикаций

ИздательSpringer Nature
ISSN (печатное издание)2522-5383


конференцияStability and Control Processes
Сокращенное названиеSCP
Страна/TерриторияРоссийская Федерация
ГородSaint Petersburg
Сайт в сети Internet

ID: 100561977