DOI

This article discusses approaches to collecting data from Web 2.0 platforms such as social networks and messengers. The authors propose the implementation of a flexible architecture for a focused web crawler to collect data of users’ discussion in the social network Facebook and the Telegram messenger. The proposed crawler is based on interaction with a platform’s API, get/post requests, and simulating actions in a browser. The authors set up an experiment comparing implementation of proposed data crawling approaches. The data on the COVID-19 virus was collected from Facebook social network and Telegram messenger using RUM Extractor for Facebook and large number open-source Telegram crawler. Developed focused crawler reached the speed of 15 participants per second and 12 posts per second without blocking account when processing a user discussion in Facebook. Telegram crawler showed the speed of 200 participants per second and 300 posts per second without blocking.

Язык оригиналаанглийский
Название основной публикацииStability and Control Processes
Подзаголовок основной публикацииProceedings of the 4th International Conference Dedicated to the Memory of Professor Vladimir Zubov
ИздательSpringer Nature
Страницы793–800
Число страниц8
ISBN (электронное издание)978-3-030-87966-2
ISBN (печатное издание)978-3-030-87965-5
DOI
СостояниеОпубликовано - 2022
СобытиеStability and Control Processes: International Conference Dedicated to the Memory of Professor Vladimir Zubov - Санкт-Петербургский Государственный Университет, Saint Petersburg, Российская Федерация
Продолжительность: 5 окт 20209 окт 2020
Номер конференции: 4
http://www.apmath.spbu.ru/scp2020/
http://www.apmath.spbu.ru/scp2020/ru/main/
http://www.apmath.spbu.ru/scp2020/eng/program/#schedule
https://link.springer.com/conference/scp

Серия публикаций

НазваниеLNCOINSPRO
ИздательSpringer Nature
ISSN (печатное издание)2522-5383

конференция

конференцияStability and Control Processes
Сокращенное названиеSCP
Страна/TерриторияРоссийская Федерация
ГородSaint Petersburg
Период5/10/209/10/20
Сайт в сети Internet

ID: 100561977