This article discusses approaches to collecting data from Web 2.0 platforms such as social networks and messengers. The authors propose the implementation of a flexible architecture for a focused web crawler to collect data of users’ discussion in the social network Facebook and the Telegram messenger. The proposed crawler is based on interaction with a platform’s API, get/post requests, and simulating actions in a browser. The authors set up an experiment comparing implementation of proposed data crawling approaches. The data on the COVID-19 virus was collected from Facebook social network and Telegram messenger using RUM Extractor for Facebook and large number open-source Telegram crawler. Developed focused crawler reached the speed of 15 participants per second and 12 posts per second without blocking account when processing a user discussion in Facebook. Telegram crawler showed the speed of 200 participants per second and 300 posts per second without blocking.

Original languageEnglish
Title of host publicationStability and Control Processes
Subtitle of host publicationProceedings of the 4th International Conference Dedicated to the Memory of Professor Vladimir Zubov
PublisherSpringer Nature
Pages793–800
Number of pages8
ISBN (Electronic)978-3-030-87966-2
ISBN (Print)978-3-030-87965-5
DOIs
StatePublished - 2022
EventStability and Control Processes: International Conference Dedicated to the Memory of Professor Vladimir Zubov: Dedicated to the Memory of Professor Vladimir Zubov - Санкт-Петербургский Государственный Университет, Saint Petersburg, Russian Federation
Duration: 5 Oct 20209 Oct 2020
Conference number: 4
http://www.apmath.spbu.ru/scp2020/
http://www.apmath.spbu.ru/scp2020/ru/main/
http://www.apmath.spbu.ru/scp2020/eng/program/#schedule
https://link.springer.com/conference/scp

Publication series

NameLNCOINSPRO
PublisherSpringer Nature
ISSN (Print)2522-5383

Conference

ConferenceStability and Control Processes: International Conference Dedicated to the Memory of Professor Vladimir Zubov
Abbreviated titleSCP2020
Country/TerritoryRussian Federation
CitySaint Petersburg
Period5/10/209/10/20
Internet address

ID: 100561977