Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Data Crawling Approaches for User Discussion Analysis on Web 2.0 Platforms. / Nepiyushchikh, Dmitry ; Blekanov, Ivan .
Stability and Control Processes: Proceedings of the 4th International Conference Dedicated to the Memory of Professor Vladimir Zubov. Springer Nature, 2022. стр. 793–800 (LNCOINSPRO).Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Data Crawling Approaches for User Discussion Analysis on Web 2.0 Platforms
AU - Nepiyushchikh, Dmitry
AU - Blekanov, Ivan
N1 - Conference code: 4
PY - 2022
Y1 - 2022
N2 - This article discusses approaches to collecting data from Web 2.0 platforms such as social networks and messengers. The authors propose the implementation of a flexible architecture for a focused web crawler to collect data of users’ discussion in the social network Facebook and the Telegram messenger. The proposed crawler is based on interaction with a platform’s API, get/post requests, and simulating actions in a browser. The authors set up an experiment comparing implementation of proposed data crawling approaches. The data on the COVID-19 virus was collected from Facebook social network and Telegram messenger using RUM Extractor for Facebook and large number open-source Telegram crawler. Developed focused crawler reached the speed of 15 participants per second and 12 posts per second without blocking account when processing a user discussion in Facebook. Telegram crawler showed the speed of 200 participants per second and 300 posts per second without blocking.
AB - This article discusses approaches to collecting data from Web 2.0 platforms such as social networks and messengers. The authors propose the implementation of a flexible architecture for a focused web crawler to collect data of users’ discussion in the social network Facebook and the Telegram messenger. The proposed crawler is based on interaction with a platform’s API, get/post requests, and simulating actions in a browser. The authors set up an experiment comparing implementation of proposed data crawling approaches. The data on the COVID-19 virus was collected from Facebook social network and Telegram messenger using RUM Extractor for Facebook and large number open-source Telegram crawler. Developed focused crawler reached the speed of 15 participants per second and 12 posts per second without blocking account when processing a user discussion in Facebook. Telegram crawler showed the speed of 200 participants per second and 300 posts per second without blocking.
UR - https://www.mendeley.com/catalogue/c32d06ad-c011-3df0-a5eb-cb85bf2872e9/
U2 - 10.1007/978-3-030-87966-2_91
DO - 10.1007/978-3-030-87966-2_91
M3 - Conference contribution
SN - 978-3-030-87965-5
T3 - LNCOINSPRO
SP - 793
EP - 800
BT - Stability and Control Processes
PB - Springer Nature
T2 - Stability and Control Processes: International Conference Dedicated to the Memory of Professor Vladimir Zubov
Y2 - 5 October 2020 through 9 October 2020
ER -
ID: 100561977