Rising user activity on online social media (OSM) platforms like VK drives cross-disciplinary research (psychology, cybersecurity, etc.), where avatars serve as key digital footprints. While ML is widely used for the analysis, adapting universal tools to dataset-specific properties remains challenging. This study focuses on the optimization of the clustering of the datasets with avatars. The intensive computational experiment was conducted in order to identify the clustering structure of such dataset and the best UMAP parameter values that lead to good clusterization with respect to several clusterization quality indices. Our pipeline combines CLIP embeddings, UMAP reduction, and five clustering algorithms (K-means to HDBSCAN and GMM). Hyperparameters were tuned via Grid Search and Bayesian optimization, evaluated on 9,000 VK avatars using four metrics (SI, DBI, CHI, DI). We demonstrate, that those parameters lead to avatar clusterization with user groups that vary in mean Big Five scales. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Язык оригиналаАнглийский
Название основной публикацииProceedings of the Ninth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’25), Volume 1
ИздательSpringer Nature
Страницы485-496
Число страниц12
ISBN (печатное издание)9783032136145
DOI
СостояниеОпубликовано - 2026
СобытиеNinth International Scientific Conference on Intelligent Information Technologies for Industry - Сочи, Российская Федерация
Продолжительность: 5 ноя 20257 ноя 2025

Серия публикаций

НазваниеLecture Notes in Networks and Systems
Том1762 LNNS

конференция

конференцияNinth International Scientific Conference on Intelligent Information Technologies for Industry
Сокращенное названиеIITI 2025
Страна/TерриторияРоссийская Федерация
ГородСочи
Период5/11/257/11/25

ID: 151441454