Rising user activity on online social media (OSM) platforms like VK drives cross-disciplinary research (psychology, cybersecurity, etc.), where avatars serve as key digital footprints. While ML is widely used for the analysis, adapting universal tools to dataset-specific properties remains challenging. This study focuses on the optimization of the clustering of the datasets with avatars. The intensive computational experiment was conducted in order to identify the clustering structure of such dataset and the best UMAP parameter values that lead to good clusterization with respect to several clusterization quality indices. Our pipeline combines CLIP embeddings, UMAP reduction, and five clustering algorithms (K-means to HDBSCAN and GMM). Hyperparameters were tuned via Grid Search and Bayesian optimization, evaluated on 9,000 VK avatars using four metrics (SI, DBI, CHI, DI). We demonstrate, that those parameters lead to avatar clusterization with user groups that vary in mean Big Five scales. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
Original languageEnglish
Title of host publicationProceedings of the Ninth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’25), Volume 1
PublisherSpringer Nature
Pages485-496
Number of pages12
ISBN (Print)9783032136145
DOIs
StatePublished - 2026
EventNinth International Scientific Conference on Intelligent Information Technologies for Industry - Сочи, Russian Federation
Duration: 5 Nov 20257 Nov 2025

Publication series

NameLecture Notes in Networks and Systems
Volume1762 LNNS

Conference

ConferenceNinth International Scientific Conference on Intelligent Information Technologies for Industry
Abbreviated titleIITI 2025
Country/TerritoryRussian Federation
CityСочи
Period5/11/257/11/25

    Research areas

  • CLIP, Clustering, Dimensionality Reduction, Graphical Digital Footprints, Hyperparameter Tuning, Online Social Media, Personality Computing, Barium compounds, Bayesian networks, Computer graphics, Dimensionality reduction, K-means clustering, Reduction, Social sciences computing, Clusterings, Clusterization, Comparatives studies, Graphical digital footprint, Hyper-parameter, Hyperparameter tuning, Online social medias, Personality computing, Social networking (online)

ID: 151441454