Detection of Hidden Communities in Twitter Discussions of Varying Volumes

Standard

Detection of Hidden Communities in Twitter Discussions of Varying Volumes. / Blekanov, Ivan ; Bodrunova, Svetlana S.; Akhmetov, Askar.

In: Future Internet, Vol. 13, No. 11, 11, 20.11.2021, p. 295-311.

Research output: Contribution to journal › Article › peer-review

BibTeX

@article{9f5fb92276d1401f8580ffcb6b7292c8,

title = "Detection of Hidden Communities in Twitter Discussions of Varying Volumes",

abstract = "The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Studies of online communities have clear social implications, as they allow for assessment of preference-based user grouping and the detection of socially hazardous groups. The aim of this study is to comparatively assess the algorithms that effectively analyze large user networks and extract hidden user communities from them. The results we have obtained show the most suitable algorithms for Twitter datasets of different volumes (dozen thousands, hundred thousands, and millions of tweets). We show that the Infomap and Leiden algorithms provide for the best results overall, and we advise testing a combination of these algorithms for detecting discursive communities based on user traits or views. We also show that the generalized K-means algorithm does not apply to big datasets, while a range of other algorithms tend to prioritize the detection of just one big community instead of many that would mirror the reality better. For isolating overlapping communities, the GANXiS algorithm should be used, while OSLOM is not advised.",

keywords = "Clustering, GANXiS, Hidden community detection, Infomap, Leiden, Social networks, User discussions, User web-graph, social networks, hidden community detection, user web-graph, clustering, user discussions",

author = "Ivan Blekanov and Bodrunova, {Svetlana S.} and Askar Akhmetov",

note = "Publisher Copyright: {\textcopyright} 2021 by the authors. Licensee MDPI, Basel, Switzerland.",

year = "2021",

month = nov,

day = "20",

doi = "10.3390/fi13110295",

language = "English",

volume = "13",

pages = "295--311",

journal = "Future Internet",

issn = "1999-5903",

publisher = "MDPI AG",

number = "11",

}

RIS

TY - JOUR

T1 - Detection of Hidden Communities in Twitter Discussions of Varying Volumes

AU - Blekanov, Ivan

AU - Bodrunova, Svetlana S.

AU - Akhmetov, Askar

PY - 2021/11/20

Y1 - 2021/11/20

N2 - The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Studies of online communities have clear social implications, as they allow for assessment of preference-based user grouping and the detection of socially hazardous groups. The aim of this study is to comparatively assess the algorithms that effectively analyze large user networks and extract hidden user communities from them. The results we have obtained show the most suitable algorithms for Twitter datasets of different volumes (dozen thousands, hundred thousands, and millions of tweets). We show that the Infomap and Leiden algorithms provide for the best results overall, and we advise testing a combination of these algorithms for detecting discursive communities based on user traits or views. We also show that the generalized K-means algorithm does not apply to big datasets, while a range of other algorithms tend to prioritize the detection of just one big community instead of many that would mirror the reality better. For isolating overlapping communities, the GANXiS algorithm should be used, while OSLOM is not advised.

AB - The community-based structure of communication on social networking sites has long been a focus of scholarly attention. However, the problem of discovery and description of hidden communities, including defining the proper level of user aggregation, remains an important problem not yet resolved. Studies of online communities have clear social implications, as they allow for assessment of preference-based user grouping and the detection of socially hazardous groups. The aim of this study is to comparatively assess the algorithms that effectively analyze large user networks and extract hidden user communities from them. The results we have obtained show the most suitable algorithms for Twitter datasets of different volumes (dozen thousands, hundred thousands, and millions of tweets). We show that the Infomap and Leiden algorithms provide for the best results overall, and we advise testing a combination of these algorithms for detecting discursive communities based on user traits or views. We also show that the generalized K-means algorithm does not apply to big datasets, while a range of other algorithms tend to prioritize the detection of just one big community instead of many that would mirror the reality better. For isolating overlapping communities, the GANXiS algorithm should be used, while OSLOM is not advised.

KW - Clustering

KW - GANXiS

KW - Hidden community detection

KW - Infomap

KW - Leiden

KW - Social networks

KW - User discussions

KW - User web-graph

KW - social networks

KW - hidden community detection

KW - user web-graph

KW - clustering

KW - user discussions

UR - http://www.scopus.com/inward/record.url?scp=85122918698&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/7f6b57d9-7a63-37b5-86ff-49f7454eceb0/

U2 - 10.3390/fi13110295

DO - 10.3390/fi13110295

M3 - Article

AN - SCOPUS:85122918698

VL - 13

SP - 295

EP - 311

JO - Future Internet

JF - Future Internet

SN - 1999-5903

IS - 11

M1 - 11

ER -

ID: 89364443