Standard

Data Encoding for Social Media: Comparing Twitter, Reddit, and Telegram. / Блеканов, Иван Станиславович; Тарасов, Никита Андреевич; Непиющих, Дмитрий Викторович; Бодрунова, Светлана Сергеевна.

Networks in the Global World VI. NetGloW 2022. Springer Nature, 2023. p. 114–122 (Lecture Notes in Networks and Systems; Vol. 663 LNNS).

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Блеканов, ИС, Тарасов, НА, Непиющих, ДВ & Бодрунова, СС 2023, Data Encoding for Social Media: Comparing Twitter, Reddit, and Telegram. in Networks in the Global World VI. NetGloW 2022. Lecture Notes in Networks and Systems, vol. 663 LNNS, Springer Nature, pp. 114–122, Networks in the Global World 2022, Санкт-Петербург, Russian Federation, 22/06/22. https://doi.org/10.1007/978-3-031-29408-2_8

APA

Блеканов, И. С., Тарасов, Н. А., Непиющих, Д. В., & Бодрунова, С. С. (2023). Data Encoding for Social Media: Comparing Twitter, Reddit, and Telegram. In Networks in the Global World VI. NetGloW 2022 (pp. 114–122). (Lecture Notes in Networks and Systems; Vol. 663 LNNS). Springer Nature. https://doi.org/10.1007/978-3-031-29408-2_8

Vancouver

Author

BibTeX

@inproceedings{62860d0cace34bdcb82aa2dc19324111,
title = "Data Encoding for Social Media: Comparing Twitter, Reddit, and Telegram",
abstract = "Social networking platforms have become a major source of data for most textual machine learning models. Applications of encodings in earlier language models, as well as advancements in model reusage, have opened new possibilities for case studies with limited or unsupervised data. In this paper, the authors test whether semantic similarity of large-scale data from three platforms allow for applying the same transfer-learning language models on data from various social media. For this, the authors perform a comparative case study to outline linguistic differences and measure similarity for deep neural encodings for the case data. In particular, semantic similarity is evaluated using traditional text similarity metrics, structure metrics of the corpora, and RUBERT encodings that provide general semantic characteristics of the text data in three datasets. We show that the platforms are similar in semantic terms well enough for the transfer learning models to be applied, by both linguistic metrics and semantic encodings. We also demonstrate, however, that, despite the difference in the average text length, Twitter is more similar to Reddit than to Telegram by linguistic metrics, which hints to the idea of {\textquoteleft}platformization{\textquoteright} of social media speech. We conclude by stating the speech factors that may lead to platform dissimilarity.",
keywords = "Linguistic metrics, RUBERT, Reddit, Semantic neural encodings, Social network analysis, Telegram, Text similarity assessment, Twitter",
author = "Блеканов, {Иван Станиславович} and Тарасов, {Никита Андреевич} and Непиющих, {Дмитрий Викторович} and Бодрунова, {Светлана Сергеевна}",
year = "2023",
doi = "10.1007/978-3-031-29408-2_8",
language = "English",
isbn = "978-3-031-29407-5",
series = "Lecture Notes in Networks and Systems",
publisher = "Springer Nature",
pages = "114–122",
booktitle = "Networks in the Global World VI. NetGloW 2022",
address = "Germany",
note = "null ; Conference date: 22-06-2022 Through 24-06-2022",
url = "http://ngw.spbu.ru/",

}

RIS

TY - GEN

T1 - Data Encoding for Social Media: Comparing Twitter, Reddit, and Telegram

AU - Блеканов, Иван Станиславович

AU - Тарасов, Никита Андреевич

AU - Непиющих, Дмитрий Викторович

AU - Бодрунова, Светлана Сергеевна

PY - 2023

Y1 - 2023

N2 - Social networking platforms have become a major source of data for most textual machine learning models. Applications of encodings in earlier language models, as well as advancements in model reusage, have opened new possibilities for case studies with limited or unsupervised data. In this paper, the authors test whether semantic similarity of large-scale data from three platforms allow for applying the same transfer-learning language models on data from various social media. For this, the authors perform a comparative case study to outline linguistic differences and measure similarity for deep neural encodings for the case data. In particular, semantic similarity is evaluated using traditional text similarity metrics, structure metrics of the corpora, and RUBERT encodings that provide general semantic characteristics of the text data in three datasets. We show that the platforms are similar in semantic terms well enough for the transfer learning models to be applied, by both linguistic metrics and semantic encodings. We also demonstrate, however, that, despite the difference in the average text length, Twitter is more similar to Reddit than to Telegram by linguistic metrics, which hints to the idea of ‘platformization’ of social media speech. We conclude by stating the speech factors that may lead to platform dissimilarity.

AB - Social networking platforms have become a major source of data for most textual machine learning models. Applications of encodings in earlier language models, as well as advancements in model reusage, have opened new possibilities for case studies with limited or unsupervised data. In this paper, the authors test whether semantic similarity of large-scale data from three platforms allow for applying the same transfer-learning language models on data from various social media. For this, the authors perform a comparative case study to outline linguistic differences and measure similarity for deep neural encodings for the case data. In particular, semantic similarity is evaluated using traditional text similarity metrics, structure metrics of the corpora, and RUBERT encodings that provide general semantic characteristics of the text data in three datasets. We show that the platforms are similar in semantic terms well enough for the transfer learning models to be applied, by both linguistic metrics and semantic encodings. We also demonstrate, however, that, despite the difference in the average text length, Twitter is more similar to Reddit than to Telegram by linguistic metrics, which hints to the idea of ‘platformization’ of social media speech. We conclude by stating the speech factors that may lead to platform dissimilarity.

KW - Linguistic metrics

KW - RUBERT

KW - Reddit

KW - Semantic neural encodings

KW - Social network analysis

KW - Telegram

KW - Text similarity assessment

KW - Twitter

UR - https://link.springer.com/chapter/10.1007/978-3-031-29408-2_8

UR - https://www.mendeley.com/catalogue/bd283c29-4eec-39a0-b826-95ac142c586b/

U2 - 10.1007/978-3-031-29408-2_8

DO - 10.1007/978-3-031-29408-2_8

M3 - Conference contribution

SN - 978-3-031-29407-5

T3 - Lecture Notes in Networks and Systems

SP - 114

EP - 122

BT - Networks in the Global World VI. NetGloW 2022

PB - Springer Nature

Y2 - 22 June 2022 through 24 June 2022

ER -

ID: 110777150