DOI

Previously, it was suggested that the “persona-driven” approach can contribute to producing sufficiently diverse synthetic training data for Large Language Models (LLMs) that are currently about to run out of real natural language texts. In our paper, we explore whether personas evoked from LLMs through HCI-style descriptions could indeed imitate human-like differences in authorship. For this end, we ran an associative experiment with 50 human participants and four artificial personas evoked from the popular LLM-based services: GPT-4(o) and YandexGPT Pro. For each of the five stimuli words selected from university websites’ homepages, we asked both groups of subjects to come up with 10 short associations (in Russian). We then used cosine similarity and Mahalanobis distance to measure the distance between the association lists produced by different humans and personas. While the difference in the similarity was significant for different human associators and different gender and age groups, neither was the case for the different personas evoked from ChatGPT or YandexGPT. Our findings suggest that the LLM-based services so far fall short at imitating the associative thesauri of different authors. The outcome of our study might be of interest to computer linguists, as well as AI/ML scientists and prompt engineers.
Язык оригиналаанглийский
Номер статьи46
ЖурналBig Data and Cognitive Computing
Том9
Номер выпуска2
DOI
СостояниеОпубликовано - 18 фев 2025

ID: 132344285