Authorship attribution of source code: A language-agnostic approach and applicability in software engineering

Egor Bogomolov, Vladimir Kovalenko, Yurii Rebryk, Alberto Bacchelli, Timofey Bryksin

Результат исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциирецензирование

2 Цитирования (Scopus)

Аннотация

Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-the-art results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this article, we first introduce a new language-agnostic approach to authorship attribution of source code. Then, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We outline next steps for the design and evaluation of authorship attribution models that could bring the research efforts closer to practical use for software engineering.

Язык оригиналаанглийский
Название основной публикацииESEC/FSE 2021
Подзаголовок основной публикацииProceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
РедакторыDiomidis Spinellis
ИздательAssociation for Computing Machinery
Страницы932-944
ISBN (электронное издание)9781450385626
DOI
СостояниеОпубликовано - 20 авг 2021
Событие29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Греция
Продолжительность: 23 авг 202128 авг 2021

конференция

конференция29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Страна/TерриторияГреция
ГородVirtual, Online
Период23/08/2128/08/21

Предметные области Scopus

  • Искусственный интеллект
  • Программный продукт

Fingerprint

Подробные сведения о темах исследования «Authorship attribution of source code: A language-agnostic approach and applicability in software engineering». Вместе они формируют уникальный семантический отпечаток (fingerprint).

Цитировать