Standard

Authorship attribution of source code : A language-agnostic approach and applicability in software engineering. / Bogomolov, Egor; Kovalenko, Vladimir; Rebryk, Yurii; Bacchelli, Alberto; Bryksin, Timofey.

ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ed. / Diomidis Spinellis. Association for Computing Machinery, 2021. p. 932-944.

Research output: Chapter in Book/Report/Conference proceedingConference contributionResearchpeer-review

Harvard

Bogomolov, E, Kovalenko, V, Rebryk, Y, Bacchelli, A & Bryksin, T 2021, Authorship attribution of source code: A language-agnostic approach and applicability in software engineering. in D Spinellis (ed.), ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, pp. 932-944, 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, Virtual, Online, Greece, 23/08/21. https://doi.org/10.1145/3468264.3468606

APA

Bogomolov, E., Kovalenko, V., Rebryk, Y., Bacchelli, A., & Bryksin, T. (2021). Authorship attribution of source code: A language-agnostic approach and applicability in software engineering. In D. Spinellis (Ed.), ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 932-944). Association for Computing Machinery. https://doi.org/10.1145/3468264.3468606

Vancouver

Bogomolov E, Kovalenko V, Rebryk Y, Bacchelli A, Bryksin T. Authorship attribution of source code: A language-agnostic approach and applicability in software engineering. In Spinellis D, editor, ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery. 2021. p. 932-944 https://doi.org/10.1145/3468264.3468606

Author

Bogomolov, Egor ; Kovalenko, Vladimir ; Rebryk, Yurii ; Bacchelli, Alberto ; Bryksin, Timofey. / Authorship attribution of source code : A language-agnostic approach and applicability in software engineering. ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering. editor / Diomidis Spinellis. Association for Computing Machinery, 2021. pp. 932-944

BibTeX

@inproceedings{8405f15d73fa4faca187f5845199b805,
title = "Authorship attribution of source code: A language-agnostic approach and applicability in software engineering",
abstract = "Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-the-art results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this article, we first introduce a new language-agnostic approach to authorship attribution of source code. Then, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We outline next steps for the design and evaluation of authorship attribution models that could bring the research efforts closer to practical use for software engineering. ",
keywords = "Copyrights, Machine learning, Methods of data collection, Security, Software maintenance, Software process",
author = "Egor Bogomolov and Vladimir Kovalenko and Yurii Rebryk and Alberto Bacchelli and Timofey Bryksin",
note = "Publisher Copyright: {\textcopyright} 2021 ACM.; 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 ; Conference date: 23-08-2021 Through 28-08-2021",
year = "2021",
month = aug,
day = "20",
doi = "10.1145/3468264.3468606",
language = "English",
pages = "932--944",
editor = "Diomidis Spinellis",
booktitle = "ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering",
publisher = "Association for Computing Machinery",
address = "United States",

}

RIS

TY - GEN

T1 - Authorship attribution of source code

T2 - 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021

AU - Bogomolov, Egor

AU - Kovalenko, Vladimir

AU - Rebryk, Yurii

AU - Bacchelli, Alberto

AU - Bryksin, Timofey

N1 - Publisher Copyright: © 2021 ACM.

PY - 2021/8/20

Y1 - 2021/8/20

N2 - Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-the-art results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this article, we first introduce a new language-agnostic approach to authorship attribution of source code. Then, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We outline next steps for the design and evaluation of authorship attribution models that could bring the research efforts closer to practical use for software engineering.

AB - Authorship attribution (i.e., determining who is the author of a piece of source code) is an established research topic. State-of-the-art results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this article, we first introduce a new language-agnostic approach to authorship attribution of source code. Then, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We outline next steps for the design and evaluation of authorship attribution models that could bring the research efforts closer to practical use for software engineering.

KW - Copyrights

KW - Machine learning

KW - Methods of data collection

KW - Security

KW - Software maintenance

KW - Software process

UR - http://www.scopus.com/inward/record.url?scp=85116219662&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/0989a8f9-3975-32a6-9fda-236b1d4f1142/

U2 - 10.1145/3468264.3468606

DO - 10.1145/3468264.3468606

M3 - Conference contribution

AN - SCOPUS:85116219662

SP - 932

EP - 944

BT - ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering

A2 - Spinellis, Diomidis

PB - Association for Computing Machinery

Y2 - 23 August 2021 through 28 August 2021

ER -

ID: 87612481