Standard

A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub. / Golubev, Yaroslav; Eliseeva, Maria; Povarov, Nikita; Bryksin, Timofey.

2020. 54-64.

Research output: Contribution to conferencePaperpeer-review

Harvard

APA

Vancouver

Author

BibTeX

@conference{05ca02b91f6b4f50b49850526eae9043,
title = "A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub",
abstract = "With an ever-increasing amount of open source software, the popularity of services like GitHub that facilitate code reuse, and common misconceptions about the licensing of open source software, the problem of license violations in the code is getting more and more prominent. In this study, we compile an extensive corpus of popular Java projects from GitHub, search it for code clones and perform an original analysis of possible code borrowing and license violations on the level of code fragments. We chose Java as a language because of its popularity in industry, where the plagiarism problem is especially relevant because of possible legal action. We analyze and discuss distribution of 94 different discovered and manually evaluated licenses in files and projects, differences in the licensing of files, distribution of potential code borrowing between licenses, various types of possible license violations, most violated licenses, etc. Studying possible license violations in specific blocks of code, we have discovered that 29.6% of them might be involved in potential code borrowing and 9.4% of them could potentially violate original licenses.",
author = "Yaroslav Golubev and Maria Eliseeva and Nikita Povarov and Timofey Bryksin",
year = "2020",
month = jun,
day = "29",
doi = "10.1145/3379597.3387455",
language = "English",
pages = "54--64",

}

RIS

TY - CONF

T1 - A Study of Potential Code Borrowing and License Violations in Java Projects on GitHub

AU - Golubev, Yaroslav

AU - Eliseeva, Maria

AU - Povarov, Nikita

AU - Bryksin, Timofey

PY - 2020/6/29

Y1 - 2020/6/29

N2 - With an ever-increasing amount of open source software, the popularity of services like GitHub that facilitate code reuse, and common misconceptions about the licensing of open source software, the problem of license violations in the code is getting more and more prominent. In this study, we compile an extensive corpus of popular Java projects from GitHub, search it for code clones and perform an original analysis of possible code borrowing and license violations on the level of code fragments. We chose Java as a language because of its popularity in industry, where the plagiarism problem is especially relevant because of possible legal action. We analyze and discuss distribution of 94 different discovered and manually evaluated licenses in files and projects, differences in the licensing of files, distribution of potential code borrowing between licenses, various types of possible license violations, most violated licenses, etc. Studying possible license violations in specific blocks of code, we have discovered that 29.6% of them might be involved in potential code borrowing and 9.4% of them could potentially violate original licenses.

AB - With an ever-increasing amount of open source software, the popularity of services like GitHub that facilitate code reuse, and common misconceptions about the licensing of open source software, the problem of license violations in the code is getting more and more prominent. In this study, we compile an extensive corpus of popular Java projects from GitHub, search it for code clones and perform an original analysis of possible code borrowing and license violations on the level of code fragments. We chose Java as a language because of its popularity in industry, where the plagiarism problem is especially relevant because of possible legal action. We analyze and discuss distribution of 94 different discovered and manually evaluated licenses in files and projects, differences in the licensing of files, distribution of potential code borrowing between licenses, various types of possible license violations, most violated licenses, etc. Studying possible license violations in specific blocks of code, we have discovered that 29.6% of them might be involved in potential code borrowing and 9.4% of them could potentially violate original licenses.

UR - https://www.mendeley.com/catalogue/2f3100a7-c9c4-3a6e-b187-0588eccad44b/

UR - https://arxiv.org/abs/2002.05237

UR - http://www.scopus.com/inward/record.url?scp=85093658918&partnerID=8YFLogxK

U2 - 10.1145/3379597.3387455

DO - 10.1145/3379597.3387455

M3 - Paper

SP - 54

EP - 64

ER -

ID: 64762105