Standard

Multi-threshold token-based code clone detection. / Golubev, Yaroslav; Poletansky, Viktor; Povarov, Nikita; Bryksin, Timofey.

Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021. Institute of Electrical and Electronics Engineers Inc., 2021. стр. 496-500 9426013.

Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференцийстатья в сборнике материалов конференциинаучнаяРецензирование

Harvard

Golubev, Y, Poletansky, V, Povarov, N & Bryksin, T 2021, Multi-threshold token-based code clone detection. в Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021., 9426013, Institute of Electrical and Electronics Engineers Inc., стр. 496-500, 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Virtual, Honolulu, Соединенные Штаты Америки, 9/03/21. https://doi.org/10.1109/SANER50967.2021.00053

APA

Golubev, Y., Poletansky, V., Povarov, N., & Bryksin, T. (2021). Multi-threshold token-based code clone detection. в Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021 (стр. 496-500). [9426013] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SANER50967.2021.00053

Vancouver

Golubev Y, Poletansky V, Povarov N, Bryksin T. Multi-threshold token-based code clone detection. в Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021. Institute of Electrical and Electronics Engineers Inc. 2021. стр. 496-500. 9426013 https://doi.org/10.1109/SANER50967.2021.00053

Author

Golubev, Yaroslav ; Poletansky, Viktor ; Povarov, Nikita ; Bryksin, Timofey. / Multi-threshold token-based code clone detection. Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021. Institute of Electrical and Electronics Engineers Inc., 2021. стр. 496-500

BibTeX

@inproceedings{bd600fdcdfd64af1ab498810728a15cb,
title = "Multi-threshold token-based code clone detection",
abstract = "Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations.In this paper, we propose a modification to bag-of-tokens based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing a multi-threshold search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches.We evaluate the method for two different popular clone detection tools on two datasets of different sizes. The implementation of the technique allows to increase the number of detected clones by 40.5-56.6% for different datasets. BigCloneBench evaluation also shows that the recall of detecting Strongly Type-3 clones increases from 37.5% to 59.6%.",
keywords = "clone detection, similarity threshold, token based clone detection, token length threshold",
author = "Yaroslav Golubev and Viktor Poletansky and Nikita Povarov and Timofey Bryksin",
note = "Publisher Copyright: {\textcopyright} 2021 IEEE. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.; 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021 ; Conference date: 09-03-2021 Through 12-03-2021",
year = "2021",
month = mar,
doi = "10.1109/SANER50967.2021.00053",
language = "English",
pages = "496--500",
booktitle = "Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

RIS

TY - GEN

T1 - Multi-threshold token-based code clone detection

AU - Golubev, Yaroslav

AU - Poletansky, Viktor

AU - Povarov, Nikita

AU - Bryksin, Timofey

N1 - Publisher Copyright: © 2021 IEEE. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

PY - 2021/3

Y1 - 2021/3

N2 - Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations.In this paper, we propose a modification to bag-of-tokens based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing a multi-threshold search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches.We evaluate the method for two different popular clone detection tools on two datasets of different sizes. The implementation of the technique allows to increase the number of detected clones by 40.5-56.6% for different datasets. BigCloneBench evaluation also shows that the recall of detecting Strongly Type-3 clones increases from 37.5% to 59.6%.

AB - Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations.In this paper, we propose a modification to bag-of-tokens based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing a multi-threshold search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches.We evaluate the method for two different popular clone detection tools on two datasets of different sizes. The implementation of the technique allows to increase the number of detected clones by 40.5-56.6% for different datasets. BigCloneBench evaluation also shows that the recall of detecting Strongly Type-3 clones increases from 37.5% to 59.6%.

KW - clone detection

KW - similarity threshold

KW - token based clone detection

KW - token length threshold

UR - http://www.scopus.com/inward/record.url?scp=85106620484&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/d21ff67e-5953-364c-b0e1-942ee3ab8f48/

U2 - 10.1109/SANER50967.2021.00053

DO - 10.1109/SANER50967.2021.00053

M3 - Conference contribution

AN - SCOPUS:85106620484

SP - 496

EP - 500

BT - Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021

Y2 - 9 March 2021 through 12 March 2021

ER -

ID: 78246110