Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
Multi-threshold token-based code clone detection. / Golubev, Yaroslav; Poletansky, Viktor; Povarov, Nikita; Bryksin, Timofey.
Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021. Institute of Electrical and Electronics Engineers Inc., 2021. стр. 496-500 9426013.Результаты исследований: Публикации в книгах, отчётах, сборниках, трудах конференций › статья в сборнике материалов конференции › научная › Рецензирование
}
TY - GEN
T1 - Multi-threshold token-based code clone detection
AU - Golubev, Yaroslav
AU - Poletansky, Viktor
AU - Povarov, Nikita
AU - Bryksin, Timofey
N1 - Publisher Copyright: © 2021 IEEE. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2021/3
Y1 - 2021/3
N2 - Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations.In this paper, we propose a modification to bag-of-tokens based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing a multi-threshold search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches.We evaluate the method for two different popular clone detection tools on two datasets of different sizes. The implementation of the technique allows to increase the number of detected clones by 40.5-56.6% for different datasets. BigCloneBench evaluation also shows that the recall of detecting Strongly Type-3 clones increases from 37.5% to 59.6%.
AB - Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations.In this paper, we propose a modification to bag-of-tokens based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing a multi-threshold search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches.We evaluate the method for two different popular clone detection tools on two datasets of different sizes. The implementation of the technique allows to increase the number of detected clones by 40.5-56.6% for different datasets. BigCloneBench evaluation also shows that the recall of detecting Strongly Type-3 clones increases from 37.5% to 59.6%.
KW - clone detection
KW - similarity threshold
KW - token based clone detection
KW - token length threshold
UR - http://www.scopus.com/inward/record.url?scp=85106620484&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/d21ff67e-5953-364c-b0e1-942ee3ab8f48/
U2 - 10.1109/SANER50967.2021.00053
DO - 10.1109/SANER50967.2021.00053
M3 - Conference contribution
AN - SCOPUS:85106620484
SP - 496
EP - 500
BT - Proceedings - 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021
Y2 - 9 March 2021 through 12 March 2021
ER -
ID: 78246110