Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
TraceSim : A method for calculating stack trace similarity. / Vasiliev, Roman; Koznov, Dmitrij; Chernishev, George; Khvorov, Aleksandr; Luciv, Dmitry; Povarov, Nikita.
MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020. ed. / Foutse Khomh; Pasquale Salza; Gemma Catolino. Association for Computing Machinery, 2020. p. 25-30 (MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - TraceSim
T2 - 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020
AU - Vasiliev, Roman
AU - Koznov, Dmitrij
AU - Chernishev, George
AU - Khvorov, Aleksandr
AU - Luciv, Dmitry
AU - Povarov, Nikita
N1 - Publisher Copyright: © 2020 ACM. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/11/13
Y1 - 2020/11/13
N2 - Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient. In this paper, we describe TraceSim-a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.
AB - Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient. In this paper, we describe TraceSim-a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.
KW - Automatic Crash Reporting
KW - Automatic Problem Reporting Tools
KW - Crash Report Deduplication
KW - Crash Reports
KW - Crash Stack
KW - Deduplication
KW - Duplicate Bug Report
KW - Duplicate Crash Report
KW - Information Retrieval
KW - Software Engineering
KW - Software Repositories
KW - Stack Trace
UR - http://www.scopus.com/inward/record.url?scp=85097287752&partnerID=8YFLogxK
U2 - 10.1145/3416505.3423561
DO - 10.1145/3416505.3423561
M3 - Conference contribution
AN - SCOPUS:85097287752
T3 - MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020
SP - 25
EP - 30
BT - MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020
A2 - Khomh, Foutse
A2 - Salza, Pasquale
A2 - Catolino, Gemma
PB - Association for Computing Machinery
Y2 - 13 November 2020
ER -
ID: 76331749