TraceSim › SPbU Researchers Portal

Standard

TraceSim : A method for calculating stack trace similarity. / Vasiliev, Roman; Koznov, Dmitrij ; Chernishev, George ; Khvorov, Aleksandr ; Luciv, Dmitry; Povarov, Nikita.

MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020. ed. / Foutse Khomh; Pasquale Salza; Gemma Catolino. Association for Computing Machinery, 2020. p. 25-30 (MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Vasiliev, R, Koznov, D , Chernishev, G , Khvorov, A , Luciv, D & Povarov, N 2020, TraceSim: A method for calculating stack trace similarity. in F Khomh, P Salza & G Catolino (eds), MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020. MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020, Association for Computing Machinery, pp. 25-30, 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020, Virtual, Online, United States, 13/11/20. https://doi.org/10.1145/3416505.3423561

APA

Vasiliev, R., Koznov, D., Chernishev, G., Khvorov, A., Luciv, D., & Povarov, N. (2020). TraceSim: A method for calculating stack trace similarity. In F. Khomh, P. Salza, & G. Catolino (Eds.), MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020 (pp. 25-30). (MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020). Association for Computing Machinery. https://doi.org/10.1145/3416505.3423561

Vancouver

Vasiliev R, Koznov D , Chernishev G , Khvorov A , Luciv D, Povarov N. TraceSim: A method for calculating stack trace similarity. In Khomh F, Salza P, Catolino G, editors, MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020. Association for Computing Machinery. 2020. p. 25-30. (MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020). https://doi.org/10.1145/3416505.3423561

Author

Vasiliev, Roman ; Koznov, Dmitrij ; Chernishev, George ; Khvorov, Aleksandr ; Luciv, Dmitry ; Povarov, Nikita. / TraceSim : A method for calculating stack trace similarity. MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020. editor / Foutse Khomh ; Pasquale Salza ; Gemma Catolino. Association for Computing Machinery, 2020. pp. 25-30 (MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020).

BibTeX

@inproceedings{287afa4f32d84b939c30a997b5be010f,

title = "TraceSim: A method for calculating stack trace similarity",

abstract = "Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient. In this paper, we describe TraceSim-a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.",

keywords = "Automatic Crash Reporting, Automatic Problem Reporting Tools, Crash Report Deduplication, Crash Reports, Crash Stack, Deduplication, Duplicate Bug Report, Duplicate Crash Report, Information Retrieval, Software Engineering, Software Repositories, Stack Trace",

author = "Roman Vasiliev and Dmitrij Koznov and George Chernishev and Aleksandr Khvorov and Dmitry Luciv and Nikita Povarov",

note = "Publisher Copyright: {\textcopyright} 2020 ACM. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.; 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 ; Conference date: 13-11-2020",

year = "2020",

month = nov,

day = "13",

doi = "10.1145/3416505.3423561",

language = "English",

series = "MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020",

publisher = "Association for Computing Machinery",

pages = "25--30",

editor = "Foutse Khomh and Pasquale Salza and Gemma Catolino",

booktitle = "MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020",

address = "United States",

}

RIS

TY - GEN

T1 - TraceSim

T2 - 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, MaLTeSQuE 2020, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020

AU - Vasiliev, Roman

AU - Koznov, Dmitrij

AU - Chernishev, George

AU - Khvorov, Aleksandr

AU - Luciv, Dmitry

AU - Povarov, Nikita

PY - 2020/11/13

Y1 - 2020/11/13

N2 - Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient. In this paper, we describe TraceSim-a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.

AB - Many contemporary software products have subsystems for automatic crash reporting. However, it is well-known that the same bug can produce slightly different reports. To manage this problem, reports are usually grouped, often manually by developers. Manual triaging, however, becomes infeasible for products that have large userbases, which is the reason for many different approaches to automating this task. Moreover, it is important to improve quality of triaging due to a large volume of reports that needs to be processed properly. Therefore, even a relatively small improvement could play a significant role in the overall accuracy of report bucketing. The majority of existing studies use some kind of a stack trace similarity metric, either based on information retrieval techniques or string matching methods. However, it should be stressed that the quality of triaging is still insufficient. In this paper, we describe TraceSim-a novel approach to this problem which combines TF-IDF, Levenshtein distance, and machine learning to construct a similarity metric. Our metric has been implemented inside an industrial-grade report triaging system. The evaluation on a manually labeled dataset shows significantly better results compared to baseline approaches.

KW - Automatic Crash Reporting

KW - Automatic Problem Reporting Tools

KW - Crash Report Deduplication

KW - Crash Reports

KW - Crash Stack

KW - Deduplication

KW - Duplicate Bug Report

KW - Duplicate Crash Report

KW - Information Retrieval

KW - Software Engineering

KW - Software Repositories

KW - Stack Trace

UR - http://www.scopus.com/inward/record.url?scp=85097287752&partnerID=8YFLogxK

U2 - 10.1145/3416505.3423561

DO - 10.1145/3416505.3423561

M3 - Conference contribution

AN - SCOPUS:85097287752

T3 - MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020

SP - 25

EP - 30

BT - MaLTeSQuE 2020 - Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation, Co-located with ESEC/FSE 2020

A2 - Khomh, Foutse

A2 - Salza, Pasquale

A2 - Catolino, Gemma

PB - Association for Computing Machinery

Y2 - 13 November 2020

ER -

ID: 76331749