Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

Standard

Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler. / Bryksin, Timofey; Petukhov, Victor; Alexin, Ilya; Prikhodko, Stanislav; Shpilman, Alexey; Kovalenko, Vladimir; Povarov, Nikita.

2020. 455-465.

Research output: Contribution to conference › Paper › peer-review

BibTeX

@conference{f7b209e21a004a2096ed978baddc028b,

title = "Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler",

abstract = "In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development.",

author = "Timofey Bryksin and Victor Petukhov and Ilya Alexin and Stanislav Prikhodko and Alexey Shpilman and Vladimir Kovalenko and Nikita Povarov",

year = "2020",

month = jun,

day = "29",

doi = "10.1145/3379597.3387447",

language = "English",

pages = "455--465",

}

RIS

TY - CONF

T1 - Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

AU - Bryksin, Timofey

AU - Petukhov, Victor

AU - Alexin, Ilya

AU - Prikhodko, Stanislav

AU - Shpilman, Alexey

AU - Kovalenko, Vladimir

AU - Povarov, Nikita

PY - 2020/6/29

Y1 - 2020/6/29

N2 - In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development.

AB - In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development.

UR - https://www.mendeley.com/catalogue/883ecf03-b516-32b1-ac07-34e60e9d1a49/

UR - https://arxiv.org/abs/2004.01618

UR - http://www.scopus.com/inward/record.url?scp=85093646487&partnerID=8YFLogxK

U2 - 10.1145/3379597.3387447

DO - 10.1145/3379597.3387447

M3 - Paper

SP - 455

EP - 465

ER -

ID: 64762215