PathMiner: a library for mining of path-based representations of code

Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, Alberto Bacchelli

Research output

Abstract

One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language.
Original languageEnglish
Title of host publicationProceedings of the 16th International Conference on Mining Software Repositories
PublisherIEEE Computer Society
Pages13-17
Volume2019
ISBN (Print)9781728134123
Publication statusPublished - May 2019
Event16th International Conference on Mining Software Repositories - Montreal
Duration: 26 May 201927 May 2019

Conference

Conference16th International Conference on Mining Software Repositories
Abbreviated titleMSR 2019
CountryCanada
CityMontreal
Period26/05/1927/05/19

Fingerprint

Learning systems
Computer programming languages
Learning algorithms
Semantics

Cite this

Kovalenko, V., Bogomolov, E., Bryksin, T., & Bacchelli, A. (2019). PathMiner: a library for mining of path-based representations of code. In Proceedings of the 16th International Conference on Mining Software Repositories (Vol. 2019, pp. 13-17). [8816777] IEEE Computer Society.
Kovalenko, Vladimir ; Bogomolov, Egor ; Bryksin, Timofey ; Bacchelli, Alberto. / PathMiner: a library for mining of path-based representations of code. Proceedings of the 16th International Conference on Mining Software Repositories. Vol. 2019 IEEE Computer Society, 2019. pp. 13-17
@inbook{5c6d0892dd7948f6978db97f1eac4a4f,
title = "PathMiner: a library for mining of path-based representations of code",
abstract = "One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language.",
author = "Vladimir Kovalenko and Egor Bogomolov and Timofey Bryksin and Alberto Bacchelli",
year = "2019",
month = "5",
language = "English",
isbn = "9781728134123",
volume = "2019",
pages = "13--17",
booktitle = "Proceedings of the 16th International Conference on Mining Software Repositories",
publisher = "IEEE Computer Society",
address = "United States",

}

Kovalenko, V, Bogomolov, E, Bryksin, T & Bacchelli, A 2019, PathMiner: a library for mining of path-based representations of code. in Proceedings of the 16th International Conference on Mining Software Repositories. vol. 2019, 8816777, IEEE Computer Society, pp. 13-17, Montreal, 26/05/19.

PathMiner: a library for mining of path-based representations of code. / Kovalenko, Vladimir; Bogomolov, Egor; Bryksin, Timofey ; Bacchelli, Alberto.

Proceedings of the 16th International Conference on Mining Software Repositories. Vol. 2019 IEEE Computer Society, 2019. p. 13-17 8816777.

Research output

TY - CHAP

T1 - PathMiner: a library for mining of path-based representations of code

AU - Kovalenko, Vladimir

AU - Bogomolov, Egor

AU - Bryksin, Timofey

AU - Bacchelli, Alberto

PY - 2019/5

Y1 - 2019/5

N2 - One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language.

AB - One recent, significant advance in modeling source code for machine learning algorithms has been the introduction of path-based representation - an approach consisting in representing a snippet of code as a collection of paths from its syntax tree. Such representation efficiently captures the structure of code, which, in turn, carries its semantics and other information. Building the path-based representation involves parsing the code and extracting the paths from its syntax tree; these steps build up to a substantial technical job. With no common reusable toolkit existing for this task, the burden of mining diverts the focus of researchers from the essential work and hinders newcomers in the field of machine learning on code. In this paper, we present PathMiner - an open-source library for mining path-based representations of code. PathMiner is fast, flexible, well-tested, and easily extensible to support input code in any common programming language.

UR - https://2019.msrconf.org/details/msr-2019-papers/38/PathMiner-A-Library-for-Mining-of-Path-Based-Representations-of-Code

M3 - Article in an anthology

SN - 9781728134123

VL - 2019

SP - 13

EP - 17

BT - Proceedings of the 16th International Conference on Mining Software Repositories

PB - IEEE Computer Society

ER -

Kovalenko V, Bogomolov E, Bryksin T, Bacchelli A. PathMiner: a library for mining of path-based representations of code. In Proceedings of the 16th International Conference on Mining Software Repositories. Vol. 2019. IEEE Computer Society. 2019. p. 13-17. 8816777