Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
Unsupervised learning of general-purpose embeddings for code changes. / Pravilov, Mikhail; Bogomolov, Egor; Golubev, Yaroslav; Bryksin, Timofey.
MaLTESQuE 2021 - Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution, co-located with ESEC/FSE 2021. ed. / Apostolos Ampatzoglou; Daniel Feitosa; Gemma Catolino; Valentina Lenarduzzi. Association for Computing Machinery, 2021. p. 7-12.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review
}
TY - GEN
T1 - Unsupervised learning of general-purpose embeddings for code changes
AU - Pravilov, Mikhail
AU - Bogomolov, Egor
AU - Golubev, Yaroslav
AU - Bryksin, Timofey
N1 - Publisher Copyright: © 2021 ACM.
PY - 2021/8/23
Y1 - 2021/8/23
N2 - Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different downstream tasks - applying changes to code and commit message generation. During pre-training, the model learns to apply the given code change in a correct way. This task requires only code changes themselves, which makes it unsupervised. In the task of applying code changes, our model outperforms baseline models by 5.9 percentage points in accuracy. As for the commit message generation, our model demonstrated the same results as supervised models trained for this specific task, which indicates that it can encode code changes well and can be improved in the future by pre-training on a larger dataset of easily gathered code changes.
AB - Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different downstream tasks - applying changes to code and commit message generation. During pre-training, the model learns to apply the given code change in a correct way. This task requires only code changes themselves, which makes it unsupervised. In the task of applying code changes, our model outperforms baseline models by 5.9 percentage points in accuracy. As for the commit message generation, our model demonstrated the same results as supervised models trained for this specific task, which indicates that it can encode code changes well and can be improved in the future by pre-training on a larger dataset of easily gathered code changes.
KW - Code changes
KW - Commit message generation
KW - Unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85113878057&partnerID=8YFLogxK
UR - https://www.mendeley.com/catalogue/39d303e5-f613-3b94-b7c1-fb3fb9b616be/
U2 - 10.1145/3472674.3473979
DO - 10.1145/3472674.3473979
M3 - Conference contribution
AN - SCOPUS:85113878057
SP - 7
EP - 12
BT - MaLTESQuE 2021 - Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution, co-located with ESEC/FSE 2021
A2 - Ampatzoglou, Apostolos
A2 - Feitosa, Daniel
A2 - Catolino, Gemma
A2 - Lenarduzzi, Valentina
PB - Association for Computing Machinery
T2 - 5th International Workshop on Machine Learning Techniques for Software Quality Evolution, MaLTESQuE 2021, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Y2 - 23 August 2021
ER -
ID: 87612403