DOI

Applying machine learning to tasks that operate with code changes requires their numerical representation. In this work, we propose an approach for obtaining such representations during pre-training and evaluate them on two different downstream tasks - applying changes to code and commit message generation. During pre-training, the model learns to apply the given code change in a correct way. This task requires only code changes themselves, which makes it unsupervised. In the task of applying code changes, our model outperforms baseline models by 5.9 percentage points in accuracy. As for the commit message generation, our model demonstrated the same results as supervised models trained for this specific task, which indicates that it can encode code changes well and can be improved in the future by pre-training on a larger dataset of easily gathered code changes.

Original languageEnglish
Title of host publicationMaLTESQuE 2021 - Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution, co-located with ESEC/FSE 2021
EditorsApostolos Ampatzoglou, Daniel Feitosa, Gemma Catolino, Valentina Lenarduzzi
PublisherAssociation for Computing Machinery
Pages7-12
Number of pages6
ISBN (Electronic)9781450386258
DOIs
StatePublished - 23 Aug 2021
Event5th International Workshop on Machine Learning Techniques for Software Quality Evolution, MaLTESQuE 2021, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 - Virtual, Online, Greece
Duration: 23 Aug 2021 → …

Conference

Conference5th International Workshop on Machine Learning Techniques for Software Quality Evolution, MaLTESQuE 2021, co-located with the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021
Country/TerritoryGreece
CityVirtual, Online
Period23/08/21 → …

    Research areas

  • Code changes, Commit message generation, Unsupervised learning

    Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Software
  • Safety, Risk, Reliability and Quality

ID: 87612403