Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data

Standard

Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data. / Gorbatovski, A.V.; Razin, A.D.; Aliev, A.A.; Kovalchuk, S.V.

In: Scientific and Technical Journal of Information Technologies, Mechanics and Optics, Vol. 24, No. 6, 2024, p. 1024-1034.

Research output: Contribution to journal › Article › peer-review

Author

Gorbatovski, A.V. ; Razin, A.D. ; Aliev, A.A. ; Kovalchuk, S.V. / Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data. In: Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2024 ; Vol. 24, No. 6. pp. 1024-1034.

BibTeX

@article{5249bd9570ca449b9fe5ef9c5a4a4df5,

title = "Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data",

abstract = "Today, Community Question Answering (CQA) forums such as Stack Overflow are becoming an irreplaceable tool for software developers, providing fast and efficient solution search and prompt community response. Although modern Pretrained Language Models (PLMs), also trained including on data from such forums, have the potential to automate answering of domain-specific questions, they often show significant limitations in complex domains such as programming due to the heterogeneity of the domain and variety in contexts of the questions being asked. In our study, we propose an approach to solving this problem based on structuring data in a complex domain. The first stage includes decomposing available forum data with the selection of thematic subsets. Next, for individual topics, models are finetuned using Reinforcement Learning with Human Feedback (RLHF) using the voting available in the forum data. Finally, to manage the ensemble of finetuned models, question classification is used with subsequent selection of the appropriate model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow, using the Llama 7B model as the base language model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow forum using the Llama 7B model as a base PLM. The results of the studies showed that by classifying questions we can improve the model performance up to +22.5 % on the Rouge metric. Moreover, the inclusion of RLHF resulted in an additional improvement of up to +11.2 %. To validate these results, we performed human evaluation of the generated responses, which confirmed the effectiveness of our approach. This study shows that by structuring community data and processing implicit feedback, we can significantly improve PLM performance in CQA tasks in complex domains characterized by high heterogeneity, such as software development. {\textcopyright} 2024 Elsevier B.V., All rights reserved.",

keywords = "finetuning, large language models, natural language generation, natural language processing, pretrained language models, question answering, software development",

author = "A.V. Gorbatovski and A.D. Razin and A.A. Aliev and S.V. Kovalchuk",

note = "Export Date: 01 November 2025; Cited By: 0; Correspondence Address: S.V. Kovalchuk; ITMO University, Saint Petersburg, 197101, Russian Federation; email: kovalchuk@itmo.ru",

year = "2024",

doi = "10.17586/2226-1494-2024-24-6-1024-1034",

language = "Английский",

volume = "24",

pages = "1024--1034",

journal = "Scientific and Technical Journal of Information Technologies, Mechanics and Optics",

issn = "2226-1494",

publisher = "НИУ ИТМО",

number = "6",

}

RIS

TY - JOUR

T1 - Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data

AU - Gorbatovski, A.V.

AU - Razin, A.D.

AU - Aliev, A.A.

AU - Kovalchuk, S.V.

N1 - Export Date: 01 November 2025; Cited By: 0; Correspondence Address: S.V. Kovalchuk; ITMO University, Saint Petersburg, 197101, Russian Federation; email: kovalchuk@itmo.ru

PY - 2024

Y1 - 2024

N2 - Today, Community Question Answering (CQA) forums such as Stack Overflow are becoming an irreplaceable tool for software developers, providing fast and efficient solution search and prompt community response. Although modern Pretrained Language Models (PLMs), also trained including on data from such forums, have the potential to automate answering of domain-specific questions, they often show significant limitations in complex domains such as programming due to the heterogeneity of the domain and variety in contexts of the questions being asked. In our study, we propose an approach to solving this problem based on structuring data in a complex domain. The first stage includes decomposing available forum data with the selection of thematic subsets. Next, for individual topics, models are finetuned using Reinforcement Learning with Human Feedback (RLHF) using the voting available in the forum data. Finally, to manage the ensemble of finetuned models, question classification is used with subsequent selection of the appropriate model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow, using the Llama 7B model as the base language model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow forum using the Llama 7B model as a base PLM. The results of the studies showed that by classifying questions we can improve the model performance up to +22.5 % on the Rouge metric. Moreover, the inclusion of RLHF resulted in an additional improvement of up to +11.2 %. To validate these results, we performed human evaluation of the generated responses, which confirmed the effectiveness of our approach. This study shows that by structuring community data and processing implicit feedback, we can significantly improve PLM performance in CQA tasks in complex domains characterized by high heterogeneity, such as software development. © 2024 Elsevier B.V., All rights reserved.

AB - Today, Community Question Answering (CQA) forums such as Stack Overflow are becoming an irreplaceable tool for software developers, providing fast and efficient solution search and prompt community response. Although modern Pretrained Language Models (PLMs), also trained including on data from such forums, have the potential to automate answering of domain-specific questions, they often show significant limitations in complex domains such as programming due to the heterogeneity of the domain and variety in contexts of the questions being asked. In our study, we propose an approach to solving this problem based on structuring data in a complex domain. The first stage includes decomposing available forum data with the selection of thematic subsets. Next, for individual topics, models are finetuned using Reinforcement Learning with Human Feedback (RLHF) using the voting available in the forum data. Finally, to manage the ensemble of finetuned models, question classification is used with subsequent selection of the appropriate model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow, using the Llama 7B model as the base language model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow forum using the Llama 7B model as a base PLM. The results of the studies showed that by classifying questions we can improve the model performance up to +22.5 % on the Rouge metric. Moreover, the inclusion of RLHF resulted in an additional improvement of up to +11.2 %. To validate these results, we performed human evaluation of the generated responses, which confirmed the effectiveness of our approach. This study shows that by structuring community data and processing implicit feedback, we can significantly improve PLM performance in CQA tasks in complex domains characterized by high heterogeneity, such as software development. © 2024 Elsevier B.V., All rights reserved.

KW - finetuning

KW - large language models

KW - natural language generation

KW - natural language processing

KW - pretrained language models

KW - question answering

KW - software development

U2 - 10.17586/2226-1494-2024-24-6-1024-1034

DO - 10.17586/2226-1494-2024-24-6-1024-1034

M3 - статья

VL - 24

SP - 1024

EP - 1034

JO - Scientific and Technical Journal of Information Technologies, Mechanics and Optics

JF - Scientific and Technical Journal of Information Technologies, Mechanics and Optics

SN - 2226-1494

IS - 6

ER -

ID: 143413251