Research output: Contribution to journal › Article › peer-review
Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data. / Gorbatovski, A.V.; Razin, A.D.; Aliev, A.A.; Kovalchuk, S.V.
In: Scientific and Technical Journal of Information Technologies, Mechanics and Optics, Vol. 24, No. 6, 2024, p. 1024-1034.Research output: Contribution to journal › Article › peer-review
}
TY - JOUR
T1 - Improving question answering in programming domain with pretrained language model finetuning using structured diverse online forum data
AU - Gorbatovski, A.V.
AU - Razin, A.D.
AU - Aliev, A.A.
AU - Kovalchuk, S.V.
N1 - Export Date: 01 November 2025; Cited By: 0; Correspondence Address: S.V. Kovalchuk; ITMO University, Saint Petersburg, 197101, Russian Federation; email: kovalchuk@itmo.ru
PY - 2024
Y1 - 2024
N2 - Today, Community Question Answering (CQA) forums such as Stack Overflow are becoming an irreplaceable tool for software developers, providing fast and efficient solution search and prompt community response. Although modern Pretrained Language Models (PLMs), also trained including on data from such forums, have the potential to automate answering of domain-specific questions, they often show significant limitations in complex domains such as programming due to the heterogeneity of the domain and variety in contexts of the questions being asked. In our study, we propose an approach to solving this problem based on structuring data in a complex domain. The first stage includes decomposing available forum data with the selection of thematic subsets. Next, for individual topics, models are finetuned using Reinforcement Learning with Human Feedback (RLHF) using the voting available in the forum data. Finally, to manage the ensemble of finetuned models, question classification is used with subsequent selection of the appropriate model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow, using the Llama 7B model as the base language model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow forum using the Llama 7B model as a base PLM. The results of the studies showed that by classifying questions we can improve the model performance up to +22.5 % on the Rouge metric. Moreover, the inclusion of RLHF resulted in an additional improvement of up to +11.2 %. To validate these results, we performed human evaluation of the generated responses, which confirmed the effectiveness of our approach. This study shows that by structuring community data and processing implicit feedback, we can significantly improve PLM performance in CQA tasks in complex domains characterized by high heterogeneity, such as software development. © 2024 Elsevier B.V., All rights reserved.
AB - Today, Community Question Answering (CQA) forums such as Stack Overflow are becoming an irreplaceable tool for software developers, providing fast and efficient solution search and prompt community response. Although modern Pretrained Language Models (PLMs), also trained including on data from such forums, have the potential to automate answering of domain-specific questions, they often show significant limitations in complex domains such as programming due to the heterogeneity of the domain and variety in contexts of the questions being asked. In our study, we propose an approach to solving this problem based on structuring data in a complex domain. The first stage includes decomposing available forum data with the selection of thematic subsets. Next, for individual topics, models are finetuned using Reinforcement Learning with Human Feedback (RLHF) using the voting available in the forum data. Finally, to manage the ensemble of finetuned models, question classification is used with subsequent selection of the appropriate model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow, using the Llama 7B model as the base language model. Experimental studies were conducted on a subset of Python-related questions from Stack Overflow forum using the Llama 7B model as a base PLM. The results of the studies showed that by classifying questions we can improve the model performance up to +22.5 % on the Rouge metric. Moreover, the inclusion of RLHF resulted in an additional improvement of up to +11.2 %. To validate these results, we performed human evaluation of the generated responses, which confirmed the effectiveness of our approach. This study shows that by structuring community data and processing implicit feedback, we can significantly improve PLM performance in CQA tasks in complex domains characterized by high heterogeneity, such as software development. © 2024 Elsevier B.V., All rights reserved.
KW - finetuning
KW - large language models
KW - natural language generation
KW - natural language processing
KW - pretrained language models
KW - question answering
KW - software development
U2 - 10.17586/2226-1494-2024-24-6-1024-1034
DO - 10.17586/2226-1494-2024-24-6-1024-1034
M3 - статья
VL - 24
SP - 1024
EP - 1034
JO - Scientific and Technical Journal of Information Technologies, Mechanics and Optics
JF - Scientific and Technical Journal of Information Technologies, Mechanics and Optics
SN - 2226-1494
IS - 6
ER -
ID: 143413251