Interactive Near Duplicate Search in Software Documentation

Various software features such as classes, methods, requirements, and tests often have similar func-tionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled dupli-cates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of dupli-cate detection in software documentation is of importance. Solving it makes planned reuse possible, as wellas creating and using templates for unification and automatic generation of documentation. In this paper, wepresent an approach for interactive detection of near duplicates that involves the user in order to conductmeaningful search. It includes a new formal definition of a near duplicate, a pattern-based , and the proof ofits completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of sev-eral industrial projects.

Original language	Russian
Pages (from-to)	346-355
Journal	Programming and Computer Software
Volume	45
Issue number	6
State	Published - Nov 2019

ID: 49216378