Various software features such as classes, methods, requirements, and tests often have similar func-tionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled dupli-cates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of dupli-cate detection in software documentation is of importance. Solving it makes planned reuse possible, as wellas creating and using templates for unification and automatic generation of documentation. In this paper, wepresent an approach for interactive detection of near duplicates that involves the user in order to conductmeaningful search. It includes a new formal definition of a near duplicate, a pattern-based , and the proof ofits completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of sev-eral industrial projects.
|Journal||Programming and Computer Software|
|Publication status||Published - Nov 2019|