Abstract

Various software features such as classes, methods, requirements, and tests often have similar func-tionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled dupli-cates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of dupli-cate detection in software documentation is of importance. Solving it makes planned reuse possible, as wellas creating and using templates for unification and automatic generation of documentation. In this paper, wepresent an approach for interactive detection of near duplicates that involves the user in order to conductmeaningful search. It includes a new formal definition of a near duplicate, a pattern-based , and the proof ofits completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of sev-eral industrial projects.
Original languageRussian
Pages (from-to)346-355
JournalProgramming and Computer Software
Volume45
Issue number6
Publication statusPublished - Nov 2019

Cite this

@article{7489af839203418f80405a2c2effbdad,
title = "Interactive Near Duplicate Search in Software Documentation",
abstract = "Various software features such as classes, methods, requirements, and tests often have similar func-tionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled dupli-cates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of dupli-cate detection in software documentation is of importance. Solving it makes planned reuse possible, as wellas creating and using templates for unification and automatic generation of documentation. In this paper, wepresent an approach for interactive detection of near duplicates that involves the user in order to conductmeaningful search. It includes a new formal definition of a near duplicate, a pattern-based , and the proof ofits completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of sev-eral industrial projects.",
author = "Luciv, {D. V.} and Koznov, {D. V.} and Shelikhovskii, {A. A.} and Romanovsky, {K. Yu.} and Chernishev, {G. A.} and Terekhov, {A. N.} and Grigoriev, {D. A.} and Smirnova, {A. N.} and Borovkov, {D. V.} and Vasenina, {A. I.}",
year = "2019",
month = "11",
language = "русский",
volume = "45",
pages = "346--355",
journal = "Programming and Computer Software",
issn = "0361-7688",
publisher = "МАИК {"}Наука/Интерпериодика{"}",
number = "6",

}

TY - JOUR

T1 - Interactive Near Duplicate Search in Software Documentation

AU - Luciv, D. V.

AU - Koznov, D. V.

AU - Shelikhovskii, A. A.

AU - Romanovsky, K. Yu.

AU - Chernishev, G. A.

AU - Terekhov, A. N.

AU - Grigoriev, D. A.

AU - Smirnova, A. N.

AU - Borovkov, D. V.

AU - Vasenina, A. I.

PY - 2019/11

Y1 - 2019/11

N2 - Various software features such as classes, methods, requirements, and tests often have similar func-tionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled dupli-cates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of dupli-cate detection in software documentation is of importance. Solving it makes planned reuse possible, as wellas creating and using templates for unification and automatic generation of documentation. In this paper, wepresent an approach for interactive detection of near duplicates that involves the user in order to conductmeaningful search. It includes a new formal definition of a near duplicate, a pattern-based , and the proof ofits completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of sev-eral industrial projects.

AB - Various software features such as classes, methods, requirements, and tests often have similar func-tionality. This can lead to emergence of duplicates in their descriptive documentation. Uncontrolled dupli-cates created via copy/paste hinder the process of documentation maintenance. Therefore, the task of dupli-cate detection in software documentation is of importance. Solving it makes planned reuse possible, as wellas creating and using templates for unification and automatic generation of documentation. In this paper, wepresent an approach for interactive detection of near duplicates that involves the user in order to conductmeaningful search. It includes a new formal definition of a near duplicate, a pattern-based , and the proof ofits completeness. Moreover, we demonstrate the results of experimenting on a collection of documents of sev-eral industrial projects.

M3 - статья

VL - 45

SP - 346

EP - 355

JO - Programming and Computer Software

JF - Programming and Computer Software

SN - 0361-7688

IS - 6

ER -