Extraction of Archetype from Near Duplicates in Software Documentation

Standard

Extraction of Archetype from Near Duplicates in Software Documentation. / Луцив, Дмитрий Вадимович ; Кознов, Дмитрий Владимирович ; Чернышев, Георгий Алексеевич ; Григорьев, Дмитрий Алексеевич.

2019 Actual Problems of Systems and Software Engineering (APSSE). 2019. p. 126-130.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review

Harvard

Луцив, ДВ , Кознов, ДВ , Чернышев, ГА & Григорьев, ДА 2019, Extraction of Archetype from Near Duplicates in Software Documentation. in 2019 Actual Problems of Systems and Software Engineering (APSSE). pp. 126-130, 6th International Conference Actual Problems of System and Software Engineering, APSSE 2019, Moscow, Russian Federation, 12/11/19. https://doi.org/10.1109/APSSE47353.2019.00023

BibTeX

@inproceedings{0286a7f97b22420ba5e64e8cb7f651f0,

title = "Extraction of Archetype from Near Duplicates in Software Documentation",

abstract = "Software documentation contains a large amount of duplicate text, which is often comprised of near duplicates —repetitions of the same text with slight differences. They emerge due to numerous copy-pastes that have been slightly modiﬁed. Uncontrolled near duplicates complicate documentation support to a signiﬁcant degree. There are some research papers on detection and management of duplicates in software documentation, but only the Duplicate Finder approach addresses the problem of near duplicates. Nevertheless, Duplicate Finder{\textquoteright}s search algorithms do not provide extraction of archetype (common text) for detected groups of near duplicates (a set of near duplicates belong to one group if they have a lot of commonalities). Archetype of group can be used in visualization of the common text and differences of duplicates for manual analysis, as well as for reuse of documentation. In this paper, we present an algorithm for archetype extraction and results of experiments on documentation of several well-known open source Java projects JUnit, Mockito, SLF4J.",

author = "Луцив, {Дмитрий Вадимович} and Кознов, {Дмитрий Владимирович} and Чернышев, {Георгий Алексеевич} and Григорьев, {Дмитрий Алексеевич}",

year = "2019",

month = nov,

doi = "10.1109/APSSE47353.2019.00023",

language = "English",

pages = "126--130",

booktitle = "2019 Actual Problems of Systems and Software Engineering (APSSE)",

note = "6th International Conference Actual Problems of System and Software Engineering, APSSE 2019, APSSE ; Conference date: 12-11-2019 Through 14-11-2019",

url = "https://apspe.hse.ru/en/2019/",

}

RIS

TY - GEN

T1 - Extraction of Archetype from Near Duplicates in Software Documentation

AU - Луцив, Дмитрий Вадимович

AU - Кознов, Дмитрий Владимирович

AU - Чернышев, Георгий Алексеевич

AU - Григорьев, Дмитрий Алексеевич

N1 - Conference code: 2019

PY - 2019/11

Y1 - 2019/11

N2 - Software documentation contains a large amount of duplicate text, which is often comprised of near duplicates —repetitions of the same text with slight differences. They emerge due to numerous copy-pastes that have been slightly modiﬁed. Uncontrolled near duplicates complicate documentation support to a signiﬁcant degree. There are some research papers on detection and management of duplicates in software documentation, but only the Duplicate Finder approach addresses the problem of near duplicates. Nevertheless, Duplicate Finder’s search algorithms do not provide extraction of archetype (common text) for detected groups of near duplicates (a set of near duplicates belong to one group if they have a lot of commonalities). Archetype of group can be used in visualization of the common text and differences of duplicates for manual analysis, as well as for reuse of documentation. In this paper, we present an algorithm for archetype extraction and results of experiments on documentation of several well-known open source Java projects JUnit, Mockito, SLF4J.

AB - Software documentation contains a large amount of duplicate text, which is often comprised of near duplicates —repetitions of the same text with slight differences. They emerge due to numerous copy-pastes that have been slightly modiﬁed. Uncontrolled near duplicates complicate documentation support to a signiﬁcant degree. There are some research papers on detection and management of duplicates in software documentation, but only the Duplicate Finder approach addresses the problem of near duplicates. Nevertheless, Duplicate Finder’s search algorithms do not provide extraction of archetype (common text) for detected groups of near duplicates (a set of near duplicates belong to one group if they have a lot of commonalities). Archetype of group can be used in visualization of the common text and differences of duplicates for manual analysis, as well as for reuse of documentation. In this paper, we present an algorithm for archetype extraction and results of experiments on documentation of several well-known open source Java projects JUnit, Mockito, SLF4J.

U2 - 10.1109/APSSE47353.2019.00023

DO - 10.1109/APSSE47353.2019.00023

M3 - Conference contribution

SP - 126

EP - 130

BT - 2019 Actual Problems of Systems and Software Engineering (APSSE)

T2 - 6th International Conference Actual Problems of System and Software Engineering, APSSE 2019

Y2 - 12 November 2019 through 14 November 2019

ER -

ID: 50717084