Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data

Standard

Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data. / Гаврилов, Никита Олегович ; Корхов, Владимир Владиславович.

Computational Science and Its Applications – ICCSA 2025 Workshops. 2025. p. 219–230 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15894).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Research › peer-review

Harvard

Гаврилов, НО & Корхов, ВВ 2025, Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data. in Computational Science and Its Applications – ICCSA 2025 Workshops. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 15894, pp. 219–230, Computational Science and Its Applications – ICCSA 2025 Workshops, Стамбул, Turkey, 30/06/25. https://doi.org/10.1007/978-3-031-97648-3_15

APA

Гаврилов, Н. О., & Корхов, В. В. (2025). Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data. In Computational Science and Its Applications – ICCSA 2025 Workshops (pp. 219–230). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15894). https://doi.org/10.1007/978-3-031-97648-3_15

Vancouver

Гаврилов НО , Корхов ВВ. Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data. In Computational Science and Its Applications – ICCSA 2025 Workshops. 2025. p. 219–230. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-031-97648-3_15

Author

Гаврилов, Никита Олегович ; Корхов, Владимир Владиславович. / Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data. Computational Science and Its Applications – ICCSA 2025 Workshops. 2025. pp. 219–230 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

BibTeX

@inproceedings{619d78afdbe04fe89c66cb0c195146b7,

title = "Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data",

abstract = "This paper presents a method for constructing a knowledge graph based on patent data, which facilitates the identification of hidden relationships between patents and the organization of information for subsequent analysis. The method involves extracting key textual fields from patent documents and vectorizing them using state-of-the-art transformer models, and building a graph where the nodes represent individual documents, and the edges reflect their semantic proximity. A clustering algorithm is employed to group the patents, ensuring high internal coherence within clusters and reducing the original graph to a compact representation. The resulting clusters are summarized using language models, enabling automatic extraction of significant terms for cluster descriptions. Experimental research conducted on a large corpus of patent data demonstrates the efficacy of the proposed approach, which is confirmed by the relevant partitioning quality metrics. The proposed method improves the interpretation of patent information, facilitating the identification of implicit relationships and structural patterns, which is of great importance for analyzing scientific achievements and managing intellectual property.",

keywords = "Clustering, Knowledge Graph, Patent Data, Text Vectorization",

author = "Гаврилов, {Никита Олегович} and Корхов, {Владимир Владиславович}",

year = "2025",

month = jun,

day = "28",

doi = "10.1007/978-3-031-97648-3_15",

language = "English",

isbn = "9783031976476",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature",

pages = "219–230",

booktitle = "Computational Science and Its Applications – ICCSA 2025 Workshops",

note = "Computational Science and Its Applications – ICCSA 2025 Workshops ; Conference date: 30-06-2025 Through 03-07-2025",

url = "http://iccsa.org",

}

RIS

TY - GEN

T1 - Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data

AU - Гаврилов, Никита Олегович

AU - Корхов, Владимир Владиславович

PY - 2025/6/28

Y1 - 2025/6/28

N2 - This paper presents a method for constructing a knowledge graph based on patent data, which facilitates the identification of hidden relationships between patents and the organization of information for subsequent analysis. The method involves extracting key textual fields from patent documents and vectorizing them using state-of-the-art transformer models, and building a graph where the nodes represent individual documents, and the edges reflect their semantic proximity. A clustering algorithm is employed to group the patents, ensuring high internal coherence within clusters and reducing the original graph to a compact representation. The resulting clusters are summarized using language models, enabling automatic extraction of significant terms for cluster descriptions. Experimental research conducted on a large corpus of patent data demonstrates the efficacy of the proposed approach, which is confirmed by the relevant partitioning quality metrics. The proposed method improves the interpretation of patent information, facilitating the identification of implicit relationships and structural patterns, which is of great importance for analyzing scientific achievements and managing intellectual property.

AB - This paper presents a method for constructing a knowledge graph based on patent data, which facilitates the identification of hidden relationships between patents and the organization of information for subsequent analysis. The method involves extracting key textual fields from patent documents and vectorizing them using state-of-the-art transformer models, and building a graph where the nodes represent individual documents, and the edges reflect their semantic proximity. A clustering algorithm is employed to group the patents, ensuring high internal coherence within clusters and reducing the original graph to a compact representation. The resulting clusters are summarized using language models, enabling automatic extraction of significant terms for cluster descriptions. Experimental research conducted on a large corpus of patent data demonstrates the efficacy of the proposed approach, which is confirmed by the relevant partitioning quality metrics. The proposed method improves the interpretation of patent information, facilitating the identification of implicit relationships and structural patterns, which is of great importance for analyzing scientific achievements and managing intellectual property.

KW - Clustering

KW - Knowledge Graph

KW - Patent Data

KW - Text Vectorization

UR - https://www.mendeley.com/catalogue/949ef5aa-3301-3017-88ae-01d23772908a/

U2 - 10.1007/978-3-031-97648-3_15

DO - 10.1007/978-3-031-97648-3_15

M3 - Conference contribution

SN - 9783031976476

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 219

EP - 230

BT - Computational Science and Its Applications – ICCSA 2025 Workshops

T2 - Computational Science and Its Applications – ICCSA 2025 Workshops

Y2 - 30 June 2025 through 3 July 2025

ER -

ID: 138833426