Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data

DOI

https://doi.org/10.1007/978-3-031-97648-3_15
Final published version

This paper presents a method for constructing a knowledge graph based on patent data, which facilitates the identification of hidden relationships between patents and the organization of information for subsequent analysis. The method involves extracting key textual fields from patent documents and vectorizing them using state-of-the-art transformer models, and building a graph where the nodes represent individual documents, and the edges reflect their semantic proximity. A clustering algorithm is employed to group the patents, ensuring high internal coherence within clusters and reducing the original graph to a compact representation. The resulting clusters are summarized using language models, enabling automatic extraction of significant terms for cluster descriptions. Experimental research conducted on a large corpus of patent data demonstrates the efficacy of the proposed approach, which is confirmed by the relevant partitioning quality metrics. The proposed method improves the interpretation of patent information, facilitating the identification of implicit relationships and structural patterns, which is of great importance for analyzing scientific achievements and managing intellectual property.

Original language	English
Title of host publication	Computational Science and Its Applications – ICCSA 2025 Workshops
Pages	219–230
Number of pages	12
DOIs	https://doi.org/10.1007/978-3-031-97648-3_15
State	Published - 28 Jun 2025
Event	Computational Science and Its Applications – ICCSA 2025 Workshops - Стамбул, Turkey Duration: 30 Jun 2025 → 3 Jul 2025 http://iccsa.org

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Publisher	Springer Nature
Volume	15894
ISSN (Print)	0302-9743

Conference

Conference	Computational Science and Its Applications – ICCSA 2025 Workshops
Abbreviated title	ICCSA
Country/Territory	Turkey
City	Стамбул
Period	30/06/25 → 3/07/25
Internet address	http://iccsa.org

Research areas

Clustering, Knowledge Graph, Patent Data, Text Vectorization

ID: 138833426