Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data. / Гаврилов, Никита Олегович; Корхов, Владимир Владиславович.
Computational Science and Its Applications – ICCSA 2025 Workshops. 2025. p. 219–230 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 15894).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › peer-review
}
TY - GEN
T1 - Language Model-Based Algorithm for Constructing Knowledge Graphs from Patent Data
AU - Гаврилов, Никита Олегович
AU - Корхов, Владимир Владиславович
PY - 2025/6/28
Y1 - 2025/6/28
N2 - This paper presents a method for constructing a knowledge graph based on patent data, which facilitates the identification of hidden relationships between patents and the organization of information for subsequent analysis. The method involves extracting key textual fields from patent documents and vectorizing them using state-of-the-art transformer models, and building a graph where the nodes represent individual documents, and the edges reflect their semantic proximity. A clustering algorithm is employed to group the patents, ensuring high internal coherence within clusters and reducing the original graph to a compact representation. The resulting clusters are summarized using language models, enabling automatic extraction of significant terms for cluster descriptions. Experimental research conducted on a large corpus of patent data demonstrates the efficacy of the proposed approach, which is confirmed by the relevant partitioning quality metrics. The proposed method improves the interpretation of patent information, facilitating the identification of implicit relationships and structural patterns, which is of great importance for analyzing scientific achievements and managing intellectual property.
AB - This paper presents a method for constructing a knowledge graph based on patent data, which facilitates the identification of hidden relationships between patents and the organization of information for subsequent analysis. The method involves extracting key textual fields from patent documents and vectorizing them using state-of-the-art transformer models, and building a graph where the nodes represent individual documents, and the edges reflect their semantic proximity. A clustering algorithm is employed to group the patents, ensuring high internal coherence within clusters and reducing the original graph to a compact representation. The resulting clusters are summarized using language models, enabling automatic extraction of significant terms for cluster descriptions. Experimental research conducted on a large corpus of patent data demonstrates the efficacy of the proposed approach, which is confirmed by the relevant partitioning quality metrics. The proposed method improves the interpretation of patent information, facilitating the identification of implicit relationships and structural patterns, which is of great importance for analyzing scientific achievements and managing intellectual property.
KW - Clustering
KW - Knowledge Graph
KW - Patent Data
KW - Text Vectorization
UR - https://www.mendeley.com/catalogue/949ef5aa-3301-3017-88ae-01d23772908a/
U2 - 10.1007/978-3-031-97648-3_15
DO - 10.1007/978-3-031-97648-3_15
M3 - Conference contribution
SN - 9783031976476
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 219
EP - 230
BT - Computational Science and Its Applications – ICCSA 2025 Workshops
T2 - Computational Science and Its Applications – ICCSA 2025 Workshops
Y2 - 30 June 2025 through 3 July 2025
ER -
ID: 138833426