The healthcare industry faces significant challenges in leveraging patient data across institutions while maintaining privacy, particularly when third-party organizations like insurance companies and banks require medical information for risk assessment. The rapid advancement of large-scale multimodal models, such as Contrastive Language-Image Pre-training (CLIP), holds immense potential for medical applications by enabling cross-modal alignment of visual and textual data. This paper presents a novel framework that combines vertical federated learning with CLIP to enable privacy-preserving medical image analysis across institutional boundaries. Our framework allows secure analysis of distributed medical data without raw data sharing, while optimizing CLIP’s performance for medical applications through Context Optimization. Experimental validation on a dataset of 7023 brain MRI scans demonstrates the framework’s effectiveness, achieving 93.1% accuracy in classifying four types of brain conditions (glioma, meningioma, pituitary, and no tumor) - a substantial improvement from the original pre-trained CLIP model’s 26.3% accuracy. These results establish a practical solution for secure, cross-institutional medical data analysis that maintains patient privacy while enabling critical business decisions in healthcare, insurance, and financial sectors.
Original languageEnglish
Title of host publicationComputational Science and Its Applications – ICCSA 2025. ICCSA 2025. Lecture Notes in Computer Science,
PublisherSpringer Nature
Pages320-331
Number of pages12
ISBN (Electronic)978-3-031-96997-3
ISBN (Print)978-3-031-96996-6
DOIs
StatePublished - 2025
EventComputational Science and Its Applications – ICCSA 2025 Workshops - Istanbul, Turkey
Duration: 30 Jun 20253 Jul 2025

Publication series

NameLecture Notes in Computer Science
Volume15649

Conference

ConferenceComputational Science and Its Applications – ICCSA 2025 Workshops
Country/TerritoryTurkey
CityIstanbul
Period30/06/253/07/25

    Research areas

  • Cross-institutional analysis, Prompt tuning, Vertical federated learning, Vision-language model

ID: 138420634