BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation
First, install the required dependencies:
npm i .Then, start the server:
To run locally with Gemini-2.5-flash, run:
npm run build && npm run startTo run locally with GPT-4.1-mini, run:
npm run build && LLM_PROVIDER=openai npm run start- User manual: Read the Docs
- Published paper: IEEE Computer Society Digital Library
- Repository docs and bundled assets:
docs/ - TVCG camera-ready paper PDF:
docs/2025_VIS_BDIViz_Camera_Ready.pdf - TVCG demo video:
docs/bdiviz_video.mp4 - SIGMOD Demo 2026 paper PDF:
docs/_SIGMOD_2026_Demo__BDIViz.pdf - SIGMOD Demo 2026 video:
docs/bdiviz_sigmod_demo_2026_w_sub.mp4 - SIGMOD Demo 2026 video mirror: Google Drive
BDIViz is an interactive web-based application developed as part of the ARPA-H ASKEM project to support schema matching and value mapping tasks in biomedical data integration. It provides users with a rich visual interface—including heatmaps, explanations, and value comparisons—to streamline the process of aligning raw biomedical datasets with standardized data schemas such as the Genomic Data Commons (GDC) and Proteomic Data Commons (PDC).
BDIViz is model agnostic, meaning it can be used with any schema matching model. It is designed to work with the BDI-Kit module, which is a Python library that provides a set of tools for schema matching and value mapping tasks. The BDI-Kit module includes a variety of schema matching algorithms, including supervised and unsupervised methods, as well as tools for data preprocessing and feature extraction.
- 🔍 Interactive Heatmap for exploring source-target column match candidates
- 📊 Value Comparison Table using fuzzy matching on raw values
- 🤖 LLM-Powered Agent Panel for dynamic match explanations and feedback
- ⏪ Timeline View to trace user actions (accept, reject, discard)
- 🎯 Control Panel for adjusting similarity threshold and navigating source columns
- 📤 Export Curated Mappings as JSON or CSV for downstream use
Video demo of SIGMOD demo paper:
Live Demo: https://bdiviz.users.hsrn.nyu.edu/dashboard/
If you use BDIViz in academic work, please cite the following paper:
@ARTICLE{wu2026bdiviz,
author={Wu, Eden and Turakhia, Dishita G and Wu, Guande and Koutras, Christos and Keegan, Sarah and Liu, Wenke and Szeitz, Beata and Fenyo, David and Silva, Claudio T. and Freire, Juliana},
journal={ IEEE Transactions on Visualization \& Computer Graphics },
title={{ BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation }},
year={2026},
volume={32},
number={01},
ISSN={1941-0506},
pages={1208-1218},
abstract={ Biomedical data harmonization is essential for enabling exploratory analyses and meta-studies, but the process of schema matching-identifying semantic correspondences between elements of disparate datasets (schemas)-remains a labor-intensive and error-prone task. Even state-of-the-art automated methods often yield low accuracy when applied to biomedical schemas due to the large number of attributes and nuanced semantic differences between them. We present BDIViz, a novel visual analytics system designed to streamline the schema matching process for biomedical data. Through formative studies with domain experts, we identified key requirements for an effective solution and developed interactive visualization techniques that address both scalability challenges and semantic ambiguity. BDIViz employs an ensemble approach that combines multiple matching methods with LLM-based validation, summarizes matches through interactive heatmaps, and provides coordinated views that enable users to quickly compare attributes and their values. Our method-agnostic design allows the system to integrate various schema matching algorithms and adapt to application-specific needs. Through two biomedical case studies and a within-subject user study with domain experts, we demonstrate that BDIViz significantly improves matching accuracy while reducing cognitive load and curation time compared to baseline approaches. },
keywords={Semantics;Data visualization;Bioinformatics;Accuracy;Visual analytics;Scalability;Graphical user interfaces;Cancer;Space heating;User centered design},
doi={10.1109/TVCG.2025.3634843},
url = {https://doi.ieeecomputersociety.org/10.1109/TVCG.2025.3634843},
publisher={IEEE Computer Society},
address={Los Alamitos, CA, USA},
month=jan}
