April 2, 2026

Harmony Thrive

Superior Health, Meaningful Life

Accelerating AI innovation in healthcare: real-world clinical research applications on the Mayo Clinic Platform

Accelerating AI innovation in healthcare: real-world clinical research applications on the Mayo Clinic Platform

Platform Architecture Overview

The MCP is a secure, cloud-based data science environment designed to accelerate research and innovation through access to large-scale, de-identified, standardized clinical data and integrated analytical tools. The platform architecture is built to ensure scalability, privacy, and accessibility for researchers across diverse disciplines.

Extensive De-identified and Standardized Data Resources: The MCP employs an innovative de-identification and standardization process applied to data from more than 15.1 million patients. To safeguard patient privacy, the platform uses a multilayered de-identification strategy the combines rule-based heuristics and deep learning models to identify and replace personally identifiable information30. These measures ensure full compliance with HIPAA and institutional governance policies. In addition, the platform provides extensive data standardization, including mapping EHR data to standard medical terminologies and common data models. This rich, multimodal dataset enables a wide range of research applications, including AI model training, real-world evidence generation, and clinical insight discovery.

Integrated Research Tools: The MCP provides a comprehensive suite of research tools that streamline the entire data-to-discovery workflow. These tools enable secure data access, exploration, and analytics within a unified platform. Designed for scalability and ease of use, the MCP tool ecosystem supports both technical and non-technical users, promoting efficient, reproducible, and collaborative research across diverse data types while maintaining rigorous standards for privacy, governance, and compliance.

Dedicated Data Science Environment: Researchers access MCP through a secure, cloud-hosted data science environment tailored for their use. This environment integrates the MCP research tools and provides preconfigured support for open-source analytical frameworks such as Python, R, and TensorFlow. It offers controlled, compliant access to de-identified data and high-performance computing resources, enabling seamless model training and evaluation within a managed and privacy-preserving infrastructure.

This architecture establishes MCP as a scalable, privacy-preserving, and AI-ready research environment that enables investigators to generate actionable insights from de-identified real-world data while maintaining the highest standards of security and compliance.

Real-world observational data in MCP

MCP provides access to extensive, high-quality clinical data, including standardized structured data (e.g., diagnoses, lab results, medications) and unstructured data (e.g., clinical notes, images). This de-identified data spans diverse demographics and captures patient journeys over time. Currently, MCP’s datasets include over 15.1 million patient records, 12 billion radiology images, 3.2 billion lab results, and 1.65 billion clinical notes, all accessible through a secure data science environment. In addition to the Mayo specific standardized EHR, MCP also provides EHR data in the OMOP CDM format, which enhances interoperability and allows users to leverage analytic pipelines and tools developed within the OHDSI ecosystem.

MCP tools used in this study

MCP partners with nference, inc31. to make available various tools to accommodate different needs. In this study, since we only used structured EHR data within MCP, the following tools were utilized.

Cohort Visualizer facilitates the quick creation, characterization, and comparison of patient cohorts for hypothesis testing and analysis using EHR data. It supports both structured and unstructured data, offering code-free analytics and intuitive visualization tools. Users can load or create new cohorts and analyze them using graphical or tabular formats by the cohort builder. With user-friendly navigation, it allows users, regardless of technical expertise, to explore vast clinical datasets using standard clinical codes or keywords, helping to accelerate clinical research and address unmet needs in translational medicine. Additionally, for more detailed downstream analysis, it provides SQL code to facilitate data retrieval from the EHR database. Figure 2A shows the user interface of the MCP Cohort Builder, where users can define and filter patient cohorts using structured/unstructured EHR data. Figure 2B illustrates the Cohort Comparison interface, which allows users to visualize and compare cohort characteristics through graphical summaries.

Fig. 2: Interface of the MCP Tools.
figure 2

A, B Interface of the Cohort Visualizer, showing patient cohort creation A and comparison views B. C Interface of the Schema Visualizer, illustrating exploration of the data schema and relationships between tables. D, E Interface of the MCP Workspace, demonstrating coding environments (D, e.g., JupyterLab and RStudio) and integrated computational tools for data analysis and AI model development E.

Schema Visualizer provides an interactive interface for exploring the data dictionary and schema within MCP. It offers detailed information on tables, columns, and their relationships, along with query code examples for downstream data collection (Fig. 2C). Additionally, it features an advanced search tool that enables users to efficiently locate specific tables, columns, or values within the data schema.

Workspaces in MCP offer a comprehensive environment for accessing data and computing resources, supporting advanced analytics and data science workflows. The platform provides scalable computational resources tailored to a variety of research needs. For an individual researcher, the maximum available configuration includes 208 CPU cores, 1872 GB of RAM, and 8 NVIDIA H100 80 GB GPUs, ensuring capacity for complex, data-intensive machine learning workflows. They also provide the latest open-source tools, packages, and libraries for cloud-based computation, with integrated support for JupyterLab, VSCode, and RStudio to accommodate diverse coding needs. This all-in-one platform streamlines data collection, processing, and analysis. Additionally, Workspaces include high-performance computing capabilities for resource-intensive tasks such as data mining, machine learning, and deep learning. They also offer code-level guidance for various applications, including data extraction, large language model (LLM) execution, and medical image processing. Furthermore, users can leverage Git within Workspaces to efficiently manage and collaborate on their repositories in GitHub. Figure 2D, E shows the interface page of the MCP workspace.

Research projects conducted on MCP

To comprehensively showcase MCP’s capabilities across various clinical research scenarios, we designed four distinct projects. Figure 3 illustrates the aims of these projects within their respective clinical research contexts. Detailed descriptions of each project are provided below.

Fig. 3
figure 3

Aims of the Four Clinical Research Projects.

Project 1. Stimulating drug efficacy randomized controlled trials (RCTs) for heart failure (HF) patients using real-world observational clinical data. This project leverages the rich retrospective data available on MCP to stimulate the conditions of traditional randomized controlled trials (RCTs). By doing so, it enables high-quality research that sidesteps the usual costs and ethical concerns associated with traditional RCTs. More specifically, we developed methodologies to stimulate RCTs for evaluating drug efficacy in HF patients using real-world observational data. Key objectives include identifying suitable RCT candidates for stimulation and leveraging EHR data to replicate heart failure drug efficacy trials, thereby enabling robust comparative effectiveness research in the absence of traditional RCTs. Additionally, this project explores the use of the Cohort Visualizer, a code-free analytical tool designed for researchers without a data science background, facilitating accessible and efficient cohort analysis.

Project 2. Impact of antihypertensive medications (AHMs) on Alzheimer’s Disease and Related Dementias (ADRD) risk in hypertensive patients with mild cognitive impairment (MCI). This study aims to validate findings from a prior study32 that suggested AHM use may be associated with a reduced risk of ADRD in hypertensive patients with MCI. Utilizing real-world observational data, the primary objective is to perform survival analysis to assess the relationship between AHM use and ADRD progression. Additionally, the study investigates potential drug-drug interactions between AHMs, statins, and metformin within the target patient cohort, providing further insights into pharmacological influences on dementia risk. This project serves as a simulation of traditional clinical research, employing statistical analysis to assess real-world evidence.

Project 3. Building a Mild Cognitive Impairment (MCI) to Alzheimer’s Disease (AD) progression prediction model using EHR data and deep learning method. This project focuses on training and validating a deep learning model33 to predict the progression from MCI, considered to be a prodrome to dementia34, to AD using longitudinal EHR data. Specifically, it employs the Bidirectional Gated Recurrent Units (BiGRU) deep learning model to forecast MCI progression at varying time intervals, extending up to five years post-diagnosis. Additionally, the study aims to validate the model’s generalizability across diverse datasets and healthcare systems, ensuring its applicability in real-world clinical settings.

Project 4. Developing Deep Learning Model to predict Major Adverse Cardiovascular Events (MACE) After Liver Transplantation (LT). This project focuses on leveraging longitudinal EHR data to develop advanced deep learning models on the MCP for predicting MACE following LT and to compare the performance with our previously developed model based on medical claims data35. By identifying high-risk candidates, the model aids clinicians in risk stratification and informs management strategies to improve transplant outcomes. Additionally, the model highlights key predictive features, enabling physicians to implement targeted preventive measures to reduce the likelihood of adverse cardiovascular events. This study demonstrates the capability of MCP in facilitating deep learning model development for clinical research.

Data collection and analysis approach

The MCP tools have played a crucial role in facilitating these projects by providing a unified platform for cohort development, data extraction, and analysis. Specifically, Project 1 leveraged the Cohort Visualizer to identify RCT candidates. Subsequently, all projects utilized Jupyter Notebook to execute SparkSQL API queries for extracting EHR data from the MCP database. Finally, data analysis—including statistical evaluations and deep learning modeling—was conducted within the Workspace using either R or Python.

Platform accessibility and reusability

The MCP is a subscription-based, cloud-hosted research environment accessible to external users following registration and approval. Researchers, healthcare organizations, and industry partners can register to access MCP’s de-identified datasets and integrated tools by completing the required onboarding process. Once registered, users have access to the same standardized data, analytical tools, and secure computing environments described in this paper. The platform supports both open-source and proprietary components—users can utilize open-source tools (e.g., Python, R, TensorFlow, PyTorch) within the MCP Workspaces, ensuring flexibility and reproducibility. This hybrid model promotes collaboration, scalability, and replicable research while maintaining robust privacy and security protections.

link

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © All rights reserved. | Newsphere by AF themes.