By Kathleen Gavin, Karin Eisinger
In the first half of 2023 the White House, the FDA, and the NIH have all highlighted the importance of advanced data ecosystems to accelerate major achievements in human healthcare and research. Their publications point to federal recognition that we have reached a critical tipping point for data connectivity. The White House, in its National Strategy to Advance Privacy-Preserving Data Sharing and Analytics, laid out a proposed path to advance privacy preserving data sharing and analytics (PPDSA), with the overarching goal of catalyzing American innovation and creativity by facilitating data linkage. Similarly, President Biden’s Bold Goals for U.S. Biotechnology and Biomanufacturing describes a data initiative with the goal of ensuring high-quality, wide-ranging, easily accessible, and secure biological datasets aimed at driving breakthroughs for the U.S. bioeconomy.
The White House is not alone in recognizing the critical importance of these linked datasets. In March, the FDA released draft guidance for Clinical Trial Considerations to Support Accelerated Approval of Oncology Therapeutics, recognizing that many post marketing requirements of confirmatory studies to verify clinical benefit are often not submitted on time, or at all. In addition to recommending randomized controlled trials (RCTs) as the preferred approach to support an accelerated approval application, they believe that RCT participants should undergo long term follow-up studies for verification of clinical benefit. This is a direct example of where data connectivity, in this case between a clinical trial and real world health data, could play a central role in fulfilling long term follow-up requirements by automating data collection, reducing participant and site burden, minimizing study costs, limiting attrition, ultimately facilitating advances and hastening timelines for regulatory submissions.
A Framework for Operationalizing Healthcare Data
With this example at the forefront, it is clear that operationalizing data linkage and interoperability, while preserving privacy is a critical next step in maximizing the use of healthcare data.
Currently, multiple federally funded programs are developing data ecosystems by therapeutic area. For example, the Cancer Research Data Commons (CRDC) is the National Cancer Institute (NCI) supported cloud-based data science infrastructure aimed at facilitating data connectivity to drive discovery, surveillance, and clinical care in oncology. The CRDC, like other NIH sponsored data repositories, is a reliable central hub for research grade datasets, such as proteomic, genomic, imaging, and clinical trial data. Even so, the NCI recognizes that a more robust data ecosystem is necessary for meeting the ambitious goals of the Cancer Moonshot, publishing its National Cancer Plan in April. The plan provides a framework for collaboration across government and society and establishes eight goals that must be achieved for the Cancer Moonshot to be successful. Maximizing data utility is goal number seven, aspiring to a future where “secure sharing of privacy-protected health data is standard practice throughout research, and researchers share and use available data to achieve rapid progress against cancer.” In order to grow the Cancer Moonshot Data Ecosystem to this ambitious potential, driving scientific discovery and increasing the speed of translating precision medicine into clinical practice, data connectivity across both research and real world data (RWD) sources will be necessary all while protecting patient privacy for broad utilization.
Privacy Preserving Record Linkage Integration
The integration of a privacy preserving record linkage (PPRL) tool is well positioned to be a critical facilitator of achieving these data sharing goals. PPRL allows secure and private linkage of data for an individual across different datasets. This approach could accelerate data connectivity among historically unlinkable datasets as well as facilitate availability of data to the broader research and clinical community. Precedent for PPRL in federally funded programming was set by NCATS National COVID Cohort Collaborative (N3C), which created the largest national, publicly available patient-level limited dataset in U.S. history, harmonizing electronic health record (EHR) data from hundreds of health systems across the U.S. The N3C has unlocked numerous important insights into COVID-19, an example of what could be possible with similar data interoperability integrations in other therapeutic areas.
As of January 25, 2023 the NIH officially established a requirement for data sharing for all federally funded research. The intention of this policy, similar to the missions described above, is to accelerate discovery and promote data-reuse for future research studies, which should be an exciting step in the future of clinical research. However, without the proper steps taken to share data in a manner in which it can truly be reused and linked in a reliable, repeatable, and secure manner, most clinical research data will continue to be siloed in data repositories with no way of maximizing its true value.
Implementation of PPRL for federally funded clinical research results as part of the data sharing process would facilitate the activation of research data in a whole new way. Patients or study participants can (and should when possible) still provide informed consent for sharing their anonymized data using PPRL, keeping the public informed of the important use cases for data sharing and ideally leading them to trust its utility. Using this model, for the first time, linkage of clinical trial data and basic biology studies with RWD sources such as social determinants of health (SDOH) data, environmental, lifestyle and other phenotypic data, EHR, claims, pharmacy, and digital-wearable/remote patient monitoring could lead to the generation of truly novel health insights and spur discovery. The ability to enable clinically actionable patient classification, diagnosis and therapy, discovery of new personalized health biomarkers and therapeutic strategies as well as demonstrate their safety and efficacy, and implement them for use in clinical practice, in a cost effective, rapid manner all require RWD data linkage. PPRL is the most secure and reliable approach to meet this need.
It has never been more evident to physicians, scientists, and the federal government that data connectivity will play an important role in accelerating federally funded human health initiatives. Now comes the challenge of coming together on a unified data initiative that truly addresses the goals and disparate needs of federally funded programs to maximize the use of the rapidly expanding health data ecosystem.
If you would like to learn more about how to partner with Datavant on data linkage to advance public health initiatives, contact us here