By S. Robicheau and T.S.K. Eisinger-Mathason
When Bailey Harris was diagnosed with rhabdomyosarcoma, a rare but aggressive pediatric cancer, her family was nearly paralyzed with fear and confusion. They were immediately thrown into a new world of doctors and treatments with no respite from the urgency of tackling a life threatening disease while trying to preserve Bailey’s quality of life.
Patients and caregivers coping with a new diagnosis are often connected with patient advocacy groups. In addition to offering significant emotional and educational support, these organizations often establish repositories of patient data, better known as patient registries.
Registries are ambitious undertakings, with the overarching goal of illuminating the natural history of diseases. They are critical resources of clinical information, and also one of the most important ways that patients like Bailey Harris can make a difference in the fight to find cures. Registries collect detailed information about the patient experience including medical histories, test results, and outcomes. In order for registries to fulfill their purpose to improve patient care and ultimately save lives, the data they contain must be optimized, linked with additional information, and shared with physicians and scientists.
Increased attention on patient-centered research, together with recent FDA guidance, is bringing new focus to the role of registries in regulatory decision making. Today we are seeing an expansion of patient registries, often resulting in multiple registries for a single disease or related diseases. Though more data is certainly necessary, the number and diversity of registries being created might actually be exacerbating the problem of healthcare data fragmentation, rather than alleviating it.
For example, the Pulmonary Fibrosis Foundation (PFF) has had a patient registry since 2016, which collects electronic health record data directly from over 60 centers of excellence nationwide. This foundation announced in 2022 that they were starting a second registry, a PFF Community Registry, that would complement their existing registry. The original PFF Patient Registry is an example of a “site-led” registry, meaning data comes from the clinical sites that provide patients with their routine care. In contrast, the newer PFF Community Registry is a “patient-powered” registry or “direct-to-patient” registry, a decentralized way to collect information from a broad range of individuals without needing to work with a health system directly.
The dual-registry paradigm is common among patient advocacy organizations. Another example is the Cure Mito Foundation, which also has two patient registries for Leigh Syndrome: one collects information directly from the patient to understand treatment patterns and quality of life. The second registry collects the complementary medical record information.
Leigh Syndrome is an extremely rare neurometabolic disorder affecting children. In order to access the electronic medical record information for this type of registry, patients nationwide consent to having their medical records to be collected, which can be considered a “site-less” registry. It would not be possible to partner with enough health systems to gather sufficient data, given the small number of impacted patients.
Employing multiple registries helps assemble a cumulative and disease specific profile that can have maximum utility for researchers if linked correctly.
Medical record information provides detail about the treatment course, medications, lab tests, and clinical evaluation, and the patient-reported data captures nuanced symptoms, outcomes, and quality of life assessments. More information is collected from diverse sources and therefore more hypotheses can be tested by end users.
However, more often than not, these disparate registries cannot be analyzed together, as they are not linked and have not been designed for linkage integration. This technical issue significantly limits the contributions of patients and physicians to the registry. The data in these related but distinct datasets should be linked for maximum utility.
The data contained in complete and linked registries could:
The challenge of linking registries needs to be addressed or these data rich registries will remain in silos and unused.
Today, patient registries are under utilized by the research community. Academic data scientists, clinical researchers, and pharmaceutical companies need reliable patient data that consistently curates the same information across a wide swath of patient backgrounds. Optimizing these data in patient registries by linking all the individual databases supported by an organization or linking registries from separate groups improves the quantity and quality of data. However, linking separate registries can be challenging for several reasons:
Ensuring that data usage rights spur collaboration and collection of sufficient information about registry participants is essential to link registries and improve their utility, while implementing linkage-ready study protocols.
Technology solutions now exist to link data on a large scale, while protecting patient identifying information (PII). These solutions are called privacy-preserving record linkage (PPRL), also known as “tokenization”.
A token is created from personally identifiable information like first name, last name, date of birth, gender, cell phone number, etc. The token can be used to identify the same individual across two disparate datasets and subsequently connect those data without revealing PII. By creating patient-specific tokens in multiple disparate datasets, patient records can be matched without sharing the underlying PII.
To take advantage of PPRL and create the tokens necessary to link records, registries should be collecting, at a minimum:
Patients may be reluctant to share this information, or some registries might feel they would get a higher response rate to their surveys if the information is collected anonymously; however, it is short-sighted to create a registry with data that will forever be siloed. Those data will never be accessible to researchers without a strategy to link those registries.
Perhaps the single most challenging obstacle to registry linkage is complex data ownership. In order to share data, its ownership must be determined. Only that individual or organization can agree to share it. Registry formation involves supplying patient information to a third party platform for storage, analysis, and access control.
As a result, data ownership is not always clear, and the owner of the data may actually be the platform company! Some patient advocacy groups have been surprised to learn that they don’t actually own their data. Imagine, a unique collaboration opportunity presents itself, and the group finds they are unable to access the raw data of their registry to share with another group, or even a single academic researcher, due to pre-existing data ownership rules.
Expecting all collaborators to work on the same data platform isn’t feasible. However, it is important to impress upon data platforms the need to craft business terms that allow and encourage collaboration. Registry founders should also understand their rights to the underlying data, and how they can facilitate data sharing. It’s critical to investigate how a platform governs data sharing, and if a privacy preserving record linkage methodology, such as tokenization, is being used to link data. If the registry data platform doesn’t provide tokenization as part of its core capabilities then it is important to know if the platform allows data to be tokenized by another party.
When founding a registry and working with an Institutional Review Board, it is a good rule of thumb to include language that will:
IRB protocols at “site-led” registries should clearly state that data will be linked when itemizing the patient data to be collected. Security protocols should be adequately described and data sharing goals should be included. IRB protocols for patient-powered registries should also clearly state why certain PII is being collected and state the use of tokenization software to link to complementary registries or other data sources.
These upfront administrative tasks can have a large impact on the opportunities for the registry over the many years of data collection and dispersal.
There are multiple strategies for creating effective registries, but data linkage is central to all of them. Registries need to be built on collaborative platforms, be linkable to external data (especially to other registries), and initiated with the appropriate oversight to allow this type of data sharing collaboration.
Many disease registries are well-established and already provide important sources of support and knowledge. Changes to these existing databases may seem daunting, but linked registry data offers substantial advances over inaccessible and isolated data silos. Ideally, by linking registries, patient data will be more complete and accurate and represent a more diverse set of patients, while also identifying potential patient duplications.
High quality, linked data, that preserves patient privacy, can be funneled to scientists in both academic medical centers and pharmaceutical companies to be put to use.
By analyzing data from linked patient registries, researchers may be able to develop better treatment strategies for children like Bailey that maximize drug efficacy while minimizing side effects.
Registry linkage is a key strategy for improving the way we do research and the impact of that research. The information currently locked away behind siloed registry doors should instead flow into the research ecosystem where it can benefit everyone.
AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).
AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.
"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.
“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."
As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.
As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.
Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.
Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:
Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.
Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:
By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.
Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:
By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.
While accessing SDOH data offers significant advantages, challenges can arise from:
To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.
With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.
Explore how Datavant can be your health data logistics partner.
Contact us