Health Data & Analytics

Life Sciences

Government & nonprofits

Blog

Real-world data

Creating Usable Linked Patient Registries

Datavant

December 5, 2022

min

Table of Contents

By S. Robicheau and T.S.K. Eisinger-Mathason

When Bailey Harris was diagnosed with rhabdomyosarcoma, a rare but aggressive pediatric cancer, her family was nearly paralyzed with fear and confusion. They were immediately thrown into a new world of doctors and treatments with no respite from the urgency of tackling a life threatening disease while trying to preserve Bailey’s quality of life.

Patients and caregivers coping with a new diagnosis are often connected with patient advocacy groups. In addition to offering significant emotional and educational support, these organizations often establish repositories of patient data, better known as patient registries.

Registries are ambitious undertakings, with the overarching goal of illuminating the natural history of diseases. They are critical resources of clinical information, and also one of the most important ways that patients like Bailey Harris can make a difference in the fight to find cures. Registries collect detailed information about the patient experience including medical histories, test results, and outcomes. In order for registries to fulfill their purpose to improve patient care and ultimately save lives, the data they contain must be optimized, linked with additional information, and shared with physicians and scientists.

Increased attention on patient-centered research, together with recent FDA guidance, is bringing new focus to the role of registries in regulatory decision making. Today we are seeing an expansion of patient registries, often resulting in multiple registries for a single disease or related diseases. Though more data is certainly necessary, the number and diversity of registries being created might actually be exacerbating the problem of healthcare data fragmentation, rather than alleviating it.

Example: Pulmonary Fibrosis Foundation

For example, the Pulmonary Fibrosis Foundation (PFF) has had a patient registry since 2016, which collects electronic health record data directly from over 60 centers of excellence nationwide. This foundation announced in 2022 that they were starting a second registry, a PFF Community Registry, that would complement their existing registry. The original PFF Patient Registry is an example of a “site-led” registry, meaning data comes from the clinical sites that provide patients with their routine care. In contrast, the newer PFF Community Registry is a “patient-powered” registry or “direct-to-patient” registry, a decentralized way to collect information from a broad range of individuals without needing to work with a health system directly.

Example: Cure Mito Foundation

The dual-registry paradigm is common among patient advocacy organizations. Another example is the Cure Mito Foundation, which also has two patient registries for Leigh Syndrome: one collects information directly from the patient to understand treatment patterns and quality of life. The second registry collects the complementary medical record information.

Leigh Syndrome is an extremely rare neurometabolic disorder affecting children. In order to access the electronic medical record information for this type of registry, patients nationwide consent to having their medical records to be collected, which can be considered a “site-less” registry. It would not be possible to partner with enough health systems to gather sufficient data, given the small number of impacted patients.

Linking Patient Registries Enhances Utility

Employing multiple registries helps assemble a cumulative and disease specific profile that can have maximum utility for researchers if linked correctly.

Medical record information provides detail about the treatment course, medications, lab tests, and clinical evaluation, and the patient-reported data captures nuanced symptoms, outcomes, and quality of life assessments. More information is collected from diverse sources and therefore more hypotheses can be tested by end users.

However, more often than not, these disparate registries cannot be analyzed together, as they are not linked and have not been designed for linkage integration. This technical issue significantly limits the contributions of patients and physicians to the registry. The data in these related but distinct datasets should be linked for maximum utility.

The data contained in complete and linked registries could:

Improve treatment and diagnostic strategies for the future
Increase the diversity of the cohort and enhance the depth of the data collected
Determine whether to invest in a particular disease area
Identify medical tests that reveal new information about a disease and treatment should be administered routinely

The challenge of linking registries needs to be addressed or these data rich registries will remain in silos and unused.

Figure 1: Linking a patient-powered registry to a site-led registry and other real-world data can lead to an improved understanding of the natural history of disease and even better treatments

Challenges To Linking Patient Registries

Today, patient registries are under utilized by the research community. Academic data scientists, clinical researchers, and pharmaceutical companies need reliable patient data that consistently curates the same information across a wide swath of patient backgrounds. Optimizing these data in patient registries by linking all the individual databases supported by an organization or linking registries from separate groups improves the quantity and quality of data. However, linking separate registries can be challenging for several reasons:

Patient demographic information: Registry data collection may intentionally omit collecting patient identifying information that is needed to link data across different registries.
Data ownership: Registries are built on different data platforms: for-profit, academic, or not-for-profit. Each platform has different data ownership agreements in place for the use of raw data. When patients and advocacy organizations lose ownership of their data, it can stifle data sharing and innovation.
IRB Protocols: Data collected from clinical research sites may be governed by an institutional review board (IRB) protocol. These protocols should include language that states the reason patient identifying information will be collected and that the data may be linked in the future using privacy-preserving record linkage.

Ensuring that data usage rights spur collaboration and collection of sufficient information about registry participants is essential to link registries and improve their utility, while implementing linkage-ready study protocols.

Patient Demographic Information

Technology solutions now exist to link data on a large scale, while protecting patient identifying information (PII). These solutions are called privacy-preserving record linkage (PPRL), also known as “tokenization”.

A token is created from personally identifiable information like first name, last name, date of birth, gender, cell phone number, etc. The token can be used to identify the same individual across two disparate datasets and subsequently connect those data without revealing PII. By creating patient-specific tokens in multiple disparate datasets, patient records can be matched without sharing the underlying PII.

To take advantage of PPRL and create the tokens necessary to link records, registries should be collecting, at a minimum:

First name
Last name
DOB
Zip code
Gender
Address
Cell phone

Patients may be reluctant to share this information, or some registries might feel they would get a higher response rate to their surveys if the information is collected anonymously; however, it is short-sighted to create a registry with data that will forever be siloed. Those data will never be accessible to researchers without a strategy to link those registries.

Data Ownership

Perhaps the single most challenging obstacle to registry linkage is complex data ownership. In order to share data, its ownership must be determined. Only that individual or organization can agree to share it. Registry formation involves supplying patient information to a third party platform for storage, analysis, and access control.

As a result, data ownership is not always clear, and the owner of the data may actually be the platform company! Some patient advocacy groups have been surprised to learn that they don’t actually own their data. Imagine, a unique collaboration opportunity presents itself, and the group finds they are unable to access the raw data of their registry to share with another group, or even a single academic researcher, due to pre-existing data ownership rules.

Expecting all collaborators to work on the same data platform isn’t feasible. However, it is important to impress upon data platforms the need to craft business terms that allow and encourage collaboration. Registry founders should also understand their rights to the underlying data, and how they can facilitate data sharing. It’s critical to investigate how a platform governs data sharing, and if a privacy preserving record linkage methodology, such as tokenization, is being used to link data. If the registry data platform doesn’t provide tokenization as part of its core capabilities then it is important to know if the platform allows data to be tokenized by another party.

IRB Protocol Updates And Informed Consent

When founding a registry and working with an Institutional Review Board, it is a good rule of thumb to include language that will:

Permit the collection of personal identifiable information (PII), and
Further allow the tokenization of individual participants within a registry using best-in-class privacy-preserving record linkage tools.

IRB protocols at “site-led” registries should clearly state that data will be linked when itemizing the patient data to be collected. Security protocols should be adequately described and data sharing goals should be included. IRB protocols for patient-powered registries should also clearly state why certain PII is being collected and state the use of tokenization software to link to complementary registries or other data sources.

These upfront administrative tasks can have a large impact on the opportunities for the registry over the many years of data collection and dispersal.

The Benefits Of Registry Linkage

There are multiple strategies for creating effective registries, but data linkage is central to all of them. Registries need to be built on collaborative platforms, be linkable to external data (especially to other registries), and initiated with the appropriate oversight to allow this type of data sharing collaboration.

Many disease registries are well-established and already provide important sources of support and knowledge. Changes to these existing databases may seem daunting, but linked registry data offers substantial advances over inaccessible and isolated data silos. Ideally, by linking registries, patient data will be more complete and accurate and represent a more diverse set of patients, while also identifying potential patient duplications.

High quality, linked data, that preserves patient privacy, can be funneled to scientists in both academic medical centers and pharmaceutical companies to be put to use.

Industry partners engaged in clinical trial work can use linked registries to collect critical information about their enrolled patients and study how patient histories and characteristics might impact their response to experimental treatments. They can also link patient reported outcomes (PRO) contained in registries to health insurance claims datasets to further profile disease progression and response to their products.
Academic scientists can use the same patient datasets to hypothesize the root cause of certain diseases and environmental factors that limit or exacerbate symptoms in a given disease context.
Physicians use it to optimize treatment for patients like Bailey Harris.

By analyzing data from linked patient registries, researchers may be able to develop better treatment strategies for children like Bailey that maximize drug efficacy while minimizing side effects.

Registry linkage is a key strategy for improving the way we do research and the impact of that research. The information currently locked away behind siloed registry doors should instead flow into the research ecosystem where it can benefit everyone.

Spotlight on AnalyticsIQ: Privacy Leadership in State De-Identification

AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).

AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.

"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.

“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."

Building Trust in Privacy-Preserving Data Ecosystems

As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.

As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.

The Power of SDOH Data with Providers and Payers to Close Gaps in Care

Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.

Payers Deploy Targeted Care Using SDOH Data

Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:

Tailored Member Programs: Payers develop specialized initiatives like nutrition delivery services and transportation to and from medical appointments.
Identifying Care Gaps: SDOH data helps payers identify gaps in care for underserved communities, enabling strategic in-home assessments and interventions.
Future Risk Adjustment Models: The Centers for Medicare & Medicaid Services (CMS) plans to incorporate SDOH-related Z codes into risk adjustment models, recognizing the significance of SDOH data in assessing healthcare needs.

Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.

Example: CDPHP supports physical and mental wellbeing with non-medical assistance

Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:

Social isolation
Loneliness
Transportation barriers
Gaps in care

By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.

Providers Optimize Value-Based Care Using SDOH Data

Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:

Onboard Patients Into Care Programs: Providers use SDOH data to identify patients who require additional support and connect them with appropriate resources.
Stratify Patients by Risk: SDOH data combined with clinical information identifies high-risk patients, enabling targeted interventions and resource allocation.
Manage Transition of Care: SDOH data informs post-discharge plans, considering social factors to support smoother transitions and reduce readmissions.

By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.

While accessing SDOH data offers significant advantages, challenges can arise from:

Lack of Interoperability and Uniformity: Data exists in fragmented sources like electronic health records (EHRs), public health databases, social service systems, and proprietary databases. Integrating and securing data while ensuring data integrity and confidentiality can be complex, resource-intensive and risky.
Lag in Payer Claims Data: Payers can take weeks or months to release claims data. This delays informed decision-making, care improvement, analysis, and performance evaluation.
Incomplete Data Sets in Health Information Exchanges (HIEs): Not all healthcare providers or organizations participate in HIEs. This reduces the available data pool. Moreover, varying data sharing policies result in data gaps or inconsistencies.

To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.

SDOH data holds immense potential in transforming healthcare and addressing health disparities.

With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.