By S. Robicheau and T.S.K. Eisinger-Mathason
When Bailey Harris was diagnosed with rhabdomyosarcoma, a rare but aggressive pediatric cancer, her family was nearly paralyzed with fear and confusion. They were immediately thrown into a new world of doctors and treatments with no respite from the urgency of tackling a life threatening disease while trying to preserve Bailey’s quality of life.
Patients and caregivers coping with a new diagnosis are often connected with patient advocacy groups. In addition to offering significant emotional and educational support, these organizations often establish repositories of patient data, better known as patient registries.
Registries are ambitious undertakings, with the overarching goal of illuminating the natural history of diseases. They are critical resources of clinical information, and also one of the most important ways that patients like Bailey Harris can make a difference in the fight to find cures. Registries collect detailed information about the patient experience including medical histories, test results, and outcomes. In order for registries to fulfill their purpose to improve patient care and ultimately save lives, the data they contain must be optimized, linked with additional information, and shared with physicians and scientists.
Increased attention on patient-centered research, together with recent FDA guidance, is bringing new focus to the role of registries in regulatory decision making. Today we are seeing an expansion of patient registries, often resulting in multiple registries for a single disease or related diseases. Though more data is certainly necessary, the number and diversity of registries being created might actually be exacerbating the problem of healthcare data fragmentation, rather than alleviating it.
For example, the Pulmonary Fibrosis Foundation (PFF) has had a patient registry since 2016, which collects electronic health record data directly from over 60 centers of excellence nationwide. This foundation announced in 2022 that they were starting a second registry, a PFF Community Registry, that would complement their existing registry. The original PFF Patient Registry is an example of a “site-led” registry, meaning data comes from the clinical sites that provide patients with their routine care. In contrast, the newer PFF Community Registry is a “patient-powered” registry or “direct-to-patient” registry, a decentralized way to collect information from a broad range of individuals without needing to work with a health system directly.
The dual-registry paradigm is common among patient advocacy organizations. Another example is the Cure Mito Foundation, which also has two patient registries for Leigh Syndrome: one collects information directly from the patient to understand treatment patterns and quality of life. The second registry collects the complementary medical record information.
Leigh Syndrome is an extremely rare neurometabolic disorder affecting children. In order to access the electronic medical record information for this type of registry, patients nationwide consent to having their medical records to be collected, which can be considered a “site-less” registry. It would not be possible to partner with enough health systems to gather sufficient data, given the small number of impacted patients.
Employing multiple registries helps assemble a cumulative and disease specific profile that can have maximum utility for researchers if linked correctly.
Medical record information provides detail about the treatment course, medications, lab tests, and clinical evaluation, and the patient-reported data captures nuanced symptoms, outcomes, and quality of life assessments. More information is collected from diverse sources and therefore more hypotheses can be tested by end users.
However, more often than not, these disparate registries cannot be analyzed together, as they are not linked and have not been designed for linkage integration. This technical issue significantly limits the contributions of patients and physicians to the registry. The data in these related but distinct datasets should be linked for maximum utility.
The data contained in complete and linked registries could:
The challenge of linking registries needs to be addressed or these data rich registries will remain in silos and unused.
Today, patient registries are under utilized by the research community. Academic data scientists, clinical researchers, and pharmaceutical companies need reliable patient data that consistently curates the same information across a wide swath of patient backgrounds. Optimizing these data in patient registries by linking all the individual databases supported by an organization or linking registries from separate groups improves the quantity and quality of data. However, linking separate registries can be challenging for several reasons:
Ensuring that data usage rights spur collaboration and collection of sufficient information about registry participants is essential to link registries and improve their utility, while implementing linkage-ready study protocols.
Technology solutions now exist to link data on a large scale, while protecting patient identifying information (PII). These solutions are called privacy-preserving record linkage (PPRL), also known as “tokenization”.
A token is created from personally identifiable information like first name, last name, date of birth, gender, cell phone number, etc. The token can be used to identify the same individual across two disparate datasets and subsequently connect those data without revealing PII. By creating patient-specific tokens in multiple disparate datasets, patient records can be matched without sharing the underlying PII.
To take advantage of PPRL and create the tokens necessary to link records, registries should be collecting, at a minimum:
Patients may be reluctant to share this information, or some registries might feel they would get a higher response rate to their surveys if the information is collected anonymously; however, it is short-sighted to create a registry with data that will forever be siloed. Those data will never be accessible to researchers without a strategy to link those registries.
Perhaps the single most challenging obstacle to registry linkage is complex data ownership. In order to share data, its ownership must be determined. Only that individual or organization can agree to share it. Registry formation involves supplying patient information to a third party platform for storage, analysis, and access control.
As a result, data ownership is not always clear, and the owner of the data may actually be the platform company! Some patient advocacy groups have been surprised to learn that they don’t actually own their data. Imagine, a unique collaboration opportunity presents itself, and the group finds they are unable to access the raw data of their registry to share with another group, or even a single academic researcher, due to pre-existing data ownership rules.
Expecting all collaborators to work on the same data platform isn’t feasible. However, it is important to impress upon data platforms the need to craft business terms that allow and encourage collaboration. Registry founders should also understand their rights to the underlying data, and how they can facilitate data sharing. It’s critical to investigate how a platform governs data sharing, and if a privacy preserving record linkage methodology, such as tokenization, is being used to link data. If the registry data platform doesn’t provide tokenization as part of its core capabilities then it is important to know if the platform allows data to be tokenized by another party.
When founding a registry and working with an Institutional Review Board, it is a good rule of thumb to include language that will:
IRB protocols at “site-led” registries should clearly state that data will be linked when itemizing the patient data to be collected. Security protocols should be adequately described and data sharing goals should be included. IRB protocols for patient-powered registries should also clearly state why certain PII is being collected and state the use of tokenization software to link to complementary registries or other data sources.
These upfront administrative tasks can have a large impact on the opportunities for the registry over the many years of data collection and dispersal.
There are multiple strategies for creating effective registries, but data linkage is central to all of them. Registries need to be built on collaborative platforms, be linkable to external data (especially to other registries), and initiated with the appropriate oversight to allow this type of data sharing collaboration.
Many disease registries are well-established and already provide important sources of support and knowledge. Changes to these existing databases may seem daunting, but linked registry data offers substantial advances over inaccessible and isolated data silos. Ideally, by linking registries, patient data will be more complete and accurate and represent a more diverse set of patients, while also identifying potential patient duplications.
High quality, linked data, that preserves patient privacy, can be funneled to scientists in both academic medical centers and pharmaceutical companies to be put to use.
By analyzing data from linked patient registries, researchers may be able to develop better treatment strategies for children like Bailey that maximize drug efficacy while minimizing side effects.
Registry linkage is a key strategy for improving the way we do research and the impact of that research. The information currently locked away behind siloed registry doors should instead flow into the research ecosystem where it can benefit everyone.
Explore how Datavant can be your health data logistics partner.
Contact us