Datavant is proud to announce the integration of a new referential dataset into Datavant Match. Datavant Match is the industry-leading privacy-preserving record linkage (PPRL) solution for identity resolution. It addresses data fragmentation with end-to-end privacy technology and advanced machine learning algorithms. Whether you’re running a long term safety study, public health research on health equity, or a medication adherence analytics exercise, highly accurate matching is critical. Referential data improves the quality and number of matches in a given linked dataset, and enables reliable identity resolution between disparate datasets across the enterprise.
Real world data is highly variable: data is collected and standardized in different ways and personal identifiers naturally change due to life events. These challenges are important to solve because the quality of data linkage can determine the success or failure of entire studies. Organizations often struggle to build a comprehensive matching strategy. Simple algorithms result in many false positive and false negative matches, while complex strategies are difficult to scale. On top of that, there is a need to ensure privacy while linking records.
Datavant has built expertise in data standardization, high-accuracy matching, and patient privacy to solve these problems. Our customers can leverage the Match-generated highly-stable identifier, a Datavant ID (DVID), as the source of truth, defining which records across disparate datasets correspond to the same individual. The DVID simplifies enterprise-wide identity resolution by giving each record its own person-level identifier.
DVIDs are optimized to remain stable, preventing customers from having to frequently update DVIDs, and therefore reducing manual work. Whenever Match finds individuals that were already detected, it will assign them the same DVID. After the one-time set up, Match will automatically run when it sees new records, assign DVIDs, and distribute those DVIDs at standard intervals.
Referential data is a term used to describe a high-quality, comprehensive dataset that can be used to make inferences about other datasets. In this case, Match leverages a dataset with 30+ years of historical information, covering the entire United States population and containing billions of records. Using public records, online data, and proprietary sources alongside rigorous data hygiene practices, this curated dataset tracks changes in personal identifiers for individuals across time. Datavant Match maps tokenized records to tokenized identities in the referential dataset, linking records and assigning DVIDs with high accuracy.
Most matching strategies struggle to link records when the underlying personal identifiable information (PII) is completely different (e.g., last names change through marriage, a new address used, gender changes). With referential data, however, Match can make probabilistic inferences to accurately link records even when the underlying PII has changed. Previously, high-precision matching may have meant far fewer matches. Referential data mitigates this by tracking personal identifiers over time, while linking as many records as possible to maximize cohorts. The result is that Match uncovers more highly accurate matches.
Match drives high accuracy in data linkage, achieving 99% precision and 95% recall in internal studies. Use cases requiring high accuracy such as external control arm development and real-world evidence generation for drug efficacy require highly-accurate matching. False positive matches result in inaccurate patient tracking, jeopardizing the quality of the study. Achieving high precision and high recall means there is less of a trade off between high quality matches and more matches. With Match, pharmaceutical companies are better poised to find the adequate number of patients for rare disease studies; providers can more accurately track the effect of interventions on their patient population; researchers can confidently conduct longitudinal studies, mitigating the risk of dropping patients due to the variable, fragmented nature of real-world data. Meanwhile, organizations simplify their datasets by using the DVID as the source of truth identifier for individuals. Leveraging the referential dataset, Datavant Match drives high-quality patient matching and innovation across siloed verticals in healthcare.
Datavant Match goes beyond record matching for isolated use cases — it also enables reliable identity resolution between disparate datasets across the enterprise. For organizations with many identified datasets, sharing across the enterprise is not possible due to privacy and security regulations. Even if data can be shared, creating an identified matching strategy is difficult when certain datasets have incomplete or disjointed PII fields. This creates data silos within organizations. But by tokenizing and matching these data through Match, organizations remove the ambiguity from matching and create unified enterprise data assets. Using a highly-stable identifier (DVID) across all datasets leads to a more thorough understanding of populations of interest, unlocking new trial participants, opportunities for targeted marketing campaigns, longitudinal tracking, and cross-disciplinary research.
With referential data, advanced machine learning, and proprietary data standardization and cleaning methods, Datavant Match immediately adds value by surfacing more, higher-quality matches. Datavant Match is used by top pharma companies, data analytics companies, employers, non-profit and academic institutions, payers, and providers to compliantly match patient records across a range of use cases.
To explore how Match could power your use case or enterprise-wide data interoperability goals, contact our team today.
Explore how Datavant can be your health data logistics partner.
Contact us