18 JANUARY 2022

Health Data Trends Part II: New Data Types in the Datavant Ecosystem


In my first post, I reviewed the most common questions clients attempt to answer with health data. This time, I’ll review new data available through Datavant’s ecosystem partners and its value in answering specific questions. As partners continue to join our ecosystem, there are more opportunities for health systems, insurers and biopharma companies to connect data that completes the picture of patient health.

1. Electronic Health Records (EHRs)

After health insurance claims, EHR comprises the largest number of de-identified patient records in the Datavant ecosystem. Last year, we added 17 EHR data partners. As a result, we can now enable clients to connect to EHR data on over 300 million de-identified patients. In the last several years, the caliber of EHR data has improved. While most EHR partners standardize structured data from the EHR system, some partners also abstract tailored concepts from clinical notes for CMS reporting and FDA submissions. Some EHRs are also focused on specific disease states such as oncology, rheumatology, mental health, women’s health, and dermatology. They capture disease-specific variables which make them more relevant for answering questions related to that condition. 

2.  Health Systems 

In addition to new EHR data partners, the Datavant ecosystem added thousands of health system relationships last year enabling full medical record retrieval with patient consent across 2,000 hospitals and 15,000 clinics. This capability is already utilized by providers and health insurers for compliant health data exchange. We see new use cases in the life science community for complete medical records access, which we believe will transform first-party clinical research by vastly expanding the amount of data available on each trial participant. For instance, a sponsor could conduct long-term follow-up post-trial or supplement trial data to understand adverse events, super-responders or non-responders. Full medical records differ from traditional EHR data sets which are typically a structured sub-set of data fields from the EHR. Medical record retrieval includes all of the EHR’s unstructured data, which accounts for 80% of all the information contained in the EHR.1 I will do a more comprehensive overview of chart retrieval and associated use cases in a future blog post.

Several provider groups (four health systems and two research consortiums) have also embarked on using the Datavant technology to de-identify and connect external real-world data to their existing patient data for health system research and care performance analysis. 

3.  Registry Data

Five registry organizations joined the ecosystem in disease states like immunology, cerebral palsy, ophthalmology and several rare diseases (such as pulmonary arterial hypertension (PAH), hemophilia, phenylketonuria (PKU) and others). Registries capture very specific variables related to each disease. Registry data is validated and quality checked to a much higher standard than typical EHR data.  The quality of this data makes it an ideal source to link to a biopharma company’s clinical trial data. These organizations also offer study services teams that can prep data for regulatory submissions.

4.  Specialty Pharmacy (SP) Data 

Specialty drugs account for 75% of prescription drugs in development.2 Pharma companies have a view of their own specialty drug distribution but have limited visibility into the full patient journey before and after patients are on their treatment. Pharma companies can connect SP data with patient hub data and claims data to understand lines of therapy, patient adherence and the effectiveness of hub services. Four new sources of SP data joined the ecosystem last year bringing Datavant’s coverage of the SP space to dozens of players. 

5.  Genomics, Digital Pathology and Specialized Lab Data

Eight new genomics and diagnostics data partners joined the Datavant ecosystem last year. One is a biobank associated with a large academic medical center with millions of genetic sequencing test results, many of which are linked to EHRs. Another is a genomics testing collaborative and another provides specialized liquid biopsy testing to classify risk in lung cancer. Lastly, we added a provider of COVID testing and associated variant sequencing data.

6.  Consumer Data

Consumer data includes demographics like age, gender and race; social determinants like employment, income, and education; and behavior, lifestyle and purchasing pattern propensity scores. Five new consumer data partners joined the Datavant ecosystem in 2021. Consumer data support many use cases including:

  • Comparison of outcomes by gender, race and age, income, education and employment 
  • Identification of barriers to accessing high-quality care 
  • Evaluation of how lifestyle and purchase decisions influence overall health.

7.  Wearables and Digital Health

Wearables, digital health interventions, remote monitoring apps and condition-specific social networks are improving health and wellness. We added five wearable and digital health companies last year. These technologies create deeply engaging patient experiences and collect continuous data on various biometrics such as sleep duration and quality, heart rate, and activity levels. Many of these companies combine disease-specific devices and mobile apps, such as implantable continuous glucose monitors (CGM) for diabetes or wearable sensors for musculoskeletal conditions. Linking this data offers continuous, real-time insight into patient health.

8.  Mortality Data with Cause of Death

Mortality is a key endpoint in many studies. Yet, it is often not captured in EHR/EMRs. The Datavant ecosystem data partners aggregate mortality data that covers more than 85% of U.S. death events.  In 2021, we added one new source that includes cause of death, which is particularly valuable in clinical research. Mortality data should be linked to every trial with a mortality endpoint to maximize data completeness. Health systems should link it to measure care effectiveness, understand their active patient population and identify underserved populations. We’ve even seen payers use it to detect fraudulent claims.

9.  Weather

On the cutting edge of health data is the integration of weather and environmental conditions data. A large source of this data joined the Datavant ecosystem last year. Weather, air quality and climate are becoming more influential on health. Weather is being used to manage supply chains, predict pandemic spread, estimate flu prevalence and the severity of allergy season to name a few use cases. 

10. First-Party Data

In 2021, Datavant tokenized 100+ first-party data sets for life sciences clients, representing a variety of proprietary data including clinical trial data, patient registries, hub data, and sponsored genetic testing data. Tokenizing proprietary first-party data and linking it to third-party commercial data is valuable for many reasons. Some of those use cases are understanding clinical and economic outcomes of patients utilizing a certain device, building look-a-like models to reach rare disease patients, assessing the effectiveness of patient services, and long-term follow-up of trial patients.  Linking proprietary first-party data is an important part of enterprise data strategy. 


Real-world data is becoming more granular and finely tuned to answer questions across specific diseases and patient types. We are seeing up-and-coming data sources enter the ecosystem earlier in their life cycle and seek partnerships to inform their data strategy. Many are building flexible data platforms that allow data users to link their first-party data to the platform. 

I’ll be back with more updates on health data trends throughout the year. I would love feedback on health data types you want to learn more about through this short 2-question survey. You can always read more about topics on health data and data infrastructure on the Datavant blog – our President and Co-founder, Travis May, wrote a particularly detailed post about real-world data infrastructure and how it is being used to fight COVID.  Email me anytime at su@datavant.com if you have questions or comments!

  1. Kong, Hyoun-Joong. Managing Unstructured Big Data in Healthcare System. 2019; 25(1): 1–2. doi:10.4258/hir.2019.25.1.1
  2. Top tech predictions for the future of specialty pharmacy. BioPharma Dive. July 22, 2021. Accessed January 9, 2022.