At Datavant, we’re focused on the vision of connecting the world’s health data to improve patient care and speed the development of new treatments. As part of this, we put together an “ecosystem map,” outlining how data flows across healthcare today.
***
Updates (September 2019 & September 2021): Since publication, we’ve seen copies of this ecosystem map show up at conferences, in classrooms, and in boardroom discussions – and have had hundreds of requests for the underlying graphic and lists. Given the traction, we feel an obligation to make sure that the map doesn’t go stale. To that end, we’ve prepared a new and expanded health data ecosystem map below. The size of the health data ecosystem has grown significantly in the past year. There are many new logos and the amount of health data in existence has more than doubled.
***
The healthcare system generates approximately a zettabyte (a trillion gigabytes) of data annually, and this amount is doubling every two years. The scale and distributed nature of this data presents an enormous challenge for those seeking to understand health data. Yet as the scope of this challenge keeps increasing, so does the potential to use data to define and deliver value for patients across the healthcare system.
The ecosystem around this data is complex, with thousands of institutions involved in the collection, transfer, and use of healthcare information about patients. This post gives an overview of how data flows through the healthcare system in three sections.
[A follow-up post will talk about how the data is ultimately used to help improve patient care, as well as a deeper discussion how data is protected throughout the process.]
Each interaction that a patient has with the healthcare system generates many data points, which are likely held by several different institutions. Data from everyday activities also provide valuable information about a patient’s health. Regardless of where the data is generated, institutions usually collect data to meet an operational need rather than to power analytics. As a result, data is fragmented (and sometimes duplicated) across many institutions and data capture systems.
To illustrate the data that is generated by a single patient visit, let us follow the hypothetical case of patient Jane Doe. Jane has been seeing her primary care doctor for years and goes in for a routine check-up. Over the course of the 8—12 minutes of face time that she spends with her doctor, a number of different data elements are generated. Jane’s doctor:
The EHR’s billing module then prepares an insurance claim for Jane’s insurer. The draft claim is sent to a clearinghouse that checks the draft for errors, reformats the claim to match the insurer’s standards, and then sends the claim to the insurer.
Only a small subset of a patient’s health is determined by clinical care. The other drivers are behavioral, social, economic and environmental. Some of this data is captured in medical records. For example, knowing Jane’s address provides some socioeconomic detail, and could be used to gather information on proximity of services, as well as air and water quality. However, most non-medical data is not captured as part of health analysis. Relevant information might include:
This year, Jane is due for a mammogram. Her doctor sends her to a local radiology clinic. The radiologist:
After receiving her diagnosis, Jane proceeds to see a number of specialists for further analysis and treatment, each visit generating its own record in that doctor’s (again, separate) EHR system. If Jane is fortunate, her doctors will be able to exchange information with one another through an interoperability platform. More likely, her information will be shared via faxes and print-outs.
One oncologist sends a prescription for a targeted therapy to a specialty pharmacy, which dispenses complex medications and provides services to educate patients and help them with side effects. The oncologist also prescribes medication to manage side effects; those prescriptions are sent to Jane’s local retail pharmacy. Each pharmacy separately generates data on pick-ups and refills, and compiles insurance claims for the pharmacy benefit manager (PBM) that administers the prescription drug component of Jane’s health plan. The PBM processes and pays the claims.
Unfortunately, Jane’s condition progresses. Her oncologist advises her that she may be eligible to enter a clinical trial for a new therapy. Jane elects to do so and is able to participate after clearing the inclusion and exclusion criteria for the trial. Now, when Jane visits her oncologist, staff enter Jane’s data – notes from her visits, labs, images – into two separate systems: the oncologist’s EHR, and a clinical trial management system (CTMS) as a structured Case Report Form (CRF). Jane’s data will later be used in analyses detailed in the trial protocol that will inform whether the new therapy will be approved for sale.
As patient health data has proliferated at an increasing rate, industry players have focused on harnessing this data through analytics to improve treatment options, enhance patient outcomes, streamline operations, and move toward a value-based care paradigm.
The first step toward performing valuable analytics is ensuring that the right data ends up in the right hands at the right time. Current aggregation strategies address aspects of this challenge, but as the data ecosystem becomes increasingly complex, new approaches are needed.
As we saw in Jane’s story, systems used to generate health data are designed for operations, not to organize data effectively for research or analytics. To facilitate analysis, companies and organizations have pursued several strategies to aggregate different data sources and data types:
Analytics meant to measure or improve the care received by someone like Jane require visibility into multiple data types generated across a number of different touchpoints with the healthcare system. Each of these aggregation strategies does the hard work of bringing together data from some part of the complex landscape of healthcare data. New strategies to link datasets together will likely emerge as the volume and variety of data relevant to health continue to increase and analytics groups look to incorporate these data into their use cases.
This large ecosystem of data stewards, aggregators, and service providers strives to make the healthcare system better – informing timely interventions, developing innovative medical products, and reducing operational and financial friction. To empower this ecosystem in the face of its complexity, the industry will need to take steps to ensure that the right data ends up in the right hands at the right time.
“Right data” means the specific data elements about Jane’s health from all relevant sources (not just those immediately available) are linked together to power a given application. As demonstrated, healthcare data is generated in many forms among many entities, through both clinical care and everyday life. Each analytical solution will need to curate a different combination of elements to work effectively, linking data from across the care continuum.
“Right hands” means that Jane’s data can be securely transmitted from each source to the authorized analyst while maintaining Jane’s privacy. The complexity and sheer volume of the data ecosystem mean that patient privacy and data security are among the most pressing issues facing organizations trying to bring healthcare data together. From a legal perspective, HIPAA, GDPR, the new California Consumer Privacy Act, the FTC’s data privacy enforcements, and similar legislation and regulatory bodies provide much of the regulatory framework for how data is used. For research purposes, IRBs are also instrumental in determining the appropriate use of data. From a first principles perspective, institutions throughout the health data ecosystem must ensure the patient value of each data use case, the security of data throughout the system, patient consent and access, permissions for secondary use, and avoid risks of data leakage.
“Right time” means that this data exchange can happen with minimal friction, so that the data can be assembled and queried in time to make a difference. It should be easy for an internal analytics group, consultant, or startup to collect the data it needs to power a use case. Mechanisms that connect different datasets and give visibility into how they overlap will need to be built in order to make this a reality.
As the amount of data continues to expand and the sophistication of analytics becomes richer and richer, we expect the complexity of the ecosystem (and the stakes around some of the issues facing the ecosystem) to continue growing exponentially in coming years.
***
Primarily Authored by Jacob Stern. Special thanks to Andy Coravos, Sebastian Caliri, Samuel Bjork, Colby Davis, Eric Perakslis, and Shahir Kassam-Adams for reviewing early drafts.
Editor’s note: This post has been updated on October 18, 202 for accuracy and comprehensiveness.
AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).
AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.
"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.
“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."
As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.
As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.
Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.
Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:
Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.
Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:
By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.
Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:
By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.
While accessing SDOH data offers significant advantages, challenges can arise from:
To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.
With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.
Explore how Datavant can be your health data logistics partner.
Contact us