At Datavant, we’re focused on the vision of connecting the world’s health data to improve patient care and speed the development of new treatments. As part of this, we put together an “ecosystem map,” outlining how data flows across healthcare today.
***
Updates (September 2019 & September 2021): Since publication, we’ve seen copies of this ecosystem map show up at conferences, in classrooms, and in boardroom discussions – and have had hundreds of requests for the underlying graphic and lists. Given the traction, we feel an obligation to make sure that the map doesn’t go stale. To that end, we’ve prepared a new and expanded health data ecosystem map below. The size of the health data ecosystem has grown significantly in the past year. There are many new logos and the amount of health data in existence has more than doubled.
***
The healthcare system generates approximately a zettabyte (a trillion gigabytes) of data annually, and this amount is doubling every two years. The scale and distributed nature of this data presents an enormous challenge for those seeking to understand health data. Yet as the scope of this challenge keeps increasing, so does the potential to use data to define and deliver value for patients across the healthcare system.
The ecosystem around this data is complex, with thousands of institutions involved in the collection, transfer, and use of healthcare information about patients. This post gives an overview of how data flows through the healthcare system in three sections.
[A follow-up post will talk about how the data is ultimately used to help improve patient care, as well as a deeper discussion how data is protected throughout the process.]
Each interaction that a patient has with the healthcare system generates many data points, which are likely held by several different institutions. Data from everyday activities also provide valuable information about a patient’s health. Regardless of where the data is generated, institutions usually collect data to meet an operational need rather than to power analytics. As a result, data is fragmented (and sometimes duplicated) across many institutions and data capture systems.
To illustrate the data that is generated by a single patient visit, let us follow the hypothetical case of patient Jane Doe. Jane has been seeing her primary care doctor for years and goes in for a routine check-up. Over the course of the 8—12 minutes of face time that she spends with her doctor, a number of different data elements are generated. Jane’s doctor:
The EHR’s billing module then prepares an insurance claim for Jane’s insurer. The draft claim is sent to a clearinghouse that checks the draft for errors, reformats the claim to match the insurer’s standards, and then sends the claim to the insurer.
Only a small subset of a patient’s health is determined by clinical care. The other drivers are behavioral, social, economic and environmental. Some of this data is captured in medical records. For example, knowing Jane’s address provides some socioeconomic detail, and could be used to gather information on proximity of services, as well as air and water quality. However, most non-medical data is not captured as part of health analysis. Relevant information might include:
This year, Jane is due for a mammogram. Her doctor sends her to a local radiology clinic. The radiologist:
After receiving her diagnosis, Jane proceeds to see a number of specialists for further analysis and treatment, each visit generating its own record in that doctor’s (again, separate) EHR system. If Jane is fortunate, her doctors will be able to exchange information with one another through an interoperability platform. More likely, her information will be shared via faxes and print-outs.
One oncologist sends a prescription for a targeted therapy to a specialty pharmacy, which dispenses complex medications and provides services to educate patients and help them with side effects. The oncologist also prescribes medication to manage side effects; those prescriptions are sent to Jane’s local retail pharmacy. Each pharmacy separately generates data on pick-ups and refills, and compiles insurance claims for the pharmacy benefit manager (PBM) that administers the prescription drug component of Jane’s health plan. The PBM processes and pays the claims.
Unfortunately, Jane’s condition progresses. Her oncologist advises her that she may be eligible to enter a clinical trial for a new therapy. Jane elects to do so and is able to participate after clearing the inclusion and exclusion criteria for the trial. Now, when Jane visits her oncologist, staff enter Jane’s data – notes from her visits, labs, images – into two separate systems: the oncologist’s EHR, and a clinical trial management system (CTMS) as a structured Case Report Form (CRF). Jane’s data will later be used in analyses detailed in the trial protocol that will inform whether the new therapy will be approved for sale.
As patient health data has proliferated at an increasing rate, industry players have focused on harnessing this data through analytics to improve treatment options, enhance patient outcomes, streamline operations, and move toward a value-based care paradigm.
The first step toward performing valuable analytics is ensuring that the right data ends up in the right hands at the right time. Current aggregation strategies address aspects of this challenge, but as the data ecosystem becomes increasingly complex, new approaches are needed.
As we saw in Jane’s story, systems used to generate health data are designed for operations, not to organize data effectively for research or analytics. To facilitate analysis, companies and organizations have pursued several strategies to aggregate different data sources and data types:
Analytics meant to measure or improve the care received by someone like Jane require visibility into multiple data types generated across a number of different touchpoints with the healthcare system. Each of these aggregation strategies does the hard work of bringing together data from some part of the complex landscape of healthcare data. New strategies to link datasets together will likely emerge as the volume and variety of data relevant to health continue to increase and analytics groups look to incorporate these data into their use cases.
This large ecosystem of data stewards, aggregators, and service providers strives to make the healthcare system better – informing timely interventions, developing innovative medical products, and reducing operational and financial friction. To empower this ecosystem in the face of its complexity, the industry will need to take steps to ensure that the right data ends up in the right hands at the right time.
“Right data” means the specific data elements about Jane’s health from all relevant sources (not just those immediately available) are linked together to power a given application. As demonstrated, healthcare data is generated in many forms among many entities, through both clinical care and everyday life. Each analytical solution will need to curate a different combination of elements to work effectively, linking data from across the care continuum.
“Right hands” means that Jane’s data can be securely transmitted from each source to the authorized analyst while maintaining Jane’s privacy. The complexity and sheer volume of the data ecosystem mean that patient privacy and data security are among the most pressing issues facing organizations trying to bring healthcare data together. From a legal perspective, HIPAA, GDPR, the new California Consumer Privacy Act, the FTC’s data privacy enforcements, and similar legislation and regulatory bodies provide much of the regulatory framework for how data is used. For research purposes, IRBs are also instrumental in determining the appropriate use of data. From a first principles perspective, institutions throughout the health data ecosystem must ensure the patient value of each data use case, the security of data throughout the system, patient consent and access, permissions for secondary use, and avoid risks of data leakage.
“Right time” means that this data exchange can happen with minimal friction, so that the data can be assembled and queried in time to make a difference. It should be easy for an internal analytics group, consultant, or startup to collect the data it needs to power a use case. Mechanisms that connect different datasets and give visibility into how they overlap will need to be built in order to make this a reality.
As the amount of data continues to expand and the sophistication of analytics becomes richer and richer, we expect the complexity of the ecosystem (and the stakes around some of the issues facing the ecosystem) to continue growing exponentially in coming years.
***
Primarily Authored by Jacob Stern. Special thanks to Andy Coravos, Sebastian Caliri, Samuel Bjork, Colby Davis, Eric Perakslis, and Shahir Kassam-Adams for reviewing early drafts.
Editor’s note: This post has been updated on October 18, 202 for accuracy and comprehensiveness.
Explore how Datavant can be your health data logistics partner.
Contact us