This summer, Datavant convened an inaugural Product Council — composed of thought leaders and health data super users within our partner ecosystem — to discuss the biggest challenges in health data exchange and share ideas about how we can work together to build solutions. This post highlights key insights from those conversations, which are shaping the way Datavant thinks about collaboration across the health data ecosystem.
The real-world data ecosystem is continuing to expand — and as it does, the rate of fragmentation is outpacing aggregation
Every day, more and more patient data becomes available within the healthcare ecosystem. The increased availability of data has enabled the healthcare industry to meaningfully improve the patient experience and care by better understanding disease outcomes, more accurately assessing a patient’s real-world experience with care delivery, and more. This novel and emerging data, however, is fragmented — available across a disparate and disconnected set of sources — and the pace of fragmentation is accelerating too.
As the data ecosystem’s expansion and fragmentation continues to accelerate, the increased complexity creates a new set of challenges. Each of these emerging challenges is unique, and every solution will require collaboration from across the industry.
The evolution of novel datasets and the tools to leverage them is bringing better insights within reach. It would be a great loss not to work together to deploy these capabilities and drive solutions that improve human health.
Informed by a series of conversations with Datavant’s Product Council, we reflect on the biggest bottlenecks in health data exchange today, and highlight the opportunities that exist for our industry to work together and address them.
Build tools and expertise to help navigate the expanding landscape of data
Expansion and fragmentation are driving two currents in the health data ecosystem. The first is that expanding data availability is creating demand for more data, and for specialized data. As novel data sources become more readily available, it becomes possible to deliver more comprehensive and impactful analyses.
Fifteen years ago, the limit of a health data analysis might have been using prescription data to understand which providers were prescribing a therapeutic. Today, the broader availability of claims, clinical, and social determinants data means that companies can assess which patients are adhering to the treatment, and why. More and more, the most innovative data users intent on delivering industry-leading analytics and solutions need to leverage unique sources of data.
Simultaneously, greater fragmentation means that these unique data sources are available through specialized providers, who may not be easy to find or be operating at scale. More data, and more specialized data, is becoming vital for key use cases. The right data, however, is more difficult to find.
The result is that just finding the right partner presents an immediate bottleneck to exchange, and equipping everyone in the healthcare ecosystem with the tools to navigate the landscape becomes foundational to sharing data.
The right solution starts with showing data users where to find, for example, lab test records that they need to better understand the rate of rare disease diagnosis and misdiagnosis. The solution could also enable data users to make standardized comparisons across data providers — between two “third party” data providers, and between a third party and their own internal “first party” data — as well as help data providers vet potential users.
Technology is part of the answer: assessment tools which facilitate dataset exploration, overlap comparison, and segmentation. The complete solution also has a place for “expert services”: subject matter experts who deeply understand the landscape of available data and can offer consulting “horizontally” across the industry, without being constrained to a specific organization.
Create technologies to enable transparent, controlled, and easy analysis on connected data
Finding and evaluating data providers is the first but not the last challenge. Solving complex health problems increasingly requires combining disparate datasets in novel and complicated ways.
As the quantity of data skyrockets and introduces new risks for patient privacy, responsible and ethical data governance demands that data combination be done in a way that protects patient privacy and creates transparency and accountability for all the parties involved. Enabling the appropriate management, governance, and application of data will take an ecosystem of exceptional technology companies.
The unmet need is for data sharing technologies that give every partner control over their records and confidence in the stewardship of their data. Businesses who collaborate to solve one problem may later compete to solve another — and that knowledge informs their willingness to share data today.
Data providers want to keep their data independent of their peers and competitors, to maintain competitive advantage and to minimize the risk of inadvertent or unauthorized combinations. Data users want to be able to easily analyze connected data without being constrained to a particular provider. Everyone wants transparency into how the data is being processed and applied.
One solution is connecting health data through a trusted third party. These data stewards ought to be neutral enclaves — they don’t buy or sell data — and be built with security and privacy as the highest priority.
Other solutions lie in applying novel privacy-preserving technologies to health data exchange. Federated technologies, such as Multi-Party Computation (MPC), provide a mechanism to exchange information about a dataset without that dataset ever needing to leave its original environment. Emerging technologies like synthetic data enable the creation of datasets which contain no real patients’ information, but in aggregate produce the same analytical results — enabling insights without requiring the underlying data to be shared.
However these solutions are combined, cross-industry collaboration is key. The data stewards and technology providers will need to be able to partner seamlessly across the health data ecosystem, with not only any data provider and any data user, but with any other technology partner providing additional tools for analysis, privacy, governance, or linking.
Drive data standardization, quality, and transparency at scale
Real world data is frequently non-standard and incomplete. This “messiness” characterizes the real world data that is available today as well as the new data types that are emerging. Solving the resulting challenges will require leading companies to work together.
Standardization is the first challenge. Today, data is collected in nearly as many formats as there are points of collection. Standard data models exist, but there are many of them and they apply to different data types. On top of that, large swathes of the industry lack the understanding or the right incentives to comply. The end result: Data recipients spend precious resources standardizing different datasets before records can even be combined and an analysis begun.
The second is data quality. Real world data is incomplete, and this “missingness” inhibits the accuracy of downstream insights. When a dataset has high missingness, one can’t be certain that it captures the full patient journey.
The patient outcomes that one aims to better understand may not have occurred — or they may not have been captured in the dataset. Missingness also limits a dataset’s linkability through the omission of elements necessary to reliably connect with other records, further impeding the ability to capture the full patient journey.
By the time data gets to an end user, the reason for any missingness is also often obscured. Obscurity creates a third challenge because it impedes reliability — insights are only as strong or as weak as the data that powered them. In particular, transparency and provenance challenges are a critical impediment to broader acceptance of real-world data for purposes like regulatory approval.
Without clear knowledge of a dataset’s strengths and weaknesses, it’s difficult to anticipate the blind spots in analysis, be that patient populations that may be underrepresented or outcomes that it may be predisposed to miss.
These challenges are costly for companies in terms of time and dollars. The cost to patients can mean big differences in health outcomes.
Solving the problem at scale requires many actors. There is a role for industry associations, trade groups, and other stakeholders to align to a more uniform standard within specific data types.
Regulators can advance standardization by implementing the right incentives. Data users can influence the equation by applying buy-side pressure. As achieving greater data transparency and quality becomes a competitive differentiator, data providers will rise to the expectations of data users and regulatory bodies governing use of RWD.
The clearest unmet need, however, is in technology. This is a particular opportunity for partnership across a constellation of exceptional technology companies. As the health data ecosystem expands and brings more and more “messy” data online, there is a rapidly expanding opportunity for specialized companies to collaborate on different pieces of the answer: from data cleaning, to harmonization, to imputation.
Capture the value and insight within unstructured and emerging patient data
Unstructured data is a growing opportunity for exchange and analysis. The real-world data that powered the first sea change in data-based decision-making has been dominated by structured data (data that has some standardized format or data model and follows a persistent order). As the health data ecosystem continues to expand, however, much of the novel data that is coming online — from clinicians’ notes to pathology images to patient reported outcomes and genomic sequencing data — is unstructured data.
The potential value in unstructured data is enormous. Clinical endpoints that illustrate the success of a treatment program, such as the size and stage of a cancerous tumor, may only be available in patients’ imaging data. Unstructured clinician notes may describe a patient’s quality of life or even medical history beyond the structured data entry. A patient’s genomic sequencing may provide critical insight into the likely benefit provided by a precision therapeutic.
Extracting that value is challenging. Acquiring the right data presents an immediate obstacle, as it’s difficult to determine from among a collection of notes or images which correspond to the right patients and which contain information relevant to the use case.
De-identification, when necessary, is also a tough problem to solve since identifiers could be a stray word or genomic detail. Finally, deriving insight from the data is a challenging process, requiring either complex programs that can process the unstructured data or tools that can first structure the information for easier utilization.
There are learnings to draw from existing solutions for structured data exchange. Tools exist that can locate specific patients’ records. Expert statisticians can assess a dataset’s identification risk and recommend steps to de-identify it. Analytics programs can derive critical insights from even massive tables of data.
Many of these tools, however, rely on structured data inputs. There is a great unmet need for partner technologies that can translate these capabilities and apply them to unstructured datasets.
Success requires cooperation
Solving these challenges is an opportunity to unlock a stepwise change in how health data is shared to power better insights and improve patient outcomes. The ambition is a future of healthcare where the right data can be easily found, combined, and applied to generate trustworthy insights and drive rigorous, data-based decision making.
Achieving that ambition will require technology, expertise, and collaboration from across the health data ecosystem. Getting it right means committing to work alongside a broad constellation of partners who bring unique and complementary strengths — and relentlessly pursuing the “win-win-win” opportunities that allow everyone to play to them.
At Datavant, we’ve made the commitment to collaborate across the health data ecosystem, solve this next wave of challenges, and achieve a data-driven future of healthcare. If you share that vision, let’s get in touch. Reach out to us at product-council@datavant.com.
Prepared by Quinn Johns, Vera Mucaj, and Su Huang
Editor’s note: This post has been updated on December 2022 for accuracy and comprehensiveness.
AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).
AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.
"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.
“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."
As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.
As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.
Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.
Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:
Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.
Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:
By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.
Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:
By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.
While accessing SDOH data offers significant advantages, challenges can arise from:
To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.
With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.
Explore how Datavant can be your health data logistics partner.
Contact us