No items found.

/

Hackathon Preview: Synthetic Data Within the Ecosystem of Healthcare Innovation

August 30, 2022

min

More innovation, fewer guardrails

Maxim Berg, courtesy of Upsplash

‍From September 8–11, 2022, Datavant will host our first annual Future of Healthcare Hackathon. To get participants more deeply engaged with some of the cutting edge technologies currently making waves in healthcare technology, we are collaborating with several industry partners to provide datasets for use in projects. In addition to price transparency data by Turquoise and a variety of datasets related to provider/payer information sharing provided by Datavant, Syntegra is providing synthetic datasets.

We had a conversation with Syntegra’s Head of Growth, Carter Prince, about synthetic data’s role within the healthcare ecosystem and how it reduces privacy risk while increasing data utility to improve research and drive innovation.

We’re very excited to have this opportunity to collaborate with Syntegra, and to share synthetic data sets as part of the Future of Healthcare Hackathon. Can you summarize what synthetic data is within the world of healthcare and how it can be useful?

Synthetic data looks and acts just like real data, maintaining all of the statistical accuracy of the original data but containing fake (synthetic) patients. This protects patient privacy even beyond the regulatory guardrails of HIPAA or GDPR because the patients being statistically represented don’t actually exist. As a result, healthcare data can be used and shared much more easily and quickly without facing the typical privacy and administrative barriers.

Beyond the element of privacy, synthetic data greatly increases the utility of healthcare data. It allows for the use of more granular, patient-level data, which can also be augmented and customized, such as by increasing population size or addressing areas of bias. Synthetic data can be used for a variety of use cases, including to improve the accuracy of algorithms by providing the volume and type of data needed for model development and testing, to expand rare or small cohorts to improve precision medicine research, to enable access to hard-to-obtain EU or rare disease data, and so much more.

Synthetic data is driving innovation, especially in the digital health space. Synthetic data has the unique capability of providing access to large amounts of diverse, representative, patient-level data, addressing a great need in the development and testing of AI/ML models to improve their accuracy in real-world settings. Digital health and health tech companies, especially those in their earlier stages, often struggle to find the right data, a process that can take months or well over a year, if they can access it at all. And when real data is accessible, it is often stripped of important fields in order to maintain patient privacy, which greatly reduces its utility. Syntegra’s synthetic datasets give immediate access to the data these companies need to build and test new products, significantly accelerating deployment of these tools. Through our partnership with Tuva Health, we also provide an analytics-ready format of both our EHR and claims datasets, removing a lot of the guesswork and time it takes to process healthcare data from its raw form, making it more usable for analytics and AI/ML.

Outside of digital health, we are also working with pharmaceutical organizations to leverage the rapid access and flexibility of synthetic data to help internal teams, such as real-world evidence teams, better explore datasets for feasibility and study design before conducting a final analysis on actual patient data.

We believe high-fidelity synthetic data will become part of the “healthcare data stack” for all healthcare organizations.

How has the use of synthetic data evolved within the healthcare industry?

The use of synthetic data in the healthcare industry is still relatively new, as previous approaches to synthetic data generation have been largely unsuccessful, limiting its adoption. Early methods used rules-based approaches and suffered from low accuracy. More recently, generative adversarial networks have been used with simple, tabular data, but they fail to capture the full complexity of healthcare data.

Syntegra uses a really groundbreaking machine learning approach, transformer-based language models, to generate synthetic data, allowing us to create complex, longitudinal healthcare data and work with all types of structured data in any data format. Our model, the Syntegra Medical Mind, learns the underlying distribution of real health care data (such as EHR, claims, genomics, and more) represented as a temporal sequence of medical events, then uses the learned distribution to generate completely new (synthetic) patient records. Learn more about our approach and challenges with early methods on the Syntegra blog.

Trust in and use of synthetic data are growing, as its fidelity and utility continue to improve, and its capabilities and potential become more well known. Syntegra’s language model approach allows us to work with longitudinal healthcare data, capture full scale and dense medical history and maintain multivariate accuracy. We’ve also developed a set of metrics for validating both the fidelity and privacy of synthetic data to ensure a high-level of accuracy and privacy preservation in our synthetic data.

Syntegra and Datavant are working in complementary roles with regard to making healthcare data more widely accessible. Given such partnerships, what do you imagine the healthcare data landscape will look like in the future?

We believe high-fidelity synthetic data will become part of the “healthcare data stack” for all healthcare organizations. There will always be a need for work with real patient data with a traceable provenance, an area in which we see Datavant as a current and growing leader, but the use of this data can be complemented and informed by the use of synthetic data. Open-ended exploration, for example, is often impossible with de-identified real data due to well-deserved patient privacy restrictions. Synthetic data, however, can be used in this way, presenting an opportunity for teams to be more data-driven in the early stages of hypothesis and study design, or product testing and development. Insights at this early stage can then be taken further in real datasets, and synthetic data can then serve as a complement to the real data by filling any gaps where necessary. We recently worked with a global pharma company to leverage synthetic data as a way to directly access an EU datasets they wouldn’t have been able to otherwise access, enabling them to gain a deep understanding of the underlying data structure and statistics, improving and accelerating future real-world evidence studies with this data.

About the Future of Healthcare Hackathon:

Datavant has hosted several hackathons over the past few years. One major highlight of these was the 2020 Pandemic Response Hackathon, which drew over 1600 participants, 230 submissions, and involved 30+ co-partners. Have a look at the 2020 project showcase to see some especially impressive submissions.

The Future of Healthcare Hackathon is a virtual event taking place from Sept. 8 — Sept. 11. Submissions will be reviewed by our judging panel including David Shulkin, prior U.S. Secretary to the VA, Niall Brennan, Chief Analytics and Privacy Officer at Clarify (formerly at the Healthcare Cost Institute), Clare Bernard, Ph.D., Senior Director, Data Sciences Platform at Broad Institute, and more.

Winners can bring their projects to life by leveraging our prize pool, which includes cash prizes and the opportunity to travel to Washington D.C. to present at the annual Future of Health Data Summit (on 9/15). Presenters at this conference will include Former and Current Heads of the FDA, Former U.S. Secretary of the VA, Chief Data Officer of Broad institute, and Federal CIO. ~250 high profile leaders in healthcare, tech, policy, will be in attendance, as well as press in attendance.

Authored by Carter Prince (Syntegra) and Nicholas DeMaison (Datavant).

Considering joining the team? Check out our careers page and see us listed on the 2022 Forbes top startup employers in America. We’re currently hiring remotely across teams and would love to speak with any new potential Datvanters who are nice, smart, and get things done.

Spotlight on AnalyticsIQ: Privacy Leadership in State De-Identification

AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).

AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.

"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.

“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."

Building Trust in Privacy-Preserving Data Ecosystems

As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.

As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.

The Power of SDOH Data with Providers and Payers to Close Gaps in Care

Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.

Payers Deploy Targeted Care Using SDOH Data

Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:

Tailored Member Programs: Payers develop specialized initiatives like nutrition delivery services and transportation to and from medical appointments.
Identifying Care Gaps: SDOH data helps payers identify gaps in care for underserved communities, enabling strategic in-home assessments and interventions.
Future Risk Adjustment Models: The Centers for Medicare & Medicaid Services (CMS) plans to incorporate SDOH-related Z codes into risk adjustment models, recognizing the significance of SDOH data in assessing healthcare needs.

Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.

Example: CDPHP supports physical and mental wellbeing with non-medical assistance

Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:

Social isolation
Loneliness
Transportation barriers
Gaps in care

By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.

Providers Optimize Value-Based Care Using SDOH Data

Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:

Onboard Patients Into Care Programs: Providers use SDOH data to identify patients who require additional support and connect them with appropriate resources.
Stratify Patients by Risk: SDOH data combined with clinical information identifies high-risk patients, enabling targeted interventions and resource allocation.
Manage Transition of Care: SDOH data informs post-discharge plans, considering social factors to support smoother transitions and reduce readmissions.

By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.

While accessing SDOH data offers significant advantages, challenges can arise from:

Lack of Interoperability and Uniformity: Data exists in fragmented sources like electronic health records (EHRs), public health databases, social service systems, and proprietary databases. Integrating and securing data while ensuring data integrity and confidentiality can be complex, resource-intensive and risky.
Lag in Payer Claims Data: Payers can take weeks or months to release claims data. This delays informed decision-making, care improvement, analysis, and performance evaluation.
Incomplete Data Sets in Health Information Exchanges (HIEs): Not all healthcare providers or organizations participate in HIEs. This reduces the available data pool. Moreover, varying data sharing policies result in data gaps or inconsistencies.

To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.

SDOH data holds immense potential in transforming healthcare and addressing health disparities.

With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.

Featured resources

Tokenizing clinical trial data in the development lifecycle allows earlier access to real-world data. Validate populations prior to marketing authorization.

The Utility of Data Tokenization in Clinical Trials

Specialty drugs drive the majority of prescription drug spending. Learn about capabilities unlocked by connecting SP data, first party data, and RWD.

Linking Specialty Pharmacy Data for Commercial Success: The New World of Commercial Analytics

Datavant Connect: Matching patients across healthcare datasets

Address the need for highly accurate privacy-preserving record linkage and patient matching solutions to unlock research and innovation.

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.