Health Data & Analytics

Life Sciences

Government & nonprofits

Blog

Real-world data

Transforming Cancer Care through Multimodal Data: An Interview with COTA CEO Miruna Sasu

Datavant

December 11, 2023

min

Table of Contents

Miruna Sasu, Chief Executive Officer at COTA

In our Ecosystem Explorer Series, we interview leaders from organizations who are advancing access to health data. Today’s interview dives into the world of multimodal data with Miruna Sasu, CEO at COTA.

Miruna Sasu is the Chief Executive Officer at COTA. She has held leadership positions at Johnson & Johnson and Bristol Myers Squibb where she revolutionized company-wide digital innovation and advanced analytics across enterprise drug portfolios from drug discovery to value access and post-marketing.

Miruna holds a PhD in biology and statistics from Penn State University and an MBA from Temple University.

COTA was founded in 2011 by doctors, engineers, and data scientists to create clarity from fragmented and often inaccessible real-world data (RWD). By using our proprietary technology, advanced analytics and deep expertise to organize complex data, we provide a comprehensive picture of cancer that can be used to advance care and research. We believe that everyone touched by cancer deserves a clear path to care. Together, we can make that vision a reality.

Introduction to multimodal data

Miruna, welcome to our Ecosystem Explorer interview series! For readers unfamiliar with multimodal data, let’s start with the basics. What is multimodal data, and why is it important to healthcare researchers?

Multimodal data, or multisource data, is data that has its origins in different areas of the healthcare continuum. These data sources may include EHR data, claims, laboratory and pharmacy records, molecular profiles, medical device data, and patient-reported data on surveys, questionnaires, or even social media.

Multimodal data is important because humans are multidimensional. When we capture different data types that reveal different aspects of a person’s healthcare experience, it results in a fuller and more accurate picture.

What is COTA’s connection with multimodal data? How does it fit in with your company’s mission?

COTA was founded to make sense of fragmented EHR data, in order to give a longitudinal view of the patient journey. Doctors and drug developers have so much data at their fingertips to help them fight cancer. They have the potential to query information on millions of clinical outcomes to find the best course of treatment for a patient, or identify the groups that’ll benefit most from a new medicine. Our medical records — after being de-identified to protect our privacy — hold the keys to countless discoveries about how cancer is diagnosed and treated. So what’s holding us back from using them right now to cure someone’s cancer?

If we’re serious about helping patients with cancer live longer, healthier, cancer-free lives, we need to get serious about improving the quality of cancer data. This means making sure our cancer datasets represent diverse groups of people and treatment settings and data sources. When we use multimodal data, we can portray patients’ journeys completely and accurately, and include the data points that matter most to answer tough cancer questions.

When it comes to answering those tough cancer questions, are there special applications or benefits of multimodal data to oncology research?

Multisource clinicogenomics data can assist with monitoring the long-term effects of a therapy as part of Phase 4 safety studies, identify any long-term secondary impacts of the product, and help to appropriately position the product in the market to ensure maximum benefit to both the developer and to patients.

These approaches are already producing benefits for patients. For example, many people cannot tolerate statin therapy for high cholesterol, despite the fact that statins are the gold standard treatment for this condition.

When researchers examined genetic data to gain understanding of the mechanism of action of high cholesterol, they found that certain mutations (proprotein convertase subtilisin kexin type 9) played a role in development of the disease. They used these insights to develop an alternate therapy using monoclonal antibodies that target the mutation. This led to the launch of two groundbreaking drugs, evolocumab, and alirocumab, that provide suitable alternatives to statin therapy.

This is a classic example of genetic evidence playing a crucial role in drug approval. With the rich clinicogenomic RWD that is available today, we are likely to see more success stories like this one alongside reduced costs and faster development of life-saving drugs for patients.

That’s a great example. Are there other real-world applications that come to mind that illustrate how multimodal data has been used to improve cancer diagnosis, treatment, or research outcomes?

While cancer affects roughly the same number of women as men, similar numbers of Black and White Americans, and people living in urban and rural parts of the country, a group of factors known as social drivers of health (SDoH) may mean one person’s cancer outcomes are worse than their neighbor’s. SDoH includes things like where a person lives, the level of education they’ve received, and how much money they make — things that, though not directly related to medicine, can impact health. If you live 45 minutes from the nearest medical center and don’t have a job that offers paid time off for doctor’s appointments, it’s unlikely you’ll be able to participate in clinical trials for novel treatments. It can be a matter of getting a life-changing treatment or not.

Multimodal datasets reflect the real lives of the real patients we treat. When data used to answer important questions about cancer represent only a small sliver of the population, the results may be skewed or biased. That’s not useful for drawing conclusions about how cancer affects Americans more broadly. For example, we can’t use data from affluent, majority-white patients treated at an academic medical center in a big city to understand the ways people of color in a low-income community are treated in local hospitals — the results simply wouldn’t match up.

The challenges with multimodal data

Collecting, curating, and connecting health data can be complex, particularly with data from multiple sources and types. What do you see as the major challenges with multimodal data?

Bringing all of this data together into a curated, fit-for-purpose dataset to support research and development can be a challenge, because these data sources weren’t originally designed to fit together neatly. Instead, researchers and analysts must consider a number of different issues when synthesizing multimodal data, including how to accurately identify a single patient via tokenization or other means.

Data scientists must carefully balance the volume of data (a higher number of data sources could create a richer, more complete portrait of an individual) with the data integrity and governance issues of merging many sources together.

The challenge becomes even more complicated as we start to explore other datasets to augment our current core sources of RWD. For example, we are increasingly integrating imaging data, personal device data, SDoH data, and patient experience data into the RWD ecosystem. These sources tend to be even “messier” than other data types due to poor standardization and high amounts of unstructured information.

How does COTA address this data messiness?

In cancer, this means working directly with oncologists and high-quality cancer data to make sure the data is accurate, useful, and well-positioned to drive better outcomes for patients. With the triad of strong tech, great data, and oncology expertise, we can make multimodal data the best it can be for giving useful cancer insights at scale.

As you know, data security and patient privacy are critical for organizations working with connected health data. Does multimodal data present unique challenges in those areas, and how do you address them?

Trust is the operative word when it comes to bringing together multiple data sources to create new assets for use in the life sciences environment. As we start looking at bringing multimodal data together for regulatory and decision-making purposes, we need to connect up these disparate silos in a compliant and secure manner.

We need to access and share de-identified data in a HIPAA-compliant manner while preserving the provenance and governance of the data. But more importantly, we need to protect the spirit of the law around patient privacy and respect the fact that these are very sensitive data elements.

By employing technologies that can identify, aggregate, and synthesize data in a trusted and neutral manner, the health system can start to unlock previously unused data to augment the traditional clinical trial and post-market surveillance ecosystem.

Innovations and opportunities

Let’s look to the future: What are the current trends or innovations with multimodal data?

Pharma can use high-quality data on real patients’ outcomes to de-risk cancer drug development. They can test hypotheses about how new drugs or molecules would affect patients in the data, before they recruit for a clinical trial, allowing drugmakers to predict which types of patients are most likely to benefit from their medicine. By replacing their control groups with external control arms, patient data replaces the non-treatment arm in a clinical study. This ensures all patients in the trial receive the novel therapy, rather than joining a trial only to represent standard of care. And when planning a clinical trial, they can combine this RWD with AI to model whether or not their trial will be able to demonstrate the true potential of their new treatment. Based on the outcome, they can decide whether to proceed with the trial as is, adjust its methodology to improve its chances of success, or scrap it altogether.

By testing hypotheses and trial feasibility virtually ahead of a full-scale clinical project, pharma teams don’t spin their wheels on efforts that aren’t likely to succeed. Instead, their resources are then better spent on research into other promising trials or medicines. And, critically, patients who participate in trials are protected from taking therapies that aren’t likely to treat their disease.

Much has been said about the advancements of AI/ML and its potential to transform healthcare, from predictive modeling and precision medicine to clinical decision support. How important is multimodal data to these applications of AI/ML?

It’s an exciting time to be in tech. Large language models and generative AI are top of mind everywhere. In healthcare, I’m increasingly inspired by our potential to elevate data science with AI, driving even faster answers for patients, doctors, and drug manufacturers.

AI will not work without high quality multimodal data. The promise for generative AI as a breakthrough tool to transform cancer therapy is immense: sophisticated algorithms to help doctors and researchers find insights that will uncover treatments to transform cancer care and help millions of people live healthy lives. So much is riding on it. The engine is ready to run. But there isn’t enough fuel to make it roar. Health data is notoriously incomplete, contains errors and bias, and is housed in many incompatible formats, as well as siloed in heavily protected and regulated systems across thousands of institutions. As much as 70% of data generated in sophisticated computer systems still needs to be qualified by human experts at great cost in time and resources. And there is already evidence that generative AI systems can amplify gaps, errors, and biases. Quality data — essential for AI’s clinical insights — isn’t produced fast enough. AI hasn’t passed the quality test.

High-quality data is the key ingredient to realizing the full potential of AI. By meshing automation and supervision of high-quality RWD, COTA is providing useful information at a scale that will begin to change the dynamics of cancer care. The average handle time of COTA’s product has decreased year-over-year by 10%. Instead of 70% of data qualified by human experts alone, COTA is moving towards a day when 70% of insights can be validated using AI with the same caliber of expert human oversight. The impact on patients will make the excitement around AI meaningful.

What do you see as the most exciting opportunities for health organizations working with multimodal data in the coming years?

As the cancer-treatment landscape becomes ever more complex and the disease is treated as chronic rather than a death sentence, oncologists and drug developers need a “Waze for cancer” to navigate the winding roads of treatment and trials. There’s work to be done to get there; the technology isn’t quite ready to give us the guidance we need to treat patients with precision and speed. The main roadblock lies in the cancer data landscape. Today, we are not processing data at a scale and speed necessary to power a Waze for cancer research. The information is housed in diverse data sources, but it must be processed. That makes it hard to draw conclusions and find the right path.

On the road to a Waze for cancer, we must first improve the quality and quantity of cancer data that powers AI models and infuses medical knowledge into cancer datasets — a big task, but not insurmountable. Thanks to leaders in cancer data, standardizing the way cancer data is recorded and working with oncologists to verify its accuracy is already underway. Part of this effort includes validating data and AI-driven conclusions in real time. Similar to how Waze users are asked to verify if one user’s report of a car accident or speed traps on a route are accurate, oncologists assess the data to make sure AI models are reaching the right conclusions.

Thanks for the interview, Miruna! Do you have any recommended resources for readers who want to learn more about multimodal data and its applications in oncology?

Follow COTA on LinkedIn to learn more about the future of multimodal data in cancer care.

This interview is part of our Ecosystem Explorer Series, in which we interview leaders from partner organizations who are improving access to health data. If you’re interested in participating in this series, send us an email at info@datavant.com.

Spotlight on AnalyticsIQ: Privacy Leadership in State De-Identification

AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).

AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.

"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.

“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."

Building Trust in Privacy-Preserving Data Ecosystems

As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.

As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.

The Power of SDOH Data with Providers and Payers to Close Gaps in Care

Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.

Payers Deploy Targeted Care Using SDOH Data

Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:

Tailored Member Programs: Payers develop specialized initiatives like nutrition delivery services and transportation to and from medical appointments.
Identifying Care Gaps: SDOH data helps payers identify gaps in care for underserved communities, enabling strategic in-home assessments and interventions.
Future Risk Adjustment Models: The Centers for Medicare & Medicaid Services (CMS) plans to incorporate SDOH-related Z codes into risk adjustment models, recognizing the significance of SDOH data in assessing healthcare needs.

Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.

Example: CDPHP supports physical and mental wellbeing with non-medical assistance

Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:

Social isolation
Loneliness
Transportation barriers
Gaps in care

By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.

Providers Optimize Value-Based Care Using SDOH Data

Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:

Onboard Patients Into Care Programs: Providers use SDOH data to identify patients who require additional support and connect them with appropriate resources.
Stratify Patients by Risk: SDOH data combined with clinical information identifies high-risk patients, enabling targeted interventions and resource allocation.
Manage Transition of Care: SDOH data informs post-discharge plans, considering social factors to support smoother transitions and reduce readmissions.

By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.

While accessing SDOH data offers significant advantages, challenges can arise from:

Lack of Interoperability and Uniformity: Data exists in fragmented sources like electronic health records (EHRs), public health databases, social service systems, and proprietary databases. Integrating and securing data while ensuring data integrity and confidentiality can be complex, resource-intensive and risky.
Lag in Payer Claims Data: Payers can take weeks or months to release claims data. This delays informed decision-making, care improvement, analysis, and performance evaluation.
Incomplete Data Sets in Health Information Exchanges (HIEs): Not all healthcare providers or organizations participate in HIEs. This reduces the available data pool. Moreover, varying data sharing policies result in data gaps or inconsistencies.

To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.

SDOH data holds immense potential in transforming healthcare and addressing health disparities.

With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.