On May 14, 2024, Datavant held our annual Real-World Data Connect event in Boston, convening more than 130 RWD experts and practitioners from theLife Sciences industry. Our goals were to explore the value of real-world data across clinical and commercial use cases, identify best practices and opportunities for collaboration, and exchange insights to drive innovation at organizations across the health data ecosystem.
We’re excited to share our top takeaways from the day’s panel discussions and look forward to continuing the conversations!
Linking new types of health data enables a comprehensive view of the patient journey
In the past, researchers would typically just connect clinical and claims data, providing insights into patient care and health outcomes based on documented medical histories and billing information. Today, the landscape has expanded dramatically with the accessibility of genomics, imaging, diagnostic, and lab data, as well as social determinants of health (SDoH) data, driving new applications and deeper, more personalized insights into patient health and treatment efficacy.
Pairing multimodal data — ‘omics, imaging, and other unstructured RWD — with an effective data strategy can reveal powerful new insights.
Panelists noted that linking multimodal data — or data derived from multiple sources and types — can create rich datasets and power new research. In one example, aLife Sciences speaker described how his organization is leveraging DNA/RNA sequencing and clinical data for organoid simulations, enabling researchers to explore signals and treatment pathways from tissue-derived models.
In another example, a behavioral health company developed a digital screening device that records consenting patients at their homes. AI/ML models trained on real-world data then analyzed these recordings to detect signals and behavioral patterns indicative of early autism, reducing the time to diagnosis.
Leveraging multimodal data can be complex, and speakers emphasized the importance of robust data strategies, architectures, and governance frameworks. Cloud providers can help manage the size and scale of multimodal data and reduce spend on duplicative RWD, while also making data sharing secure and efficient.
Patient-consented clinical data are powering R&D workflows.
Patient-consented, HIPAA-compliant clinical data is increasingly being used to power large-scale R&D projects. The greater depth, breadth, and utility of robust EHR data is well-suited to supplement traditional RWD, particularly in therapeutic areas with rare or hard-to-find cohorts such as pediatrics and rare diseases.
One case involved a cancer research registry that faced challenges in retrieving complete patient histories for a clinical study. By obtaining full medical records from consenting patients, including pathology notes, progress notes, radiology reports, and lab tests, the registry enriched its data model to include cancer staging, biomarker details, and comorbidities.
In another example, a speaker described a study design that combined identified and de-identified workflows to enrich study data and deduplicate an external control arm for a respiratory therapy trial. Tokenization linked relevant trial EDC and PRO data with multiple RWD sources, including claims, EMR, and study drug distribution data, in a privacy-preserving manner.
RWD is reshaping how pharmaceutical products are developed, marketed, and delivered
RWD is supporting regulatory approval for medical devices.
Historically, medical device data lacked the granularity needed for market access and pricing, but several panelists noted that this has changed.
One speaker from a top pharmaceutical company described connecting RWD to pursue label expansions for cardiac medical devices. By linking medical device data with multiple EHR datasets to assess patient outcomes, the pharmaceutical company generated evidence on device safety. This innovative approach led to the FDA's first-ever approval for label expansion based solely on a comparative real-world evidence (RWE) study using EHR databases. This milestone has set the stage for 20 additional test cases funded by the FDA through the National Evaluation System for Health Technology.
In another example, aLife Sciences organization linked data to generate evidence on a wearable device to treat essential tremors, a condition that has limited treatment options. Researchers combined proprietary claims data, longitudinal patient-reported outcomes data, and device data to measure the device's impact on patient health, healthcare utilization, and costs. This evidence supported the device's 510(k) premarket clearance, demonstrating the transformative potential of RWD in advancing medical device innovation.
RWD can improve modeling, engagement, and measurement for commercial teams.
The landscape of healthcare marketing is rapidly evolving, and new technologies are enabling more personalized engagement with patients and providers. In particular, linking de-identified RWD givesLife Sciences organizations and their technology partners insights that can improve audience modeling and campaign measurement accuracy. This, in turn, leads to better patient engagement, with one speaker describing a 30x lift in acquisition for patients with an autoimmune disease.
New partnerships and technologies are making data connectivity and exchange more efficient, controlled, and privacy-centric.
Designing and implementing a linkable data strategy is essential for unlocking new data types and powering innovative clinical and commercial use cases. Efficiently tokenizing and connecting first and third-party data enables organizations to maximize the utility of their data, reduce duplicative spending, and enable researchers to access the data they need when they need it.
However, this is just one part of the equation. Effective data strategies also depend on the ability to quickly and securely connect fit-for-purpose data. Panelists emphasized that “fit for purpose” involves taking the time upfront to determine the specific research questions, the types and attributes of data needed to answer those questions, and identifying the right partnerships to obtain that data.
RWD partnerships go deeper than price and data quality.
Panelists noted that there are multiple factors to consider when evaluating RWD, including ease and speed of access, transparency, reliability, and relevance. However, as research needs evolve, the degree to which a dataset is fit for purpose changes over time, and attendees generally agreed that the ideal state is where researchers can quickly and easily navigate the landscape of RWD partners to answer their specific research questions.
Clean rooms and federated data models have emerged as privacy-enhancing solutions for secure, efficient collaboration.
RWD practitioners are increasingly using cloud-based clean rooms as a trusted method for data exchange. Commonly used in advertising and marketing, clean rooms are secure environments that enable multiple parties to collaborate without exchanging data, allowing joint analyses on multiple datasets while adhering to privacy controls.
Speakers noted that the advent of clean rooms has encouraged organizations that historically have shied away from sharing data to provide access to de-identified data for health research. But they’re not only valuable for collaborating with external organizations — clean rooms also empower internal teams within an organization to share data in a secure, controlled environment. Bringing tokenization and data assessment tools to the cloud will further enhance the value of clean rooms, enabling organizations to efficiently de-identify, link, and match data without ever moving the data out of their environment.
Federated data models are another method of secure data collaboration, enabling data access and analysis across multiple distributed databases without centralizing the data. One panelist described success with using a federated data model to power oncology studies for multiple cancer diagnoses and Alzheimer’s studies. Multiple research sites ran local AI models on federated datasets, accessing rich clinical data without actually exchanging it. This privacy-centric collaboration model allowed researchers to examine disease progression over time while maintaining strict data privacy and security standards.
Guarded optimism on AI and Large Language Models (LLMs)
AI for predictive modeling is mature and widespread acrossLife Sciences organizations that have governance frameworks in place for model development and management. In the future, proprietary data will be the largest differentiator between the strength and quality of predictive AI models. To advance AI and build better models, organizations must be willing to provide access to high-quality data, whether in a shared setting where first and third-party datasets are being connected or in a federated fashion.
The type and quality of data used for training will also differentiate LLMs, which could transform how we approach data management and analytics. One panelist was optimistic that LLMs will reduce the need to structure data in the future: Researchers will simply be able to ask the model questions and generate analysis in real-time, accelerating the speed to insight.
However, there are still questions regarding proper governance with LLMs. Black box models can be problematic, noted one speaker, andLife Sciences organizations will need to clarify what data sources were used for training and what steps were taken to eliminate biases.
Privacy First and Always
As the volume, modalities, and applications of RWD continue to expand, it’s increasingly important to treat privacy as an essential component of and enabler for data strategy. Experts agreed that privacy should be built into the fabric of the data strategy, at both the enterprise level and project level.
The foundational principles of patient privacy remain unchanged, and speakers noted that researchers must master the essentials. To sustain patient trust and serve as good stewards of health data, organizations must continue to invest in clear governance frameworks, proper data hygiene, data minimization practices, transparency, and organizational literacy on privacy and compliance.
Numerous privacy tools and tactics are available to researchers. From clean rooms and federated data models to contractual controls like data use agreements and new technologies such as unstructured data redaction, researchers should be equipped with the resources they need to connect data securely and compliantly.
Final thoughts
This gathering demonstrated that RWD is becoming a linchpin in the machinery ofLife Sciences, contributing to breakthroughs in therapies, more personalized care, and better patient outcomes. The promise of RWD to transformLife Sciences and patient care is profound, and we left RWD Connect inspired by the progress being made through new technologies, applications, and collaborations.
Thank you to all who attended and contributed their perspectives! For those unable to attend this year, RWD Connect 2025 will take place in Philadelphia in the spring of 2025. Stay tuned for more information!
Want to join the discussion? Reach out to learn how to join the nation’s largest RWD ecosystem.