Introduction
Real-world data comes from numerous sources, and it provides value in healthcare research that cannot be overstated. This data has valuable applications in proving efficacy, enhancing clinical trial design, and improving lifecycle management by reducing the resources required for post-market surveillance studies.
Real-world data (RWD) and real-world evidence (RWE) have the potential to influence the prevention, diagnosis, and treatment of various conditions. As the future of real-world data expands and leads healthcare to a new frontier, the use of health data presents a variety of challenges.
Download a PDF version of this guide by filling out this form, or keep scrolling to read.
Chapters
Chapter 1
Real-World Data Overview
RWD and RWE both have many valuable uses in healthcare, including patient recruitment for clinical trials, comparing drug efficacy, and monitoring drug safety. However, understanding the key differences between these two concepts is important.
What Is Real-World Data?
Clinical trials seek to answer specific questions in a controlled environment to earn regulatory approval of an investigational drug or device. The information gathered from these trials lacks data from a real-world environment. Therefore, post-market studies are often critical to understanding patient adherence and clinical efficacy in the real world, outside of a controlled clinical study.
The pharmaceutical industry has traditionally used randomized controlled trials when seeking approval of therapies, but the U.S. Food and Drug Administration (FDA) has been developing a framework and issuing guidance to support the use of real-world evidence, enabled by RWD, in regulatory decision-making. Additionally, technological improvements that protect patient privacy have expanded the possible sources of real-world data available to researchers.
RWD comes from a variety of sources outside of traditional clinical trials. Researchers routinely collect patient data from sources such as:
- Claims and billing activity
- Electronic health records (EHRs)
- Patient-reported outcomes
- Disease or product registries
- Biometric monitoring sources including pedometers and smart watches
Health data from these sources can provide a more comprehensive picture of the patient journey and experience. It can even provide an overview of population health. Although RWD can enable clinical evidence, it’s essential that the data is fit-for-purpose (i.e., relevant, valid, and reliable), which is only determined based on the research question it’s answering.
How Does Real-World Data Differ from Real-World Evidence?
Real-world data and real-world evidence are often used interchangeably, but they are two different concepts. RWE derives from the analysis of RWD and can provide valuable information about the risks, benefits, and use of a therapy. Real-world evidence helps accelerate the approval of new therapies, especially in oncology.
What Is the Value of Real-World Data and Real-World Evidence?
By using sources of patient health data such as those listed above, researchers can evaluate therapies in a larger population, in real-world conditions, and at a lower cost than with typical clinical trials.
RWD has the potential to provide information about a more diverse population than the typical populations that participate in clinical trials. Therefore, researchers can get valuable efficacy and safety information on a more representative population than they can from a randomized clinical trial.
RWE provides a more comprehensive view of how a therapy will work in a real-world setting. Researchers can evaluate the therapy while factoring in other variables such as comorbidities, demographic groups, and age groups, among other parameters. Most importantly, RWE helps researchers develop a better understanding of the long-term use of the therapy beyond the clinical trial period.

Example Use Cases of RWD and RWE
Some ideal use cases for RWD and RWE include regulatory requirements and deciding on a treatment plan for patients.
RWE can help support regulatory requirements to expand on a therapy’s indication without performing a full additional clinical trial. For example, if a product is often prescribed for off-label conditions, companies may use RWD to study patient outcomes and therapy safety and then submit this information to the U.S. Food and Drug Administration (FDA) and/or European Medicines Agency (EMA).
Healthcare providers can use RWD and RWE to better inform a patient’s treatment plan, procedures, tests, and prescriptions. This data may help develop practice guidelines. For instance, during the beginning of the COVID-19 pandemic, public health officials needed to rapidly evaluate and share information on the prevention and treatment of COVID-19. Much of the information gathered during this time leveraged RWD
Chapter 2
Real-World Data Ecosystem
Numerous types of patient data gathered from multiple sources can be useful for analyzing RWE. To develop a deeper understanding of how RWD can be used and its value in healthcare, examining key data types and sources proves critical.
Real-World Data Types
Health data can be pulled from numerous sources to provide valuable insights into the patient journey.
Claims Data
Claims data results from processing a healthcare claim. Two types of claims data include open and closed. Open claims datasets come from claims clearinghouses or providers’ revenue cycle management systems. They cover a large scale of patient lives, but may not represent complete claims coverage for a given patient.
Closed claims come from health insurance plans or self-insured employer groups. They tend to cover a smaller scale of patient lives, but represent complete claims coverage for a given patient during the time that patient was on the insurance plan or worked at the employer. Claims data is longitudinal in nature and captures a long period of the patient journey, but it does not have as much depth of clinical detail about a particular medical encounter as other data types.
Because closed claims datasets are very comprehensive, they prove ideal for health economics and outcomes research (HEOR) that considers a patient’s journey, resource utilization, and the economic burden of their condition. Open claims datasets prove less useful for HEOR, due to their incompleteness for a given patient. Given the large scale of patients covered as well as lower data latency, open claims datasets can prove useful for marketing use cases.
Claims data is even more powerful when used in tandem with clinical metrics such as lab data, EHR data, or patient-reported outcomes. The combination of these data sets can provide deeper insight into symptoms, disease progression, and clinical outcomes.
Laboratory and Genomics Data
Lab testing data proves valuable for a variety of use cases in healthcare analytics from market sizing to monitoring disease progression to finding biomarker signals of patients eligible for certain therapeutics. Lab data can provide a deep point-in-time clinical and biochemistry profile of a patient but isn’t as longitudinal as claims data.
Genomics data is a specialty area of lab testing currently growing in popularity within healthcare analytics given the increase in biomarker-targeted therapeutics. Genomics data proves useful for both clinical development use cases where scientists may employ genomics data to inform biomarker selection or in commercial use cases where genomic results may provide input towards building a predictive model to find patients eligible for a biomarker-targeted therapy.
Pharmacy Data
Pharmacy data provides information about which therapies patients use and how the therapies change over time. Pharmacy data proves extremely useful for specialty drugs, which now account for approximately 75 percent of prescription drugs in development. A network of specialty pharmacies, contracted by pharmaceutical manufacturers, typically distributes specialty drugs. The manufacturer will then aggregate real-world data from specialty pharmacies to understand real-world prescribing, dispensing, and medication adherence patterns.
Electronic Health Records (EHR)
As patients move throughout the health system, valuable real-world data is collected as part of their electronic health records (EHRs). EHR data contains richer clinical detail than claims data, but it may not be as longitudinal because patients could visit many doctors across different care settings over the course of a year, and those doctors may use different EHR systems. EHRs contain data on appointments, medical history, diagnoses, symptoms, medications prescribed, labs, and chart notes. This data is important for gaining a more granular understanding of clinical patient outcomes.
Even though EHR data is valuable, it requires significant data curation and cleaning because much of the valuable information may reside in unstructured physician notes fields.
As with all real-world data, biases may exist. For instance, EHR data can have ascertainment bias because the recorded information depends on why the patient came in for a hospital visit, who cared for them, and the intended use of the information. Patients in the emergency room for a broken arm won’t have their insomnia noted in their health records, but if they go to a neurologist, it’s more likely to be noted.
Generally, the information recorded as part of a patient’s EHR—whether they’re in an inpatient setting, outpatient setting, or a specific therapeutic area—includes:
- Procedures performed
- Diagnosis
- Vital Signs
- Laboratory results
- Medication orders
- Medications administered
- Patient surveys or questionnaires
- Surgical care information
- Symptoms
Note that electronic health records on their own may not contain all of the necessary RWD, so researchers may be required to seek multiple sources of data.
Oncology Data
RWD plays an important role in answering a variety of research questions surrounding cancer. One of the top priorities in research is generating accurate evidence on the efficacy of cancer prevention, diagnosis, and treatment in a real-world setting.
Researchers often study cancer treatments in a select population in a clinical trial setting. However, researchers can also collect and analyze real-world oncology data to provide RWE on the efficacy and tolerability of new treatment methods in the real world. The main sources of real-world oncology data include:
- Registries
- Claims
- EHRs
- Specialty data providers and networks
Each state legally mandates central cancer registries, thus providing a census of all the patients who have cancer within a defined geographic area. Because of this and the capture of detailed exposure information such as diet or physical activity and patient-reported outcomes, these registries provide unique information because it comes from a non-random group of people.
Limitations of cancer registry data include a lack of information on outcomes other than survival as well as long-term treatment. Addressing these limitations requires new initiatives such as linking registry data with data from other organizations. The new initiatives, as well as real-time access to pathology reports, provide opportunities to supplement the understanding of therapeutic advances and impact outside of clinical trials.
Consumer Data
Recently, researchers have increased the demand for consumer data. This information can provide additional context about a patient population, such as:
- Jobs
- Socioeconomic status
- Interests
- Health
- Race
- Ethnicity
- Languages spoken
This data comes from consumer data companies and has traditionally been used for targeted marketing. Note that consumer data companies only have data on adult consumers.
Social Determinants of Health (SDOH)
Social determinants of health (SDOH) are the conditions in which people are born, work, live, play, age, and worship. These have a large impact on peoples’ health, quality of life, and functioning. Some examples of social determinants of health include:
- Racism
- Polluted water and air
- Access to healthy, nutritious foods
- Physical activity opportunities
Social determinants of health contribute to health inequities and disparities. For example, those who don’t have access to grocery stores that carry healthy foods may not have good nutrition, which can lead to obesity, diabetes, and heart disease. Data on SDOH and various groups of people provide a key understanding of health disparities as well as ways to improve conditions in peoples’ environments.

Real-World Data Sources
Different types of data providers are relevant for different situations. Three types of data provider categories exist:
Data Platforms
Data platforms provide a technology platform that has intuitive user interfaces (UI) for analyzing data within the platform. These companies have data science and data engineering teams that clean and standardize continuous streams of data coming into the platform, and combined with the UI layer, can be considered user-ready.
In many cases, the platform provides limited ability to export data for use. Working with a data platform is best for companies without data analytics or data engineering capabilities.
Data Aggregators
Data aggregators offer cleaned and standardized data that has been aggregated from many underlying sources. Typically, a technology platform or user interface overlay doesn’t exist, and the data is available to license as a one-time or continuous data feed.
This data is analytics-ready. Companies working with a data aggregator need to have data analytics or business intelligence analysts who can manipulate the data into analysis, but they do not need to have sophisticated data engineering to clean and standardize the data.
Data Originators
Data originators are closest to the source. They have the most granular and detailed data, but they do not clean it. This data requires the application of sophisticated data engineering capabilities before it can be analytics-ready.
Real-World Data Solutions Providers
The RWD ecosystem includes both real-world sources, as described above, as well as solutions providers that have built analytic and workflow solutions on top of real-world data. Many platform companies are also solutions companies, having built specific data views and analytic tools that provide solutions for specific use cases.
Common commercial analytics and clinical development solutions built on top of real-world data include:
Commercial Solutions
- Specialty pharmacy aggregation: These companies aggregate specialty pharmacy data on behalf of pharmaceutical manufacturers to monitor therapy launches. Specialty drug data is proprietary data of pharma companies that may link to other real-world data such as claims for a longitudinal view of the patient journey.
- Outcomes and patient journey: These companies enable outcome studies and patient journey research. Many of these companies build their solution on top of aggregated and linked claims data to enable a comprehensive view of patients as they move through the healthcare system.
- Commercial triggers: These companies provide triggers to commercial teams at pharmaceutical companies to alert them when a patient eligible for a specific therapy sees their provider, so sales teams can be deployed to the provider’s office for education on the relevant disease or therapy. Use of this solution is especially common in rare diseases since providers are often unaware of the rare disease and patients can be hard to diagnose.
- Digital marketing: These companies identify relevant patients and providers and then serve up digital advertising to educate them on a disease or therapy.
- Commercial analytics and insights: These companies provide aggregated data, usually claims or EHR, to help commercial teams with strategy and insights before and post-launch.
Clinical Solutions
- Trial recruitment: These companies use aggregated data—typically EHR, lab, and claims data—to identify the ideal clinical trial sites that have sizable populations of patients who would meet inclusion/exclusion criteria for a trial.
- Synthetic control arms: Synthetic control arms, also known as external control arms, are where real-world data is utilized as the control arm rather than enrolling actual patients into a control arm where a placebo or standard of care (SOC) is utilized.
This is popular in disease states where patient populations are
increasingly sub-stratified by biomarker status (e.g., oncology, rare disease) given the challenges of recruiting enough patients and ethical considerations of placing patients on placebo or standard of care (SOC). Companies that provide these solutions often have deep and highly curated clinical and genomic data to conduct synthetic control arms. Synthetic control arms lower trial costs, increase efficiency, and increase the speed of therapies to market.
Decentralized clinical trials: These companies provide technology infrastructure to collect data and support decentralized clinical trials (DCTs). DCTs are trials where patient communication and data collection has been decentralized away from a traditional clinical trial site. Instead, remote and digital technologies communicate with study participants and collect their data.
Chapter 3
Major Use Cases for Real-World Data in the Healthcare Industry
RWD brings a lot of value to different organizations in the healthcare industry, from life sciences to payers to public health agencies. In this section, we examine the key use cases for RWD.
Life Sciences
Biopharmaceutical organizations can use RWD across the entire drug development lifecycle, from preclinical to clinical development to commercial planning and post-marketing monitoring.
In the pre-clinical and clinical development settings, organizations can use RWD for:
- Biomarker selection (preclinical)
- External control arms (clinical development)
During the commercial phase, organizations can use RWD for:
- Market access strategy
- Sales force planning
- Monitoring launch effectiveness
- Patient recruitment for clinical trials
- Drug efficacy comparisons
- Evidence generation to support reimbursement
- Commercial targeting precision improvements
Payers
Payers can use RWD to:
- Assess and validate value-based contracts
- Improve risk adjustment calculations
- Develop a holistic, longitudinal view of applicants
Payers often use RWE to inform comparative efficacy in a real-world setting after a drug has launched to validate coverage. According to recent research, approximately 84.9 percent of pharmacy administrator respondents used RWE to make formulary decisions in oncology for comparative efficacy when clinical trial data wasn’t available.
Providers
At the individual patient level, real-world data and real-world evidence are often used for decisions on:
- Procedures
- Test orders
- Prescriptions for patients
RWD and RWE help healthcare providers create targeted treatment plans for patients. Within the larger hospital system, providers may use them to inform the creation of practice guidelines and the further adoption of these guidelines.
Data and Analytics
Healthcare is becoming more digital due to innovative technologies and the demand for real-world data. Organizations increasingly depend on RWD and RWE to develop analytics, machine learning, and artificial intelligence (AI) applications.
For example, RWD can:
- Train AI models and predict populations at risk of a particular disease
- Identify better treatments
- Understand patient prioritization
- Improve marketing precision
- Understand patient behavior
Clinical Research Networks
Clinical research enables the creation and approval of new therapies. For greater impact, clinical research networks can integrate supplemental RWD. RWD can enhance patient recruitment for clinical trials and provide a comprehensive view of the patient before, during, and after the trial.
RWD can prove especially beneficial for oncology studies. In addition to optimizing the study design, RWD can improve operational efficiency and provide greater insights to achieve better patient outcomes. Combining the clinical trial platform with RWD opens the opportunity for reduced healthcare costs and faster research in addition to better patient outcomes.
In government applications, RWD and RWE provide benefits for regulatory agencies such as the FDA. RWD and RWE can be employed in addition to randomized clinical trial evidence for post-market safety monitoring, adverse events, and marketing authorization.
Government
In 2008, the FDA adopted the Sentinel Initiative, which monitors product safety by integrating nationwide registry, claims, and EHR data. The FDA is the primary user of this system, but the system has also provided valuable information to researchers and biopharmaceutical companies.
In addition to monitoring side effects, RWD provides value in the regulatory approval process. For example, physicians can prescribe therapies off-label in the U.S., but the regulatory label of the therapy in question can determine coverage decisions and even how many patients will be able to receive treatment.
Due to RWD, the FDA is increasingly expanding regulatory labels to allow more patients to receive treatment. For example, a therapy only approved for women with ER+/HER2- breast cancer was also approved for use in men in 2019 because of patient outcome data reported in that patient population’s
Chapter 4
Challenges of Real-World Data
As important as RWD is, it also presents a number of challenges. A plethora of patient data exists, but before researchers can use and analyze it, the data must be de-identified for patient privacy. Additionally, because RWD must be fit-for-purpose, finding the right, relevant data for the applicable use case can prove challenging.
Navigation of the Expanding Data Landscape
The expanding availability of data is creating the demand for additional data, especially specialized data. As more data becomes available, it opens up the possibility of more comprehensive analyses.
The availability of more specialized data doesn’t mean it’s the right data. The right data has become increasingly difficult to find, and finding the right data partner presents a bottleneck to data-sharing.
It’s essential to provide partners in the healthcare ecosystem with the necessary data-sharing tools. Then, it’s essential to show data users where to find the necessary data for their particular use. Assessment tools that better facilitate data exploration, segmentation, and overlap comparison help with analyzing and sharing data.
Data-Sharing Technologies
As data continues to be generated, it also introduces new patient privacy risks, resulting in demands for data that protects patient privacy while maintaining transparency and accountability. The key? An ecosystem of technology companies that allow data management, governance, and data application.
Data-sharing technologies are an unmet need in real-world data. Ideally, every data partner should have control over their records as well as confidence in the integrity of their data. Data providers and users each want something different. Providers want to keep their competitive advantage and keep data independent of their competitors and peers, while users want to easily analyze data without being tied to a specific provider. Both providers and users want transparency.
The solution is a trusted third-party data enclave that doesn’t buy or sell data and has security and privacy as the highest priorities. Collaboration is key to seamless partnership across the RWD ecosystem.
Data Standardization and Quality
RWD is often incomplete and non-standard. Data is collected in many different formats. Many standard data models exist, but they apply to different types of data. This results in data recipients spending many resources to standardize the datasets before they can be analyzed.
Real-world data is often incomplete, which affects the accuracy later on because recipients can’t know that the data reflects the entire patient journey. It’s unclear if the patient outcomes didn’t occur at all, or if the data simply didn’t capture them.
However, as more and more data is generated, the opportunity exists for companies to collaborate on data cleaning, harmonization, and imputation.

Unstructured Data
Data is generally structured data, which has a standardized format and follows an order. As the health data ecosystem expands, however, much of the data coming online is unstructured data.
Unstructured data has enormous potential. For example, clinician notes may better describe a patient’s medical history or quality of life. Genomic sequencing may provide better insights into the benefits of precision medicine.
Acquiring the right data proves challenging because of the difficulty in determining which information is relevant to the use case. Additionally, deriving insights from unstructured data can prove difficult because it requires complex programs to process. An unmet need exists for technologies with the capability to apply data inputs to unstructured data.
Patient Privacy and Data Utility
Patient privacy is essential to the use of RWD, but ensuring data utility may prove challenging. The key? Finding a balance between not compromising privacy and maintaining utility. The choice between de-identifying patient data through safe harbor versus expert determination depends on research objectives, patient privacy, and business needs.
Expert determination is generally ideal when striking a balance between utility and privacy because of the flexibility it offers. For example, experts may recommend redaction, removal, or modification of identifying data elements in the data set, whereas safe harbor removes a set of 18 predetermined values.
De-identification’s primary disadvantage is its time-intensive nature. It can often take months to complete, but the right expertise and technology can greatly accelerate the process while ensuring the data is fit-for-purpose
Resources
Chapter 5
The Future of Real-World Data
Expanding technology and changes in regulations provide plenty of opportunities for the expansion of RWD and quicker de-identification for patient privacy.
New Real-World Data Use Cases
A future use case of real-world health data involves advancements in genomics testing. Radiographic features can determine the genomics of a tumor in oncology. Clinicians can use this insight in real time to correctly diagnose and start the patient on the appropriate treatment.
This data advancement improves precision medicine and helps deliver more meaningful insights. However, the need to maintain patient privacy remains. Using advanced methods such as deploying synthetic data could meet patient privacy needs. Genomics data doesn’t just benefit oncology. It could lead to advancements in treatments for rare diseases and other areas of health as well.
New Types of Data Available
New policies, scientific discoveries, and the advancement of healthcare technologies have increased the variety and volume of available health data.
Genomic sequencing combined with the increase of biomarker-specific drugs has led to increased genetic testing. Wearable technology and health apps collect data about users’ heart rates, steps taken, geo-locations, and more. Air quality, climate, and weather are even becoming more influential data points. For example, weather can predict the severity of an allergy season, pandemic spread, and even flu prevalence.
All of these factors have led to the rapid growth of RWD. In fact, healthcare data is growing faster than data from any other industry.

Recent Trends with RWD
The increase in health data has brought about the increased use of RWD and RWE in healthcare. Let’s take a look at some recent health data trends.
Demographics and SDOH
Government, health systems, and life science researchers are striving to understand the reasons for disparities and worse patient outcomes in vulnerable populations.
Growing Use of Genomics Data
Data science, machine learning, and artificial intelligence have empowered scientists to tackle questions once left unanswered—such as whether genetic alterations cause health conditions and whether lifestyle or demographic considerations are relevant to a disease. The ability to derive meaningful insights and patterns from genetic sequencing data and other health data is opening new paths for future progress in the field.
Growing Acceptance of Real-World Data
FDA real-world data guidance documents continue to be released, pointing to the growing acceptance of RWD for regulatory decision-making.
Disease Registries
Registries capture specific variables related to various conditions, which are then validated to a higher standard than EHR data. This makes it ideal for clinical trials and preparing data for regulatory submissions.
Patient-Reported Outcomes Data
Data on patient outcomes is better in informing the patient experience.
Data on COVID-19 Vaccinations and Variants
Since the beginning of the pandemic, information on long COVID-19 and the lingering impacts of COVID-19 on a person’s health has been in high demand.
Rare Disease Data
Patients with rare conditions often see numerous specialists and receive specialty drugs, so the patient journey may be fragmented and data spread amongst many partners.
Linking Proprietary Data with Real-World Data
To gain a comprehensive understanding of patient health, a recent trend in healthcare is to link proprietary pharma company data—such as clinical trials, aggregated specialty drug data, and disease or device registries—with real-world data offered by commercial data provider
Tokenization
Tokenizing every clinical trial and health economics and outcomes research study can enable expedited partner identification for multiple studies at the same time.
Quod quidem nobis non saepe contingit. Age, inquies, ista parva sunt. Omnia contraria, quos etiam insanos esse vultis. Quid ergo attinet gloriose loqui, nisi constanter loquare? Habent enim et bene longam et satis litigiosam disputationem. An hoc usque quaque, aliter in vita? Nihil opus est exemplis hoc facere longius. Duo enim genera quae erant, fecit tria. Atque haec coniunctio confusioque virtutum tamen a philosophis ratione quadam distinguitur. Quod non faceret, si in voluptate summum bonum poneret.
Download a PDF version of this guide by filling out this form
Simply fill out this form to receive a PDF version of our guide.