In our Ecosystem Explorer Series, we interview leaders from partner organizations who are improving access to real-world data. Today’s interview is with Jason LaBonte, CEO at Veritas Data Research.
Jason LaBonte is the chief executive officer of Veritas Data Research, where he is responsible for the overall management of the company and its operations. He is an executive with over 15 years of experience in leading healthcare information and technology companies, most recently as chief strategy officer at Datavant. Jason received his Ph.D. in virology from Harvard University, and his A.B. in molecular biology from Princeton University.
Founded by experts in the data analytics industry, Veritas Data Research uses cutting-edge technology and efficient workflow design to collect, curate, and distribute foundational reference datasets. Veritas makes critical information accessible to data and analytics teams across the healthcare vertical, as well as customers in the financial and insurance sectors.
Jason, welcome to the Ecosystem Explorer interview series! To start off, can you give us a quick overview of what mortality data is and why it’s important to researchers?
Mortality is a critical endpoint in health analytics — whether a patient survives their disease (or procedure), or has succumbed to it, should be one of the basic measures of treatment efficacy, public health policy, and protocol design. Unfortunately, unless a patient dies in a healthcare facility, this event is not well captured in the clinical datasets normally used in real-world data analytics, such as insurance claims or electronic health records.
Therefore, to determine the vital status of the patients in a study cohort, it is necessary to augment clinical real-world data (RWD) with a mortality dataset like Veritas’s Fact of Death offering. Very simply, this dataset has a record for every deceased individual in the United States that we can find, going back to 1935. For each record, we report who died (where each person is represented as a Datavant token set when deidentified), when they died, and where they died (at the zip code level).
By linking mortality data to clinical RWD, researchers can then determine which patients are alive or deceased, allowing them to build more accurate survival curves, measure mortality as an endpoint in health economics and outcomes research (HEOR) studies and in pragmatic trials, and better build synthetic control arms for use in interventional clinical trials.
And how is Veritas involved with mortality data? Tell us a little bit about your company and its mission.
Veritas was founded to make critical reference datasets much more accessible and, in so doing, to increase the utility of all clinical data. We believe that a dedicated focus on creating these datasets will result in higher quality, higher coverage data than what is often available as the “exhaust” from systems designed for other purposes. We also believe that vulnerable populations are often under-represented in the datasets used today, and part of our mission is to fill that data gap as well.
We started with mortality data because it is a vital endpoint that we felt was just too difficult to access. The data that was available to analysts had low coverage, a number of restrictions, and a lack of timeliness. Through Datavant, analysts could at least aggregate multiple mortality datasets to create something with coverage that was good enough to use, but they still had to do a lot of de-duplication and data cleaning.
At Veritas, we thought we could do a lot better with a focused effort. By sourcing, collating, and indexing mortality data from over 40,000 public, private, and government sources, Veritas has now built the most complete and timely mortality dataset on the market. And all of those records are delivered in a single dataset, so the user doesn’t need to do any aggregation or de-duplication work.
Intuitively, it feels it should be very easy in this day and age to find out if someone has died. Why is this so hard?
Every death is recorded by states in a death certificate, and those deaths are all aggregated at the CDC, so the data is out there. However, these government sources don’t allow access to individual-level records for commercial use cases. Even for research applications, these data sources are very hard to access — sometimes taking years to obtain. Unfortunately, even governmental agencies struggle to access this data for their work.
You mentioned the CDC aggregates mortality data — that’s a reference to the National Death Index (NDI), a centralized database of death record information compiled from state vital statistics offices. Could you talk more on the NDI’s constraints that would drive organizations to acquire other sources of mortality data?
Most use cases that are of interest to pharmaceutical companies, payers, and even providers are not allowed under the NDI’s charter. Of those that are allowed, we’ve been told the CDC prefers that the NDI data does not leave their systems, often requiring that a researcher’s data is sent to them for linkage to the NDI and analysis. With these constraints, most folks need to acquire mortality data outside of the NDI.
Let’s talk about the other sources of mortality data beyond the CDC. What are those, and are there any challenges associated with collecting and managing large volumes of mortality data?
Mortality data is available in a number of public places from obituaries to cemetery listings. However, these sources are numerous and fragmented, meaning it is a large effort to scour them all. Veritas, for example, examines ~40,000 sources across the United States to find mortality events.
Timely data is critical for many of the use cases we serve, so we need to find mortality events as fast as we can. That means that we need our collection processes to refresh our dataset every week, meaning we had to build a lot of automation in our data processing workflow.
And the data our system collects is raw and unstructured, so we have built an entire data extraction, cleaning, and standardizing workflow to take the mortality information we find and turn it into an analytics-ready dataset.
The curation process — turning raw data into structured, usable data — must be quite challenging, especially if you’re pulling from tens of thousands of sources. How do you approach curation to make mortality data useful for health organizations?
During our data curation process, we try to remove a lot of the work that researchers and our other customers would typically need to do. For instance, we work to standardize the data as much as we can using reference datasets. We remove special characters from first and last names, and then validate names against a names database to make sure only records with a real name are included in our deliverable. We validate locations against the USPS reference database and report out the standardized USPS value for city, state, and zip code.
And because we source data from so many different places, we will generally find a mortality record for the same person in multiple places. We have algorithms in place to de-duplicate those records, consolidating them into a single mortality record. Where we can, we use the multiple sources to fill gaps to create the most complete mortality record possible, and we can create a confidence score for each record in the process.
Can you share some of Veritas’ data sources?
Some of Veritas’s sources are online obituary announcements, funeral home notices, military & veterans cemetery listings, and the Social Security Administration’s Limited Access Death Master File (LADMF). We are continuously sourcing and adding incremental mortality data, and have been increasing our coverage rates every month.
It sounds like most of your data sources are open-source. Does mortality data have any unique challenges with data privacy and security?
Mortality data gathered from public sources is not considered protected health information (PHI), nor is it subject to consumer data regulations like GDPR or the California Consumer Privacy Act (CCPA). Instead, this form of mortality data would be categorized as personally identifiable information (PII). That said, our health customers in particular often need to link our mortality data with PHI, so we are well-versed in the process of deidentification and token-based linkage. In partnership with Datavant, our mortality data can be joined with any customer’s health data in a privacy-preserving manner.
We take data security very seriously. Our predominant workflow is that our data is delivered to the customer, who uses it within their environment. We support whatever method of file transfer they prefer, whether that is Secure File Transfer Protocol (SFTP) or data sharing within cloud providers like Snowflake or Databricks.
We’ve talked about the opportunities and challenges of mortality data. Now let’s look to the future. How do you believe greater access to mortality data will improve healthcare?
Having access to mortality data will allow researchers to better document long-term survival statistics for clinical and longitudinal research. They will be able to more accurately measure the efficacy of new drugs or treatment protocols in real-world settings. They will be able to better model and identify high-risk patient populations to be able to intervene earlier with preventative care. And because our mortality dataset has better representation of vulnerable populations, these analyses will be more accurate for traditionally underrepresented groups.
What do you see as the most exciting opportunities for researchers and organizations working with mortality data in the coming years?
We are excited to extend our mortality data to cover the cause of death, and potentially the social factors associated with death. With the addition of cause of death, researchers will be able to tease apart death events that are related to the condition they are studying, and those that should not be included (e.g. removing patients who die in a car accident from a cancer survival curve). With social data, researchers can augment their analyses of mortality outcomes with the non-clinical factors that should be part of a risk or outcomes assessment — what could be considered the “social determinants of death”.
Are there any other innovations in this space that you are particularly excited about?
We are excited by the innovations surrounding the use of RWD in clinical trial settings, including pragmatic (RWD-only) studies, building synthetic control arms for interventional trials, and long-term monitoring of trial patients. We think mortality data should be a key component of each of these efforts, and we’ve worked hard to build our data with maximum transparency and traceability to comply with FDA’s emerging real-world evidence (RWE) guidance around data provenance.
Jason, thanks very much for the interview! Final question: If our readers want to learn more about mortality data, do you have any recommended resources or links?
Absolutely! For research studies using mortality data, check out the COVID-19 Research Database. Additionally, here is a comprehensive overview of the Veritas Fact of Death Index.
For Datavant customers who want to learn more about Veritas’ mortality data, our data is tokenized and available for exploration on the Datavant Portal. Interested organizations can conduct an overlap with our data whenever they would like.
For more detailed questions, you can reach out to us directly at Sales@veritasdataresearch.com.
This interview is part of our Ecosystem Explorer Series, in which we interview leaders from partner organizations who are improving access to health data. Contact us if you’re interested in participating in this series.
AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).
AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.
"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.
“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."
As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.
As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.
Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.
Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:
Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.
Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:
By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.
Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:
By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.
While accessing SDOH data offers significant advantages, challenges can arise from:
To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.
With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.
Explore how Datavant can be your health data logistics partner.
Contact us