Health Data & Analytics

Life Sciences

Government & nonprofits

Blog

Real-world data

From Glass Slides to Pixels: The Power of Digital Pathology Data

Datavant

November 16, 2023

min

Table of Contents

Ross Cantor, MT(ASCP), Vice President, Data Strategy & Partnerships at Proscia

In our Ecosystem Explorer Series, we interview leaders from partner organizations who are improving access to real-world data. Today’s interview is with Ross Cantor, MT(ASCP), Vice President, Data Strategy & Partnerships at Proscia.

At Proscia, Ross leverages his extensive clinical and data expertise to foster collaborations with top laboratories and biopharma organizations, enhancing access to high-quality real-world data and steering scientific breakthroughs. Over his 18-year journey in the healthcare industry, Ross has played pivotal roles, commencing at the Hospital of the University of Pennsylvania, making significant contributions to Genzyme Diagnostics, and recently serving as the VP of Strategy & Business Development at Lifepoint Informatics. His consistent commitment to prioritizing digital health and promoting interoperability across healthcare systems, provider networks, and clinical platforms underscores his dedication to driving innovation in the field.

Proscia is accelerating pathology’s transformation to digital and using data to reshape our understanding of diseases like cancer. Its Concentriq enterprise pathology platform and powerful AI applications are unlocking new insights that accelerate R&D, better inform treatment decisions, and advance the quest for precision medicine. 14 of the top 20 pharmaceutical companies and leading diagnostic laboratories rely on Proscia’s software each day.

Introduction to pathology data and why it’s important

Ross, welcome to the Ecosystem Explorer interview series! To start off, can you give us an overview of what pathology data is, how it is used, and why it is important to healthcare researchers?

Certainly. Keeping in mind that pathology is the study of disease, pathology data has long played a vital role in the diagnosis of cancer and other diseases, informing up to 70% of clinical decisions. With the rise of digitization and precision medicine, pathology data also increasingly serves as a bridge between clinical practice and drug discovery.

Pathology data itself has traditionally consisted of tissue biopsies affixed to glass slides, which pathologists, who are trained medical doctors, use to make a diagnosis. We also like to think of the unstructured reports where pathologists interpret and characterize tissue to inform clinicians creating treatment plans as pathology data.

Pathology’s shift to digital has generated a new real-world data asset, whole slide images. These are high-resolution images of tissue biopsies that are emerging as an incredibly rich method for understanding a patient’s disease. Each of the 1 billion whole slide images created every year is made up of over 1 billion pixels (~1GB+) and contain so much more information than what the eye can see.

The central role that pathology data plays in understanding a patient’s condition, as well as the wealth of information that whole slide images include, are what make it so valuable to R&D teams today.

What inspired the founding of Proscia, and how has the company’s focus evolved since its inception?

At Proscia, we believe that pathology deserves great technology. Pathologists are on the front lines of fighting some of humanity’s biggest challenges, like cancer. Despite the impact that software has had on almost every healthcare domain — and the world more generally — it had not begun making its mark on pathology until about five years ago. The practice had remained largely unchanged in its 150-year history, still depending on the microscope and glass slides.

Proscia was founded to deliver that great technology and use the data that it generates to reshape the way we understand disease. We initially began driving pathology’s digital transformation across theLife Sciences. As adoption took off on the clinical side, in part due to the pandemic, we also started growing our base of diagnostic laboratories. 14 of the top 20 pharmaceutical companies and leading diagnostic laboratories now rely on our Concentriq enterprise pathology platform to conduct routine operations.

But this is just part of our story. In working to reshape our understanding of disease, we are helping pathology to play an even bigger role in the data-driven precision medicine paradigm. We have emerged as more than a software provider and work closely with our customers to elevate the role of pathology in the 21st century. This is where our focus on real-world data and our partnership with Datavant become especially exciting.

How do you source pathology data?

We work with a network of leading academic medical centers and commercial laboratories at the forefront of pathology’s digital transformation. With them, we are assembling one of the largest collections of diverse, labeled whole slide images of solid tumor tissue biopsies and other rare diseases. This includes images from over 2M patients and counting.

These whole slide images are paired with corresponding multi-modal data, including pathology reports, biomarker results, next-generation sequencing (NGS) and molecular testing, and other clinically-relevant laboratory data captured as part of the standard of care.

Let’s talk about how pathology data fits with other types of clinical and real-world data. As you know, Datavant enables organizations to link disparate datasets at the patient level — we often see organizations linking their 1st party data to 3rd party claims and EHR data, for example, to fill in gaps in patient records and better understand disease progression. Where does pathology data fit in, and in what circumstances would researchers get more value by linking their data with whole slide images and pathology data?

Whole slide images are among the best representatives of disease. They capture the cellular tissue-level details and patterns that determine diagnosis. The power of this data is amplified when it is incorporated as part of a multi-modal approach.

The impact of such an approach is especially clear when it comes to identifying new precision therapeutics. Consider the drug pembrolizumab, commonly known as Keytruda. A landmark 2016 study found that patients with Non-Small Cell Lung Cancer (NSCLC) who expressed PD-L1 on over 50% of tumor cells demonstrated longer survival following treatment with pembrolizumab compared to platinum-based chemotherapy. Pathology data factored heavily into this assessment of PD-L1 expression, and now, pembrolizumab is the preferred treatment for these patients.

The development of precision therapeutics is just one of the many applications of pathology data in a multi-modal context. Scientists can now generate a complete longitudinal record to assist with post-market surveillance. Other use cases include novel biomarker identification and validation, patient stratification for clinical trials, and indication expansion.

Those are exciting applications, and it’s great to hear pathology data is already having an impact on clinical R&D and patient care. Are there any other success stories you’d like to share that highlight the impact of connecting pathology data to real-world data?

A top pharmaceutical company is using multi-modal data, including whole slide images linked to other clinically-relevant laboratory data, that we curated to develop an AI application. The application aims to enable scientists to predict a known biomarker for lung cancer by identifying patterns, or signals, from within the pixels contained in the whole slide images.

This biomarker is critical to ensuring effective, targeted treatment for patients; however, it is currently only identified through expensive, time-consuming molecular tests. In turn, only some patients have access to it. Waiting for results can delay the start of treatment, which can have an especially big effect on outcomes for patients with advanced disease.

As a long-term goal, the pharmaceutical company hopes to pave the way for the development of more personalized treatments by identifying the biomarker and then analyzing the patient’s response to therapy.

The constraints and challenges with pathology data

Let discuss some of the challenges associated with pathology data. Are there difficulties with collecting and managing large volumes of pathology data? You mentioned that whole slide images are quite large.

When it comes to collecting data, pathology reports consist of unstructured data, and the corresponding whole slide images are not organized in a way that makes them easy to search. (On a related note, this becomes especially challenging for AI development and training.) Whole slide images also exist in many different file formats and live in many systems. To add to the complexity, diagnostic laboratories have a separate laboratory information system that contains a lot of the clinically-relevant data that corresponds with whole slide imaging data, and this is often siloed as well.

While file format compatibility can make managing data in one central system difficult, the biggest issue with data management is almost always the sheer size of whole slide images. They are massive — up to 1GB each — which is 2 to 10 times larger than an average radiology image. Laboratories and R&D teams are not used to working with such large files, and their systems may not scale to support them.

We built our Concentriq enterprise pathology platform to overcome many of these challenges by unifying teams, data, and applications. It serves as the system of record for whole slide images in a variety of file formats and is designed to integrate with the laboratory information system to seamlessly incorporate other pathology data in one central location. Concentriq is also incredibly scalable to account for massive volumes of data.

In turn, diagnostic laboratories can manage all of their data in a way that makes it easy to collect, R&D teams can centralize all of their data in a way that makes it easy to incorporate into studies. R&D teams can also leverage Concentriq’s developer platform to develop their own AI models against their very large data sets and deploy them into routine operations.

How do you address concerns about data privacy and security, especially given that pathology data is unstructured?

Data privacy and security are top priorities for us, and we take careful measures to address them. Our Concentriq platform is HIPAA-compliant. All PHI and PII stays in the laboratory’s environment, where it is de-identified before it is made accessible to the research organization. Our team also carries out a rigorous validation process to ensure accurate data.

Datavant’s tokenization and de-identification process adds another layer of protection to all that we do, helping to further regulatory compliance, patent anonymity, and privacy. This is just one of the many synergies of our partnership.

How does Proscia process and curate pathology data to make it useful for researchers and organizations?

After data is de-identified, our team uses AI and machine learning to extract relevant details from unstructured pathology reports and other data types, ensuring that we follow FAIR principles to assemble large, complex data cohorts quickly. Data is further curated by pathologists who annotate whole slide images to identify those that are most relevant in a particular case and to illustrate regions of interest. These steps enable us to make fully distilled, clean, and ready-to-use data available to scientists on Concentriq for both their R&D activities and AI model development.

Future opportunities for pathology data

Now that we’ve covered the challenges, let’s look at the opportunities with pathology data. How do you think digital pathology is improving the field of medicine, particularly in cancer research and diagnostics?

On the research side, we’ve covered many of the ways that digital pathology and pathology data are advancing precision medicine. It’s also worth noting that they are making an impact on day-to-day operations. A survey of major pharmaceutical companies and contract research organizations conducted earlier this year found that 70% of respondents had already invested in digital pathology. 83% of them adopted it to improve collaboration, as sharing whole slide images is much more efficient than transporting glass slides. In turn, they can build networks of internal and external collaborators around the world to best carry out their studies.

On the diagnostic side, digital pathology is helping to drive meaningful efficiency gains to overcome the growing shortage of pathologists and rising cancer burden. It is also resulting in improved accuracy; AI applications are able to unlock new, clinically impactful insights to aid pathologists in making a diagnosis.

What’s perhaps most exciting is that we are seeing a flywheel effect as innovations from theLife Sciences increasingly make their way into diagnostic laboratories. These innovations, like PD-L1 detection algorithms, are helping to accelerate adoption among diagnostic laboratories, generating more real-world data to fuel research breakthroughs.

What do you see as the most exciting opportunities for researchers and organizations working with pathology data in the coming years?

The consumerization of AI, and specifically advanced techniques like generative AI, are expanding the potential of pathology data unlike ever before and lowering the bar for entry for developing AI applications. This is driving unprecedented demand for both high-quality data and data scientists who can tap into its full potential. In fact, we’re seeing some pharmaceutical companies rapidly expand their data science teams to build generative AI solutions for a wide range of use cases, including identifying and validating new biomarkers, assessing target compounds, and predicting drug toxicity.

It goes without saying that we’re in the first inning of realizing the promise of generative AI, and the opportunities for pathology data will only become even more impactful.

Are there any other innovations in this space that you are particularly excited about?

There are so many exciting innovations that we could highlight. One specifically, vision transformers, is at the center of so much AI development in pathology that it’s especially worth highlighting. A vision transformer is a specific type of AI model that processes images. Meta’s Segment Anything Model (SAM), which shows promise for accelerating pathology image segmentation, is among the best-known examples and has led to a significant wave of AI R&D. Innovations like SAM mean that it will be easier than ever before to gather data for projects that require pixel annotations. SAM has made such an impact, in part, because it’s also a foundation model that can be used for many downstream tasks.

Beyond SAM, other foundation models are driving the creation of new pathology-related applications. Existing vision transformers are already making it faster and easier to build new AI models based solely on whole slide images.

These vision transformers are also bringing digital pathology into the world of multi-modality. Vision-language models like CONCH are enabling a wide range of previously unavailable functionalities like searching pathology images via text and automatically generating image captions. This interaction with images through text is just the beginning; it’s not a far stretch to envision combining pathology images and text with other modalities to reshape how we interact with and leverage the relationships among data modalities.

Such a model already exists in the natural image space (e.g. ImageBind), and pathology is now developing on its heels. Future foundation models may similarly incorporate other data modalities and flexibly associate -omics data with histological patterns. All of these examples point to one clear trend — more and more data is becoming utilizable to fuel AI development, unlocking insights that bring personalized therapies to patients faster and advance precision medicine.

Ross, thank you again for the deep dive on pathology data and sharing your excitement for the advancements in this space. Do you have any recommendations for our readers if they want to learn more?

Thanks for the opportunity! Here are some helpful links:

Visit the Digital Pathology Association for helpful resources on all things digital pathology.
Learn more about Proscia’s real-world data offerings.
Watch our chief strategy officer present on “Maximizing the Value of Pathology Data in theLife Sciences: Mastering Digitization, Data Curation, and AI-Driven Innovation,” at a recent Amazon Web Services event.
Read Signify Research’s perspective on how technology vendors can advance the real-world data opportunity by improving access to pathology data.

This interview is part of our Ecosystem Explorer Series, in which we interview leaders from partner organizations who are improving access to health data. Contact us if you’re interested in participating in this series.

Spotlight on AnalyticsIQ: Privacy Leadership in State De-Identification

AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).

AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.

"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ.

“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."

Building Trust in Privacy-Preserving Data Ecosystems

As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.

As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health.

The Power of SDOH Data with Providers and Payers to Close Gaps in Care

Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.

Payers Deploy Targeted Care Using SDOH Data

Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:

Tailored Member Programs: Payers develop specialized initiatives like nutrition delivery services and transportation to and from medical appointments.
Identifying Care Gaps: SDOH data helps payers identify gaps in care for underserved communities, enabling strategic in-home assessments and interventions.
Future Risk Adjustment Models: The Centers for Medicare & Medicaid Services (CMS) plans to incorporate SDOH-related Z codes into risk adjustment models, recognizing the significance of SDOH data in assessing healthcare needs.

Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.

Example: CDPHP supports physical and mental wellbeing with non-medical assistance

Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:

Social isolation
Loneliness
Transportation barriers
Gaps in care

By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.

Providers Optimize Value-Based Care Using SDOH Data

Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:

Onboard Patients Into Care Programs: Providers use SDOH data to identify patients who require additional support and connect them with appropriate resources.
Stratify Patients by Risk: SDOH data combined with clinical information identifies high-risk patients, enabling targeted interventions and resource allocation.
Manage Transition of Care: SDOH data informs post-discharge plans, considering social factors to support smoother transitions and reduce readmissions.

By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.

While accessing SDOH data offers significant advantages, challenges can arise from:

Lack of Interoperability and Uniformity: Data exists in fragmented sources like electronic health records (EHRs), public health databases, social service systems, and proprietary databases. Integrating and securing data while ensuring data integrity and confidentiality can be complex, resource-intensive and risky.
Lag in Payer Claims Data: Payers can take weeks or months to release claims data. This delays informed decision-making, care improvement, analysis, and performance evaluation.
Incomplete Data Sets in Health Information Exchanges (HIEs): Not all healthcare providers or organizations participate in HIEs. This reduces the available data pool. Moreover, varying data sharing policies result in data gaps or inconsistencies.

To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.

SDOH data holds immense potential in transforming healthcare and addressing health disparities.

With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.