No items found.

Data Science for Product Development

Author
Publish Date
Read Time
Datavant
August 8, 2022

At Datavant, our goal is to connect the world’s healthcare data by building tools to enable healthcare data to be shared in a way that is secure, compliant, and so that the data retains value. The data science team at Datavant is responsible for meaningful pieces of product development, and we have strived to build our org structure accordingly.

To give an example, one product the Data Science team works on is Match, our tool for reconciling patient IDs underlying disparate patient records. A critical component of the ID assignment process is the machine learning model that determines whether a pair of similar records — for example a pair of records with the same last name and date of birth but different zip codes — belong to the same individual. We have grappled with how to divide ownership between the development of the model and its deployment to production, and more broadly how to structure our Data Science team within our broader product development function.

What we’re solving for

We felt strongly about having a model development process that is tightly integrated with production systems in order to circumvent a slew of challenges that can otherwise arise. The most tangible hurdle is the alignment of the technical requirements of the model and the production environment, which limits velocity if not managed properly. A less visible but equally impactful risk is the potential that is left unrealized when production requirements are treated as black box constraints rather than frameworks with a degree of malleability tethered to certain nonnegotiable principles. Moreover, a lack of synergy between model development and deployment can also reverberate beyond the walls of engineering, as business and product stakeholders are liable to receive disparate timelines from different groups if there is not a single technical owner.

Although none of these issues are insurmountable with the right coordination and culture, at Datavant we have actively sought to mitigate them through our org structure. More precisely, we aimed to optimize development efficiency, visibility within and outside of engineering, and personal growth.

Our approach

To do this, we built a custom objective function from these three variables and modeled these variables as stochastic processes with initial conditions determined by various team alignments. Just kidding. To do this, we put our data science team within our broader engineering org; this means that data scientists go through the same technical onboarding as software engineers — setting up environments, access to resources, etc., which helps remove collaboration bottlenecks.

Going one step further, data scientists are part of our engineering product pod structure. Product pods are led by a product manager and consist of a mix of data scientists and software engineers, depending on the product needs (there are several pods without data scientists). Returning to the example of Datavant Match, this product is owned by Datavant’s “Identity” pod, which is responsible for solutions around patient identities. The pod contains a mix of data scientists and software engineers, but operates as a single group. For example, there is a single roadmap that incorporates both model development and deployment, and daily standups and weekly open-ended discussions include the full pod as a single team.

Trade offs from our approach

We have found several tangible advantages from this structural approach.

More efficient development

  • Less overhead coordinating model development and deployment.
  • Empowered and smarter decision making as a result of more context. For example, we want to avoid a data scientist feeling “we can’t do X because engineering won’t support it”.
  • Testing: there is a shared understanding of which functionality should be tested within the scope of model development, and which should be part of production testing.
  • There are no surprises when it comes to dependencies.
  • We can design an optimal process for model productionization without having to compromise functionality for the sake of a simpler framework.

Joint sequencing, prioritization, and accountability

  • We can sequence our work so that infrastructure necessary to support models is not a bottleneck. For example, we can build a Python microservice to support more sophisticated model functionality concurrently with the development of such functionality.
  • There is a single owner of prioritization, leading to more efficient planning and clearer visibility.
  • If certain areas of the roadmap are behind schedule, we can allocate resources from a broader pool since there is shared context between software engineers and data scientists.

Opportunity for growth

  • Data scientists and software engineers are exposed to and have the opportunity to take on a broader range of technical work.

There will be tradeoffs to any organizational structure, and there are drawbacks with our model, including:

Agile development for data science

  • Biweekly sprints with tightly scoped tickets is not the most natural framework to manage some of the exploration and experimentation required for model development, both for data scientists and stakeholders. We have sought to address this with clear designations and documentation around the type of work that each ticket entails.

Data science identity

  • With our horizontal team structure, there is less natural interaction within the data science team. We have mitigated this with weekly, cross-pod check-ins for the data science team, in addition to biweekly meetings for team members to present on an aspect of what they’re working on, or recent developments in data science.

Redundant DS tooling

  • Since data scientists are building tools in support of a particular product area, there is more possibility to build out capabilities with overlapping functionality rather than a single, cleaner data science toolbox built for multiple use cases.

These downsides would be more naturally mitigated if our Data Science team were structured as an atomic unit that played a consultative role to other functions in the organization rather than our horizontal model of team members embedded within these functions.

We can represent some of these tradeoffs graphically:

Broader context

The responsibility of data science teams can vary widely across companies, ranging from forecasting revenue, to building dashboards, to policy research, and we certainly wouldn’t expect our org structure to work in any situation.

The modularity of our product pods has also enabled this embedded model to succeed. Each engineering product pod has end-to-end ownership of the product, including implementation of company-level compliance and security measures (erring on the side of ownership is consistent with our broader cultural value of more responsibility, fewer rules). Some of the utility of our embedded model would be diminished if these constraints were managed entirely by a separate team.

As our engineering team scales from 70 to several hundred in the coming years, we will need to continue to evaluate our approach so the data science team can continue to meet the needs of the business.

Spotlight on AnalyticsIQ: Privacy Leadership in State De-Identification

AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).

AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.

"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ. 

“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."

Building Trust in Privacy-Preserving Data Ecosystems

As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.

As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health. 

The Power of SDOH Data with Providers and Payers to Close Gaps in Care

Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.

Payers Deploy Targeted Care Using SDOH Data

Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:

  • Tailored Member Programs: Payers develop specialized initiatives like nutrition delivery services and transportation to and from medical appointments.
  • Identifying Care Gaps: SDOH data helps payers identify gaps in care for underserved communities, enabling strategic in-home assessments and interventions.
  • Future Risk Adjustment Models: The Centers for Medicare & Medicaid Services (CMS) plans to incorporate SDOH-related Z codes into risk adjustment models, recognizing the significance of SDOH data in assessing healthcare needs.

Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.

Example: CDPHP supports physical and mental wellbeing with non-medical assistance

Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:

  • Social isolation
  • Loneliness
  • Transportation barriers
  • Gaps in care

By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.

Providers Optimize Value-Based Care Using SDOH Data

Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:

  • Onboard Patients Into Care Programs: Providers use SDOH data to identify patients who require additional support and connect them with appropriate resources.
  • Stratify Patients by Risk: SDOH data combined with clinical information identifies high-risk patients, enabling targeted interventions and resource allocation.
  • Manage Transition of Care: SDOH data informs post-discharge plans, considering social factors to support smoother transitions and reduce readmissions.

By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.

While accessing SDOH data offers significant advantages, challenges can arise from:

  • Lack of Interoperability and Uniformity: Data exists in fragmented sources like electronic health records (EHRs), public health databases, social service systems, and proprietary databases. Integrating and securing data while ensuring data integrity and confidentiality can be complex, resource-intensive and risky.
  • Lag in Payer Claims Data: Payers can take weeks or months to release claims data. This delays informed decision-making, care improvement, analysis, and performance evaluation.
  • Incomplete Data Sets in Health Information Exchanges (HIEs): Not all healthcare providers or organizations participate in HIEs. This reduces the available data pool. Moreover, varying data sharing policies result in data gaps or inconsistencies.

To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.

SDOH data holds immense potential in transforming healthcare and addressing health disparities. 

With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.

Careers

Join us on our quest to connect the world's health data

Join us

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us