Life Sciences

Further Evidence that COVID-19 Disproportionately Impacts African American, Hispanic, and Low-Income Populations

Author
Publish Date
Read Time
June 18, 2020
5 Minutes
Table of Contents

Over the past several months, studies have indicated that COVID-19 has disproportionately impacted African American and Hispanic populations, as well as socioeconomically disadvantaged groups. A recent Sutter Health study showed that African Americans diagnosed with COVID-19 were 2.7 times more likely to be hospitalized compared to white patients [1]. Using data from the COVID Racial Tracker, an NPR study has also indicated that Hispanics are contracting COVID-19 at rates higher than would be expected given their population share, yet it is African Americans who are dying of COVID-19 at nearly twice the expected rate [2]. The racial disparities in disease and mortality outcomes are incompletely understood and warrant further study, as immediate efforts are needed to narrow this gap.

One of the hurdles to understanding these COVID-19 outcome disparities has been a lack of timely and complete data. The federal government has required that as of August 1st, diagnostic labs must report data on race and ethnicity to state and local health authorities. This requirement will increase the quality and availability of data but does not solve the immediate need for data to understand racial disparities in outcomes. In the interim, researchers can utilize already available real-world data to examine racial inequalities. Using the COVID-19 Research Database, we have confirmed previous findings posted by the NPR and Sutter Health teams. We leveraged three de-identified or limited data sets, which were connected using privacy-preserving record linkage to combine data elements necessary to understand disease impact by race and income:

  • Mortality Data, which contains curated data from government agencies, online newspapers, funeral homes, online memorials, direct submissions, and other sources that collectively cover over 80% of annual U.S. deaths.
  • Medical Claims Data, which contains patient demographics, diagnoses, procedures, service dates, and billing information (e.g., doctor, facility, and insurance carrier). The data set used contains > 3 billion claims on > 100 million unique patients over the last 7 years. The data set indicates whether a patient was diagnosed as COVID-19 positive by a physician. Source: Office Ally.
  • Consumer Data, which includes a variety of consumer attributes on >240 million individuals across >120 million households. The attributes are drawn from publicly available data, surveys, and custom algorithms informed by cognitive psychology. The attributes are organized in three primary categories: People (e.g., demographics, finance, credit, housing, jobs, lifestyle), Behaviors (e.g., interests, charitable giving, healthy lifestyle), and Predictors (e.g., motivator, travel, auto, in-market, and economic stats). Only demographic and income data was assessed in this study. Source: AnalyticsIQ.

Study Findings on the Joined Dataset

The key elements utilized in the study were (i) fact of death, which was pulled from the mortality data set, (ii) the physician diagnosis information from the medical claims data, and (iii) the racial and income attributes drawn from the consumer data. Overall, there were 27,037,294 patients present in both the claims and consumer dataset. We narrowed our analysis to claims in 2020, resulting in 5,731,415 patients. Of these, 10,673 individuals had a positive COVID-19 diagnosis. In later analysis, we refer to this cohort as COVID-19 cases. 629 individuals had a positive COVID-19 diagnosis and fact of death, which we will subsequently refer to as a COVID-19 death. As explained in further detail in methodology and limitations, although we refer to this group as COVID-19 deaths, knowing that a COVID-19 diagnosed patient died (the “fact of death”) does not conclusively tell us the cause of death.

Diagnosis and Death Distributions by Race

In the combined dataset, we observed similar trends to those identified in the Sutter Health and NPR studies. By comparing the baseline distribution of race in our dataset to the distributions of COVID-19 cases and COVID-19 deaths, we see that African Americans and Hispanics make up a larger than expected share of both cases and deaths.

Diagnosis and Death Distributions by Income

The effect of income on health outcomes has been the focus of a number of studies over the years [3]. Inferring causation is challenging, but potential factors that might lead to disparate health outcomes by income include access to care and preventative medicine, improved living conditions, and access to education on healthy behaviors. With regard to COVID-19 in particular, it has been shown that low income individuals are 10% more likely to have a serious chronic health condition, which can make COVID-19 up to 10 times as deadly [4]. Preventive measures, such as avoiding public transportation and working from home, are also far more accessible to the rich, reducing their risk of a coronavirus diagnosis [5].

We observe these same trends in our analysis. Those in the first quartile (<$49k) annual household income bracket show higher than expected representation in both cases and deaths, and those in higher income brackets show lower than expected representation, particularly in deaths.

Diagnosis and Death Distributions by Race and Income

Unfortunately, in the United States, there is a strong correlation between income level and race, particularly for African American and Hispanic populations. According to the US Census Bureau, these populations have significantly lower median income levels than the overall US population [6]. After looking at the race and income data independently, we hypothesized that disparate impacts of COVID-19 would be stronger among low-income populations in vulnerable groups with social determinant factors that negatively impact health [7,8,9].

By analyzing racial groups within each income quartile, we can observe how they interact. Unsurprisingly, disparities by race are amplified in the lower income cohort. However, even in the higher income group, we still see higher case rates among African Americans and Hispanics, and lower case rates among Caucasians. Interestingly, we don’t observe the same trend for COVID-19 deaths. It warrants further research to determine whether these results are replicable across datasets, or are a result of high variance from a small cohort of COVID-19 deaths.

Methodology and Limitations

Methodology

Our approach to this study had several key components:

  • Privacy-preserving linking: Data was joined with privacy-preserving linkage methods in a manner certified by a third-party expert as de-identified under HIPAA regulations. As part of that process, some elements which were present in the individual datasets were removed to maintain sufficient de-identification for the combined set. To preserve privacy, income levels were capped at $200K and age was capped at 80 (e.g., an 81 year old patient would be represented as 80).
  • Matching: When joining the data, patients were considered the same if their identifiers, which represented unique combinations of last name, first name Soundex, date of birth, and sex, were identical. Some cleaning and standardization is applied on these fields. When one of these identifiers mapped to multiple races, death records, income groups, or education levels, that identifier and corresponding data was removed. Such cases occurred in approximately 10% of the data.
  • COVID-19 Diagnosis and Fact of Death: Any claim with a U07.1 ICD10 code was considered a positive COVID-19 diagnosis. While the combined dataset includes the fact of death, it does not always contain the cause of death. Our analysis cannot conclude that a positive COVID-19 diagnosis and fact of death indicates a COVID-19 caused death.
  • Modeled Consumer Attributes: The annual household income and race fields from the consumer data are partially modeled through proprietary methods based on surveys, demographic data, and other variables. This is a common practice in consumer datasets, but in this context is a limitation on the data compared to the reported demographic information that the government will begin mandating in August.

Limitations

Representativeness of Sample

The joined data is not a representative sample of the U.S. population, which is why we compare the percentage of COVID-19 diagnoses and deaths to the baseline percentages in the underlying dataset rather than to the U.S. population. Each dataset only covers a subset of the U.S. population, and fluctuates across age, sex, geography, and other variables.

To sanity check the data used, we compared certain baseline percentages in our data with the data published by the CDC (conducted 2020/06/04). While we observed significant alignment, this does not imply representativeness.

Geographically, the medical claims data we used in this analysis has better coverage along the West Coast (CA, OR, WA), both by proportion of those states’ populations and proportion of our dataset. California alone makes up a fourth of our sample. Per our mortality source, the coverage in our most represented areas is approximately 75% of all deaths, and coverage may also vary across income levels.

Matching

When matching, patient information discrepancies across datasets (e.g., a date of birth in the consumer data mismatching the date of birth on a medical claim) lead to false negatives, and collisions in information, though unlikely, lead to false positives. These mismatches are not randomly distributed throughout the population. While it is difficult to verify matching accuracy due to the de-identified nature of the data, previous internal evaluations on comparable data sets suggest a precision of 95% and a recall of 90%.

Latency

Reporting latency on claims and death data also vary. Both may be received weeks after an encounter. For a particular service, it may take two months for 80% of claims to be processed and appear in the data. The mortality data, which covers over 80% of the population, takes approximately a month to capture 90% of the deaths that will eventually be included.

Additional Limitations

Omission from mortality data does not imply that an individual is alive, only that the fact of death was not captured in that dataset. We do not consider diagnoses outside of the U07.1 ICD10 code in the medical claims, and so, like mortality data, a lack of positive COVID-19 diagnosis in the medical claims data does not imply the individual does not have COVID-19.
Consumer data may have bias related to survey and modelling methods. Most of the consumer data we used was at least partially modelled.

Each of these factors imparts a bias into the dataset, and generalization of results would be inappropriate without replication of these findings and further analysis.

Next Steps

As next steps, we’re interested in diving more deeply into income and race conditional on location, age, sex, and comorbidities, as well as conducting more advanced survival analysis on these subgroups.

This joined dataset is available as part of the COVID-19 Research Database. We encourage researchers to submit proposals to the database to verify and expand upon our findings, as well as to conduct further investigation into both this topic and others.

Contributors

The COVID-19 Research Database is a pro-bono, public-private, cross-industry collaboration composed of institutions donating technology service, healthcare expertise, and limited and de-identified data. The database is a public-private consortium organized by Datavant, Health Care Cost Institute, Medidata, Mirador Analytics, Veradigm, Change Healthcare, Symphony Health, Snowflake and many others.

AnalyticsIQ is a leading predictive data and analytics innovator that leverages a blend of publicly available data and custom algorithms informed by cognitive psychology concepts to describe consumers across three areas – People, Behaviors, and Predictors. Headquartered in Atlanta and recently named one of Georgia’s Top 10 most innovative companies, AnalyticsIQ’s team of data analysts, scientists, and cognitive psychologists have over 100 years of collective analytical experience and expertise.

Office Ally is a full-service medical claims clearinghouse that provides numerous software products to empower both doctors and patients.

Citations

[1] https://www.healthaffairs.org/doi/full/10.1377/hlthaff.2020.00598

[2] https://www.npr.org/sections/health-shots/2020/05/30/865413079/what-do-coronavirus-racial-disparities-look-like-state-by-state

[3] https://www.npr.org/sections/health-shots/2019/06/28/736938334/the-gap-between-rich-and-poor-americans-health-is-widening,%20https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2690209/

[4] https://www.nytimes.com/2020/03/15/world/europe/coronavirus-inequality.html

[5] https://qz.com/1818862/coronavirus-prevention-is-far-more-accessible-for-the-rich/

[6] https://www.census.gov/content/dam/Census/library/visualizations/2018/demo/p60-263/figure1.pdf

[7] French DD, Wang A, Prager AJ, Margo CE. Association of the Robert Wood Johnson Foundations’ Social Determinants of Health and Medicare Ocular Hospitalizations: A Cross Sectional Data Analysis. Ophthalmol Ther .2019 Dec;8(4):611—622. doi: 10.1007/s40123—019—00220—1. Epub 2019 Nov 1.

[8] Wang A, Kho AN, French DD. Association of the Robert Wood Johnson Foundations’ Social Determinants of Health and Medicare Hospitalisations for Ischaemic Strokes: A Cross-Sectional Data Analysis. Open Heart. 2020 Jan 8;7(1):e001189. doi: 10.1136/openhrt-2019—001189. eCollection 2020.

[9] French DD, Behrens JJ, Jackson KL, Kho AN, Walunas TL, Evans CT, Mbagwu M, Margo CE, Bryar PJ. Payment Reform Needed to Address Health Disparities of Undiagnosed Diabetic Retinopathy in the City of Chicago. Ophthalmol Ther. 2017 Jun;6(1):123—131. doi:10.1007/s40123—016—0072—4. Epub 2016 Nov 24.

Further Evidence that COVID-19 Disproportionately Impacts African American, Hispanic, and Low-Income Populations was originally published on Medium, where people are continuing the conversation by highlighting and responding to this story.

Spotlight on AnalyticsIQ: Privacy Leadership in State De-Identification

AnalyticsIQ, a marketing data and analytics company, recently adopted Datavant’s state de-identification process to enhance the privacy of its SDOH datasets. By undergoing this privacy analysis prior to linking its data with other datasets, AnalyticsIQ has taken an extra step that could contribute to a more efficient Expert Determination (which is required when its data is linked with others in Datavant’s ecosystem).

AnalyticsIQ’s decision to adopt state de-identification standards underscores the importance of privacy in the data ecosystem. By addressing privacy challenges head-on, AnalyticsIQ and similar partners are poised to lead clinical research forward, providing datasets that are not only compliant with privacy requirements, but also ready for seamless integration into larger datasets.

"Stakeholders across the industry are seeking swift, secure access to high-quality, privacy-compliant SDOH data to drive efficiencies and improve patient outcomes,” says Christine Lee, head of health strategy and partnerships at AnalyticsIQ. 

“By collaborating with Datavant to proactively perform state de-identification and Expert Determination on our consumer dataset, we help minimize potentially time-consuming steps upfront and enable partners to leverage actionable insights when they need them most. This approach underscores our commitment to supporting healthcare innovation while upholding the highest standards of privacy and compliance."

Building Trust in Privacy-Preserving Data Ecosystems

As the regulatory landscape continues to evolve, Datavant’s state de-identification product offers an innovative tool for privacy officers and data custodians alike. By addressing both state-specific and HIPAA requirements, companies can stay ahead of regulatory demands and build trust across data partners and end-users. For life sciences organizations, this can lead to faster, more reliable access to the datasets they need to drive research and innovation while supporting high privacy standards.

As life sciences companies increasingly rely on SDOH data to drive insights, the need for privacy-preserving solutions grows. Data ecosystems like Datavant’s, which link real-world datasets while safeguarding privacy, are critical to driving innovation in healthcare. By integrating state de-identified SDOH data, life sciences can gain a more comprehensive view of patient populations, uncover social factors that impact health outcomes, and ultimately guide clinical research that improves health. 

The Power of SDOH Data with Providers and Payers to Close Gaps in Care

Both payers and providers are increasingly utilizing SDOH data to enhance care delivery and improve health equity. By incorporating SDOH data into their strategies, both groups aim to deliver more personalized care, address disparities, and better understand the social factors affecting patient outcomes.

Payers Deploy Targeted Care Using SDOH Data

Payers increasingly leverage SDOH data to meet health equity requirements and enhance care delivery:

  • Tailored Member Programs: Payers develop specialized initiatives like nutrition delivery services and transportation to and from medical appointments.
  • Identifying Care Gaps: SDOH data helps payers identify gaps in care for underserved communities, enabling strategic in-home assessments and interventions.
  • Future Risk Adjustment Models: The Centers for Medicare & Medicaid Services (CMS) plans to incorporate SDOH-related Z codes into risk adjustment models, recognizing the significance of SDOH data in assessing healthcare needs.

Payers’ consideration of SDOH underscores their commitment to improving health equity, delivering targeted care, and addressing disparities for vulnerable populations.

Example: CDPHP supports physical and mental wellbeing with non-medical assistance

Capital District Physicians’ Health Plan (CDPHP) incorporated SDOH, partnering with Papa, to combat loneliness and isolation in older adults, families, and other vulnerable populations. CDPHP aimed to address:

  • Social isolation
  • Loneliness
  • Transportation barriers
  • Gaps in care

By integrating SDOH data, CDPHP enhanced their services to deliver comprehensive care for its Medicare Advantage members.

Providers Optimize Value-Based Care Using SDOH Data

Value-based care organizations face challenges in fully understanding their patient panels. SDOH data significantly assists providers to address these challenges and improve patient care. Here are some examples of how:

  • Onboard Patients Into Care Programs: Providers use SDOH data to identify patients who require additional support and connect them with appropriate resources.
  • Stratify Patients by Risk: SDOH data combined with clinical information identifies high-risk patients, enabling targeted interventions and resource allocation.
  • Manage Transition of Care: SDOH data informs post-discharge plans, considering social factors to support smoother transitions and reduce readmissions.

By leveraging SDOH data, providers gain a more comprehensive understanding of their patient population, leading to more targeted and personalized care interventions.

While accessing SDOH data offers significant advantages, challenges can arise from:

  • Lack of Interoperability and Uniformity: Data exists in fragmented sources like electronic health records (EHRs), public health databases, social service systems, and proprietary databases. Integrating and securing data while ensuring data integrity and confidentiality can be complex, resource-intensive and risky.
  • Lag in Payer Claims Data: Payers can take weeks or months to release claims data. This delays informed decision-making, care improvement, analysis, and performance evaluation.
  • Incomplete Data Sets in Health Information Exchanges (HIEs): Not all healthcare providers or organizations participate in HIEs. This reduces the available data pool. Moreover, varying data sharing policies result in data gaps or inconsistencies.

To overcome these challenges, providers must have robust data integration strategies, standardization efforts, and access to health data ecosystems to ensure comprehensive and timely access to SDOH data.

SDOH data holds immense potential in transforming healthcare and addressing health disparities. 

With Datavant, healthcare organizations are securely accessing SDOH data, and further enhancing the efficiency of their datasets through state de-identification capabilities - empowering stakeholders across the industry to make data-driven decisions that drive care forward.

See all blogs

Achieve your boldest ambitions

Explore how Datavant can be your health data logistics partner.

Contact us