More than three years into the pandemic, our understanding of the extent of COVID-19’s impact on Black and brown communities remains incomplete due to a lack of federal data on race and ethnicity.

The Centers for Disease Control and Prevention has reported over 104 million COVID-19 cases in the United States, but only 66% of those cases include race or ethnicity information. For comparison, the same data misses only 2% of sex data.

Demographic data like race and ethnicity tells agencies and policymakers who’s vulnerable at different stages of the pandemic and helps them decide how and where to allocate resources like vaccines and testing. Researchers have been raising the alarms for years, and many are concerned that this can have serious implications in the years and even decades to come

“Although race and ethnicity for testing and cases have become more complete since reporting began, a significant portion of both testing and case data continue to lack racial and ethnic information,” a CDC spokesperson said in a statement. 

But why is it missing in the first place? 

RELATED: WHO says COVID emergency is over. So what does that mean?

Testing overload

It might come down to how the data was collected. In the early months of the pandemic, the surge of patients filling hospitals, clinics, and makeshift testing sites overwhelmed healthcare systems, making data collection on patients difficult to complete, let alone enforce.
Sarita Shah, an infectious disease epidemiologist at Emory University, says the main issue early on was that data on COVID-19 cases that health departments received largely came from labs that received COVID-19 testing samples since they were quicker and most consistent at reporting information to local health departments than clinics and hospitals. 

But unlike clinics and hospitals, labs don’t usually have much information about the patients they’re testing to pass on to health departments.

“The form does not typically include race and ethnicity since this is not crucial for the lab to conduct the test,” Shah told the Amsterdam News. They actually get very little information: name, DOB, address, and ordering physician.

This actually isn’t normally a problem for most diseases on an epidemic scale, Shah said. With other diseases like Ebola or tuberculosis, health departments would have more time to report on it, gathering data from multiple sources to fill in gaps. COVID-19 was different, she said.

“We didn’t really have that time,” said Shah. “And the systems that we have in place for data reporting for public health purposes are huge operations and they’re not very easy to modify, adapt, pivot. They’re certainly not fast enough for the pace that COVID-19 needed.” 

In March of 2020, New York City’s health department was trying to connect data from labs and hospital systems together. Celia Quinn, the deputy commissioner of disease control at the Department of Health and Mental Hygiene (DOHMH), said their best method for getting demographic data was actually to investigate each case, “which is a huge burden on us,” she said. “People don’t always want to talk to us.” 

The frantic rush in the first few months of the pandemic might explain early data gaps, but continuing trends of missing race and ethnicity data and rising trends in some states, including New York, make little sense.

“It is quite concerning to see that this remains a persistent barrier and maybe even a worsening barrier,” Shah said.

In forms we distrust

The CDC uses COVID-19 data to track outbreaks and trends across the country and to assess whether public health actions are working. While the agency mandated that state and local health departments report data on the virus’s spread and impact, each state had its own way of collecting that data. 

When it comes to turning over data, the CDC has no enforcement mechanism in place to make sure that health departments report this information. Testing forms asking for name, age, sex, race, and ethnicity often remained incomplete or partially complete. Doctors noted that from the start of the pandemic it was these forms required by the CDC that went unfilled by patients and providers.

“I think we were trying to put in place Band-Aids, I guess, on a system that couldn’t really handle the stress but needed to have this information,” Shah said. 

It’s not clear why these problems persisted in each state, but some researchers cite issues with community trust. Historical mistrust of medical institutions from Black and brown Americans contributed to hesitancy on sharing race and ethnicity with healthcare providers, contact tracers and healthcare systems collecting COVID-19 patient information.

As a result, the communities most impacted by the virus are usually the least likely to identify in health surveys or fill in the race and ethnicity questions. 

One study found that 28% of surveyed patients were uncomfortable providing their race and ethnicity to healthcare providers, while another points to greater reluctance among those who’ve experienced discrimination.

Emil Coman, a statistician and researcher for the Health Disparities Institute at the University of Connecticut, says that collection of racial and ethnicity data in the U.S. is becoming more important than in other cultures and countries. 

“Racial and ethnic minorities have inherited a set of burdens that makes the study of their health status compared to white U.S. residents somewhat more challenging,” he said. 

Coman understands the skepticism people feel. Personal information that is collected by agencies, like a person’s name, birth date, and address, can feel less sensitive to share than race and ethnicity. While data systems commonly collect information about us like name and address, race and ethnicity are often excluded or requested less frequently.

“That’s the issue,” he says. “Race is a very stable characteristic of a person, and once it’s recorded in one place it should be available.” 

To reduce missing data, it becomes important that public health and providers restore some of that trust by providing patients with the information they need to understand why those details are being asked of them in the first place. 

Quinn said the burden is on providers to collect that information accurately and to do it in a way that’s sensitive and encourages people to report instead of simply checking “not interested” or “not applicable.” 

“That’s a really high burden,” she added. 

Quinn said the city’s health department is working on improving the way that they get information so that they can get as complete a picture on the virus as they can. “That’s something that we’ve worked on with providers and with labs to make sure that we have the infrastructure to receive that information,” she said. 

The more passive the process is for getting this vital information, the more completeness we can start to see with race and ethnicity data, Quinn says. Testing systems at pharmacies like CVS or Walgreens make getting demographic data collection easier. 

“That’s one way that we kind of work with the provider and the testing side to make sure that that information is collected so that it can come to us,” she said. 

Filling in the gaps

Some researchers have tried to fill in the gaps. Chicago’s public health department teamed up with DePaul University and attempted to identify race and ethnicity in data that was previously unknown. A team at Emory University, including Shah, worked with Fulton County in Georgia to show fuller disparity by factoring for statistical biases. 

“I think the best way forward is to have the actual data collected… that’s our first choice,” Shah said. “In the absence of that, though, these are tools we can use to fill in some of those gaps.” 

It’s not perfect, she says, “but we do have statistical methods we can use.”

“I don’t think COVID is the [main] reason for missing race and ethnicity, I think the way it’s being collected in this country is flawed,” said  Ofer Harel, professor of statistics and associate dean for research at the University of Connecticut. He added that there are a number of models that statisticians use to answer questions from data, but many researchers without this background may not know how best to apply them.

Research about missing data has been going on since the 1970s, says Harel, who researches methods for handling incomplete data. As science evolves, we may be able to develop better methods. “We’re getting better, but we don’t know everything yet,” he says.

Some states are missing more race and ethnicity data than others. According to April 2022 data from the Kaiser Family Foundation, places with sizable Black and Hispanic/Latinx populations like Louisiana (22% cases with unknown race or ethnicity), District of Columbia (22% unknown) and Massachusetts (28% unknown) had less missing data than most. (As of April 2023, California has 18% cases with unknown race or ethnicity.) Other states like Maryland (39% unknown), Texas (50% unknown), New York (51% unknown) and Connecticut (69% unknown) saw an increase in missing data as the pandemic rolled on. 

Delaware had both. Delaware started off with a huge amount — 78% of missing race and ethnicity data – in July 2021. That number dropped to 39% almost a year later.

Delaware Health and Social Services said that the department issued an order in March of 2020 requiring labs reporting to the state to include race and ethnicity data in their results. They credited contact tracing for filling in the gaps and manually updated information when it was missing.

New York state’s percentage of missing race and ethnicity case data was dropping until around fall 2021 when missing data started to increase again. By April 2022, New York State’s Department of Health (NYSDOH), which depended on labs to report race and ethnicity, had missed 51% of race and ethnicity data. 

According to the DOHMH, New York City started out missing 60% of race and ethnicity data for cases, 17% for hospitalizations and 10% for deaths. The amount of missing race data for new COVID-19 cases is now down to 20% and Quinn credits this to contact tracing.

Quinn at DOHMH attributes this uptick in the fall of 2021 to when the city stopped doing test and trace in the spring of 2021. 

“We stopped calling people to ask them their demographics,” she said. “We never got 100% completion even on that demographic question because I think it was put at the end of the interview just to make sure that we got some of the more critical pieces of, you know, who were your contacts so that we could do the contact tracing piece of it.”

It was at that time that Quinn said the department started focusing their case investigations on cases that were in congregate settings or more like high-risk locations. 

Pandemic-related hospitalizations and deaths tend to be better collected because of how they’re documented. Race and ethnicity data is the least complete for cases and continues to get more difficult with at-home testing being more commonplace. The only people who get counted are those who are coming to a public health laboratory that are then going to end up in the reporting system, Shah says. 

What happens to data with COVID-19 emergency ending? 

The public health emergency ends this May and so will the way that state and local health departments monitor COVID-19 as the federal government won’t require reporting of testing data. Without a federal requirement, the ways that the virus is monitored by cities and states will likely vary even more.

Persistent gaps in data can lead to inequitable distribution of resources. As policymakers in New York City navigate decisions to transition the city “back to normal,” incomplete data will continue to inform those decisions. 

Quinn says that the demographic data for COVID-19, like the 80-some maladies the city health department tracks, helps the agency know what’s happening with the disease, whether the disease requires an emergency response, who is being most impacted, and how they communicate with communities most impacted to keep them safe. 

Dr. Coman of the Health Disparities Institute said that governments need to design better systems that talk to each other and surveys that are uniform and represent people’s identifiers well. Where data like race and ethnicity are unknown, he says that doing a sensitivity analysis, a statistical approach that measures how big a difference it would have made if you had complete data, would help health departments analyzing data convey a greater sense of scope for the burden on people of different races and ethnicities. 

Coman said in an attempt to be broad and general, federal demographic survey questions miss out on the complexity and nuances of who we are and our lived experience. “Trying to create categories that are very general and that apply everywhere equally gets you into this trouble,” he said.

There might be some federal movement to address this issue, not specifically for COVID-19 surveillance but more widely for how the federal government collects demographic data for public responses. The federal government announced late January that they’ve begun seeking proposals for revising the federal race and ethnicity standards. 

But as people and governments start to move on from the pandemic, federal and local health agencies will be rolling back how often and how much they report data on COVID-19. What’s more, states won’t be required to report cases, and the CDC will stop tracking community-level spread, focusing on deaths and hospitalizations.


Less data along with missing race and ethnicity data for COVID-19 will continue to hinder our ability to understand the true impact of the pandemic on different communities. Luisa Borrell, a Distinguished Professor of Epidemiology and Biostatistics at CUNY’s Graduate School of Public Health and Health Policy, says obtaining better data is crucial to effectively addressing racial and ethnic inequality. 

“So I think we have to put our money where our mouth is,” she says. “We have to definitely get better data.”

Leave a comment

Your email address will not be published. Required fields are marked *