Scientific Racism and the Validity of Racial Categorization

This post is brought to you courtesy of Patreon. If you want to support more work like this, you can sign up here.

The following is based partly on posts previously published in 2008 and 2012. It’s been consolidated and revised to be accessible to audiences new to this discussion. Images used in the post are mine taken at the Science Museum of Minnesota’s exhibit on race, developed by the American Anthropological Association, first mounted in 2008, and revived in 2017.

“Scientific racism” refers to the attempt to justify racist beliefs and policies through the use of scientific studies. Proponents of scientific racism, unsurprisingly, prefer to use less-loaded terms like “race realism” or “human biodiversity”, but even “scientific racism” can lend an unfortunate patina of validity to the arguments they make. It suggests–reinforced by many of those sympathetic to the claims involved–that the arguments we’re all having over race are merely political. It says those of us who fight scientific racism object to the policies promoted, to the racism, but that the whole enterprise is sadly, regrettably, unavoidably scientific.

However, this simply isn’t true. While scientific racism is decidedly racist and particularly anti-Black, it fails to be scientific. How badly does it fail? The very concept of “race” as used by scientific racists isn’t scientifically valid.

Photo of fake street signs at the intersection of Privilege Place and Race Road.
This isn’t to say that race is never a valid scientific concept. You’ll sometimes see people say, “Race is a social construct”, as though this invalidated the concept. It doesn’t. Race is a social construct, but it’s been a durable and very powerful construct. We have scads of documentation of how race has been defined, redefined, and enforced over time. People have lived and died en masse over race. Governments have been organized around the concept of race. Race is not only valid but critical in studying social and societal dynamics.

That isn’t how scientific racists use “race”, though. A socially constructed conception of “race” doesn’t support their arguments. In fact, it’s more likely to expose their racism than justify it.

No, scientific racists are using a more essentialist concept of race. Given the age of much of the research they cite, this may not explicitly be a concept based in genetics. Even their modern research often doesn’t refer directly to genetics (studies that do tend to fall out of favor for reasons I’ll get to), but the implication is that the qualities studied are fixed and inherent to the racial categories being used. We know this requires a shared genetics even if this is never said.

The problem for scientific racism is that race is not a valid concept within human population genetics. This isn’t to say you’ll never hear a scientist working in population genetics endorse human races as a concept. You will. Some are more cautious. Some have more obvious political biases than others. The reasons they give, though, won’t make a case for genetics-based scientific categories. They’ll make a case for social categories.

The difference between the two types of categories is that we assign social categories for social reasons. The taco-sandwich wars that periodically rage on social media provide a great example of this. We could call a taco a sandwich based on the definitions we elicit when we ask people to tell us what a sandwich is. We don’t do this in practice unless we’re asking for an argument, because a taco just isn’t we mean when we say we want a sandwich. Certain things are sandwiches because we agree they are, not because we have an underlying, objective, replicable set of criteria for what constitutes a sandwich.

We don’t, however, assign scientific categories for social reasons. At least we don’t unless we’re studying social categorization. For research that isn’t studying subjectivity, we need more. We need our categories to be objective if we want to claim our research based on them is. We need these categories to accurately describe underlying qualities. We need to be able to replicate categorization between researchers.

Now it does happen that something or someone defies easy categorization by having qualities of multiple categories. Platypus and echidna are the only mammals that lay eggs instead of bearing live young, but we call them mammals because they meet all the other requirements of the classification. “Mammal” tells us more about them than it obscures.

Still, we don’t continue to call Pluto a planet just because we’re used to doing things that way and people got upset when astronomers announced the change. Social pressure–including political pressure–does not a scientific category make. Even if schoolchildren and people raised in another era call all planets, dwarf planets, and planetoids “planets”, astronomers do not.

That said, scientific racists do make arguments for why race should be considered a valid scientific category for these purposes. The problem is that these arguments don’t meet the threshold for scientific validity. This post goes summarizes some of the most common of these arguments and their problems.


I’ve been using a couple of words pretty heavily already that I want to stop to define. These get to be important in any argument like this, especially for defining the burden these arguments have to carry and for avoiding accusations of ad hominem attacks. These are the relevant definitions from the Webster’s Third New International Dictionary of the English Language, Unabridged.

Racism: the assumption that psychocultural traits and capacities are determined by biological race and that races differ decisively from one another, which is usually coupled with a belief in the inherent superiority of a particular race and its right to domination over others. [emphasis mine]

Valid: based on distinctive characteristics of recognized importance: founded on an adequate basis of classification.

I emphasized “usually” in the definition of racism because belief in racial superiority is not necessarily part of racism, although it’s important in explaining racism’s negative effects. It is, however, a necessary component of “race realism”, in which the “realism” refers in part to the assertion that certain abilities and achievements are just not to be expected of certain races.

The Argument for Race

There are four main arguments commonly made for the biological validity of race:

  1. Genetic testing allows for grouping by continent of ancestor origin.
  2. Race may not predict the things it’s been used to predict in the past, but it’s an important proxy for genetics in medicine.
  3. Yes, assignment of humans to racial categories is an arbitrary procedure, but we use arbitrary names for parts of other continua. Why not race?
  4. You’re just being PC, Marxist wankers.

From a scientific perspective, we can ignore #4, but I’ll still touch on it briefly after dealing with the more substantive arguments. It’s important in dealing with the implicit criticism that opponents of scientific racism are making a political argument.

Where to Start

Most scientific racists start with the assumption of human races and attempt to place the burden of proof on those who disagree. That’s not how the null hypothesis works. Subspecies aren’t a given within a species. They require reproductive isolation in order to form and to be maintained, just as species do. If we can’t show the effects of this isolation, we don’t have subspecies.

Once we’ve identified a species of bird, we don’t assume that each group of birds we see that are slightly darker than their fellows or are short a tail feather is a new subspecies. If we want to claim that they represent a group distinct from another group, we have to define the boundaries of the group and demonstrate their validity. If we fail to do that, we haven’t demonstrated the existence of subspecies.

Within any species, we expect genetic variation. That alone isn’t enough to start classifying subspecies. We expect changes in the frequency of alleles across types of environments, across distances from the point of origin of a mutation. Human history is in part a history of trade and conquest, so we also expect changes in frequency across trade routes and distances from trade routes, across distances from imperial centers.

The key word is “across.” As noted before, people trade and fight and have sex and produce offspring with their neighbors. Consider a metaphorical game of genetic telephone. What you hear at any two widely separated points may be distinct, but there is a chain of changes between them. We can divide up our players into groups, but why would we prefer to do that rather than focusing on the process of change? And why would we put our dividing lines in any particular place?

The burden falls on the person wanting to impose categorization to show that their categories are valid–both accurate and useful. The existence of a distinct genetic population of humans is not an impossibility, but what we know of human social and political history makes it unlikely.

Genetics of Origin

Scientific racists like to point to studies that show genetic data can be used to determine what region a person’s ancestors originated from. These are real studies, reasonably done for the most part. There are problems yet with our samples and with our ability to differentiate between particular subgroups of people. There are also issues with pinpointing the origin of people whose ancestors came from several regions, and diasporas complicate the picture. But for the most part, yes, using enough genetic markers, we can determine someone’s regional ancestry.

This still doesn’t mean people are reasonably classified into races. There are three major problems with arguing that it does.

The first problem is that the argument for races is not a simple argument that human genetics vary regionally. No one suggests that they don’t. Both those who argue for the validity of race and those who argue against it agree that people are more likely to be genetically similar to other people whose ancestors lived closer to their ancestors. There’s no dispute there.

The question is whether genetic variations happen more or less smoothly over distance or group together in ways that are useful for scientific study. That’s not an easy question to answer, for reasons that include the fact that we haven’t agreed on how we should measure genetic variation. This disagreement encompasses Lewontin’s famous statements about genetic variability occurring mostly within “races” rather than between them and all the argument that has followed over whether that’s relevant to sorting humans into races.

You can see discussion on that question here. Based in no small part on the lack of agreement about which genetic variation is relevant, what our “distinctive characteristics of recognized importance” are, the authors conclude that these genetic studies don’t provide evidence for or against the concept of genetically determined races. The evidence we have is neutral, giving us no reason to reject our null hypothesis. They conclude “using biological theory to ground race is a pernicious reification.” [emphasis theirs]

The technical tools provided by biological theory are diverse, but the social house we build is of our own choosing. It seems time to accept honestly our burden of conceptualizing “race,” and not allow ourselves to be imprisoned by formal abstractions. Biological theory and data do not force the “race” concept upon us; we force it upon ourselves, to our own detriment.

When we do use genetic variation to sort humans into “clusters” based on similarity, we run into several issues with the ideas behind scientific racism. The first is that the output of cluster analysis programs is, of course, determined by the programming and the assumptions fed into them.

What assumptions are we talking about? The major assumption is the number of clusters (K) we should be split into. The most-cited study published on the topic by Rosenberg et al. published in 2002 showed the most likely results of clustering analyses using between two and six clusters.

The results of this study already mostly fail to map to historical racial classifications. The three-group model comes closest, though like all the models, it shows significant overlap between groups that can be used to argue against using discrete categories. But on top of that, these analyses give us very little reason to choose a number of clusters. As Deborah Bolnick explained:

Thus, the fact that structure identified 6 genetic clusters is not significant in and of itself–the program also identified 2, 5, 10, and 20 genetic clusters using the same set of data. As noted above, structure will identify as many clusters as the user tells it to identify. […]

In other words, no single value of K clearly maximized the probability of the observed data. Probabilities increased sharply from K = 1 to K = 4 but were fairly similar for values of K ranging from 4 to 20 (N. Rosenberg, personal communication). The probability of the observed data was higher for K = 6 than for smaller values of K, but not as high as for some replicates with larger values of K (N. Rosenberg, personal communication). […] Consequently, it is uncertain what number of genetic clusters best fits this data set, but there is no clear evidence that K = 6 is the best estimate.

Bolnick provides a great discussion of the difficulties of choosing a number of clusters for these analyses. When we can’t do that consistently based on criteria better than our own intuitions or prior, socially determined notions of race, the argument for the scientific validity of genetics-based races is at best very weak. It too gives us no reason to reject our null hypothesis.

Finally, there are also weaknesses in the datasets that have been used for this clustering analysis. Some of these weaknesses are fatal to the proposition that the studies involved preferentially support the existence of human races.

For example, when Bamshad et al. published a study in 2003 on using genetic variation to sort people by continent of origin, they used “a heterogeneous sample of >500 individuals from sub-Saharan Africa, East Asia, southern Asia, and Europe.” Their research produced some fairly dramatic genetic separation between the groups measured, but it can’t address the question of the overall separation we would see between human groups.

It can’t do that because it didn’t sample humanity. It sampled what we would already expect to be quite discrete groups if genetic variation happened smoothly across distance. Looking at a map, you can already see that there are populations between sub-Saharan Africa and Europe that weren’t sampled, but the artificial separation is greater than even that description suggests. The European sample consists of French, Polish, and northern European people. Southern European populations are excluded, as are western European groups that would come closer to the southern Asian populations sampled.

Scientific racists like to point to the results of studies like these to demonstrate the separation between races. That separation, however, is imposed by data sampling. It can’t speak to separations in all of humanity because it was chosen not to represent all of us.

Even where such artificial restrictions aren’t imposed on the data used for genetic analysis, there are still restrictions imposed by convenience. We’re building larger datasets to work with and from, but much of what we have oversamples locations with research facilities (mostly universities and larger medical centers) and undersamples everything else. In some parts of the world, that causes significant discontinuities in our samples.

If we want to be able to differentiate between genetic separation between populations caused by underlying discontinuities and that caused by poor data sampling, we have to get into these areas and get samples. Luckily, there are researchers working on that.

Xing et al. filled in gaps in 2010 by collecting genetic data from 13 groups that hadn’t previously been included in these studies. They concluded:

Patterns of human genetic variation are influenced by mating patterns, and the latter are in turn influenced by geographic and cultural factors (e.g., mountain ranges, language, religious practices). Consequently, it is not surprising that human genetic variation, while correlated with geographic location, is not perfectly clinal [37–39]. However, between-population differences can be seriously exaggerated if human populations are sparsely sampled.

Consistent with previous studies [37,39,40], our analyses demonstrate that differentiation among human populations decreases substantially and genetic diversity is distributed in a more clinal pattern when more geographically intermediate populations are sampled.

Better data is less consistent with the assignment of genetic races. It supports the null hypothesis more than it supports human races.

It’s also worth noting in all this that the scientists conducting these clustering studies generally do not claim their work supports the idea of human races. Some of them, like Rosenberg, have spoken out against the use of their work by scientific racists.

Race for Medicine’s Sake

One of the more seemingly benign arguments for clinging to the concept of race is that it can provide a clue to underlying genetics that can be useful in diagnosing or treating disease. After all, we know that Ashkenazi Jews get Tay-Sachs and that Africans get sickle-cell anemia. High blood pressure and diabetes are linked to race.

Again, there are multiple problems with this argument.

Photo of plaque titled "High blood pressure: your race or racism?" Subtitle: "Studies suggest that the stress of racism contributes to higher rates of hypertension among African Americans than among European Americans. Map graphic shows high blood pressure of U.S. blacks and low blood pressure of related Africans.
“It’s not genetic”

Both Tay-Sachs and sickle-cell anemia are genetic disorders with well-defined mechanisms, but environmental factors play a role in many diseases. Racism accounts for several of those environmental factors. Social constructs can have biological effects. From the stress of the ostracism and maltreatment that we classically recognize as racism to medical racism that results in substandard treatment to unequal access to relevant education, resources, and basic necessities, racism has a direct effect on health. Some disorders attributed to race, such as hypertension, are better attributed to racism. Race, in these cases, is again relevant only as a social phenomenon.

Map of world sun exposure. Q&As below the map point out that skin cancer happens too late in life to affect reproductive success and that some near-polar populations have darker skin because of reflected sunlight.
Some diseases are linked to the genetics of race, but only in as much as we use certain genetically determined traits as proxies for racial classification. Skin cancer, for example, is linked to skin pigmentation, though it’s not determined by it. Scientific racists suggest this provides a justification for racial categorization. However, direct measurements of skin pigmentation provide a much better measure of risk for skin cancer than racial categories do. Limiting diagnosis and treatment advice based on race in these cases is a bad idea that can increase skin cancer deaths. It would be a step backward, a step toward less knowledge, to use race as a proxy for risk.

Even among known genetic disorders, inheritance is not based on race as scientific racists use the term. Risk may have some correlation with the races they attempt to assign us to, but no race is defined, even imperfectly, as a group that gets disease that no one else gets. The gene for sickle-cell anemia is adaptive in areas with high rates of malaria. This means that there are areas of Africa with almost no instance of the gene and areas of South America and Asia with fairly high rates. The group that is at high risk of sickle-cell anemia is geographic in origin in a way that is not racial.

Tay-Sachs is prevalent among one population of Jews but not others. It is also prevalent among Cajuns. French Canadians also have a higher than average prevalence, but the underlying genetics among Quebecers is different than among Cajuns, who share a mutation with the Ashkenazi Jews. As in the studies on population genetics, single genetic markers show possible founder effects but very little correlation with proposed genetic race or continent of ancestor origin.

Continua and Complexity

“Erm, okay,” say the racists, finally. “Maybe the underlying genetics do vary smoothly. That doesn’t mean they don’t vary. We use arbitrary names for other continua, like color. Why not for races?”

While this is once again starting from the assumption that race has validity–that it already measures something–and thus, the wrong question when trying to demonstrate validity, it does raise a couple of issues that are worth talking about. Color, an example used in the discussion, is provocative. The underlying continuum is smooth, but we do use divisions of it, frequently and successfully, in communicating with each other. Why can’t we do the same with race?

The main answer is complexity. To the extent that we agree on a name, saying that something is a particular color tells us roughly what wavelength(s) it reflects or emits. Color varies on one continuum. We can operationalize our categories for study based on wavelength and repeat those studies using the same wavelengths each time.

What does race tell us? What is the underlying continuum that we’re measuring?

Plaque with three maps showing frequency of blood types. Lots of regional variation with very little correspondence with continental "races".
If we were Japanese, we might try assigning race based on blood type. Lots of luck to us.

Of course, we can’t reduce humans to a single continuum of variability. We even have difficulty finding traits once considered to be racial traits that vary with geography in ways a racial model would predict. Skin color varies with average sun exposure as much as it does with any known pattern of migration. Analysis of skeletal remains, now and in the past, does not reliably indicate group identity. Facial features and body proportions are both too variable and too consistent across groups.

The continua that race tries to measure are not single, smooth gradations around the world. Racialized traits/continua don’t always follow the same paths, so that if we overlaid one trait on another, the resulting map would look somewhat like plaid but much fuzzier. A third overlay, accounting for just one more trait, would produce an even more muddled map.

Where do we stop and still see anything that look like groups without the groups being larger than an extended family–or even an individual? We would have to reduce the number of traits to the point that we would be measuring trivial differences between us (though racialized traits do tend to be trivial in themselves). If we don’t do that, we’re back to the big map of genetic distances and trying to determine where to impose borders for reasons that aren’t social. And we simply don’t have good reasons to do that.

Cultural Marxism and Status Quo Warriors

This brings us down to the “argument” that we’re only denying the validity of racial categorization because of politics and that our politics are bad. The first time I addressed these arguments, I didn’t bother with this one. However, there are a couple of points I want to make about it.

The first is that “Marxism” and particularly “cultural Marxism” as terms used to invalidate argument have a very specific history that is itself quite racist. They’re referring to a supposed Jewish conspiracy theory designed to radically change society. It’s possible, even probable given the anti-intellectualism of modern racist movements, that some of the people using this term don’t know its origins. It’s still no coincidence that it’s tied to racist ideologies and justifications.

On top of that, calling anti-racists names that highlight their attempts to change the world is making a political argument in itself. If there’s something wrong with us because we want change, even radical change, this has to be because the person doing the name calling sees maintenance of the status quo as an inherent good. This is a political argument that has no place in scientific discourse, which is all the ironic.

The Failure of Race

The question that those who want us to study race must answer is “What does race tell us?” Sociologists and others who study the social categorization of race can point to good answers. They can point to well-documented histories of legal and de facto discrimination, to racial prejudice, to discrepancies in opportunities and living conditions. Even as social definitions have changed over time, we can point to media and government agents defining race.

Scientific racists aren’t doing as well. If they want to claim race has biological validity, race has to not only be based on biological/genetic measurements that distinguish the categories used, which current racial classifications are not. It also has to tell us something about the biology or genetics of race that is nontrivial. Importance is a critical part of the definition of validity.

Looking at the claims scientific racists have made, we can see what race does not tell us. I’m still waiting to be shown what it does. I’m in good company.

Want to see more work like this? Support me on Patreon.

Scientific Racism and the Validity of Racial Categorization

One thought on “Scientific Racism and the Validity of Racial Categorization

  1. 1

    Three recent reports emphasize the fluidity of the concept of race for me. One is the finding of DNA markers from Neanderthal people in studies of DNA in many modern folk. Way to blur the line, guys!

    Second, the discovery of a very early “Brit” (since nationalism is even more fluid, I’ll use that term) whose genetic markers indicate blue eyes and black skin. No, not just darkening attributed to the effects of tanins in peat bogs.

    Third, the DNA studies which show the combining of various Asian immigrants across the Berring land bridge, penned in by lengthy glaciation to a narrow location leading to extensive interbreeding, which produced a genetically new group of native Americans, unique to these 2 American continents.

    Yeah, evolution is pretty cool, eh? But what the heck is race, really? I got my DNA tested, and the majority actually confirmed the family stories. Interesting, to me. The minority findings merely confirmed how much humans have traveled and scattered their genes throughout history. I’m sure if we could pick out even finer minority bits, we’d all be a little bit of everything.
    I’ll take it.

Comments are closed.