This is a continuation of my last post.
The P-Value of Command
Looking deeper at Special Forces battalion commanders offers us an example of the pitfalls of quotas but also the challenges of bias. Now I don’t have access to the raw data for this, so we’re going to have to do some McKinsey style ‘back of the envelope math’. But I’ll be clear where I’m making assumptions, and I’ll also scale the odds to skew the math toward best case scenario.
1st Special Forces Group was reactivated over 40 years ago back in 1984. Each year two of its battalions open up for command, 1st and 2nd in even years and 3rd and 4th in the odd ones. If we assume two years in command, then we get about 60 battalion commanders — 20 commanders each for 1st, 2nd, and 3rd. 4th battalion was only stood up back in 2011 so we’ll use those six commands to help pad against any errors in the other three, in case someone got fired or a command ran long.
The latest data I can scrounge up for the military’s demographics is in the DoD’s 2023 Demographics: Profile of the Military Community. This lists the officer corps as 74.5% white, with the remaining 25.5% broken across six other ethnic groups. If we distributed our commands on a quota system, then one-out of every four battalion commanders would be non-white. But the Army doesn’t do quotas for command, nor do I think they should.
It might be hard to visualize this when it’s a bunch of numbers, so let’s DnD (Dungeons and Dragons) this.1 The above math example is like rolling a four-sided die — a d4 in the lingo — which looks like a pyramid. You’d expect to roll a two, three, or a four most of the time, but roughly one quarter of the time you would roll 1s.
That DoD report puts the army’s breakdown as 29.0% non-white, but that still can vary a lot from unit to unit. I definitely don’t have access to the ethnic breakdown for officers in 1st Special Forces Group going back to 1984, but we’ll take a conservative estimate and cut in half the 25% from the overall DoD and conservatively assume only 12.5% of the officers in 1st Group over the last forty years have been minorities.2 DnD has another die perfect for this — the d8, which looks like two of those d4s stacked together.
We’d struggle to run a CSL selection based on quota for the three BNs since 1/3rd doesn’t go neatly into 1/8th. If it was 1/9th then we could try to cycle in one command every three command cycles, but with 1/8th you’re going to have to check to ensure three commands get picked out of every 24, which spread over 16 years just isn’t very realistic. This is just one example of how quotas can be counterproductive.
Putting it back into our DnD analogy, we’d expect if we rolled three d8s eight times each, we’d roll about three 1s. Across twenty rolls each, the odds say we’d roll 1s either seven or eight times — technically 7.5, but you can’t half-roll a die. And if you look at the history of 1st Special Forces Group, you’ll see the number of minority battalion commanders we have is… three.
Now twenty commanders across three battalions doesn’t make for a big data set. Small numbers can be tricky, which is why statistics works best with larger data sets. Let’s try a bigger one.
We’ll take all five of the Special Forces groups, continuing to ignore 4th battalions — sorry Ron— and just use them to round out the gaps. 3rd Group stood up later in 1991, but 5th Group and 10th Group have longer histories. So we’ll assume 15x d8s now (3x BNs each for 5x Groups), and we’ll roll them each 20 times — for 300 battalion commanders.
I don’t have the officer demographics for all of the SF Regiment either, so instead I’ll try a different tack. We know that those d8s should roll each number 1/8th of the time, but we also know that will only hold for large numbers of rolls. In reality, some of those dice are going to roll some numbers more than they statistically would, and others less. But we also know that on average about half the die will roll 1s at a below average rate, and about half the time they will roll 1s at a higher one.3 This is the sort of thing they watch for in casinos, where people can try things like shaving dice to cheat the odds.
In reality, only two of our 20 different SF battalions have had three or more minority officers selected for command in twenty ‘rolls’. Better than none, but I think that’s low enough to warrant taking another look. Over a million simulations, I’ve found the odds of only 2 dice rolling 1s at such a low rate converges toward about 0.1609% of the time — 1 in 621.5. That’s not exactly Avengers: Endgame math, but it’s low enough that it catches my attention.
I’m not ‘woke’, but I am data literate. Statistics say our data set is unusual. Let’s assume I didn’t join the army but instead decided to become a Vegas pit boss. 13 of my 15 craps tables just started throwing dice this skewed. I'm not going to start breaking my dealers’ kneecaps yet, but I’m damn sure going to take a closer look. At a minimum, I’d have thrown away the dice.
The Baby and the Bathwater
I’m not arguing that SF Battalion command selection is racist. There are a lot of things which end up creating anomalies in racial and gender disparities. Some of the effect might just be random sorting.
There are a lot of reasons that could lead to ‘unfair’ outcomes, some of which compound. Things like who gets mentorship and who your mentor is can define your career. It did for me. But did I subconsciously look for a familiar face in my mentors? Did they? Would I have been comfortable mentoring female officers if I hadn’t read Athena Rising? What if I hadn’t? Was it easier for me to self-recruit into Special Forces because I could so easily see myself wearing one?
Those 20 dice rolls are representative of forty years of commands, so there’s problems with lumping today’s stats in with those from decades ago. Two things in particular have changed since the 80s and 90s when SF started picking its commanders. First, we started paying attention. The army started acknowledging the science of cognitive bias just in the last decade and has been talking to commanders about it, in particular at pre-command courses. We also transitioned to CAP, a more data-based means of picking our commanders.4
Two of the three non-white SF battalion commanders in 1st Group were selected after attending CAP. All three were selected within the last six years, as the Army was deciding to remove photos from our selection boards. Correlations aren’t causation, and there’s not nearly enough data here to prove one. Statistics just helps you know when you might want to start paying attention.
Training on cognitive bias is not DEI and has been a needed addition to command prep-courses. We need commanders to be aware of their own predispositions, and blithely deleting bias as ‘DEI’ is a step backwards. Over the last four years I sat in pre-command courses where multiple colonels angrily refused to believe bias was even real. One even demanded we put photos back on soldier’s files, so he knew who to hire. Those weren’t random soldiers. They were CSL selected colonels on their way to commanding brigade level headquarters. Our jobs need people who merit them, but right now, the math predicts we’re precluding some incredibly qualified candidates from even getting a chance.
There are also a non-zero number of people who believe minority servicemembers are inferior to their white male peers. I know because they’ve told me. Directly.
One consequence of being the army’s default model is people will comfortably say things in front of me they wouldn’t say in mixed company. As a recent example, upon just hearing about Secretary Hegseth’s nomination last fall, one fellow lieutenant colonel was all too happy to tell me every sexist thought in his head.
The army has a long track record of getting diversity and inclusion wrong. We fought desegregation. When black soldiers and airmen exceeded the meritocratic standards, they were actively erased from our history. COL Perry Davis, a member of my own regiment, saw his Medal of Honor packet spiked on two different occasions. Purging every unit’s social media feed will never make those facts untrue.
Next, we fought women being allowed into the ranks, and later into combat jobs. Women who could meet the standards were called ‘freaks’. The army also fought allowing patriotic LGBTQ soldiers to serve their nation, a fight it looks set to resume.
And yet we’ve been proven wrong every single time. We are wrong when we presume a race or gender is going to bring some secret sauce to our formations, but we’ve been wrong way more often by arguing those same groups shouldn’t be on our teams, that they don’t have merit.
There are and have been transgender Navy Seals and Special Forces operators. They earned their tridents and berets on their own through blood and sweat. It difficult for me to hear their merit questioned by fellow soldiers who never even earned the tab. It is even harder to sit by and just listen as their honor, humility, integrity, and discipline are questioned by civilians who never served their nation’s armed forces.
‘Without data, you're just another person with an opinion’
5In statistics, p-values of 5% are often considered significant, since the odds of them happening randomly are less than 1/20 — or rolling a critical fail on a d20. It doesn’t mean they prove causation, but 5% has been set as the barrier for rejecting the null hypothesis. If we think back those three lone non-whtie commanders at 1st Special Forces Group, they probably should have been rolling d4s or d6s. Even in our pessimistic scenario, they would roll d8s. But in reality, they were rolling d20s.
I’m not in favor of quotas, but I’m also data literate. So, what’s the answer?
I think we need to find a point on the number line between quotas and p-values that triggers more introspection. I’m not sure where the LaGrange point is, but there’s certainly a lot of room between 0.1609% and 25%. There are a couple orders of magnitude in there to choose from.
In the meantime, I need to ensure I ask for more candidates than just the ones my team gives me. I need to take time to check my own biases and challenge them. And I perhaps most importantly, I need to do more to make sure every soldier I meet is getting the mentorship they need and deserve.
I also think you should be wary of anyone who claims to be in favor of meritocracy but doesn’t bring metrics. This is because we’re all in favor of ‘meritocracy’ as long as it means ‘well, of course me’. One place you can see this recently is with CAP. ‘I got fucked by peers at CAP’ is the new ‘I got fucked on push-ups at Ranger School’.6
We should promote the soldiers with the most potential on merit, but we owe our servicemembers clarity and transparency on what those standards are.
And any AI Tech-Bro who claims there’s no such thing as bias is fucking lying to you. You might want to start asking why.
My editor said I didn’t need to spell out DnD, and given how nerdy this Substack is, she’s almost definitely right.
1st Group, which is assigned Asian languages, is likely in the top two for most diverse group. The other would be 7th Group, which is aligned toward Central and South America.
Here’s a block of python code if you want to try it out for yourself:
import random
class Die:
def __init__(self):
self.rolls = []
self.count_ones = 0
self.flag = ""
def roll(self, num_rolls):
self.rolls = [random.randint(1, 8) for _ in range(num_rolls)]
self.count_ones = self.rolls.count(1)
self.flag = "high" if self.count_ones > num_rolls / 8 else "low"
def simulate_generations(num_dice, num_rolls, num_generations):
occurrences_low_2_or_less = 0
for _ in range(num_generations):
dice = [Die() for _ in range(num_dice)]
low = 0
for die in dice:
die.roll(num_rolls)
if die.flag == "low":
low += 1
if low <= 2:
occurrences_low_2_or_less += 1
return occurrences_low_2_or_less
def main():
num_dice = 15 # Set the number of die / BNs to simulate
num_rolls = 20 # Set the number of rolls / commanders to simulate
num_generations = 1000000 # Set the number of generations / times to run the simulation
occurrences_low_2_or_less = simulate_generations(num_dice, num_rolls, num_generations)
print("Summary:")
print(f"Number of generations: {num_generations}")
print(f"Number of generations with 2 or fewer 'low' dice: {occurrences_low_2_or_less}")
print(f"Percentage: {occurrences_low_2_or_less / num_generations * 100}%")
if __name__ == "__main__":
main()
Command Assessment Program. An assessment run in the fall where the Army evaluates the next generation of the Command Select List (CSL) leaders among its Lieutenant Colonels, Colonels, and Sergeants Major.
Quote is from W. Edwards Deming.
Years ago, when Ranger School did a data pull on why people were failing, they found the highest attrition event was the push-up. It turned out this was in large part because it was the first event you could fail.
Many people who get Ranger School slots don’t actually want them. But rejecting one typically takes the integrity to stand in front of your commander and tell them to their face you don’t want to go. In my whole career, I’ve only met one officer who had the courage to do that. Most just go down to Georgia, half-ass the first event, and then spend the rest of their careers telling everyone the reason they didn’t get a Ranger tab was because they ‘got fucked on the push-ups’.