Predicting the NCAA Men’s Basketball Field — and Discovering the Selection Committee’s Biases

A group of academics shed light on the March Madness selection proccess

  • Share
  • Read Later

The NCAA men’s basketball selection committee is biased toward bigger conferences and better-known teams. Those biases are surprisingly consistent and quantifiable. If you don’t believe that, just ask these guys.

Hardly a March goes by without the NCAA Selection Committee getting an earful from college basketball fans about teams getting shafted and others miraculously awarded bids to the men’s basketball tournament.

(MORE: Why is College Basketball Scoring in Decline? One Expert’s Take)

The process by which the selection committee chooses its 37 at-large bids has always felt like some version of a papal conclave, where a bunch of gray-hairs convene to deem who’s worthy. Instead of white smoke and a pope, we’re given Selection Sunday and a glorious bracket.

The only thing we seem to know about the process itself is that we don’t really know anything about it. Bits of information come out each year during the annual CBS sit-down with the committee chair, but most of us are often left with 68 teams and a bunch of questions about how they got there.

For many fans, it often seems that major conferences historically get the nod over mid-majors. While last year 11 mid-majors got at-large bids, the number of Gonzagas and Butlers that generally make it are in the single digits. And the research done by professors Jay Coleman, Allen Lynch and Mike DuMond backs that up.

Exposing the committee’s biases wasn’t the goal of Coleman, a professor at the University of North Florida, back in the 1990s when he stumbled across the now well-known basketball metric RPI, a ranking based on win-loss records and strength of schedule. But Coleman soon started using that data set to predict which teams the selection committee would select to compete in the NCAA tourney, and along with Lynch of Mercer University and later DuMond of Florida State University, they created a model using SAS software called the “Dance Card” that predicts which teams the selection committee will select. Often, they have accuracy rates near 95%.

That may not seem terribly impressive. A majority of the at-large teams every year are absolute locks to make it into the tournament. But predicting those dozen or so “bubble teams” is a difficult task. If it wasn’t, Joe Lunardi would be out of a job.

The analytics guys use a combination of the RPI, the Jeff Sagarin ratings, conference record, road record, neutral court wins, wins against teams in the RPI, the conference ranking and whether a team won the regular season conference championship. But until 2008, something about the model seemed off.

“Each year, there would be teams that were mispredicted,” says DuMond. “The question to me was, What’s unique about those teams? It appeared from a casual inspection that teams that were getting in that the model suggested shouldn’t be in also had some sort of friends on the selection committee, be it an athletic director or a conference commissioner.”

According to the model, Pac-12 teams appeared to be getting in more consistently than they should have. So were Big 12 teams. Mid-major Missouri Valley Conference, on the hand, was poorly represented.

(MORE: Five Lessons to Learn From Gonzaga Basketball)

“We said, ‘Wait a second. We’re starting to see patterns in the error term,” says Lynch, referring to the model’s failed predictions. “We’re starting to see a pattern or systematic process with the teams that are being missed. And it was through that examination that we’re able to identify these biases.”

So they changed their model. The guys began adding metrics to test for the committee’s biases. After all, they weren’t trying to predict who should be in the tournament. They were trying to predict the teams the selection committee would pick. In 2008, they started including in their metrics whether a team has representation on the committee through a conference chairman or athletic director, whether a team is from a mid-major or major conference, and how many other teams from a conference get in (the committee may want to limit the number of teams from a single conference).

Looking back on the data from 1999-2008, they discovered some shocking stats. When comparing more objective metrics with what the committee decided each year, being in the Pac-12 (or Pac-10 at the time) was equivalent to being 31 spots higher in the RPI. Being in the Big 12 equaled to being 17 spots higher. For the Missouri Valley, being in that conference was like they were 32 spots lower.

Last year, the biased Dance Card predicted 35 of 37 at-large bids. But a funny thing happened in 2012: the unbiased version was a better predictor of the committee’s actions. It predicted 36 of 37 to produce a 97% accuracy rating.

“Maybe last year they were more cognizant of that and perhaps the make-up of the committee was such that they didn’t follow the same sort of biases,” says Coleman.

So this year, the analytics guys are going with the unbiased version. According to Coleman, no team is moving up or down the rankings this year thanks to the bias factors. But in their biased version, Wichita State from the Missouri Valley would be six spots lower, St. Mary’s (West Coast Conference) would be eight spots higher because their commissioner is on the committee, Oregon (Pac-12) would be seven spots higher, Colorado (Pac-12) would be 10 spots higher and Oklahoma (Big 12) would be six spots higher.

Indiana, Duke, Kansas and Louisville round out the top four, and several notable programs are near the bubble, including Kentucky, Virginia, Iowa and Tennessee.

While Coleman, Lynch and DuMond seemed to have figured out the formula for determining the tournament field, can any of their metrics be used to fill out really matters — the tourney bracket itself?

“If we could develop a model that could accurately predict who’s going to win games, I don’t think they’d call it March Madness,” says Lynch. “That’s what makes it fun. As a fan, I’m glad we haven’t developed a successful model for that.”