the Polo Grounds

Back to the forum index

OT: Critical Problems Facing College Football Analysis

by suave_andrew, Sunday, February 06, 2011, 11:38

I was recently browsing an old 2005 issue of Journal of Quantitative Analysis in Sports (great bed time material...) and came across the brief article linked below by Aaron Schatz of Football Outsiders. Schatz follows the approach of David Hilbert, who posed a series of critical problems facing mathematics at the turn of the 20th century. The article turns a critical eye to issues facing football analytics:

I thought it'd be interesting to go through a similar exercise for college football. I think that you need to first separate out the issues facing both the NFL and college game, and keep it college-specific. In my opinion, the NFL provides a far better way of analyzing things that are general to the game of football, and Schatz already provides a good list of the main issues affecting the general analysis of football and data collection.

College Football-Specific Issues - Analytical areas that need to be addressed and/or better understood that are specific to the game of college football for the most part. These are not necessary in order of importance, but rather the order in which they popped into my head:

1) Recruiting - This is probably the most visible input into college football programs and also probably the most poorly analyzed 'predictor' of a team/coach's future success. More analysis needs to be done on the proper utilization and value of recruiting service data. Relative recruiting - how talented a team is in comparison to its actual opponents - is not even looked at as far as I can tell, yet is probably the only thing that should really matter from a usefulness standpoint. There is basically no recruiting data for lower division teams, and service academy recruiting data is very poor. Finally, bias is an issue with recruiting services, be it regional, for marketing purposes, or for revenue generation (i.e. you must attend our camp to get rated, etc). It's also conceivable that a player's family or representative(s) can buy the player coverage from recruiting services.

2) Player Development - Development is mentioned a lot, but little has been done to try and quantify it. I personally think that comparing recruiting performance with the number of players a program puts into the NFL is a good starting place, but it's far from perfect. Because it appears that recruiting has diminishing marginal returns, it's arguable that player development becomes an increasingly critical factor of top level program differentiation.

3) Strength of Schedule - I don't think there's an adequate way of looking at this yet. This sort of goes hand-in-hand with identifying objective ways of ranking programs relative to one-another. There's a million ways to try and compare programs and use their difficulty of schedule as a 'deflater' for this purpose, and while some methods may be more popular than others, I don't think there's truly one best way of doing it. Like with recruiting, I think that strength of schedule should be completely relative: the only thing that should matter is how much of an upfield battle did a team face against its opponents? How much of a talent difference was there? Was the team a visitor to an extremely tough environment? What was the weather like? What was the margin of victory relative to all of this? None of this stuff is really factored in. Sagarin, for example, uses a base home field advantage, but there is absolutely not some 'standard' home field advantage across every program in college football.

4) Objective/Fair Ranking - Linked closely to the SOS issues above. The Massey consensus rankings are probably the best and most objective approach at this point until SOS is addressed better. The AP and Coach's polls are probably the worst and least objective approaches that we have.

5) Home Field Advantage - As mentioned in the SOS portion, I refuse to believe that a standard homefield advantage can be applied to every home team across college football. The atmosphere, noise, and fan involvement at places like Eastern Michigan or Temple are simply different than what you will find at LSU or Penn State. Scheduling becomes involved because top-tier programs play more home games on average than bottom-tier programs. Top-tiered teams have more bargaining power and thus have an advantage in scheduling that multiplies the home field advantage they're already getting from good venues. There is hardly any way to really quantify most of this at this point, however.

6) Offensive Style/Tempo - There is virtually no quantitative analysis done on different types of offenses that you see in the college game. Part of it is because formations aren't listed in play-by-play data, so it's virtually impossible to analyze unless you intend to sit down and chart every game after they're played. Variations in offensive tempo looks to be the next major trend coming down the pipeline, thanks to Oregon's run to the national championship game this year. Dividing the number of plays a team runs by that team's time of possession yields some insight into tempo, but doesn't tell the full story - special teams plays, running out the clock, etc. jumble up time of possession.

7) Program Growth/Regression/Cycles - What factors cause some program to experience growth in yearly performance over time while others experience regression? Are there measurable performance cycles that just about all programs go through? At its core, this is sort of the macro economics of college football, and virtually nobody has looked at it. TCU would be a good case study of program growth; Miami of regression.

8) Affect of Player Experience (years in program) - Is there a predictable way to measure how a team will perform relative to the amount of experience of the players in its starting lineup? This is being looked at somewhat, but nobody really knows for sure. I personally think that experience is tied closely to player development and there are limits to how well an experienced but underdeveloped or not very talented player will perform. Further, is it better to evenly distribute the number of players you bring in each year, or overload (over sign?) some years in order to maximize the amount of experience of the starting lineup every few years for championship runs?

9) Affect of Program Revenue - There's been a bit of work on this in the field of sports economics, specifically in analyzing the BCS and whether or not it's affecting competitive balance of college football. Many public schools disclose how much revenue their football programs bring in on a yearly basis, as well as their expenditures on their coaching staffs. But little is known about how much money schools devote to recruiting, to their training programs, to academic tutors, to year-to-year facility improvements, etc. Football programs at private schools don't need to disclose anything that they don't want to. Knowing this information would make it possible to develop a 'production function' for different programs, but the data is not available or if it is, the time hasn't been taken to make a database of it.

10) Transfers/Player Movement - Every year hundreds of players transfer to different schools, leave for the NFL early, 'retire' from football due to injuries, or simply drop football (maybe even school) all together. There is no analysis into player movement beyond the anecdotal news or blog writeups that may or may not occur when this happens. There is definitely no real macro analysis looking at trends in any of this stuff, other than maybe oversigning - but even that is not overly analytical. A good start would simply be some kind of database that tracks player transfers/movement for each team on a yearly basis.

11) How to compare/incorporate lower football divisions - Even hardcore college football fans probably have a hard time telling you which division of football schools like James Madison and Northern Colorado are in. Besides the Sagarin rating system, there really isn't any work on evaluating lower division football teams, let alone figuring out how to compare them to upper level teams. This throws a monkey wrench into any kind of analysis on strength of schedule, ranking, and weighting a teams performance when teams play lower division teams. Nobody really knows how to approach it, yet just about every upper division football team is playing at least one game against a lower division team now.

I'm interested to hear other people's thoughts on the subject.

charts, stats, draft


Good questions and issues

by LaFortune Teller ⌂ @, South Bend, Sunday, February 06, 2011, 12:44 @ suave_andrew

I think everything here boils down to data that is available but not adequately collected or measured and/or data that is difficult/impossible to measure. I can't remember the source, but I was reading something recently that talked about the future of data collection/analysis in a variety of fields. Imagine a world in which every player on the football field (and the ball, perhaps) had a chip in it and the precise geo-positioning of all of those elements was instantly recorded for every moment of every game. The technology for that currently exists -- how long might it be before it is a reality? And what could teams do with that data if they had access to it?

The other thing I'll add is that I wonder what is the "problem" we're trying to solve? Are we trying to arrive at a conclusion of each team's true overall value? Are we trying to estimate the likelihood of victory in a particular game, or the likelihood of success of a particular drive, or the likelihood of success of a particular play, or to distinguish precisely how much more effective a particular running back is at hitting the hole on a particular play late in a game? Depending on how precise our questions are changes how difficult the problem really is. Or at least changes how precise we can expect our solutions to be.


I've thought about the tracking for awhile

by Pat, Right behind you, Monday, February 07, 2011, 10:33 @ LaFortune Teller

It seems to me it would be most useful for film review. Show the receivers the routes they ran imposed over a graphic of how the route was supposed to be run. Highlight if a player consistently does something wrong on a particular route.

Likewise, cornerbacks can see how the opposing receivers run (I imagine getting opposition data would be far harder) and see if they cheat on certain routes.

For the rest of the players it might be harder to find a repeatable use. I suppose you could track how much linemen run given the type of offense/defense they are facing. Could you show that a spread east-west type offense makes D-linemen run a statistically relevant distance further than those facing a more traditional smash-mouth team? At what point does the additional distance have the same impact as having a fullback slam into you?

Tracking would probably be helpful for special teams kick and punt coverage units in some fashion.


I wonder, is there a point at which too much analysis,

by APND02 ⌂ @, Winston-Salem, NC, Sunday, February 06, 2011, 15:01 @ LaFortune Teller

information and knowledge takes away some of the unknowns that make the game what it is. Right now it seems like strategy and execution is a huge oart of the game. But if everything is known, or very nearly known, does it merely come down to which team has the better players? I guess not, at least not necessarily. Since coaching still involves teaching and imparting knowledge to the players. I seems like player development will always be a big part of the game. But I do think knowing all these things will change the game. And I'm not sure it would be for the better.



Two notes

by suave_andrew, Sunday, February 06, 2011, 13:25 @ LaFortune Teller

On your first point about tracking physical player positions, that is something I've been thinking about. I was actually supposed to meet with an assistant GM for the Minnesota Timberwolves a couple of years ago to propose some ideas I had about competitive advantages through statistical analysis. One of the suggestions I was going to make was a way to track/record the real time position of their players on the court and then use that data to analyze the team's own statistical tendencies in terms of player movement and positioning. I think the guy got fired or lost interest, because he only returned one phone call after our initial conversation in person. Given how the Wolves performed that last few years, I wouldn't be surprised if he was fired.

As for what the overall purpose of sorting out/refining all of these issues would be? I'd say to gain a better understanding of the game and at least to try and get to the level that baseball is at currently. College football especially is an interesting system to look at because of all the different moving parts involved: recruiting is sort of its own world, player development is its own world (physical fitness, psychology, etc), coaching strategy is its own world, and so on. It's an interesting challenge to try and figure out how they all interact with each other.


there's a company doing that

by Jay ⌂, San Diego, Monday, February 07, 2011, 09:57 @ suave_andrew

damned if I can remember the name. A friend of a friend was working for this company that had contracts with the Philadelphia Eagles and a few other teams to attach RFID chips (or some similar motion-capture technology) to jerseys or helmets and be able to virtually recreate an entire game. This was a few years ago and my memory is fuzzy on the details.


lots of good stuff to chew on

by Jay ⌂, San Diego, Sunday, February 06, 2011, 12:01 @ suave_andrew

Related to #6, there are a number of in-game statistics that could be tracked in college football, but aren't, and could reveal whole pools of valuable analyses: plays per player, packages, formations, types of plays, passes dropped, passes broken up, blitzes by player, defensive formations, penalties by type, and so on.


I think several of those things you mention are tracked, the

by APND02 ⌂ @, Winston-Salem, NC, Sunday, February 06, 2011, 15:03 @ Jay

just aren't available to the general public. ESPN alludes to several of them on a frequent basis.


387056 Postings in 33495 Threads, 205 registered users, 71 users online (3 registered, 68 guests)
powered by my little forum