Proof of Concept Test: Team Profiles (2005-2009)

by suave_andrew, Monday, July 26, 2010, 23:35 (5797 days ago)
edited by suave_andrew, Monday, July 26, 2010, 23:40

I just wanted to run this by the board again to see if it conceptually makes sense. As I mentioned the last time I posted a few of these, there is some statistical meat behind these rankings - for simplicity I'm holding off on posting everything that goes into this right now. So bare with me there.

These profiles are rankings based on certain important 'team factors' that I believe help provide some explanatory insight. These profiles are not yearly, but rather represent the programs for the entire five year period. Right now, I'm just asking people to review the actual team rankings and let me know if they seem out of whack.

R = Recruiting Ranking
D = Development Ranking
S = Ease of Schedule Ranking (lowest being the easiest)
C/L = Clutch or luck Ranking (to take into account Pythagorean expectations vs reality)
H = Home Field Advantage Ranking
G = Game Day Ranking (Taking into account execution, game management, and other game day factors)


Championship Team Profiles (Teams that played in a BCS championship game from 2005-2009)
[image]

First, the easiest trend to spot is that recruiting seems to be the most important factor for championship teams. It appears you need to be in the top 11 in terms of five-year talent to really have a shot at getting to the championship game. That's not a ground breaking revelation or anything, but does set a bar for where ND needs to get.

Second, just to explain Florida's poor development ranking: I have a pretty good formula to project how many NFL players a school should produce based on its recruited talent, and while Florida did produce 24 NFL players from 2005-2009, my formula has them at an expected 32 players.

Third, ease of schedule does not seem to be a huge issue: many of the teams that have appeared in the title game appear to have played some of the more difficult schedules in the country.

Fourth, the clutch or luck statistic does not appear to be a big determinant, and it may be slightly misleading in this case because these teams likely blew away the competition for the most part throughout the years and so didn't play in a ton of close games.

Fifth, home field advantage does seem to be very important for a team: the more home games a team can play, combined with a good home field win percentage, the better.

And finally, the game day ranking does seem to tell a story: for teams that don't develop talent as well as others or don't have the best home field advantage, good game day execution, planning etc are needed. Conversely, if a team develops talent well and can fully utilize their huge home field advantage, playcalling is not as big of a deal. Certainly, it seems to make sense that Ohio State and LSU have much lower gameday rankings having watched them struggle sometimes with play calling throughout the last few years.


Looking at ND and Cincy
[image]

It looks like ND definitely had the talent from 2005-2009 to compete for a national championship, given the bar that we previously set. However, while Florida was able to overcome its development issues with a huge home field advantage and excellent gameday execution, Notre Dame struggled with these areas relative to a championship caliber team.

Using this data, one can rank Kelly's priorities:
1) Maintain ND's minimum top 11 talent
2) Either find a way to turn Notre Dame stadium into a real home field advantage or improve development and execution. From Cincy's profile (which does leak a little into the Dantonio regime), it appears that we should expect a development and execution improvement at the very least.


Again, I'll go into how the rankings truly came about in an article, but for now do they at least appear to echo reality for the most part?

I think the concepts generally work for me.

by LaFortune Teller ⌂ @, South Bend, Tuesday, July 27, 2010, 09:44 (5797 days ago) @ suave_andrew

And this is a really cool project.

I guess I'll wait for the article to understand the methodology more completely. For instance, the recruiting and schedule ratings are easily grasped and generally well-accepted measures.

Home field advantage ought to be relatively easily understood as well -- depending on the methodology. Personally, I haven't been convinced that the quantifiable differences between best and worst home field advantages are all that dramatic.

The development, game day, and cluck/luck measures make sense, but are also going to be heavily scrutinized depending on the methodology.

Here are a few big questions I have:

1. Are you trying to make a "complete" picture of a team? Or a coach? It seems like the approach you are taking with at least a few of these items is drawing a connection between performance and expectations -- that there are teams/coaches that get more out their raw materials in development and game day performance than they ought to. If this is where you're taking it, are you trying to actually identify areas teams/coaches could concentrate on for improvement, or are you trying to identify the status of a particular team/coach as a way to see through other basic data that might be clouding our judgment? And are there clear distinctions between some of these things, or are there overlaps (between, say, clutch/luck and game day)? And if it's supposed to be complete, is there anything missing?

2. Are the scales important? Are you doing the analysis a disservice by assigning a 1-120 rank for both rather than using the ratings themselves? The difference between #1 and #15 in recruiting, for instance, might be equal to the difference between #1 and #120 in home field advantage. Or it might not. But without that nuance, it makes understanding the complete picture more challenging.

3. I like the use of 5-year comprehensive data for a picture of a program. But I think it is a fairer picture of a program taken at the end of the 5-year period. In other words, when you are making comparisons and drawing conclusions, I think you should be careful about grouping teams that might have had success at opposite ends of that period. Alabama in 2009 was certainly a function of its 2005-09 recruiting, development, game day, etc. But in the same table, is it right to talk about USC in 2005 as a function of its 2005-09 recruiting, development, game day, etc? (Maybe this isn't what you're doing, and maybe that would make this far more challenging to do, but I think 2005 teams should be evaluated on 2001-05 data, 2006 teams on 2002-06 data, etc).

Reply

by suave_andrew, Tuesday, July 27, 2010, 18:46 (5797 days ago) @ LaFortune Teller

My projects tend to be relatively theoretical, so I realize that there is always going to be some issues with methods.

And this is a really cool project.

I guess I'll wait for the article to understand the methodology more completely. For instance, the recruiting and schedule ratings are easily grasped and generally well-accepted measures.

The only thing I've changed with recruiting is that I'm going by average rivals stars rather than the rivals ranking - the avg stars tend to be more correlated with win % than ranking (likely attributed to some bias).

Home field advantage ought to be relatively easily understood as well -- depending on the methodology. Personally, I haven't been convinced that the quantifiable differences between best and worst home field advantages are all that dramatic.

Home field advantage is basically the difference between the actual home field win % and the team's expected total win %. The expected comes from the team's performance at home (home win%) plus the number of total wins that came from home games. This produces a pretty accurate picture in my opinion, because it takes into account the number of games played at home as well as a team's reliance on their home field versus their performance.

The development, game day, and cluck/luck measures make sense, but are also going to be heavily scrutinized depending on the methodology.

The development one is pretty solid. I would argue that using the NFL draft as a barometer of a team's player development makes sense when you figure that a team which produces a higher than expected number of prospects is a program where you'd expect to find players that are more fundamentally sound all around.

Here are a few big questions I have:

1. Are you trying to make a "complete" picture of a team? Or a coach? It seems like the approach you are taking with at least a few of these items is drawing a connection between performance and expectations -- that there are teams/coaches that get more out their raw materials in development and game day performance than they ought to.

My intent with the profiles is to take a five year snapshot of that program and say, "from 2005-2009, the program's total winning percentage for the period (or simply performance) can be explained by these rankings. So if a team recruited well, developed its talent well, and had a relatively easy schedule, but only won something like half its games, then you would expect to see poor rankings in the other areas: especially gameday performance.

If this is where you're taking it, are you trying to actually identify areas teams/coaches could concentrate on for improvement, or are you trying to identify the status of a particular team/coach as a way to see through other basic data that might be clouding our judgment? And are there clear distinctions between some of these things, or are there overlaps (between, say, clutch/luck and game day)? And if it's supposed to be complete, is there anything missing?

The way the gameday one works is that it's basically a catchall statistic given all the other factors. So it's not actually derived from any on-field statistics, but rather saying "given all of the other factors, the difference between actual and expected win % is best explained by what occurred on the field." Which means Clutch or Luck is factored out, as is talent, schedule, and everything else. It's definitely the most theoretical part of this.

It helps explain why a team like Southern Miss (ranked 11th in game day performance) who only had average recruiting, terrible player development, no luck, and really no home field advantage was able to post a 57% winning percentage for the period (better than Notre Dame's). That and a relatively easy schedule.

2. Are the scales important? Are you doing the analysis a disservice by assigning a 1-120 rank for both rather than using the ratings themselves? The difference between #1 and #15 in recruiting, for instance, might be equal to the difference between #1 and #120 in home field advantage. Or it might not. But without that nuance, it makes understanding the complete picture more challenging.

Using rankings rather than relative scales provides an easier to understand way to communicate the profiles to people who are not familiar with the process and don't want to really take the time to figure it out. Also, the rankings did end up making the model more explanatory and the ranking variables were more significant - I'm not sure what that's a result of, maybe just that the relative scale isn't all that important?

3. I like the use of 5-year comprehensive data for a picture of a program. But I think it is a fairer picture of a program taken at the end of the 5-year period. In other words, when you are making comparisons and drawing conclusions, I think you should be careful about grouping teams that might have had success at opposite ends of that period. Alabama in 2009 was certainly a function of its 2005-09 recruiting, development, game day, etc. But in the same table, is it right to talk about USC in 2005 as a function of its 2005-09 recruiting, development, game day, etc? (Maybe this isn't what you're doing, and maybe that would make this far more challenging to do, but I think 2005 teams should be evaluated on 2001-05 data, 2006 teams on 2002-06 data, etc).

An eventual goal is to be able to transfer my profile system into a yearly team profile and maybe even try to use it for predictive purposes. But right now I think the easiest and least time consuming approach is to just use five years worth of data because it dampens out a lot of randomness you get from year-to-year data. It obviously does not give a full description of teams that have undergone coaching changes (i.e. Alabama), but the profile still does describe that five year period in total.

Thanks.

by LaFortune Teller ⌂ @, South Bend, Wednesday, July 28, 2010, 00:45 (5796 days ago) @ suave_andrew

I'm still a little foggy on a little bit of it, but I'm looking forward to reading the summary of it when its all put together.

Really interesting stuff.

by APND02 ⌂ @, Winston-Salem, NC, Tuesday, July 27, 2010, 05:44 (5797 days ago) @ suave_andrew

- No text -

powered by my little forum