Friday, September 22, 2006


The science of hope

It's the question that haunts us all this time of year, but never more so than when your team is 4-0, looks overwhelming with every possible arrangement of players, and has just spanked its chief rival using a bunch of kids against veterans.

Do the standings in the preseason really mean anything?

If you're like me your operating assumption has always been that they don't—and that, in fact, a good preseason record might even be a bad omen. I strongly suspect this is true, for instance, in football, where smart coaches are hiding strategies and keeping star players out of danger for the exhibition games. I've never noticed any strong connection in baseball between the teams that clean up in the preseason and the ones that do well when the games count. (Kansas City always seems to devastate the rest of the Cactus League every spring.)

But hockey might be another story. After the Oilers won the Battle of Alberta preview last night, I decided to take a look at the '05-'06 preseason standings and see if they told us anything we should have been paying attention to. The top six teams league-wide in that preseason, with their records, were:

San Jose (7-0-0)
Ottawa (7-1-0)
Philadelphia (6-1-0)
Colorado (4-0-2)
Buffalo (6-2-0)
Carolina (5-2-1)

Hard to know what to make of that, but the list does contain five 99-point teams, including the eventual champion. The bottom six were:

NY Rangers (2-3-1)
Washington (2-4-2)
Tampa Bay (2-4-1)
Pittsburgh (2-5-2)
Columbus (3-6-0)
Phoenix (2-6-0)

Four awful teams, one that limped into the playoffs, and one that pulled Henrik Lundqvist out of its hindquarters. This was convincing enough to make me choose a broader statistical test for comparing preseason performance to regular-season performance.

What I found was pretty startling. Reaching into my meager quant toolbox, I pulled out Spearman's rank correlation test, which does what it says on the box--it tells you whether there is a strong correlation between paired sets of ordinal, or ranked, data. I don't know that this is the most appropriate instrument for the job, but it has the advantage of being a nonparametric test, which in English means that it doesn't depend on too many assumptions about the underlying data and is good for making thumbnail judgments of this sort. Moreover, a sample size of 30 is usually considered the ideal number for the Spearman test.

So I made a big table of preseason rankings and regular-season rankings for all the teams: Philadelphia, for instance, finished 3rd in the league in the preseason and 8.5th in the league during the regular season. 8.5th? Yeah, they tied with the Devils for 9th, so to fit them into a table of Spearman pairs you say both teams finished 8.5th. Just go with me on this.

The test ends up giving you a number, representing the correlation between the sets of pairs, symbolized by the Greek letter rho (ρ). The number can be anywhere from -1 to 1. If it's -0.5 or less, you've found a strong negative relationship between the two sets of rankings; bad in one column would be good in the other, and vice versa. If it's 0.5 or greater, you've found a strong positive relationship: one ranking can be used to predict the other.

The number for the '05-'06 preseason and regular season is, if my math is right, 0.56. Is that high? It's high enough that on its own this data could, according to the ordinary standards of social science, persuade you to reject the null hypothesis of no relationship between the two sets of standings (technical note: the critical value of ρ yielding 95% confidence for a two-tailed test with n=30 is much lower, 0.43). In other words it's a pretty persuasive demonstration that the preseason is not meaningless, and that success in exhibition games does predict success in the regular season.

So, Oiler fans: are you excited yet? If the lads actually go on to finish 7-1-0 or 6-0-2 or something like that through the end of September, you probably should be. (Better-trained math people are encouraged to check my work, extend it to other NHL preseasons, and engage in vicious personal attacks on my grooming and ancestry.)


"(Better-trained math people are encouraged to check my work, extend it to other NHL preseasons, and engage in vicious personal attacks on my grooming and ancestry.)"

Heh. This immediately reminded me of the comments to this blog post:


More like the science of turning pig iron into gold.

Is this what happens when bench clearing brawls are outlawed? I think so.


Jesus. And all this time I figured there was nothing to write about. I better come back with a vengeance. Great post. I'm now absolutely convinced we are going to win the Cup.

Might want to remember:

Regular season standings mean very little when it comes to the playoffs ( ask the best teams from last year).

Oilers should be happy to make the playoffs and repeat as under dogs than worry about the regular season standings.

Interesting use of numbers tho, good job.

What about point percentage? I imagine the trend is fairly close, but I wonder exactly how much so. That'd be what I'd test, anyway.

I think you lost most Oiler fans at 'The....'
I kid I kid.

No shit you kid. Go over to a Flames board and say anything that at all has reasoning based in numbers. They recoil like vampires from sunlight.

In other words it's a pretty persuasive demonstration that the preseason is not meaningless, and that success in exhibition games does predict success in the regular season.

Nice post, CC. I've been saying this for years, but had no definitive proof. Preseason standigns were the main reason I thought Carolina and Buffalo would do well prior to last season, even though everyone thought I went off the deep end.

You lost me -- not with the math, but with the opening paragraph. (Otherwise, good stuff. And it looks like that kid Horcoff might be a player.)

I'm not too sure how significantly this problem benifits from Spearman's rank (try spearman vs point % regression). I'm curious what the p-value is (not rho). Also, I'm assuming you scaled out games played in your ranking...

Scatterplot #1
Scatterplot #2 [That is estimation of winning % based on GF^2/(GF^2+GA^2)].

In a data set this small looking at how the "regressions" are formed is critical. One team can change things a lot (which is probably why the Spearman test works better).

What you can see in the first graph is 6 teams that the regression is chasing on the right. And a bunch of team that are "missed" in the bottom left...

I think you can learn a lot more about your team in the last 4 games in comparison to the first 4, but I think you have to focus on individuals in the pre-season the predict the regular season. Winning an AHL style preseason game isn't worth that much. For example WHO are the goals against coming against...

That all being said, you can also see in the graphs that there are few teams that did really well in the pre-season, but poorly in the regular. Although St. Louis is a candidate...

I wonder if this correlation holds up as well if you compare the results of the first month of the regular season with the whole season? I remember the Kings looked like they were going to win the Cup before Remembrance day last year. The Canucks were also pretty strong? I'm guessing your correlation in the preseason is stronger than the same correlation for the first four weeks of the regular season.

Javageek: it's cool that you showed up with your golf clubs. I was hoping someone would. Yes, the preseason numbers are normalized for games played.

I think the crux of your first scatterplot is right: it's not worth getting juiced unless your team is at the extremes. With your second scatterplot the problem is that you're trying to knock the hat off a nonparametric test with a parametric one. That's like bringing a knife to a gunfight. If large differences in the preseason led to marginal ones in the regular season, that would actually make the preseason data all the more valuable, but it would generate an uninteresting-looking regular-season plot.

Like you I'm curious what the exact p-value is; it would require a Monte Carlo test, so all I can say with my training is that it's much lower than 0.05.

This is an interesting theory but there's myriad theories in the universe supported by data seemingly created to prop up the argument.

Call me a pigheaded traditionalist, but there are far too many variables in rosters — especially in the first week of NHL exhibition games — to get an idea of which team will be better. Often shows the depth of an organization, and the most fans knew the Oilers had plenty of depth (especially at forward) before camps started.

What the games do say is that teams like the Oilers, which have quite a few jobs open for newcomers, benefit in the preseason from the competition for those jobs. Most of the guys mentioned as possible Oilers this season are having at least reasonable (and some like Thoresen, excellent) training camps, from what I read. Teams that have only one job available for newcomers probably have easier camps because the vets are concentrating on getting into game shape instead of keeping their jobs.

I'd also beware of those MSM writeups in camp — reporters have to pump up the guys they write about to justify to their editors that the story is worthwhile to put in the paper. Mikhnov's camp would most likely be a lot better if he knew English well enough for an interview.

To say the Oilers are going to excel in the regular season because their exhibition record resembles 05-06 Carolina and Buffalo, well, the Oilers haven't finished the exhibition season yet. Good start but they could lose three games and make all this regression analysis moot until next exhibition season.

Cool post and good comments. Clearly there is something to it for 05/06.

A really simple way to do this Colby, is just to cut and paste the preseason and regular season data into excel, sort by team name, use the =RANK() function to establish the ranks, then the =CORREL() (Pearson) function on the two ranks list gives you the Spearman Correlation.

You'd have to do about five minutes of manual labour to change the rankings to your system here, or write a tiny script to do it. Either way it's not much work.

The reason I mention this is because I doubt that the relationship between the preseason and regular season is like this for other years, and this would make it easy to check without eating up time.

I checked for 03/04, the only other season with the preseason results on the Yahoo link you provided. And there wasn't much in it at all. Looks like the strongest correlation by a shade is "points-per-game in the preseason" to "regular season points".

Bear in mind that it has been close to two decades since I took a stats course, hopefully Java will correct me if I'm wrong.

In short, I think I agree with 'bigleaguer' on the whole, though as you've shown, for 05/06 it was a different ball of wax for some reason(s).

I suspect that if you ran the correlations for the past ten seasons before that at least nine would show a slight positive correlation, in the "just noise" range. Just my feeling, I'd be happy to be proven wrong. If someone knows where the preseason results are laid out in a nice, clean format ... if they provide the link I'll run the numbers.

Isn't there a BoA game tonight? Man, this site is slipping...

Isn't there a BoA game tonight? Man, this site is slipping...

It's past noon on Saturday, Andy. Shouldn't you be out stalking an Oiler prospect by now? :P

I should also add: try doing the same analysis with the previous year performances and see how it compares. I did it with a normal regression and it is a much better variable than the pre-season.

Yeah, but I don't think that's a surprise to anybody.

Blackhawks fans are excited too! We'll see how that pans out.

Results after 41 games do not predict final standings with enough reliability that I can trade off it and beat the transactions costs (8% on Tradesports, last I checked in a brutally illiquid market). I think you'd eventually come out ahead, but you could destroy the RRSP in the meantime.

I found that performance after 41 games predicted final standings with a standard deviation of 5.3. In other words, if I projected a 6th place finish after 41 games, based on record alone, the team would finish between 1st and 12th in the league with 68% probability. It's pretty useless. Predictions based on goal differential are worse.

If 41 regular season games mean nothing, then 5 pre-season games mean even less.

Don't get too excited. The Flames are going to be sipping Big Rock out of the cup this season.




A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,av dvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX,成人論壇




Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?