Friday, September 22, 2006

 

The science of hope

It's the question that haunts us all this time of year, but never more so than when your team is 4-0, looks overwhelming with every possible arrangement of players, and has just spanked its chief rival using a bunch of kids against veterans.

Do the standings in the preseason really mean anything?

If you're like me your operating assumption has always been that they don't—and that, in fact, a good preseason record might even be a bad omen. I strongly suspect this is true, for instance, in football, where smart coaches are hiding strategies and keeping star players out of danger for the exhibition games. I've never noticed any strong connection in baseball between the teams that clean up in the preseason and the ones that do well when the games count. (Kansas City always seems to devastate the rest of the Cactus League every spring.)

But hockey might be another story. After the Oilers won the Battle of Alberta preview last night, I decided to take a look at the '05-'06 preseason standings and see if they told us anything we should have been paying attention to. The top six teams league-wide in that preseason, with their records, were:

San Jose (7-0-0)
Ottawa (7-1-0)
Philadelphia (6-1-0)
Colorado (4-0-2)
Buffalo (6-2-0)
Carolina (5-2-1)

Hard to know what to make of that, but the list does contain five 99-point teams, including the eventual champion. The bottom six were:

NY Rangers (2-3-1)
Washington (2-4-2)
Tampa Bay (2-4-1)
Pittsburgh (2-5-2)
Columbus (3-6-0)
Phoenix (2-6-0)

Four awful teams, one that limped into the playoffs, and one that pulled Henrik Lundqvist out of its hindquarters. This was convincing enough to make me choose a broader statistical test for comparing preseason performance to regular-season performance.

What I found was pretty startling. Reaching into my meager quant toolbox, I pulled out Spearman's rank correlation test, which does what it says on the box--it tells you whether there is a strong correlation between paired sets of ordinal, or ranked, data. I don't know that this is the most appropriate instrument for the job, but it has the advantage of being a nonparametric test, which in English means that it doesn't depend on too many assumptions about the underlying data and is good for making thumbnail judgments of this sort. Moreover, a sample size of 30 is usually considered the ideal number for the Spearman test.

So I made a big table of preseason rankings and regular-season rankings for all the teams: Philadelphia, for instance, finished 3rd in the league in the preseason and 8.5th in the league during the regular season. 8.5th? Yeah, they tied with the Devils for 9th, so to fit them into a table of Spearman pairs you say both teams finished 8.5th. Just go with me on this.

The test ends up giving you a number, representing the correlation between the sets of pairs, symbolized by the Greek letter rho (ρ). The number can be anywhere from -1 to 1. If it's -0.5 or less, you've found a strong negative relationship between the two sets of rankings; bad in one column would be good in the other, and vice versa. If it's 0.5 or greater, you've found a strong positive relationship: one ranking can be used to predict the other.

The number for the '05-'06 preseason and regular season is, if my math is right, 0.56. Is that high? It's high enough that on its own this data could, according to the ordinary standards of social science, persuade you to reject the null hypothesis of no relationship between the two sets of standings (technical note: the critical value of ρ yielding 95% confidence for a two-tailed test with n=30 is much lower, 0.43). In other words it's a pretty persuasive demonstration that the preseason is not meaningless, and that success in exhibition games does predict success in the regular season.

So, Oiler fans: are you excited yet? If the lads actually go on to finish 7-1-0 or 6-0-2 or something like that through the end of September, you probably should be. (Better-trained math people are encouraged to check my work, extend it to other NHL preseasons, and engage in vicious personal attacks on my grooming and ancestry.)

Comments:

"(Better-trained math people are encouraged to check my work, extend it to other NHL preseasons, and engage in vicious personal attacks on my grooming and ancestry.)"

Heh. This immediately reminded me of the comments to this blog post:

http://thestar.blogs.com/azerb/2006/08/the_right_stuff.html
 


.../the_right_stuff.html
 


More like the science of turning pig iron into gold.

Is this what happens when bench clearing brawls are outlawed? I think so.

hunter1909
 


Jesus. And all this time I figured there was nothing to write about. I better come back with a vengeance. Great post. I'm now absolutely convinced we are going to win the Cup.
 


Might want to remember:

Regular season standings mean very little when it comes to the playoffs ( ask the best teams from last year).

Oilers should be happy to make the playoffs and repeat as under dogs than worry about the regular season standings.

Interesting use of numbers tho, good job.
 


What about point percentage? I imagine the trend is fairly close, but I wonder exactly how much so. That'd be what I'd test, anyway.
 


I think you lost most Oiler fans at 'The....'
I kid I kid.
 


No shit you kid. Go over to a Flames board and say anything that at all has reasoning based in numbers. They recoil like vampires from sunlight.
 


In other words it's a pretty persuasive demonstration that the preseason is not meaningless, and that success in exhibition games does predict success in the regular season.

Nice post, CC. I've been saying this for years, but had no definitive proof. Preseason standigns were the main reason I thought Carolina and Buffalo would do well prior to last season, even though everyone thought I went off the deep end.
 


You lost me -- not with the math, but with the opening paragraph. (Otherwise, good stuff. And it looks like that kid Horcoff might be a player.)
 


I'm not too sure how significantly this problem benifits from Spearman's rank (try spearman vs point % regression). I'm curious what the p-value is (not rho). Also, I'm assuming you scaled out games played in your ranking...

Scatterplot #1
Scatterplot #2 [That is estimation of winning % based on GF^2/(GF^2+GA^2)].

In a data set this small looking at how the "regressions" are formed is critical. One team can change things a lot (which is probably why the Spearman test works better).

What you can see in the first graph is 6 teams that the regression is chasing on the right. And a bunch of team that are "missed" in the bottom left...

I think you can learn a lot more about your team in the last 4 games in comparison to the first 4, but I think you have to focus on individuals in the pre-season the predict the regular season. Winning an AHL style preseason game isn't worth that much. For example WHO are the goals against coming against...

That all being said, you can also see in the graphs that there are few teams that did really well in the pre-season, but poorly in the regular. Although St. Louis is a candidate...
 


I wonder if this correlation holds up as well if you compare the results of the first month of the regular season with the whole season? I remember the Kings looked like they were going to win the Cup before Remembrance day last year. The Canucks were also pretty strong? I'm guessing your correlation in the preseason is stronger than the same correlation for the first four weeks of the regular season.
 


Javageek: it's cool that you showed up with your golf clubs. I was hoping someone would. Yes, the preseason numbers are normalized for games played.

I think the crux of your first scatterplot is right: it's not worth getting juiced unless your team is at the extremes. With your second scatterplot the problem is that you're trying to knock the hat off a nonparametric test with a parametric one. That's like bringing a knife to a gunfight. If large differences in the preseason led to marginal ones in the regular season, that would actually make the preseason data all the more valuable, but it would generate an uninteresting-looking regular-season plot.

Like you I'm curious what the exact p-value is; it would require a Monte Carlo test, so all I can say with my training is that it's much lower than 0.05.
 


This is an interesting theory but there's myriad theories in the universe supported by data seemingly created to prop up the argument.

Call me a pigheaded traditionalist, but there are far too many variables in rosters — especially in the first week of NHL exhibition games — to get an idea of which team will be better. Often shows the depth of an organization, and the most fans knew the Oilers had plenty of depth (especially at forward) before camps started.

What the games do say is that teams like the Oilers, which have quite a few jobs open for newcomers, benefit in the preseason from the competition for those jobs. Most of the guys mentioned as possible Oilers this season are having at least reasonable (and some like Thoresen, excellent) training camps, from what I read. Teams that have only one job available for newcomers probably have easier camps because the vets are concentrating on getting into game shape instead of keeping their jobs.

I'd also beware of those MSM writeups in camp — reporters have to pump up the guys they write about to justify to their editors that the story is worthwhile to put in the paper. Mikhnov's camp would most likely be a lot better if he knew English well enough for an interview.

To say the Oilers are going to excel in the regular season because their exhibition record resembles 05-06 Carolina and Buffalo, well, the Oilers haven't finished the exhibition season yet. Good start but they could lose three games and make all this regression analysis moot until next exhibition season.
 


Cool post and good comments. Clearly there is something to it for 05/06.

A really simple way to do this Colby, is just to cut and paste the preseason and regular season data into excel, sort by team name, use the =RANK() function to establish the ranks, then the =CORREL() (Pearson) function on the two ranks list gives you the Spearman Correlation.

You'd have to do about five minutes of manual labour to change the rankings to your system here, or write a tiny script to do it. Either way it's not much work.

The reason I mention this is because I doubt that the relationship between the preseason and regular season is like this for other years, and this would make it easy to check without eating up time.

I checked for 03/04, the only other season with the preseason results on the Yahoo link you provided. And there wasn't much in it at all. Looks like the strongest correlation by a shade is "points-per-game in the preseason" to "regular season points".

Bear in mind that it has been close to two decades since I took a stats course, hopefully Java will correct me if I'm wrong.

In short, I think I agree with 'bigleaguer' on the whole, though as you've shown, for 05/06 it was a different ball of wax for some reason(s).

I suspect that if you ran the correlations for the past ten seasons before that at least nine would show a slight positive correlation, in the "just noise" range. Just my feeling, I'd be happy to be proven wrong. If someone knows where the preseason results are laid out in a nice, clean format ... if they provide the link I'll run the numbers.
 


Isn't there a BoA game tonight? Man, this site is slipping...
 


Isn't there a BoA game tonight? Man, this site is slipping...

It's past noon on Saturday, Andy. Shouldn't you be out stalking an Oiler prospect by now? :P
 


I should also add: try doing the same analysis with the previous year performances and see how it compares. I did it with a normal regression and it is a much better variable than the pre-season.
 


Yeah, but I don't think that's a surprise to anybody.
 


Blackhawks fans are excited too! We'll see how that pans out.
 


Results after 41 games do not predict final standings with enough reliability that I can trade off it and beat the transactions costs (8% on Tradesports, last I checked in a brutally illiquid market). I think you'd eventually come out ahead, but you could destroy the RRSP in the meantime.

I found that performance after 41 games predicted final standings with a standard deviation of 5.3. In other words, if I projected a 6th place finish after 41 games, based on record alone, the team would finish between 1st and 12th in the league with 68% probability. It's pretty useless. Predictions based on goal differential are worse.

If 41 regular season games mean nothing, then 5 pre-season games mean even less.
 


Don't get too excited. The Flames are going to be sipping Big Rock out of the cup this season.
 


A片,aio,av女優,av,av片,aio交友愛情館,ut聊天室,聊天室,豆豆聊天室,色情聊天室,尋夢園聊天室,080聊天室,視訊聊天室,080苗栗人聊天室,上班族聊天室,成人聊天室,中部人聊天室,一夜情聊天室,情色聊天室,情色視訊,美女視訊,辣妹視訊,視訊交友網,免費視訊聊天,視訊,免費視訊,美女交友,成人交友,聊天室交友,微風論壇,微風成人,sex,成人,情色,情色貼圖,色情,微風,聊天室尋夢園,交友,視訊交友,視訊聊天,視訊辣妹,一夜情,色情聊天室,聊天室

情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣用品,情趣,情趣,情趣,情趣,情趣,情趣,情趣,情趣,按摩棒,跳蛋,充氣娃娃,情境坊歡愉用品,情趣用品,情人節禮物,情惑用品性易購,A片,視訊聊天室

免費A片,AV女優,美女視訊,情色交友,免費AV,色情網站,辣妹視訊,美女交友,色情影片,成人影片,成人網站,A片,H漫,18成人,成人圖片,成人漫畫,情色網,日本A片,免費A片下載,性愛

A片,色情,成人,做愛,情色文學,A片下載,色情遊戲,色情影片,色情聊天室,情色電影,免費視訊,免費視訊聊天,免費視訊聊天室,一葉情貼圖片區,情色,情色視訊,免費成人影片,視訊交友,視訊聊天,視訊聊天室,言情小說,愛情小說,AIO,AV片,A漫,av dvd,聊天室,自拍,情色論壇,視訊美女,AV成人網,色情A片,SEX,成人論壇

情趣用品,A片,免費A片,AV女優,美女視訊,情趣用品,A片,免費A片,日本A片,A片下載,線上A片,成人電影,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,微風成人區,成人文章,成人影城,情色,情色貼圖,色情聊天室,情色視訊色情網站,一葉情貼圖片區,做愛,性愛,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,美女交友,做愛影片

av,情趣用品,a片,成人電影,微風成人,嘟嘟成人網,成人,成人貼圖,成人交友,成人圖片,18成人,成人小說,成人圖片區,,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,情色電影,aio,av女優,AV,免費A片,日本a片,美女視訊,辣妹視訊,聊天室

情趣用品.A片,情色,情色貼圖,色情聊天室,情色視訊,情色文學,色情小說,情色小說,色情,寄情築園小遊戲,情色電影,色情遊戲,色情網站,聊天室,ut聊天室,豆豆聊天室,美女視訊,辣妹視訊,視訊聊天室,視訊交友網,免費視訊聊天,免費A片,av女優,av,成人電影,成人,成人貼圖,成人交友,成人網站,自拍,尋夢園聊天室
 


Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?