Friday, December 5, 2008

A little math goes a long way

Over here, Colin Wyers, who is generally a smart guy, studies whether a run saved is just as valuable as a run scored.  He takes it to be an implication of the Pythagorean winning percentage formula (Win% = (RS^2/(RS^2+RA^2)) that this is indeed the case.  However, looking at teams with matched run differentials but widely differing total runs (so, for instance, a team with RS/RA allowed of 900/800 gets matched with a 750/650) he finds that the lower scoring teams do slightly, but statistically significantly better.  He takes this to imply some slight deficiency in the Pythagorean formula.

The problem is that he is completely wrong in his interpretation of the Pythagorean formula.  In fact, as I showed in one of my first posts here, according to the Pythagorean formula a run saved is more valuable when the team is an above average team, and a run scored more valuable when it is below average.  Thus, if there are more teams above .500 in his sample than below--which, if you look at the article, is indeed the case--the Pythagorean formula would predict that that the run saved is better than the run scored.  In other words, completely contrary to the article's claim, the Pythagorean formula correctly predicts the data!

To be fair, Colin does later present some additional evidence that Pythagorean estimators are less accurate for the low scoring teams.  That, however, does not make up for the earlier error.

2 comments:

Jack Klompus said...

A little qualitative thinking also goes a long way. Blackadder's point is pretty obvious, albeit in an obvious-once-I'm-told sense.

For winning teams with the same run differentials, more overall runs means the (positive) run differential is a smaller proportion of overall runs. So that team is worse than a team with the same run differential in a lower-scoring run environment. Thus winning teams should rather prevent a run than score one.

What makes this obvious? A team with 162/0 RS/RA will be undefeated. A team with 1,162/1,000 will be much worse. Both have the same run differential. Way different outcome.

All I've done is simply apply the Pythagorean formula. Win% = 162 squared / 162 squared = 100%.

The reverse holds for losing teams -- they should prefer high-run environments. For losers, the Pythagorean formula approaches 0 as total runs decrease.

All this, I confess, is how Blackadder originally put his point to my sorry, math-allergic ass. But once you explain it this way, don't the t-tests look silly?

Blackadder said...

That's a good point, and raises another one: I don't think any important insight sabermetrics has provided requires mathematical sophistication to appreciate. When coming up with new theories, of course mathematical and statistical ability of at least a modest degree is quite important. However, I think the subject is sufficiently basic that any insight that comes out of that work should be explicable in non-mathematical terms. Maybe those who play around with regressions and run estimators are more likely than a casual fan to figure out that, say, OBP is a far better stat than BA, but the arguments can then be recast in completely non-mathematical terms. I would be skeptical of any sabermetric result for which this was not the case.