Topics Probability

Question I have a question about probability (and baseball). Say that a hitter has consistently hit .300 for many years. Now, suppose that he begins a new season in a slump, and hits only .200 for the first half; should we infer that he will hit well above .300 for the second half (and so finish with the year-end .300 average we have reason to expect of him), or would this be an instance of the gambler's fallacy?

Accepted:
May 2, 2007

Comments

If the hitter were just a batting machine that averages .300 in the long run, then it would indeed be an instance of the gambler's fallacy to think he would end up with his normal .300 even though he's .200 halfway through the season (just as it would be a fallacy to suppose a fair coin that has come down heads 5 times in a row is more likely to come down tails than heads over the next 5 throws).

But our hitter isn't a batting machine, and one respect in which this may matter is that he may try harder in the second half of the season so as to keep up his record of hitting .300 each season, and this may itself make him score well above .300 for the second half.

What we have here is an instance of the 'reference class problem'. Should we consider the hitter's second-half performance as an instance of (a) the class of all his half-season performances (in which case we should expect him to average only .300), or should we consider it as an instance of (b) the class of his-second-half-season-performances-when-he's-averaged-only-.200-in-the-first-half (in which case the statistics may indicate that he will score much higher than .300)?

The standard view is that we should estimate the probability of single events by considering them as instances of the most restrictive class we know them to be members of--so here it would be (b) rather than (a).

The trouble is that we may have fewer past statistics for such a more restrictive class--for example, if he's never had a slump like this before, then we won't have any indication of how he will respond.

Even so, I'd still say that it's the probability in this restrictive class that matters. Even if it's hard for us to find out, that's what we need to know to estimate his second-half performance. To the extent we don't know it, we simply won't have a good basis for predicting this performance.

Since you are obviously interested in probability and baseball, here's a fun question for you to think about. How can it happen that player A has a higher batting average than player B in the first half of the season, and A also has a higher batting average than B in the second half of the season, but B has a higher overall season batting average than A? (Yes, this can indeed happen. It is a form of "Simpson's Paradox.)