Cybermetrics: Rabbit Maranville, Mr. RBI

Why him? The following Bill James formula predicts his RBIs better than any other player:

RBI = (TB/4) + HRs

It predicts he would have had 883.75 RBIs while he actually had 884. For every 700 PAs, or about a full season, that is only off by +.016. That is the most accurate prediction for all players with 5,000+ PAs from 1876-2012 (I used Baseball Reference and RBIs might not be available for all pre-1900 years). Click here to see the rankings. The rankings are arranged by how much over or under a player was predicted.

Cap Anson was predicted to have 77.24 RBIs per 700 PAs while he actually had about 130. So he gets +52.76. Of course, this does not mean he was necessarily a great clutch hitter (although he could have been-he did lead the league 8 times in RBIs according to Baseball Reference and if you notice, he is 7 RBIs ahead of the next best guy, so he looks like a bit of an outlier). But his team led the league in OBP several times back then and in other years was often near the top.

So what might be going on with Anson? For one, he did not hit many HRs (just 97). But no one did back then so you had low HR guys batting in the middle of the order, where you would get more than the average number of RBI opportunities. Second, he might have played in some years when the league OBP was high. Third, more players reached on errors back then, creating even more opportunities.

Over the last 10 years, the formula has predicted about 20 more RBIs per team each year in the AL than they actually got. In the NL, it is about 25 more. So the prediction is coming in around 3% too high. Again, we are in a low error period, so not as many runners are reaching on errors as in other period's in baseball's history.

In, fact there is a high correlation between how often runners reach (by whatever means) and the size of the prediction error for a whole league in any given year. I added the OBP each year to the error rate (ERATE) each year (ERATE is 1 - fielding percentage). That sum was then correlated with how big the prediction error was per team (the more teams you have the bigger the error might be). For all of NL history, that correlation is .87 and for the AL it is .85. So years when an entire league had more RBIs than predicted it most likely had alot more baserunners than normal, by hits, walks, HBP and errors.

Now getting back to Maranville, he tended to bat leadoff, 2nd or 7th. Hardly great RBI slots. So you might expect him to get less than the number of RBIs expected. But he did play mainly in the 1920s and 30s, when OBPs were high and the ERATE was higher. He also hit well with runners on base. Retrosheet only has about 1300 of his 8800 career ABs broken down for this. But with none on he batted .277. With runners on, .317 and with runners in scoring position, .324. Click here to see his splits.

If you look at the rankings from the first link, you can see that many of the batters who had the biggest negative differentials (meaning they got fewer RBIs than expected) were leadoff men.

This formula may apply best to power hitters who bat in the middle of the order. So I also looked at how well the formula predicted for all players with 300+ career HRs. Click here to see that link. The guy that jumps out there is Al Simmons. He got about 25 more RBIs than expected per season and the next highest is Greenberg at about 18.

Now Simmons batted 4th most of his career (especially with the A's) and had Max Bishop leading off alot of that time. Bishop had a career OBP of .423. The 2-3 hitters probably averaged around .365. So he had alot of opportunities. But Retrosheet has an even smaller number of on-base splits for Simmons so it is hard to tell if he was a clutch hitter.

I was surprised to see Willie Mays so far down. He had 15 fewer RBIs than predicted per season yet he hit well with runners on.Click here to see his splits. Maybe he got intentionally walked alot with runners in scoring position. Mantle and Barry Bonds are also near the bottom and the same thing could have happened to them. Alfonso Soriano has batted leadoff nearly half of his PAs, so that may be why he is last.

Bill James discusses this formula in his book Solid Fool's Gold: Detours on the Way to Conventional Wisdom

I have published two articles about RBI prediction:

RBIs, Opportunities and Power Hitting

Do Hitter’s Get Their Expected RBIs?

Cybermetrics

Wednesday, March 13, 2013

Rabbit Maranville, Mr. RBI

No comments:

About Me

Links

Blog Archive