Cybermetrics: March 2013

Why him? The following Bill James formula predicts his RBIs better than any other player:

RBI = (TB/4) + HRs

It predicts he would have had 883.75 RBIs while he actually had 884. For every 700 PAs, or about a full season, that is only off by +.016. That is the most accurate prediction for all players with 5,000+ PAs from 1876-2012 (I used Baseball Reference and RBIs might not be available for all pre-1900 years). Click here to see the rankings. The rankings are arranged by how much over or under a player was predicted.

Cap Anson was predicted to have 77.24 RBIs per 700 PAs while he actually had about 130. So he gets +52.76. Of course, this does not mean he was necessarily a great clutch hitter (although he could have been-he did lead the league 8 times in RBIs according to Baseball Reference and if you notice, he is 7 RBIs ahead of the next best guy, so he looks like a bit of an outlier). But his team led the league in OBP several times back then and in other years was often near the top.

So what might be going on with Anson? For one, he did not hit many HRs (just 97). But no one did back then so you had low HR guys batting in the middle of the order, where you would get more than the average number of RBI opportunities. Second, he might have played in some years when the league OBP was high. Third, more players reached on errors back then, creating even more opportunities.

Over the last 10 years, the formula has predicted about 20 more RBIs per team each year in the AL than they actually got. In the NL, it is about 25 more. So the prediction is coming in around 3% too high. Again, we are in a low error period, so not as many runners are reaching on errors as in other period's in baseball's history.

In, fact there is a high correlation between how often runners reach (by whatever means) and the size of the prediction error for a whole league in any given year. I added the OBP each year to the error rate (ERATE) each year (ERATE is 1 - fielding percentage). That sum was then correlated with how big the prediction error was per team (the more teams you have the bigger the error might be). For all of NL history, that correlation is .87 and for the AL it is .85. So years when an entire league had more RBIs than predicted it most likely had alot more baserunners than normal, by hits, walks, HBP and errors.

Now getting back to Maranville, he tended to bat leadoff, 2nd or 7th. Hardly great RBI slots. So you might expect him to get less than the number of RBIs expected. But he did play mainly in the 1920s and 30s, when OBPs were high and the ERATE was higher. He also hit well with runners on base. Retrosheet only has about 1300 of his 8800 career ABs broken down for this. But with none on he batted .277. With runners on, .317 and with runners in scoring position, .324. Click here to see his splits.

If you look at the rankings from the first link, you can see that many of the batters who had the biggest negative differentials (meaning they got fewer RBIs than expected) were leadoff men.

This formula may apply best to power hitters who bat in the middle of the order. So I also looked at how well the formula predicted for all players with 300+ career HRs. Click here to see that link. The guy that jumps out there is Al Simmons. He got about 25 more RBIs than expected per season and the next highest is Greenberg at about 18.

Now Simmons batted 4th most of his career (especially with the A's) and had Max Bishop leading off alot of that time. Bishop had a career OBP of .423. The 2-3 hitters probably averaged around .365. So he had alot of opportunities. But Retrosheet has an even smaller number of on-base splits for Simmons so it is hard to tell if he was a clutch hitter.

I was surprised to see Willie Mays so far down. He had 15 fewer RBIs than predicted per season yet he hit well with runners on.Click here to see his splits. Maybe he got intentionally walked alot with runners in scoring position. Mantle and Barry Bonds are also near the bottom and the same thing could have happened to them. Alfonso Soriano has batted leadoff nearly half of his PAs, so that may be why he is last.

Bill James discusses this formula in his book Solid Fool's Gold: Detours on the Way to Conventional Wisdom

I have published two articles about RBI prediction:

RBIs, Opportunities and Power Hitting

Do Hitter’s Get Their Expected RBIs?

Below is something I posted in June 2011. Tango just had a post on this issue because Gossage is talking again about his work load being tougher. Tango posted something on this yesterday. See Mo v Goose. Now my post from 2011.

...“I wasn’t a closer, I was a relief pitcher,” Gossage said. He made a great point that he was not just the closer, but the seventh and eighth inning man. He pointed out that he came on with inherited runners in the seventh or eighth inning many times. Some of those situations required that he keep the ball out of play.

Gossage went on to say that “Mariano doesn’t come in with inherited runners. He gets to start out the ninth with nobody on… Easy? It is a piece of cake compared to what we use to do.”

From Baseball Think Factory, quoting an article by Mike Silva.

Yes, relievers were used differently in Gossage's time. From 1977-1985, one of the time periods I will look at for Gossage, most of the top 50 seasons in both saves and games finished were by pitchers who pitched over 100 innings (with only a couple of cases of even 1 game started). From 1997-2005, the period I will look at for Rivera, there were no 100+ IP seasons and even 90+ IP was rare (less than 5 for both stats).

So I want to compare both Gossage and Rivera to the average relievers of their times. I picked Gossage's 1977-1985 years since that seems to be his prime years and he was very good throughout the period. It does leave out his great 1975 season as a reliever (he was a starter in 1976). So for Rivera, I look at his first 9 years as a closer, 1997-2005 (which leaves out a very good 1996 seaon). The fact that Rivera has continued to pitch great since then is a plus in his favor. Gossage supporters might say that Rivera's relatively low IP totals have helped his longevity. Gossage was just average after 1985.

The average relief pitcher from 1977-1985 had an ERA of 3.68 while Gossage had 2.10. If we turn that into a winning pct. using the Pythagorean formula created by Bill James to estimate team winning pct. using runs and runs allowed, we get .754. From 1997-2005, Rivera's years, he had an ERA of 2.04 while the league average was 4.31. That gets us a pct of .817. So Rivera edges Gossage .817-.754. (I checked park factors for each pitcher and the simple average of their teams pitching park factors was the same, 97.56, meaning that they each got a little help from their parks, which were about 2.5% lower than average in scoring). All the data I use here is from Baseball Reference or The Lee Sinins Complete Baseball Encyclopedia.

I also found the top 10 pitchers in saves in each era and then calculated the combined ERA of the other 9 (taking out Gossage and Rivera). The best 9 in Gossage's years had 2.87. That gets a .651 pct. The best 9 in Rivera's years had 3.07, getting us a pct of .694. Again, edge to Rivera.

So far, when being compared to contemporaries with a similar role, Rivera is ahead. But ERA can be misleading, since the fielders play a role here (and ERA may not be the best way to judge relievers who are supposed to come in and put out fires).

To avoid this problem, I am going to look at how each guy comapared to his peers in the fielding independent stats (HRs, BBs, SOs). Then I will convert that into a run value using the values below

HR: 1.40
BB: .33
SO: -.22

Those are the values used in what is called "Fielding Independent ERA" formulas. The table below shows how each guy compared to the average reliever of his time in these stats per 9 IP. For example, Gossage allowed .508 HRs per 9 IP while the average reliever allowed .724. So he was .216 better. Multiplying that by 1.4 we get .3024 (it is negative in the table, meaning how much below average Gossage was). Then this is done for the other stats and for Rivera. The last line shows the combined run value each guy was below average using all three sats.

So Rivera is farther below average than Gossage. If I use the average reliever ERAs from each period, then Gossage gets 2.49 (3.68 - 1.19). Rivera gets 2.76 (4.31 - 1.55). The Pythagorean winning pct for Gossage is then .686 and for Rivera it is .709. The next table does the same thing but only for the other 9 pitchers in the top 10 in saves in each period.

Going right to the bottom line, we can see that they are almost even. Gossage would get a Pythagorean pct of .620 and Rivera would get .611. Very close. Now Gossage may have been better than Rivera, but I think the evidence shows that he should not belittle his greatness. Rivera seems to be at least close to Gossage as measured by how good they were relative to their peers.

One weakness of looking at the others in the top 9 is that park effects and fielders might play a big role since they don't represent the entire league. It is possible that the other 9 guys Rivera gets comapred to pitched in great hitters parks so they look weak in comparison to him. Or maybe Rivera had much better fielders behind him. I have not checked that. And when I did the top 10, it included both leagues whereas when I used the league average, it was just the league they pitched in (for Gossage it was the NL from 1977 and 1984-5 and the AL from 1978-83).

Cybermetrics

Wednesday, March 13, 2013

Rabbit Maranville, Mr. RBI

Monday, March 11, 2013

Rich Gossage vs. Mariano Rivera