Friday, April 27, 2018

Shortstops with at least 120 OPS+ thru age 22

Minimum 1000 PAs. Had to have played at least 50% of games at SS. Data from the Baseball Reference Play Index

Rogers Hornsby    150
Carlos Correa    138
Arky Vaughan    137
Alex Rodriguez    132
Vern Stephens    127
Cal Ripken    126
Jim Fregosi    122

By the way, Hornsby was 2nd in the NL in defensive WAR in both 1917 and 1918 while being almost always a SS (he also played 3 games in the OF in 1918).

Sunday, April 22, 2018

Did a cold and related eye twitch prevent Joe DiMaggio from batting .400 in 1939?

See DiMaggio’s Mysterious Plunge by John B. Holway.

DiMaggio's average was .409 through Sept. 9th but he hit only .233 in his last 73 ABs to finish at .381. One report said he had a virus that had sidelined pitcher Red Ruffing and "Years later Joe would say the cold he had developed on the 9th had caused a painful eye twitch."

But manager Joe McCarthy would not give him a couple of games off to recuperate.
"Can we believe that McCarthy didn’t know that his greatest star — the brightest star in America — was in agony? If he did know, did he really make DiMag stay in the game and suffer? Why???!!! McCarthy was an alcoholic, who kept a brown bag with him on the end of the bench, where he sat alone while the coaches apparently ran the team. He was sometimes referred to as a “push button manager,” the closest anyone came to whispering the problem. Eleven years later he would be found literally in the gutter and was swiftly and quietly whisked home to Buffalo and oblivion. No one then, or now, uttered the word “alcoholism.”"
If the coaches ran the team, it seems like they could have taken DiMaggio out of the lineup or after just one AB to get him some rest.

DiMaggio finished with 462 ABs. That means he needed 185 hits to reach .400. He was 17 for his last 73, and finished with a total of 176 hits. So he was 9 hits short of .400. If he had gone 26 for his last 73, he would have made it. That is still a .356 average and even a guy at .400 has no guarantee of reaching that level for a short spurt.

Using the binomial distribution function in Excel, I get that there is a 15% chance of DiMaggio getting 25 or fewer hits in his last 73 ABs (assuming he was truly a .409 hitter).

But was he truly a .409 hitter that year? Or could have been expected to be in Sept? Players usually hit worse in Sept., perhaps due to colder weather. The Yankees as a team batted .270 in Sept. while it was .287 for the whole season (it was .290 thru Aug.). Using the Baseball Reference Play Index, it looks like the Yankee regulars were still used as much as usual in Sept. Maybe with a big lead, they did not quite have their usual incentive. They never lead by less that 11.5 games the whole month. Anyway, if we reduce DiMaggio by 20 points then we expect only a .389 average from Sept. 10th on.

DiMaggio himself batted .319 in Sept/Oct games in his career while it was .325 overall (.326 thru Aug. for his career). So we had to expect some drop off.  Also, in his last 19 games, only 2 were on the road and Yankee stadium was not a good place for righties like DiMaggio (it was 467 feet to left center). In his career, he batted .315 at home and .334 on the road. So maybe we reduce him .008 for this. He is down to .381

DiMaggio also had 78 of his 109 ABs in Sept. against righties. That is 71.6%. It was 66.7% for the whole season. Maybe that is 5 ABs more than usual against righties. In his career, DiMaggio batted .342 vs. lefties and .316 vs. righties. This would be a slight effect. Maybe .001. So he is down to .380

That would get him a .400 AVG for the season (and he finishes at .403 or .404). But what are the odds he can still reach .356 if we assume he was a true .380 hitter?). Using the binomial distribution again, I get a 30% chance that he gets 25 or fewer hits in his last 73 ABs if he is a true .380 hitter. That is a high enough chance that we can't assume it was just his illness, eye twitch and lack of rest due to a bad manager that cost him his .400 season.

Now I might have penalized him too much for a lower Sept. average by the Yankees as a whole. If I cut that penalty in half and we make him a true .390 hitter, there is still a 24% chance he does not get the 26 hits he needed in those last 19 games to finish at .400.

Friday, April 6, 2018

Explaining Bob Gibson’s 1968 Season


(This was originally published at Beyond the Box Score in 2006)

Most fans know that Gibson had an incredibly low ERA of 1.12 in 1968. Even considering that the league ERA was just 2.98 that year, what he did is still great. What explains this performance? Did Gibson have a great “stuff” year? Was he lucky? Or was it a combination of luck and skill? If so, how much of each?

Luck is sometimes a factor in baseball. Some years a guy hits well with runners in scoring position (RISP), some years he doesn’t. In 2004, A.J. Pierzynski, for example, hit .272 overall but .307 with RISP. In 2005, he hit .257 overall but only .236 with RISP. Its not likely he forgot how to hit with RISP all of the sudden. For pitchers, the batting average they allow on balls in play may be out of their hands (as pointed out by Voros McCracken). A pitcher might get lucky on balls in play one year, with more getting caught than normal (or his fielders might be especially good one year). Is this what happened to Gibson?

To test this, I ran a regression in which a pitcher’s ERA was the dependent variable and his strikeouts, walks and HRs allowed per 9 IP were the independent variables. I used the regression equation to predict each pitcher’s ERA then found out how much it differed from his actual ERA. If a pitcher had an ERA lower than what his strikeouts, walks and HRs allowed per 9 IP predicted, he most likely gave up fewer hits on balls in play than average. Here is the regression equation

(1) ERA = 2.19 + 1.436*HR - .159*SO + .303*BB

Again, all stats are per 9 IP. BB includes both walks and HBP. The data includes all pitchers who qualified for the ERA title from 1963-68 (I used this period since it was an especially low scoring period). There were 420 pitchers. The r-squared was .548, meaning that 54.8% of the variation in ERA across pitchers is explained by the equation. The standard error is .469.

Plugging Gibson’s 1968 data into equation (1) leaves an ERA of 2.02. That is a very large 0.90 above his actual ERA of 1.12. So it appears that he must have done especially well on balls in play (more on this later). The table below shows the leaders in how much lower their predicted ERA was than actual their ERA.

Rank
Pitcher
YEAR
ERA
Pred
Diff
1
Rickey Clark
1967
2.59
3.81
-1.22
2
Joe Horlen
1968
2.37
3.48
-1.11
3
Carl Willey
1963
3.10
4.18
-1.08
4
Lee Stange
1963
2.62
3.68
-1.06
5
Jim Perry
1965
2.63
3.64
-1.01
6
Dave McNally
1968
1.95
2.92
-0.97
7
Tommy John
1968
1.98
2.91
-0.93
8
Bob Gibson
1968
1.12
2.02
-0.90
9
Bob Veale
1968
2.06
2.95
-0.89
10
Joe Horlen
1967
2.06
2.93
-0.87
11
Vern Law
1965
2.16
3.01
-0.85
12
Sonny Siebert
1967
2.38
3.21
-0.83
13
Pete Richert
1965
2.60
3.41
-0.81
14
Joe Horlen
1964
1.88
2.69
-0.81
15
Phil Niekro
1967
1.87
2.68
-0.81
16
Tracy Stallard
1965
3.39
4.19
-0.80
17
Bobby Bolin
1966
2.89
3.69
-0.80
18
Denny McLain
1968
1.96
2.75
-0.79
19
Eddie Fisher
1965
2.40
3.17
-0.77
20
Jim Perry
1966
2.54
3.29
-0.75
21
Luis Tiant
1968
1.60
2.34
-0.74
22
Jim Bouton
1963
2.53
3.26
-0.73
23
Jerry Koosman
1968
2.08
2.80
-0.72
24
Milt Pappas
1965
2.61
3.33
-0.72
25
Steve Blass
1968
2.13
2.85
-0.72

Gibson is not first in “Diff” but he did have a big one at #8 (something intersting is may be going on with the White Sox, with John, Horlen and Fisher all being up there). The table below shows the 25 lowest predicted ERAs.

Rank
Pitcher
YEAR
ERA
Pred
1
Bob Gibson
1968
1.12
2.02
2
Sandy Koufax
1963
1.88
2.06
3
Sandy Koufax
1964
1.74
2.16
4
Sandy Koufax
1965
2.04
2.18
5
Sandy Koufax
1966
1.73
2.20
6
Bob Moose
1968
2.74
2.22
7
Bob Bruce
1964
2.76
2.23
8
Bob Veale
1965
2.84
2.24
9
Bill Singer
1967
2.65
2.24
10
Don Sutton
1968
2.60
2.25
11
Mike Cuellar
1966
2.22
2.28
12
Chris Short
1964
2.20
2.28
13
Sam McDowell
1965
2.18
2.29
14
Gaylord Perry
1966
2.99
2.30
15
Dean Chance
1964
1.65
2.31
16
Luis Tiant
1968
1.60
2.34
17
Jim O'Toole
1964
2.66
2.35
18
Whitey Ford
1964
2.13
2.37
19
Gaylord Perry
1968
2.44
2.37
20
Tom Seaver
1968
2.20
2.38
21
Dean Chance
1968
2.53
2.39
22
Bob Gibson
1967
2.98
2.40
23
Don Drysdale
1964
2.19
2.40
24
Gary Peters
1963
2.33
2.41
25
Mike Cuellar
1968
2.74
2.42


Notice that Gibson’s 1968 season, although the best, does not dominate the way his actual ERA dominates. The next table shows the lowest 25 actual ERAs from the period.

Rank
Pitcher
YEAR
ERA
Pred
1
Bob Gibson
1968
1.12
2.02
2
Luis Tiant
1968
1.60
2.34
3
Dean Chance
1964
1.65
2.31
4
Sandy Koufax
1966
1.73
2.20
5
Sandy Koufax
1964
1.74
2.16
6
Sam McDowell
1968
1.81
2.53
7
Phil Niekro
1967
1.87
2.68
8
Sandy Koufax
1963
1.88
2.06
9
Joe Horlen
1964
1.88
2.69
10
Dave McNally
1968
1.95
2.92
11
Denny McLain
1968
1.96
2.75
12
Bobby Bolin
1968
1.98
2.60
13
Gary Peters
1966
1.98
2.62
14
Tommy John
1968
1.98
2.91
15
Sandy Koufax
1965
2.04
2.18
16
Stan Bahnsen
1968
2.05
2.71
17
Joe Horlen
1967
2.06
2.93
18
Bob Veale
1968
2.06
2.95
19
Jerry Koosman
1968
2.08
2.80
20
Dick Ellsworth
1963
2.10
2.62
21
Whitey Ford
1964
2.13
2.37
22
Steve Blass
1968
2.13
2.85
23
Juan Marichal
1965
2.14
2.67
24
Don Drysdale
1968
2.15
2.63
25
Vern Law
1965
2.16
3.01

In predicted ERA, there are 24 pitchers within .40 or less of Gibson. But in actual ERA, it is only one!. So using only pitcher determined outcomes (strikeouts, walks and HRs allowed), brings Gibson back down to earth. He is the leader, but he is not so far away from the rest of the pitchers.

Now that we have seen these results, lets check to see if Gibson did indeed have a low batting average allowed on balls in play (BABIP) in 1968. The table below shows Gibson’s BABIP for each year of his career along with the BABIP of the entire Cardinal staff (including Gibson). Notice that his lowest BABIP was in 1968 as well as the difference from the Card’s staff. 1968 was also the biggest difference.

YEAR
BABIP-Gibson
BABIP-Cards
Diff
1961
0.283
0.276
0.007
1962
0.249
0.272
-0.023
1963
0.271
0.267
0.004
1964
0.272
0.275
-0.002
1965
0.256
0.273
-0.016
1966
0.240
0.267
-0.027
1967
0.280
0.268
0.012
1968
0.230
0.264
-0.034
1969
0.270
0.269
0.002
1970
0.299
0.293
0.005
1971
0.270
0.291
-0.021
1972
0.263
0.277
-0.014
1973
0.255
0.271
-0.015
1974
0.272
0.274
-0.002
1975
0.303
0.283
0.020

In some years Gibson had a lower BABIP than the Cards staff, in other years, higher. But he definitely had a low BABIP in 1968 (some of the numbers in the “Diff” column may look slightly wrong due to rounding). The next table shows the lowest 25 BABIPs of the period.

Rank
Pitcher
YEAR
BABIP
1
Dave McNally
1968
0.202
2
Joe Horlen
1967
0.214
3
Wally Bunker
1964
0.214
4
Joe Horlen
1964
0.216
5
Luis Tiant
1968
0.216
6
Phil Ortega
1966
0.221
7
Denny McLain
1966
0.221
8
Juan Marichal
1966
0.221
9
Bobby Bolin
1966
0.222
10
Dick Hughes
1967
0.224
11
Carl Willey
1963
0.224
12
Sonny Siebert
1967
0.225
13
George Brunet
1968
0.225
14
Sonny Siebert
1968
0.225
15
Jim Bouton
1964
0.226
16
Ernie Broglio
1963
0.229
17
Lew Krausse
1968
0.229
18
Bob Gibson
1968
0.230
19
Rickey Clark
1967
0.231
20
Pete Richert
1966
0.231
21
Denny McLain
1968
0.231
22
Jim Bouton
1963
0.231
23
Ken McBride
1963
0.233
24
Bobby Bolin
1968
0.233
25
Moe Drabowsky
1963
0.233


Gibson did not have the lowest BABIP, but he was #18.

If we try to predict Gibson’s ERA using regression analysis and also include hits on balls in play, we will be able to predict his ERA much more accurately. I ran a regression in which a pitcher’s ERA was the dependent variable and his non-HR hits, walks and HRs allowed per 9 IP were the independent variables (since I am using what happens on balls in play here, it is not necessary to put stikeouts in-every strikeout means one less chance for a hit and the number of hits is already accounted for in the model).

(2) ERA = -2.17 + 1.397*HR + .466*NONHR + .310*BB

The r-squared was .811, meaning that 81.1% of the variation in ERA across pitchers is explained by the equation. The standard error is .303. I then predicted each pitcher’s ERA using equation (2) and found how much that differed from their actual ERA. It predicted Gibson to have a 1.49 ERA. This is only .37 above his actual ERA, much more accurate than equation (1), which was off by .90. But the point here is not to find which equation is most accurate. The point is that once you include what happens on balls in play, we get a much more accurate picture of Gibson’s performance. And in this case Gibson was off by just 1.22 standard errors (.37/.303) while he was off by 1.92 standard errors with equation (1) (.90/.469). This supports the thesis that Gibson was helped quite a bit by his low BABIP.

A few weeks ago a I posted an article about the best seasons in something called “Fielding Independent ERA” or FIP ERA (see sources at the end of this article). In that article I used a more sophisticated approach than I used here. The lowest 25 FIP ERAs of this period are in the table below.

Rank
Pitcher
YEAR
FIP ERA
1
Sam McDowell
1965
1.96
2
Bob Gibson
1968
2.21
3
Sandy Koufax
1963
2.23
4
Al Downing
1963
2.25
5
Sandy Koufax
1966
2.28
6
Bob Veale
1965
2.30
7
Gary Peters
1963
2.32
8
Sonny Siebert
1965
2.32
9
Gaylord Perry
1966
2.32
10
Sandy Koufax
1964
2.38
11
Dick Radatz
1964
2.39
12
Luis Tiant
1968
2.39
13
Steve Hargan
1966
2.43
14
Dean Chance
1964
2.46
15
Mike Cuellar
1966
2.49
16
Chris Short
1964
2.50
17
Whitey Ford
1964
2.51
18
Bill Singer
1967
2.53
19
Sam McDowell
1968
2.55
20
Sam McDowell
1966
2.56
21
Bob Bruce
1964
2.57
22
Bob Moose
1968
2.60
23
Dean Chance
1968
2.62
24
Jim O'Toole
1964
2.63
25
Jim Maloney
1963
2.67

Notice that Gibson is only second (the FIP ERA’s do not completely correspond to predicted or actual ERAs mentioned earlier since all ERAs in the FIP ERA study are normalized to a league with an ERA of about 3.70). The FIP ERAs here are also different because HRs allowed were adjusted for park effects, something not done for the above analysis.

We can tell from the following stats that Gibson’s performance in 1968 was not as far above his other seasons as ERAs alone would indicate. The table below shows his strikeouts, walks and HRs allowed per batter faced for each year in his career with 100 or more IP. 1968 is clearly his best, but some other seasons rival it. In 1970, for example, his strikeout and HR rates are very close to 1968. In 1967, his strikeout rate and BB rate were similar to that of 1968. But 1970 was a much higher scoring season than 1968 with the NL ERA being 4.05. Gibson’s fielding independent stats in 1970 are almost as good as they were in 1968.

YEAR
SO/BFP
BB/BFP
HR/BFP
1961
0.181
0.136
0.014
1962
0.215
0.109
0.016
1963
0.188
0.100
0.017
1964
0.206
0.080
0.021
1965
0.219
0.092
0.028
1966
0.201
0.074
0.018
1967
0.209
0.061
0.014
1968
0.231
0.059
0.009
1969
0.212
0.083
0.009
1970
0.226
0.076
0.011
1971
0.180
0.081
0.014
1972
0.186
0.081
0.013
1973
0.180
0.076
0.015
1974
0.124
0.105
0.023
1975
0.120
0.132
0.020

The next table shows his FIPERAs for the years 1961-1974 (season when he had at least 150 IP).


Year
FIP ERA
1961
3.08
1962
2.56
1963
3.31
1964
3.15
1965
3.47
1966
3.08
1967
2.69
1968
2.21
1969
2.46
1970
1.96
1971
3.03
1972
2.90
1973
3.12
1974
4.87

As mentioned earlier, these FIP ERAs are all normalized for a league with about a 3.70 ERA. According to this, Gibson was better in 1970 than in 1968. That is, taking park effects into account to adjust HRs, using only pitcher controlled stats and comparing to the league average shows his 1970 season to be even better. This, too, suggests that 1968 was helped quite a bit by a very low BABIP. The big difference between the two seasons was his BABIP of .230 in 1968 and his .299 BABIP in 1970. The 1968 BABIP was far below the team BABIP while his 1970 BABIP was above the team BABIP.

I also broke down his performance into RISP and non-RISP situations to try to understand how his ERA could have been so low. He allowed a batting average of .184 overall but just .141 with RISP (and .193 in non-RISP situations). To see if this made any difference, I ran a regression with ERA as the dependent variable and on-base percentage (OBP) and slugging percentage (SLG) were the independent variables. That predicted Gibson to have an ERA of 1.25 (just using pitchers from 1968). Then I broke down OBP and SLG into RISP and non-RISP situations. The resulting regression equation predicted Gibson to have an ERA of 1.13. So his RISP performance also helped a little in making his ERA so low since the regression that took RISP into account was more accurate. His career average allowed overall was .228 while with RISP it was .219. Those two are pretty close, indicating that Gibson probably did not have any special ability with RISP. He just happened to do very well in those situations in 1968.

I also took the natural log of ERA in one of the regressions in case ERA had a non-linear relationship with the other stats. Doing this did not improve the results.

There is one other issue with the fielding in 1968. It appears that Gibson got a little lucky in 1968 with more balls in play than average being turned into outs. Perhaps the fielders were playing better or trying harder behind Gibson that year than they were for the other Cardinal pitchers. But Gibson actually gave up more unearned runs than would be expected as compared to the entire Cardinal staff. The staff ERA was 2.49 or 1.37 above Gibson. But if we include all runs, Gibson gave up 1.45 runs per 9 IP while it was 2.87 for the whole staff or 1.42 higher than Gibson. So by adding in unearned runs, the difference between Gibson and the team grows. Gibson’s runs per 9 IP was 29.46% above his ERA  while for the whole staff it was 15.26% higher. So Gibson was hurt more by unearned runs than the whole team, indicating that his fielders hurt him. Perhaps they were hustling more and simply got to more balls, leading to more errors. Maybe official scorers called more errors than expected to protect Gibson’s ERA (it is true that Gibson’s runs per 9 IP is .33 higher than his ERA while it is .38 for the whole staff, possibly showing that Gibsons was hurt less by unearned runs-but with fewer base runners allowed, any given error should have hurt him less so a .33 increase is proportionally worse for him than a .38 increase was for the whole team).

Sources:

The Complete Baseball Encyclopedia from Lee Sinins

The Best Fielding Independent Pitching Seasons From 1920-2005