Sunday, November 30, 2014

Would the “wide arc" of DiMaggio’s swing have made him more vulnerable to strikeouts against the higher velocity of pitchers in today’s game, as John Thorn suggests?

Yankees great Joe DiMaggio was overrated, says MLB historian. Excerpt:
"Perhaps most remarkably, especially when compared to the current era in baseball when hitters strike out more than ever, DiMaggio never struck out more than 39 times in a season. In 1941, the year of his famous 56-game hitting streak, DiMaggio struck out a total of 13 times.

By comparison, 2014 AL MVP Mike Trout struck out 184 times, the highest total in the majors.

Yet Thorn makes the case that the “wide arc" of DiMaggio’s swing would have made him more vulnerable to strikeouts against the higher velocity of pitchers in today’s game."
This might be true, but all players would have to deal with the faster pitch speeds.

It wasn't just that DiMaggio had low strikeout totals. It is that his HR-to-strikeout ratio was astronomical, especially considering that he was a right handed batter in Yankee Stadium. See my post Which Players Had The Best HR-To-Strikeout Ratios?

DiMaggio hit 2.77 HRs for every one that the average player hit while he only struck out 59% as often (for a ratio of 4.69).

In fact, the only player to have a higher HR-to-strikeout ratio relative to the league average was Ken Williams of the St. Louis Browns. His home field, Sportsman's Park, was a great hitter's park.

DiMaggio hit only 41% of his HRs at home in his career while Williams hit 72%. So it is likely the case that DiMaggio would rank first, and probably by a wide margin, if HRs were park adjusted.

DiMaggio faced Bob Feller 138 times. He hit 6 HRs while striking out only 7 times. Feller struck out 16.9% of the batters he faced from 1938-51. He allowed a HR% of 1.3%. DiMaggio struck out much less than average against Feller and hit HRs more frequently. So it looks like he could adapt to fast pitchers.

DiMaggio face Hal Newhouser 60 times (Newhouser was 3rd behind Feller and Tommy Bridges in strikeouts per 9 IP from 1938-51 in the AL). He had 6 HRs and just one strikeout. He faced Bridges 7 times with 1 HR and no strikeouts.

Data from Baseball Reference and the Baseball Reference Play Index.

Joe DiMaggio Led MLB In Road Slugging Percentage, 1936-51

Minimum 2500 PAs. Here is the top 10

Joe DiMaggio 0.610
Ted Williams 0.607
Stan Musial 0.581
Jimmie Foxx 0.528
Johnny Mize 0.528
Hank Greenberg 0.526
Ralph Kiner 0.525
Jeff Heath 0.514
Walker Cooper 0.511
Charlie Keller 0.510

Here is the top 10 in all games

Ted Williams 0.633
Hank Greenberg 0.619
Stan Musial 0.584
Ralph Kiner 0.582
Joe DiMaggio 0.579
Jimmie Foxx 0.573
Johnny Mize 0.568
Earl Averill 0.526
Hal Trosky 0.518
Charlie Keller 0.518

From 1939-51, here are the AVG-OBP-SLG for both DiMaggio and Williams in neutral parks (with Fenway and Yankee Stadium taken out)

DiMaggio) .335-.417-.605
Williams) .333-.469-.617

Yes, Williams beats DiMaggio in SLG. But it is fairly close, much closer than their career numbers. So under pretty much the same circumstances, DiMaggio slugged close to what Williams slugged. The big edge is OBP for Williams.

Now only looking at neutral parks leaves alot of PAs out of the analysis. But DiMaggio's stats put him almost on the level of the guy many say was the greatest hitter ever.

Data from Baseball Reference and the Baseball Reference Play Index.

Friday, November 28, 2014

Should Joe DiMaggio's Offensive Value Be Estimated Upwards Because Of Yankee Stadium?

His road stats were much better than his home stats. In those days, it was over 400 feet to left-center field (I think 407). And players normally hit better at home than the road. So I tried to estimate what his career stats might have been in light of this if had played in a fair park and then estimate how many runs this would add to an average team.

The table below shows his splits. Data from the Baseball Reference Play Index

Home 0.315 0.391 0.546
Away 0.333 0.405 0.610

Now the league splits from 1936-51

Home 0.273 0.350 0.394
Away 0.261 0.335 0.373

So players normally had an OBP that was .015 higher at home and a SLG that was .021 higher. What if DiMaggio had played in a fair park his whole career and he had these same differentials?

His home OBP and SLG would be .420 and .631. If those are averaged with his road numbers of .405 and .610, he would have a career OBP of .413 and a career SLG of .621.

That is better than his actual numbers of .398 & .579. So his OBP goes up .015 and his SLG goes up .042. That would raise a team's OBP and SLG by 0.0016 & 0.0046, respectively (assuming he has one ninth of a teams ABs and PAs).

How many extra runs would this mean? I ran a regression with runs per game as the dependent variable and OBP & SLG as the independent variables for all MLB teams from 1936-51. Here is the equation

R/G = 11.19*SLG + 19.20*OBP - 6.17

Plugging in the 0.0016 & 0.0046 changes in team SLG and OBP, we get 0.0825 more runs per game or 12.7 per 154 game season. That is about one extra win per season.

DiMaggio played 1736 games. That is 11.27 154 game seasons. That times 12.7 is 143. That adds about 14 to wins to his career value.

He has 78.2 career WAR, good for 41st among position players. This adjustment would give him 92.8, putting him at 28th.

Wednesday, November 19, 2014

Is The Run Value Of Stealing A Base Different Than The Run Value Of Allowing A Stolen Base?

Maybe this is just a statistical artifact or something quirky is going on. But I ran regressions with runs scored per game and runs allowed per game as the dependent variables and OBP, SLG, SB, CS, GDP, and ROE (reached on errors) as the independent variables (the last four were all per game). I used all teams from 2005-14 and the data was from the Baseball Reference Play Index.

Here is the regression for runs scored per game

R/G = 9.8*SLG + 17.17*OBP - 0.308*GDP - 0.394*CS + 0.143*SB + 0.54*ROE - 5.09

Now the regression for runs allowed per game

RA/G = 9.4*SLG + 17.57*OBP - 0.188*GDP - 0.446*CS + 0.302*SB + 0.86*ROE - 5.35

So the value of stealing a base is .143 runs per game while allowing one is .302 runs per game (it seems like there are big differences in GDP and ROE as well). I can't think of any reason why there would be a big difference here.

I started looking at this because when I added variables like SB differential, etc. to my regressions estimating winning pct based on OPS differential, the value of the SB differential seemed too high.

If we look at a team like the 2010 Red Sox, they allowed 1.04 SBs per game while having 0.259 CSs per game. If I use the coefficient values for RA/G, they allowed about .2 runs per game from stealing. That would be about 32 runs per season or about 3 wins.

If I used the values from the R/G regression, they would have allowed about .047 runs per game from stealing or 7.59 per season. That is not even one win. So the difference between the two methods is about 2.5 wins.

In a recent post, I found that over the last 5 years, the Red Sox seemed to under perform based on their OPS differential. See The Relationship Between OPS Differential And Winning Percentage Using 5 Year Averages. They won 5.23 fewer games on average each season than their OPS differential would predict (only the Rockies were worse at 5.4 fewer wins).

The 2011 Red Sox were similar, with 0.963 SBs per game and 0.309 CSs per game.They have 4 teams in the top 21 in SB allowed per game from 2010-14. The Rockies, however, don't seem to have been that bad at allowing SBs, having only one year in the top 50 in SB allowed per game. So their under performance must be due to something else.

Sunday, November 9, 2014

Which Teams Exceeded Their Win Total By The Most According To OPS Differential?

I have done some related posts recently, looking at the teams with the best OPS differentials since 1914 using data from Baseball Reference's Play Index. I also did a regression and the equation for winning pct was

Pct = 1.396*OPSDIFF + .500

Then I estimated each team's pct and calculated how many more games per 162 that they won than this equation would predict. Here are the top 10:

Team Year DIFF W-L% Pred Diff per 162
BSN 1914 0.002 0.614 0.503 0.111 17.96
STL 1987 -0.017 0.586 0.476 0.110 17.88
LAA 2008 0.014 0.617 0.519 0.098 15.85
CIN 1973 0.010 0.611 0.514 0.097 15.66
STL 1931 0.045 0.656 0.562 0.094 15.21
NYM 1969 0.017 0.617 0.524 0.093 15.07
NYY 2013 -0.048 0.525 0.433 0.092 14.86
MIN 1994 -0.088 0.469 0.378 0.091 14.81
PHA 1948 -0.032 0.545 0.455 0.090 14.59
BAL 1977 0.011 0.602 0.516 0.086 13.97

So the Braves in 1914 should have had a pct of .503 based on their OPS differential of .002. But it was actually .614. That gives them 17.96 more wins per 162 games than we might have expected. Maybe that is why they are called the "Miracle Braves!"

I also found a regression equation for each decade but the list of the top over achieving teams was similar to this one.

We don't have splits for things like RISP, runners on base and close and late situations for 1914. So we can't tell if the Braves did especially well in those cases, which would explain alot. The Braves were 33-20. But that is not much different than their overall pct.

They had what seems to be good fielding. Their fielding pct was .961 (the league average was .958). Their defensive efficiency rating was .701 and the league average was .698. That probably helped a bit, but I don't think that would explain their .614 pct. They stole 139 bases, as did their opponents. They turned 143 DPs and the next highest team had 119.

The 1987 Cardinals hit alot better when it counted as this table shows

High Lvrge 0.275 0.344 0.399 0.743
Medium Lvrge 0.267 0.350 0.389 0.739
Low Lvrge 0.254 0.328 0.357 0.686

It looks like their pitchers did a bit better when it counted

High Lvrge 0.261 0.344 0.397 0.740
Medium Lvrge 0.260 0.320 0.395 0.715
Low Lvrge 0.274 0.334 0.418 0.753

They stole 248 bases while their opponents stole just 100. The Cards hit into 16 fewer DPs and reached on errors 22 more times. Some of that probably helped them over achieve. They turned 172 DPs, 2nd most in the league. The league average was 146.

Sunday, November 2, 2014

Should The 1927 Yankees Have Won Even More than 110 Games? Like 118?

I recently estimated that their winning percentage could have been around .770 based on their OPS differential. They had a .872 OPS while allowing a .676 OPS, for a differential of .196, easily the highest since 1914. See The Statistical Dominance Of The 1927 Yankees.

A .770 pct would give them 118.5 wins in a 154 game season.

One reason that they did not win more is that they may have scored fewer runs than expected based on their OBP and SLG. Here is the regression generated equation for runs per game during the 1920s for all teams:

R/G = 11.29*SLG + 18.04*OBP - 5.92

The Yanks had a .489 SLG and a .384 OBP. That predicts 6.53 runs per game while they actually had 6.29. Over the whole season, that is about 37 runs fewer than expected. Out of the 160 teams in the decade, the 27 Yanks were 10th in underscoring.

So maybe that accounts for about 3.7 wins. That still leaves about 4.8 wins.

But why did they not score more runs? They stole 90 bases, just a bit below the league average of 99. Their success rate was 58.4%, just a bit below the league average of 60.7%. This probably does not matter much.

Maybe they had too many sacrifice bunts. They had 107 according to Retrosheet. But the other 7 teams averaged 148. So I doubt they lost alot of big innings bunting too much.

Their pitchers allowed an SLG of .356 and an OBP of .320. The regression equation for runs allowed was:

R/G = 11.25*SLG + 18.68*OBP - 6.12

That predicts they would allow about 3.86 runs per game, their actual total. So they did not win fewer games than expected due to the pitchers giving up more runs than expected.

Here are the splits for the Yankee hitters and pitchers followed by the league splits. Nothing jumps out as to why they won fewer games than expected. Maybe that they did not hit better with runners or on with RISP like the league generally did. That might account for them not scoring as many runs as expected.

I don't see anything in their close and late performance that explains anything. It even looks like their pitchers did a very good job then compared to what the league did.

They were also 24-19 in 1-run games for a .558 pct. That means they were .775 in all other games.  (if they had done as well in 1-run games as they did at other times it would mean 9 more wins)

Situation-Hit AVG OBP SLG
Total 0.307 0.384 0.489
None On 0.307 0.380 0.487
Men On 0.308 0.376 0.492
RISP 0.301 0.376 0.479
Close & Late 0.287 0.370 0.460

Situation-Pit AVG OBP SLG
Total 0.265 0.320 0.356
None On 0.257 0.312 0.342
Men On 0.275 0.321 0.374
RISP 0.264 0.320 0.356
Close & Late 0.224 0.275 0.288

Situation-Lg AVG OBP SLG
Total 0.286 0.352 0.399
None On 0.276 0.342 0.387
Men On 0.298 0.352 0.413
RISP 0.292 0.353 0.407
Close & Late 0.266 0.332 0.365

The Yankee pitchers had an AVG allowed when it was close and late 41 points below their total AVG allowed. The league as a whole had 20 points. It seems like they would have done well in 1-run games because of that. Their hitters did about what you would expect in close and late situations when you look at the league stats.

So the fewer runs scored explains a good chunk of the missing wins, but it is not clear what explains the rest.

Saturday, November 1, 2014

OPS Wins Baseball Games

A study done by STATS, INC, in their 1998 “Baseball Scoreboard” book showed that the team with the higher OPS at the end of a game had a winning percentage of .852. The study covered the years 1993-1997. Here are the stats they looked at. First the stat, then the winning percentage for the team that had the higher stat in each game:

OPS .852
OBP .824
SLG .820
AVG .804
fewest errors .669
SB per 9 offensive innings .653
HR per plate appearance .653
BB per plate appearance .623
SB% .576
Most strikeouts per 9 defensive innings .543

Now a lot of things could be going on here. But this at least suggests that maybe the most important thing to do is to “out OPS” your opponent. Now you can do this with better hitters or with better pitchers who hold down the opponents (or good fielders who take away hits). OPS comes out higher than AVG. Perhaps there is something to it. Notice it is much higher than any of the SB winning percentages.