Wednesday, November 19, 2014

Is The Run Value Of Stealing A Base Different Than The Run Value Of Allowing A Stolen Base?

Maybe this is just a statistical artifact or something quirky is going on. But I ran regressions with runs scored per game and runs allowed per game as the dependent variables and OBP, SLG, SB, CS, GDP, and ROE (reached on errors) as the independent variables (the last four were all per game). I used all teams from 2005-14 and the data was from the Baseball Reference Play Index.

Here is the regression for runs scored per game

R/G = 9.8*SLG + 17.17*OBP - 0.308*GDP - 0.394*CS + 0.143*SB + 0.54*ROE - 5.09

Now the regression for runs allowed per game

RA/G = 9.4*SLG + 17.57*OBP - 0.188*GDP - 0.446*CS + 0.302*SB + 0.86*ROE - 5.35

So the value of stealing a base is .143 runs per game while allowing one is .302 runs per game (it seems like there are big differences in GDP and ROE as well). I can't think of any reason why there would be a big difference here.

I started looking at this because when I added variables like SB differential, etc. to my regressions estimating winning pct based on OPS differential, the value of the SB differential seemed too high.

If we look at a team like the 2010 Red Sox, they allowed 1.04 SBs per game while having 0.259 CSs per game. If I use the coefficient values for RA/G, they allowed about .2 runs per game from stealing. That would be about 32 runs per season or about 3 wins.

If I used the values from the R/G regression, they would have allowed about .047 runs per game from stealing or 7.59 per season. That is not even one win. So the difference between the two methods is about 2.5 wins.

In a recent post, I found that over the last 5 years, the Red Sox seemed to under perform based on their OPS differential. See The Relationship Between OPS Differential And Winning Percentage Using 5 Year Averages. They won 5.23 fewer games on average each season than their OPS differential would predict (only the Rockies were worse at 5.4 fewer wins).

The 2011 Red Sox were similar, with 0.963 SBs per game and 0.309 CSs per game.They have 4 teams in the top 21 in SB allowed per game from 2010-14. The Rockies, however, don't seem to have been that bad at allowing SBs, having only one year in the top 50 in SB allowed per game. So their under performance must be due to something else.

Sunday, November 9, 2014

Which Teams Exceeded Their Win Total By The Most According To OPS Differential?

I have done some related posts recently, looking at the teams with the best OPS differentials since 1914 using data from Baseball Reference's Play Index. I also did a regression and the equation for winning pct was

Pct = 1.396*OPSDIFF + .500

Then I estimated each team's pct and calculated how many more games per 162 that they won than this equation would predict. Here are the top 10:

Team Year DIFF W-L% Pred Diff per 162
BSN 1914 0.002 0.614 0.503 0.111 17.96
STL 1987 -0.017 0.586 0.476 0.110 17.88
LAA 2008 0.014 0.617 0.519 0.098 15.85
CIN 1973 0.010 0.611 0.514 0.097 15.66
STL 1931 0.045 0.656 0.562 0.094 15.21
NYM 1969 0.017 0.617 0.524 0.093 15.07
NYY 2013 -0.048 0.525 0.433 0.092 14.86
MIN 1994 -0.088 0.469 0.378 0.091 14.81
PHA 1948 -0.032 0.545 0.455 0.090 14.59
BAL 1977 0.011 0.602 0.516 0.086 13.97

So the Braves in 1914 should have had a pct of .503 based on their OPS differential of .002. But it was actually .614. That gives them 17.96 more wins per 162 games than we might have expected. Maybe that is why they are called the "Miracle Braves!"

I also found a regression equation for each decade but the list of the top over achieving teams was similar to this one.

We don't have splits for things like RISP, runners on base and close and late situations for 1914. So we can't tell if the Braves did especially well in those cases, which would explain alot. The Braves were 33-20. But that is not much different than their overall pct.

They had what seems to be good fielding. Their fielding pct was .961 (the league average was .958). Their defensive efficiency rating was .701 and the league average was .698. That probably helped a bit, but I don't think that would explain their .614 pct. They stole 139 bases, as did their opponents. They turned 143 DPs and the next highest team had 119.

The 1987 Cardinals hit alot better when it counted as this table shows

Split BA OBP SLG OPS
High Lvrge 0.275 0.344 0.399 0.743
Medium Lvrge 0.267 0.350 0.389 0.739
Low Lvrge 0.254 0.328 0.357 0.686

It looks like their pitchers did a bit better when it counted

Split BA OBP SLG OPS
High Lvrge 0.261 0.344 0.397 0.740
Medium Lvrge 0.260 0.320 0.395 0.715
Low Lvrge 0.274 0.334 0.418 0.753

They stole 248 bases while their opponents stole just 100. The Cards hit into 16 fewer DPs and reached on errors 22 more times. Some of that probably helped them over achieve. They turned 172 DPs, 2nd most in the league. The league average was 146.

Sunday, November 2, 2014

Should The 1927 Yankees Have Won Even More than 110 Games? Like 118?

I recently estimated that their winning percentage could have been around .770 based on their OPS differential. They had a .872 OPS while allowing a .676 OPS, for a differential of .196, easily the highest since 1914. See The Statistical Dominance Of The 1927 Yankees.

A .770 pct would give them 118.5 wins in a 154 game season.

One reason that they did not win more is that they may have scored fewer runs than expected based on their OBP and SLG. Here is the regression generated equation for runs per game during the 1920s for all teams:

R/G = 11.29*SLG + 18.04*OBP - 5.92

The Yanks had a .489 SLG and a .384 OBP. That predicts 6.53 runs per game while they actually had 6.29. Over the whole season, that is about 37 runs fewer than expected. Out of the 160 teams in the decade, the 27 Yanks were 10th in underscoring.

So maybe that accounts for about 3.7 wins. That still leaves about 4.8 wins.

But why did they not score more runs? They stole 90 bases, just a bit below the league average of 99. Their success rate was 58.4%, just a bit below the league average of 60.7%. This probably does not matter much.

Maybe they had too many sacrifice bunts. They had 107 according to Retrosheet. But the other 7 teams averaged 148. So I doubt they lost alot of big innings bunting too much.

Their pitchers allowed an SLG of .356 and an OBP of .320. The regression equation for runs allowed was:

R/G = 11.25*SLG + 18.68*OBP - 6.12

That predicts they would allow about 3.86 runs per game, their actual total. So they did not win fewer games than expected due to the pitchers giving up more runs than expected.

Here are the splits for the Yankee hitters and pitchers followed by the league splits. Nothing jumps out as to why they won fewer games than expected. Maybe that they did not hit better with runners or on with RISP like the league generally did. That might account for them not scoring as many runs as expected.

I don't see anything in their close and late performance that explains anything. It even looks like their pitchers did a very good job then compared to what the league did.

They were also 24-19 in 1-run games for a .558 pct. That means they were .775 in all other games.  (if they had done as well in 1-run games as they did at other times it would mean 9 more wins)


Situation-Hit AVG OBP SLG
Total 0.307 0.384 0.489
None On 0.307 0.380 0.487
Men On 0.308 0.376 0.492
RISP 0.301 0.376 0.479
Close & Late 0.287 0.370 0.460








Situation-Pit AVG OBP SLG
Total 0.265 0.320 0.356
None On 0.257 0.312 0.342
Men On 0.275 0.321 0.374
RISP 0.264 0.320 0.356
Close & Late 0.224 0.275 0.288








Situation-Lg AVG OBP SLG
Total 0.286 0.352 0.399
None On 0.276 0.342 0.387
Men On 0.298 0.352 0.413
RISP 0.292 0.353 0.407
Close & Late 0.266 0.332 0.365

The Yankee pitchers had an AVG allowed when it was close and late 41 points below their total AVG allowed. The league as a whole had 20 points. It seems like they would have done well in 1-run games because of that. Their hitters did about what you would expect in close and late situations when you look at the league stats.

So the fewer runs scored explains a good chunk of the missing wins, but it is not clear what explains the rest.

Saturday, November 1, 2014

OPS Wins Baseball Games

A study done by STATS, INC, in their 1998 “Baseball Scoreboard” book showed that the team with the higher OPS at the end of a game had a winning percentage of .852. The study covered the years 1993-1997. Here are the stats they looked at. First the stat, then the winning percentage for the team that had the higher stat in each game:

OPS .852
OBP .824
SLG .820
AVG .804
fewest errors .669
SB per 9 offensive innings .653
HR per plate appearance .653
BB per plate appearance .623
SB% .576
Most strikeouts per 9 defensive innings .543

Now a lot of things could be going on here. But this at least suggests that maybe the most important thing to do is to “out OPS” your opponent. Now you can do this with better hitters or with better pitchers who hold down the opponents (or good fielders who take away hits). OPS comes out higher than AVG. Perhaps there is something to it. Notice it is much higher than any of the SB winning percentages.

Tuesday, October 28, 2014

The Relationship Between OPS Differential And Winning Percentage Using 5 Year Averages

See a recent post called The Relationship Between Team OPS Differential And Winning Percentage, By Decades. I used regression analysis to see how big the impact of OPS differential was on winning.

Here, instead of using individual years, I used the average OPS differential and average winning pct for all 30 teams over the last 5 years.

The regression equation from using individual years was

Pct = 1.325*OPSDIFF + .5

The r-squared was .827 and the standard error was .029. Over 162 games, that is 4.639 wins

The regression equation from using the 5 year average was

Pct = 1.3465*OPSDIFF + .5

The r-squared was .869 and the standard error was .017. Over 162 games, that is 2.72 wins. That is a big drop from the first regression. In a given year, luck will play a role. But the more seasons and data that are used the more accurate the relationship. By combining the years, some of the good and bad luck evens out.

The table below shows the prediction for each team. It seems strange the 6 most extreme teams are all pretty far from the rest of the pack. The Orioles were predicted to have a .476 pct but it was actually .505. That means they won 4.762 more games per season than their OPS differential would estimate.


Team OPSDIFF W-L% Pred Diff Per 162
BAL  -0.018 0.505 0.476 0.029 4.762
PHI  -0.001 0.526 0.498 0.028 4.532
NYY  0.026 0.563 0.535 0.028 4.473
ATL  0.027 0.554 0.536 0.018 2.949
CLE  -0.021 0.487 0.472 0.014 2.349
MIN  -0.053 0.443 0.429 0.014 2.266
SFG  0.018 0.538 0.525 0.013 2.157
CIN  0.017 0.535 0.523 0.012 1.912
SDP  -0.023 0.481 0.470 0.012 1.899
PIT  -0.018 0.481 0.476 0.006 0.939
NYM  -0.024 0.473 0.467 0.006 0.931
KCR  -0.023 0.475 0.470 0.006 0.894
ARI  -0.020 0.475 0.473 0.002 0.371
STL  0.041 0.557 0.555 0.002 0.360
TOR  -0.009 0.489 0.487 0.002 0.251
LAA  0.023 0.532 0.531 0.001 0.149
WSN  0.024 0.530 0.533 -0.003 -0.415
SEA  -0.037 0.446 0.450 -0.004 -0.640
TBR  0.040 0.550 0.554 -0.004 -0.687
LAD  0.030 0.536 0.541 -0.004 -0.709
OAK  0.029 0.535 0.539 -0.005 -0.738
TEX  0.035 0.539 0.547 -0.008 -1.247
CHW  -0.008 0.479 0.489 -0.010 -1.642
MIL  0.016 0.509 0.522 -0.013 -2.115
HOU  -0.078 0.380 0.395 -0.015 -2.379
DET  0.050 0.552 0.567 -0.015 -2.446
FLA  -0.029 0.444 0.461 -0.016 -2.612
CHC  -0.032 0.427 0.458 -0.030 -4.918
BOS  0.034 0.514 0.546 -0.032 -5.231
COL  -0.017 0.444 0.478 -0.033 -5.404

Here is a graph of the relationship

Monday, October 27, 2014

The Statistical Dominance Of The 1927 Yankees

I recently listed The 25 Highest And Lowest Team OPS Differentials From 1914-2014. The 27 Yanks were number 1 by a good margin. Data from the Baseball Reference  Play Index and Retrosheet. I also regressed winning pct against OPS differential and got the following equation

Pct = 1.396*OPSDIFF + .500

Then I estimated every team's pct. Here are the top 10 project records


Team Year DIFF Pred
NYY 1927 0.196 0.773
NYY 1939 0.158 0.720
ATL 1998 0.139 0.694
BAL 1969 0.136 0.690
NYY 1936 0.131 0.683
STL 1944 0.130 0.682
STL 1942 0.127 0.677
CLE 1948 0.127 0.677
NYY 1998 0.126 0.676
SEA 2001 0.126 0.676

Now the 27 Yanks actually had a .714 pct (why they did not reach .773 might be a good topic for a future post). But notice how big their lead is and how closely teams bunch up after the 1939 Yanks. The 27 Yanks would have an 8 game advantage over their 1939 counterparts in a 154 game season (although they would play each other so it might be a bit lower).

I also did the regression by decades. See The Relationship Between Team OPS Differential And Winning Percentage, By Decades. In some decades the impact of the differential was greater than others. But the 27 Yanks still dominate. Here is that top 10


Team Year DIFF Pred
NYY 1927 0.196 0.769
NYY 1939 0.158 0.728
BAL 1969 0.136 0.698
STL 1944 0.130 0.697
STL 1942 0.127 0.692
CLE 1948 0.127 0.692
NYY 1936 0.131 0.689
NYY 1937 0.121 0.675
ATL 1998 0.139 0.673
STL 1943 0.114 0.672


Now OPS weights OBP and SLG equally. What if we give more weight to OBP? I used 1.7*OBP + SLG. Then I divided that by 3 since this approximates wOBA, a stat from Tangotiger. The regression equation in this case was

Pct = 3.34*wOBADIFF +  0.5

The 27 Yanks had a projected pct of .769, the 39 Yanks had .718, and the 69 Orioles had .694 and then the percentages slowly fall after that.

Now we don't have teams from 1901-13 since we don't know OPS allowed. But I did estimate pct using the differentials for the following 3 stats: HRs, Walks and non-HR hits. I was curious to see where the 1906 Cubs rank.

I also compared the estimated winning percentages for the 1914-19 teams from this method and the OPS differential method to see if they gave similar estimates. If they did, then it might be reasonable to project what the OPS differential would say for the 1901-13 teams based on the projection using these other 3 stats.

The good news is that the correlation between the percentages estimated by the two methods for the 1914-19 teams is .96. But the bad news is that there was one team for which the estimates differed by .048. That is pretty big.

But we can still get somewhere. The highest predicted winning pct for the 1901-13 teams was the 1902 Pirates with about .746. The 1906 Cubs were at .690 (why they actually had a .763 pct might make a good post, too).

For the Pirates to reach the .769 of the 27 Yanks, their estimate would have to go up about .023. But only 13 of the 96 teams from 1914-19 had their estimate from the OPS method exceed the 3 stat method by as much as .023. So it seems unlikely that the Pirates would catch they Yankees.

Also, of the 10 best actual winning percentages from 1901-13, only 1 other team had a prediction over .700, the 1905 Giants at .716.

Furthermore, of the 10 best actual 1914-19 teams, only 2 had their OPS differential prediction exceed their 3 stat prediction by at least .023. So it is unusual for a very good team to be off by much.

So it looks like only one team, the 1902 Pirates MIGHT come close to the 1927 Yankees. And that seems unlikely.

Tuesday, October 21, 2014

The Relationship Between Team OPS Differential And Winning Percentage, By Decades

I learned on Oct 21 that there are some discrepancies between Baseball Reference and Retrosheet, so I can't be sure of these results. If I learn more, I will report it.


Oct 24. Here are the corrected numbers:


Period DIFF INT r squared Std error Per 162 Games
1914-19 1.866 0.498 0.833 0.038 6.207
1920-29 1.375 0.500 0.866 0.033 5.390
1930-39 1.442 0.500 0.851 0.038 6.157
1940-49 1.515 0.500 0.854 0.036 5.754
1950-59 1.452 0.500 0.874 0.032 5.165
1960-69 1.458 0.500 0.816 0.035 5.590
1970-79 1.361 0.500 0.811 0.032 5.165
1980-89 1.352 0.500 0.745 0.033 5.399
1990-99 1.249 0.500 0.780 0.032 5.109
2000-09 1.293 0.500 0.809 0.032 5.120
2010-14 1.325 0.500 0.827 0.029 4.639

Data from the Baseball Reference Play Index and Retrosheet.

DIFF is the value of the coefficient on OPS differential in the regression. INT is the intercept. Std error is the standard error. Per 162 games is the standard error times 162.

It seems like the relationship has gotten slightly stronger over time if you look at the standard errors, although the DIFF coefficient does not seem to be as strong as it used to be.

Also, for some reason, before the 1960s, the intercept was below .500. You might expect a team with a .000 OPS differential to have a .500 record. But that was not the case for some time. Not sure why. Maybe greater imbalance in talent levels across teams (like those great Yankee teams) meant that if you were just "average" you lost alot more than you would expect when you played those top teams.


Period DIFF INT r squared Std error Per 162 Games
1914-19 1.898 0.429 0.802 0.042 6.759
1920-29 1.366 0.441 0.803 0.040 6.542
1930-39 1.548 0.423 0.822 0.042 6.807
1940-49 1.537 0.467 0.794 0.042 6.837
1950-59 1.486 0.494 0.858 0.034 5.490
1960-69 1.458 0.500 0.816 0.035 5.590
1970-79 1.361 0.500 0.811 0.032 5.165
1980-89 1.352 0.500 0.745 0.033 5.399
1990-99 1.249 0.500 0.780 0.032 5.109
2000-09 1.293 0.500 0.809 0.032 5.120
2010-14 1.325 0.500 0.827 0.029 4.639