Tuesday, October 28, 2014

The Relationship Between OPS Differential And Winning Percentage Using 5 Year Averages

See a recent post called The Relationship Between Team OPS Differential And Winning Percentage, By Decades. I used regression analysis to see how big the impact of OPS differential was on winning.

Here, instead of using individual years, I used the average OPS differential and average winning pct for all 30 teams over the last 5 years.

The regression equation from using individual years was

Pct = 1.325*OPSDIFF + .5

The r-squared was .827 and the standard error was .029. Over 162 games, that is 4.639 wins

The regression equation from using the 5 year average was

Pct = 1.3465*OPSDIFF + .5

The r-squared was .869 and the standard error was .017. Over 162 games, that is 2.72 wins. That is a big drop from the first regression. In a given year, luck will play a role. But the more seasons and data that are used the more accurate the relationship. By combining the years, some of the good and bad luck evens out.

The table below shows the prediction for each team. It seems strange the 6 most extreme teams are all pretty far from the rest of the pack. The Orioles were predicted to have a .476 pct but it was actually .505. That means they won 4.762 more games per season than their OPS differential would estimate.


Team OPSDIFF W-L% Pred Diff Per 162
BAL  -0.018 0.505 0.476 0.029 4.762
PHI  -0.001 0.526 0.498 0.028 4.532
NYY  0.026 0.563 0.535 0.028 4.473
ATL  0.027 0.554 0.536 0.018 2.949
CLE  -0.021 0.487 0.472 0.014 2.349
MIN  -0.053 0.443 0.429 0.014 2.266
SFG  0.018 0.538 0.525 0.013 2.157
CIN  0.017 0.535 0.523 0.012 1.912
SDP  -0.023 0.481 0.470 0.012 1.899
PIT  -0.018 0.481 0.476 0.006 0.939
NYM  -0.024 0.473 0.467 0.006 0.931
KCR  -0.023 0.475 0.470 0.006 0.894
ARI  -0.020 0.475 0.473 0.002 0.371
STL  0.041 0.557 0.555 0.002 0.360
TOR  -0.009 0.489 0.487 0.002 0.251
LAA  0.023 0.532 0.531 0.001 0.149
WSN  0.024 0.530 0.533 -0.003 -0.415
SEA  -0.037 0.446 0.450 -0.004 -0.640
TBR  0.040 0.550 0.554 -0.004 -0.687
LAD  0.030 0.536 0.541 -0.004 -0.709
OAK  0.029 0.535 0.539 -0.005 -0.738
TEX  0.035 0.539 0.547 -0.008 -1.247
CHW  -0.008 0.479 0.489 -0.010 -1.642
MIL  0.016 0.509 0.522 -0.013 -2.115
HOU  -0.078 0.380 0.395 -0.015 -2.379
DET  0.050 0.552 0.567 -0.015 -2.446
FLA  -0.029 0.444 0.461 -0.016 -2.612
CHC  -0.032 0.427 0.458 -0.030 -4.918
BOS  0.034 0.514 0.546 -0.032 -5.231
COL  -0.017 0.444 0.478 -0.033 -5.404

Here is a graph of the relationship

Monday, October 27, 2014

The Statistical Dominance Of The 1927 Yankees

I recently listed The 25 Highest And Lowest Team OPS Differentials From 1914-2014. The 27 Yanks were number 1 by a good margin. Data from the Baseball Reference  Play Index and Retrosheet. I also regressed winning pct against OPS differential and got the following equation

Pct = 1.396*OPSDIFF + .500

Then I estimated every team's pct. Here are the top 10 project records


Team Year DIFF Pred
NYY 1927 0.196 0.773
NYY 1939 0.158 0.720
ATL 1998 0.139 0.694
BAL 1969 0.136 0.690
NYY 1936 0.131 0.683
STL 1944 0.130 0.682
STL 1942 0.127 0.677
CLE 1948 0.127 0.677
NYY 1998 0.126 0.676
SEA 2001 0.126 0.676

Now the 27 Yanks actually had a .714 pct (why they did not reach .773 might be a good topic for a future post). But notice how big their lead is and how closely teams bunch up after the 1939 Yanks. The 27 Yanks would have an 8 game advantage over their 1939 counterparts in a 154 game season (although they would play each other so it might be a bit lower).

I also did the regression by decades. See The Relationship Between Team OPS Differential And Winning Percentage, By Decades. In some decades the impact of the differential was greater than others. But the 27 Yanks still dominate. Here is that top 10


Team Year DIFF Pred
NYY 1927 0.196 0.769
NYY 1939 0.158 0.728
BAL 1969 0.136 0.698
STL 1944 0.130 0.697
STL 1942 0.127 0.692
CLE 1948 0.127 0.692
NYY 1936 0.131 0.689
NYY 1937 0.121 0.675
ATL 1998 0.139 0.673
STL 1943 0.114 0.672


Now OPS weights OBP and SLG equally. What if we give more weight to OBP? I used 1.7*OBP + SLG. Then I divided that by 3 since this approximates wOBA, a stat from Tangotiger. The regression equation in this case was

Pct = 3.34*wOBADIFF +  0.5

The 27 Yanks had a projected pct of .769, the 39 Yanks had .718, and the 69 Orioles had .694 and then the percentages slowly fall after that.

Now we don't have teams from 1901-13 since we don't know OPS allowed. But I did estimate pct using the differentials for the following 3 stats: HRs, Walks and non-HR hits. I was curious to see where the 1906 Cubs rank.

I also compared the estimated winning percentages for the 1914-19 teams from this method and the OPS differential method to see if they gave similar estimates. If they did, then it might be reasonable to project what the OPS differential would say for the 1901-13 teams based on the projection using these other 3 stats.

The good news is that the correlation between the percentages estimated by the two methods for the 1914-19 teams is .96. But the bad news is that there was one team for which the estimates differed by .048. That is pretty big.

But we can still get somewhere. The highest predicted winning pct for the 1901-13 teams was the 1902 Pirates with about .746. The 1906 Cubs were at .690 (why they actually had a .763 pct might make a good post, too).

For the Pirates to reach the .769 of the 27 Yanks, their estimate would have to go up about .023. But only 13 of the 96 teams from 1914-19 had their estimate from the OPS method exceed the 3 stat method by as much as .023. So it seems unlikely that the Pirates would catch they Yankees.

Also, of the 10 best actual winning percentages from 1901-13, only 1 other team had a prediction over .700, the 1905 Giants at .716.

Furthermore, of the 10 best actual 1914-19 teams, only 2 had their OPS differential prediction exceed their 3 stat prediction by at least .023. So it is unusual for a very good team to be off by much.

So it looks like only one team, the 1902 Pirates MIGHT come close to the 1927 Yankees. And that seems unlikely.

Tuesday, October 21, 2014

The Relationship Between Team OPS Differential And Winning Percentage, By Decades

I learned on Oct 21 that there are some discrepancies between Baseball Reference and Retrosheet, so I can't be sure of these results. If I learn more, I will report it.


Oct 24. Here are the corrected numbers:


Period DIFF INT r squared Std error Per 162 Games
1914-19 1.866 0.498 0.833 0.038 6.207
1920-29 1.375 0.500 0.866 0.033 5.390
1930-39 1.442 0.500 0.851 0.038 6.157
1940-49 1.515 0.500 0.854 0.036 5.754
1950-59 1.452 0.500 0.874 0.032 5.165
1960-69 1.458 0.500 0.816 0.035 5.590
1970-79 1.361 0.500 0.811 0.032 5.165
1980-89 1.352 0.500 0.745 0.033 5.399
1990-99 1.249 0.500 0.780 0.032 5.109
2000-09 1.293 0.500 0.809 0.032 5.120
2010-14 1.325 0.500 0.827 0.029 4.639

Data from the Baseball Reference Play Index and Retrosheet.

DIFF is the value of the coefficient on OPS differential in the regression. INT is the intercept. Std error is the standard error. Per 162 games is the standard error times 162.

It seems like the relationship has gotten slightly stronger over time if you look at the standard errors, although the DIFF coefficient does not seem to be as strong as it used to be.

Also, for some reason, before the 1960s, the intercept was below .500. You might expect a team with a .000 OPS differential to have a .500 record. But that was not the case for some time. Not sure why. Maybe greater imbalance in talent levels across teams (like those great Yankee teams) meant that if you were just "average" you lost alot more than you would expect when you played those top teams.


Period DIFF INT r squared Std error Per 162 Games
1914-19 1.898 0.429 0.802 0.042 6.759
1920-29 1.366 0.441 0.803 0.040 6.542
1930-39 1.548 0.423 0.822 0.042 6.807
1940-49 1.537 0.467 0.794 0.042 6.837
1950-59 1.486 0.494 0.858 0.034 5.490
1960-69 1.458 0.500 0.816 0.035 5.590
1970-79 1.361 0.500 0.811 0.032 5.165
1980-89 1.352 0.500 0.745 0.033 5.399
1990-99 1.249 0.500 0.780 0.032 5.109
2000-09 1.293 0.500 0.809 0.032 5.120
2010-14 1.325 0.500 0.827 0.029 4.639

Saturday, October 18, 2014

The 25 Highest And Lowest Team OPS Differentials From 1914-2014

I learned on Oct 21 that there are some discrepancies between Baseball Reference and Retrosheet, so I can't be sure of these results. If I learn more, I will report it. 

Compiled from the Baseball Reference Play Index and Retrosheet

Oct. 24. Here are the corrected numbers:


Team Year OPS OPSA DIFF
NYY 1927 0.872 0.676 0.196
NYY 1939 0.825 0.667 0.158
ATL 1998 0.795 0.656 0.139
BAL 1969 0.756 0.620 0.136
NYY 1936 0.864 0.733 0.131
STL 1944 0.745 0.615 0.130
STL 1942 0.717 0.590 0.127
CLE 1948 0.792 0.665 0.127
NYY 1998 0.825 0.699 0.126
SEA 2001 0.805 0.679 0.126
PHA 1929 0.816 0.692 0.124
NYY 1937 0.825 0.704 0.121
CLE 1995 0.839 0.718 0.121
BRO 1953 0.840 0.722 0.118
CLE 1954 0.744 0.626 0.118
NYY 1932 0.830 0.714 0.116
NYY 1931 0.840 0.726 0.114
PHA 1928 0.799 0.685 0.114
ATL 1997 0.769 0.655 0.114
STL 1943 0.725 0.611 0.114
NYY 1921 0.838 0.725 0.113
BRO 1941 0.752 0.641 0.111
LAD 1974 0.743 0.633 0.110
PHA 1931 0.789 0.680 0.109
BOS 2003 0.851 0.742 0.109

Now the lowest


BOS 1927 0.677 0.796 -0.119
PHI 1945 0.633 0.752 -0.119
PIT 2010 0.678 0.797 -0.119
TOR 1979 0.673 0.793 -0.120
PIT 1952 0.631 0.752 -0.121
NYM 1965 0.604 0.728 -0.124
PIT 1953 0.676 0.803 -0.127
BOS 1932 0.665 0.792 -0.127
PHA 1919 0.634 0.761 -0.127
SLB 1937 0.747 0.875 -0.128
PHI 1939 0.669 0.797 -0.128
PHA 1920 0.642 0.771 -0.129
PHA 1915 0.615 0.745 -0.130
SLB 1951 0.674 0.804 -0.130
DET 1996 0.743 0.875 -0.132
PHA 1936 0.711 0.843 -0.132
SDP 1974 0.632 0.764 -0.132
FLA 1998 0.690 0.824 -0.134
DET 2003 0.675 0.813 -0.138
OAK 1979 0.648 0.786 -0.138
NYM 1963 0.600 0.739 -0.139
SLB 1939 0.720 0.860 -0.140
PHI 1928 0.716 0.857 -0.141
BSN 1924 0.633 0.776 -0.143
PHA 1954 0.648 0.804 -0.156


Here are the highest


Team Year OPS OPSA DIFF
NYY 1927 0.872 0.636 0.236
NYY 1939 0.825 0.638 0.187
NYY 1936 0.864 0.691 0.173
NYY 1931 0.840 0.673 0.167
NYY 1937 0.825 0.661 0.164
PHA 1929 0.816 0.655 0.161
PHA 1928 0.799 0.644 0.155
NYY 1921 0.838 0.683 0.155
NYY 1932 0.830 0.675 0.155
NYY 1930 0.872 0.719 0.153
SLB 1922 0.823 0.673 0.150
STL 1944 0.745 0.596 0.149
STL 1942 0.717 0.570 0.147
STL 1939 0.785 0.641 0.144
NYY 1926 0.806 0.663 0.143
NYY 1934 0.782 0.639 0.143
PHA 1931 0.789 0.648 0.141
NYY 1928 0.816 0.677 0.139
PHA 1930 0.821 0.682 0.139
ATL 1998 0.795 0.657 0.138
BAL 1969 0.756 0.620 0.136
NYY 1933 0.809 0.674 0.135
CLE 1920 0.793 0.659 0.134
STL 1943 0.725 0.592 0.133
WSH 1930 0.795 0.663 0.132

Now the lowest



Team Year OPS OPSA DIFF
SEA 1978 0.673 0.778 -0.105
SEA 1980 0.664 0.769 -0.105
KCA 1955 0.703 0.809 -0.106
KCR 2004 0.720 0.828 -0.108
KCR 2005 0.716 0.825 -0.109
MIN 2011 0.666 0.775 -0.109
SLB 1951 0.674 0.784 -0.110
NYM 1966 0.643 0.755 -0.112
KCA 1956 0.686 0.799 -0.113
SDP 1969 0.614 0.730 -0.116
TOR 1978 0.667 0.783 -0.116
TBD 2002 0.704 0.820 -0.116
NYM 1962 0.679 0.797 -0.118
HOU 2013 0.674 0.792 -0.118
DET 2002 0.679 0.798 -0.119
TOR 1979 0.673 0.793 -0.120
PIT 2010 0.678 0.798 -0.120
NYM 1965 0.604 0.728 -0.124
SDP 1974 0.632 0.764 -0.132
DET 1996 0.743 0.875 -0.132
FLA 1998 0.690 0.825 -0.135
DET 2003 0.675 0.813 -0.138
OAK 1979 0.648 0.786 -0.138
NYM 1963 0.600 0.740 -0.140
PHA 1954 0.648 0.803 -0.155

Thursday, October 16, 2014

Team Winning Percentage As A Function Of OPS Differentials In High, Medium And Low Leverage Situations

I recently posted a regression generated equation where team winning pct was a function of overall OPS differential. It was based on the years 2010-14. All data from Baseball Reference's Play Index. Here it is

Pct = .5 + 1.3246*OPSDIFF

The r-squared was .827 and the standard error was .0286, which works out to 4.64 wins per season. I was interested in seeing how many more games the Royals won than their OPS differential of just .003 would indicate. It was 7.36.

Using the same years, here is the equation when breaking things down by leverage

Pct = .5 + .306*LOW +.420*MED + .564*HIGH

Where LOW, MED and HIGH are the OPS differentials in the three cases

The r-squared was .906 and the standard error was .0212, which works out to 3.44 wins per season. So a better estimate than just overall OPS differential.

Here are the PA percentages for each case in MLB in 2014

High) 0.205
Med) .365
Low) .43

So even though the high leverage situations are only around 20% of the total, they still have the biggest impact. Those are generally the cases where the game is closer and later than normal, usually with runners on base.

Here are the OPS and OPS allowed by the Royals for the three cases this year:

High) .713, .630
Med)  .713, .700
Low) .659, .700

Using those numbers to get the Royals' differentials and plugging thems into the 2nd equation we get a .540 pct, just a bit lower than their actual pct of .549. A .540 pct would give them 87.5 wins or just 1.5 fewer than expected. So their performance in high leverage situations for the most part explains how well they did this year. They move 4.86 wins closer to their actual total when leverage is taken into account.

See About WPA and Leverage.

Fangraphs on Leverage

Sunday, October 12, 2014

How Have The Royals Won 7.36 More Games Than Their OPS Differential Would Indicate?

They had a .690 OPS during the season and allowed .687. That should give them a .50397 winning pct or 81.64 wins. They actually won 89 games. I had a regression about a week ago that had pct as

Pct = .5 + 1.3246*OPSDIFF

The tables below show what the Royals hit and allowed this year. Their big advantages are with RISP and when it is Late & Close. They had differentials of .052 and .057 in those two cases.


Royals BA OBP SLG OPS
Totals 0.263 0.314 0.376 0.690
None on 0.258 0.308 0.373 0.680
Men On 0.268 0.321 0.381 0.701
RISP 0.271 0.332 0.399 0.732
Late & Close 0.245 0.310 0.340 0.650


Royals Opponents BA OBP SLG OPS
Totals 0.250 0.310 0.377 0.687
None on 0.249 0.304 0.378 0.682
Men On 0.252 0.317 0.375 0.692
RISP 0.246 0.311 0.369 0.680
Late & Close 0.221 0.292 0.300 0.593

If I use some research I did a few years ago, Does Team Clutch Matter in Baseball?, where I estimate pct by breaking things down into RISP & NONRISP and Late&Close & NONLate&Close (the OPS and OPS allowed in each case), I get some slightly higher estimates for the Royals winning pct.

Using the Late&Close regression, they would have about a .520 winning pct and using the RISP regression, they would have about a .525 pct. There probably is a bit of an overlap between the two situations (maybe 4.165% because usually RISP is about 25% of PAs and L&C is about 16.66%-multiplying .25*.1666 gets about .04165).

But perhaps combining the two together would get us to about a .540 winning pct. That would be 87.5 wins and that is pretty close to the 89 they actually got.

Major League Situational Stats, 2010-2014

Compiled using the Baseball Reference Play Index.

Split PA BA OBP SLG OPS
Total 923779 0.254 0.319 0.398 0.717
None On 520158 0.249 0.310 0.393 0.702
Men On 403621 0.261 0.332 0.405 0.737
RISP 238074 0.255 0.339 0.394 0.733
Late & Close 153559 0.240 0.316 0.365 0.681

Here is what I have for the years 1991-2000. The relative differences are not too much different than they used to be.