Tuesday, October 28, 2014

The Relationship Between OPS Differential And Winning Percentage Using 5 Year Averages

See a recent post called The Relationship Between Team OPS Differential And Winning Percentage, By Decades. I used regression analysis to see how big the impact of OPS differential was on winning.

Here, instead of using individual years, I used the average OPS differential and average winning pct for all 30 teams over the last 5 years.

The regression equation from using individual years was

Pct = 1.325*OPSDIFF + .5

The r-squared was .827 and the standard error was .029. Over 162 games, that is 4.639 wins

The regression equation from using the 5 year average was

Pct = 1.3465*OPSDIFF + .5

The r-squared was .869 and the standard error was .017. Over 162 games, that is 2.72 wins. That is a big drop from the first regression. In a given year, luck will play a role. But the more seasons and data that are used the more accurate the relationship. By combining the years, some of the good and bad luck evens out.

The table below shows the prediction for each team. It seems strange the 6 most extreme teams are all pretty far from the rest of the pack. The Orioles were predicted to have a .476 pct but it was actually .505. That means they won 4.762 more games per season than their OPS differential would estimate.

 Team OPSDIFF W-L% Pred Diff Per 162 BAL -0.018 0.505 0.476 0.029 4.762 PHI -0.001 0.526 0.498 0.028 4.532 NYY 0.026 0.563 0.535 0.028 4.473 ATL 0.027 0.554 0.536 0.018 2.949 CLE -0.021 0.487 0.472 0.014 2.349 MIN -0.053 0.443 0.429 0.014 2.266 SFG 0.018 0.538 0.525 0.013 2.157 CIN 0.017 0.535 0.523 0.012 1.912 SDP -0.023 0.481 0.470 0.012 1.899 PIT -0.018 0.481 0.476 0.006 0.939 NYM -0.024 0.473 0.467 0.006 0.931 KCR -0.023 0.475 0.470 0.006 0.894 ARI -0.020 0.475 0.473 0.002 0.371 STL 0.041 0.557 0.555 0.002 0.360 TOR -0.009 0.489 0.487 0.002 0.251 LAA 0.023 0.532 0.531 0.001 0.149 WSN 0.024 0.530 0.533 -0.003 -0.415 SEA -0.037 0.446 0.450 -0.004 -0.640 TBR 0.040 0.550 0.554 -0.004 -0.687 LAD 0.030 0.536 0.541 -0.004 -0.709 OAK 0.029 0.535 0.539 -0.005 -0.738 TEX 0.035 0.539 0.547 -0.008 -1.247 CHW -0.008 0.479 0.489 -0.010 -1.642 MIL 0.016 0.509 0.522 -0.013 -2.115 HOU -0.078 0.380 0.395 -0.015 -2.379 DET 0.050 0.552 0.567 -0.015 -2.446 FLA -0.029 0.444 0.461 -0.016 -2.612 CHC -0.032 0.427 0.458 -0.030 -4.918 BOS 0.034 0.514 0.546 -0.032 -5.231 COL -0.017 0.444 0.478 -0.033 -5.404

Here is a graph of the relationship

Monday, October 27, 2014

The Statistical Dominance Of The 1927 Yankees

I recently listed The 25 Highest And Lowest Team OPS Differentials From 1914-2014. The 27 Yanks were number 1 by a good margin. Data from the Baseball Reference  Play Index and Retrosheet. I also regressed winning pct against OPS differential and got the following equation

Pct = 1.396*OPSDIFF + .500

Then I estimated every team's pct. Here are the top 10 project records

 Team Year DIFF Pred NYY 1927 0.196 0.773 NYY 1939 0.158 0.720 ATL 1998 0.139 0.694 BAL 1969 0.136 0.690 NYY 1936 0.131 0.683 STL 1944 0.130 0.682 STL 1942 0.127 0.677 CLE 1948 0.127 0.677 NYY 1998 0.126 0.676 SEA 2001 0.126 0.676

Now the 27 Yanks actually had a .714 pct (why they did not reach .773 might be a good topic for a future post). But notice how big their lead is and how closely teams bunch up after the 1939 Yanks. The 27 Yanks would have an 8 game advantage over their 1939 counterparts in a 154 game season (although they would play each other so it might be a bit lower).

I also did the regression by decades. See The Relationship Between Team OPS Differential And Winning Percentage, By Decades. In some decades the impact of the differential was greater than others. But the 27 Yanks still dominate. Here is that top 10

 Team Year DIFF Pred NYY 1927 0.196 0.769 NYY 1939 0.158 0.728 BAL 1969 0.136 0.698 STL 1944 0.130 0.697 STL 1942 0.127 0.692 CLE 1948 0.127 0.692 NYY 1936 0.131 0.689 NYY 1937 0.121 0.675 ATL 1998 0.139 0.673 STL 1943 0.114 0.672

Now OPS weights OBP and SLG equally. What if we give more weight to OBP? I used 1.7*OBP + SLG. Then I divided that by 3 since this approximates wOBA, a stat from Tangotiger. The regression equation in this case was

The 27 Yanks had a projected pct of .769, the 39 Yanks had .718, and the 69 Orioles had .694 and then the percentages slowly fall after that.

Now we don't have teams from 1901-13 since we don't know OPS allowed. But I did estimate pct using the differentials for the following 3 stats: HRs, Walks and non-HR hits. I was curious to see where the 1906 Cubs rank.

I also compared the estimated winning percentages for the 1914-19 teams from this method and the OPS differential method to see if they gave similar estimates. If they did, then it might be reasonable to project what the OPS differential would say for the 1901-13 teams based on the projection using these other 3 stats.

The good news is that the correlation between the percentages estimated by the two methods for the 1914-19 teams is .96. But the bad news is that there was one team for which the estimates differed by .048. That is pretty big.

But we can still get somewhere. The highest predicted winning pct for the 1901-13 teams was the 1902 Pirates with about .746. The 1906 Cubs were at .690 (why they actually had a .763 pct might make a good post, too).

For the Pirates to reach the .769 of the 27 Yanks, their estimate would have to go up about .023. But only 13 of the 96 teams from 1914-19 had their estimate from the OPS method exceed the 3 stat method by as much as .023. So it seems unlikely that the Pirates would catch they Yankees.

Also, of the 10 best actual winning percentages from 1901-13, only 1 other team had a prediction over .700, the 1905 Giants at .716.

Furthermore, of the 10 best actual 1914-19 teams, only 2 had their OPS differential prediction exceed their 3 stat prediction by at least .023. So it is unusual for a very good team to be off by much.

So it looks like only one team, the 1902 Pirates MIGHT come close to the 1927 Yankees. And that seems unlikely.

Tuesday, October 21, 2014

The Relationship Between Team OPS Differential And Winning Percentage, By Decades

I learned on Oct 21 that there are some discrepancies between Baseball Reference and Retrosheet, so I can't be sure of these results. If I learn more, I will report it.

Oct 24. Here are the corrected numbers:

 Period DIFF INT r squared Std error Per 162 Games 1914-19 1.866 0.498 0.833 0.038 6.207 1920-29 1.375 0.500 0.866 0.033 5.390 1930-39 1.442 0.500 0.851 0.038 6.157 1940-49 1.515 0.500 0.854 0.036 5.754 1950-59 1.452 0.500 0.874 0.032 5.165 1960-69 1.458 0.500 0.816 0.035 5.590 1970-79 1.361 0.500 0.811 0.032 5.165 1980-89 1.352 0.500 0.745 0.033 5.399 1990-99 1.249 0.500 0.780 0.032 5.109 2000-09 1.293 0.500 0.809 0.032 5.120 2010-14 1.325 0.500 0.827 0.029 4.639

Data from the Baseball Reference Play Index and Retrosheet.

DIFF is the value of the coefficient on OPS differential in the regression. INT is the intercept. Std error is the standard error. Per 162 games is the standard error times 162.

It seems like the relationship has gotten slightly stronger over time if you look at the standard errors, although the DIFF coefficient does not seem to be as strong as it used to be.

Also, for some reason, before the 1960s, the intercept was below .500. You might expect a team with a .000 OPS differential to have a .500 record. But that was not the case for some time. Not sure why. Maybe greater imbalance in talent levels across teams (like those great Yankee teams) meant that if you were just "average" you lost alot more than you would expect when you played those top teams.

 Period DIFF INT r squared Std error Per 162 Games 1914-19 1.898 0.429 0.802 0.042 6.759 1920-29 1.366 0.441 0.803 0.040 6.542 1930-39 1.548 0.423 0.822 0.042 6.807 1940-49 1.537 0.467 0.794 0.042 6.837 1950-59 1.486 0.494 0.858 0.034 5.490 1960-69 1.458 0.500 0.816 0.035 5.590 1970-79 1.361 0.500 0.811 0.032 5.165 1980-89 1.352 0.500 0.745 0.033 5.399 1990-99 1.249 0.500 0.780 0.032 5.109 2000-09 1.293 0.500 0.809 0.032 5.120 2010-14 1.325 0.500 0.827 0.029 4.639

Saturday, October 18, 2014

The 25 Highest And Lowest Team OPS Differentials From 1914-2014

I learned on Oct 21 that there are some discrepancies between Baseball Reference and Retrosheet, so I can't be sure of these results. If I learn more, I will report it.

Compiled from the Baseball Reference Play Index and Retrosheet

Oct. 24. Here are the corrected numbers:

 Team Year OPS OPSA DIFF NYY 1927 0.872 0.676 0.196 NYY 1939 0.825 0.667 0.158 ATL 1998 0.795 0.656 0.139 BAL 1969 0.756 0.620 0.136 NYY 1936 0.864 0.733 0.131 STL 1944 0.745 0.615 0.130 STL 1942 0.717 0.590 0.127 CLE 1948 0.792 0.665 0.127 NYY 1998 0.825 0.699 0.126 SEA 2001 0.805 0.679 0.126 PHA 1929 0.816 0.692 0.124 NYY 1937 0.825 0.704 0.121 CLE 1995 0.839 0.718 0.121 BRO 1953 0.840 0.722 0.118 CLE 1954 0.744 0.626 0.118 NYY 1932 0.830 0.714 0.116 NYY 1931 0.840 0.726 0.114 PHA 1928 0.799 0.685 0.114 ATL 1997 0.769 0.655 0.114 STL 1943 0.725 0.611 0.114 NYY 1921 0.838 0.725 0.113 BRO 1941 0.752 0.641 0.111 LAD 1974 0.743 0.633 0.110 PHA 1931 0.789 0.680 0.109 BOS 2003 0.851 0.742 0.109

Now the lowest

 BOS 1927 0.677 0.796 -0.119 PHI 1945 0.633 0.752 -0.119 PIT 2010 0.678 0.797 -0.119 TOR 1979 0.673 0.793 -0.12 PIT 1952 0.631 0.752 -0.121 NYM 1965 0.604 0.728 -0.124 PIT 1953 0.676 0.803 -0.127 BOS 1932 0.665 0.792 -0.127 PHA 1919 0.634 0.761 -0.127 SLB 1937 0.747 0.875 -0.128 PHI 1939 0.669 0.797 -0.128 PHA 1920 0.642 0.771 -0.129 PHA 1915 0.615 0.745 -0.13 SLB 1951 0.674 0.804 -0.13 DET 1996 0.743 0.875 -0.132 PHA 1936 0.711 0.843 -0.132 SDP 1974 0.632 0.764 -0.132 FLA 1998 0.69 0.824 -0.134 DET 2003 0.675 0.813 -0.138 OAK 1979 0.648 0.786 -0.138 NYM 1963 0.6 0.739 -0.139 SLB 1939 0.72 0.86 -0.14 PHI 1928 0.716 0.857 -0.141 BSN 1924 0.633 0.776 -0.143 PHA 1954 0.648 0.804 -0.156

Here are the highest

 Team Year OPS OPSA DIFF NYY 1927 0.872 0.636 0.236 NYY 1939 0.825 0.638 0.187 NYY 1936 0.864 0.691 0.173 NYY 1931 0.840 0.673 0.167 NYY 1937 0.825 0.661 0.164 PHA 1929 0.816 0.655 0.161 PHA 1928 0.799 0.644 0.155 NYY 1921 0.838 0.683 0.155 NYY 1932 0.830 0.675 0.155 NYY 1930 0.872 0.719 0.153 SLB 1922 0.823 0.673 0.150 STL 1944 0.745 0.596 0.149 STL 1942 0.717 0.570 0.147 STL 1939 0.785 0.641 0.144 NYY 1926 0.806 0.663 0.143 NYY 1934 0.782 0.639 0.143 PHA 1931 0.789 0.648 0.141 NYY 1928 0.816 0.677 0.139 PHA 1930 0.821 0.682 0.139 ATL 1998 0.795 0.657 0.138 BAL 1969 0.756 0.620 0.136 NYY 1933 0.809 0.674 0.135 CLE 1920 0.793 0.659 0.134 STL 1943 0.725 0.592 0.133 WSH 1930 0.795 0.663 0.132

Now the lowest

 Team Year OPS OPSA DIFF SEA 1978 0.673 0.778 -0.105 SEA 1980 0.664 0.769 -0.105 KCA 1955 0.703 0.809 -0.106 KCR 2004 0.720 0.828 -0.108 KCR 2005 0.716 0.825 -0.109 MIN 2011 0.666 0.775 -0.109 SLB 1951 0.674 0.784 -0.110 NYM 1966 0.643 0.755 -0.112 KCA 1956 0.686 0.799 -0.113 SDP 1969 0.614 0.730 -0.116 TOR 1978 0.667 0.783 -0.116 TBD 2002 0.704 0.820 -0.116 NYM 1962 0.679 0.797 -0.118 HOU 2013 0.674 0.792 -0.118 DET 2002 0.679 0.798 -0.119 TOR 1979 0.673 0.793 -0.120 PIT 2010 0.678 0.798 -0.120 NYM 1965 0.604 0.728 -0.124 SDP 1974 0.632 0.764 -0.132 DET 1996 0.743 0.875 -0.132 FLA 1998 0.690 0.825 -0.135 DET 2003 0.675 0.813 -0.138 OAK 1979 0.648 0.786 -0.138 NYM 1963 0.600 0.740 -0.140 PHA 1954 0.648 0.803 -0.155

Thursday, October 16, 2014

Team Winning Percentage As A Function Of OPS Differentials In High, Medium And Low Leverage Situations

I recently posted a regression generated equation where team winning pct was a function of overall OPS differential. It was based on the years 2010-14. All data from Baseball Reference's Play Index. Here it is

Pct = .5 + 1.3246*OPSDIFF

The r-squared was .827 and the standard error was .0286, which works out to 4.64 wins per season. I was interested in seeing how many more games the Royals won than their OPS differential of just .003 would indicate. It was 7.36.

Using the same years, here is the equation when breaking things down by leverage

Pct = .5 + .306*LOW +.420*MED + .564*HIGH

Where LOW, MED and HIGH are the OPS differentials in the three cases

The r-squared was .906 and the standard error was .0212, which works out to 3.44 wins per season. So a better estimate than just overall OPS differential.

Here are the PA percentages for each case in MLB in 2014

High) 0.205
Med) .365
Low) .43

So even though the high leverage situations are only around 20% of the total, they still have the biggest impact. Those are generally the cases where the game is closer and later than normal, usually with runners on base.

Here are the OPS and OPS allowed by the Royals for the three cases this year:

High) .713, .630
Med)  .713, .700
Low) .659, .700

Using those numbers to get the Royals' differentials and plugging thems into the 2nd equation we get a .540 pct, just a bit lower than their actual pct of .549. A .540 pct would give them 87.5 wins or just 1.5 fewer than expected. So their performance in high leverage situations for the most part explains how well they did this year. They move 5.86 wins closer to their actual total when leverage is taken into account.

Fangraphs on Leverage

Sunday, October 12, 2014

How Have The Royals Won 7.36 More Games Than Their OPS Differential Would Indicate?

They had a .690 OPS during the season and allowed .687. That should give them a .50397 winning pct or 81.64 wins. They actually won 89 games. I had a regression about a week ago that had pct as

Pct = .5 + 1.3246*OPSDIFF

The tables below show what the Royals hit and allowed this year. Their big advantages are with RISP and when it is Late & Close. They had differentials of .052 and .057 in those two cases.

 Royals BA OBP SLG OPS Totals 0.263 0.314 0.376 0.690 None on 0.258 0.308 0.373 0.680 Men On 0.268 0.321 0.381 0.701 RISP 0.271 0.332 0.399 0.732 Late & Close 0.245 0.310 0.340 0.650

 Royals Opponents BA OBP SLG OPS Totals 0.250 0.310 0.377 0.687 None on 0.249 0.304 0.378 0.682 Men On 0.252 0.317 0.375 0.692 RISP 0.246 0.311 0.369 0.680 Late & Close 0.221 0.292 0.300 0.593

If I use some research I did a few years ago, Does Team Clutch Matter in Baseball?, where I estimate pct by breaking things down into RISP & NONRISP and Late&Close & NONLate&Close (the OPS and OPS allowed in each case), I get some slightly higher estimates for the Royals winning pct.

Using the Late&Close regression, they would have about a .520 winning pct and using the RISP regression, they would have about a .525 pct. There probably is a bit of an overlap between the two situations (maybe 4.165% because usually RISP is about 25% of PAs and L&C is about 16.66%-multiplying .25*.1666 gets about .04165).

But perhaps combining the two together would get us to about a .540 winning pct. That would be 87.5 wins and that is pretty close to the 89 they actually got.

Major League Situational Stats, 2010-2014

Compiled using the Baseball Reference Play Index.

 Split PA BA OBP SLG OPS Total 923779 0.254 0.319 0.398 0.717 None On 520158 0.249 0.310 0.393 0.702 Men On 403621 0.261 0.332 0.405 0.737 RISP 238074 0.255 0.339 0.394 0.733 Late & Close 153559 0.240 0.316 0.365 0.681

Here is what I have for the years 1991-2000. The relative differences are not too much different than they used to be.

Friday, October 3, 2014

Is it hard to quantify the success of the Giants?

One of the announcers on the Fox broadcast said it was today (around the time they mentioned their low SB total of 55). They did win 2.67 more games than their OPS differential would predict. But that is not a very large difference. Even if we drop them from 88 to 85 wins, they still make the playoffs.

I looked at all teams from 2010-2014. In a regression, team winning pct was the dependent variable and OPS differential was the independent variable. Here is the equation

Pct = .5 + 1.3246*OPSDIFF

The Giants had a .699 OPS and allowed a .679 OPS for a .020 differential. Plugging that into the equation gives a pct of about .52649 while they actually had .54321. So their actual pct was about .0165 higher than predicted. Over 162 games that is 2.67 wins.

Thursday, October 2, 2014

Could The Tigers Hit 200 HRs If They Played In Camden Yards?

They just said so on the TBS broadcast. Here are the road HR totals for all AL teams this year

 Baltimore 104 LA Angels 82 Chicago Sox 81 Detroit 79 Toronto 79 Boston 74 Houston 73 Oakland 72 Cleveland 70 Tampa Bay 66 Seattle 63 Minnesota 61 Texas 60 NY Yankees 59 Kansas City 52

So the Tigers would have to hit 121 HRs in Camden. Here are the home HR totals for all AL teams this year

 Baltimore 107 Toronto 98 Houston 90 NY Yankees 88 Detroit 76 Chicago Sox 74 Oakland 74 LA Angels 73 Seattle 73 Cleveland 72 Minnesota 67 Texas 51 Tampa Bay 51 Boston 49 Kansas City 43

So the Orioles out homered the Tigers in road games 104-79. So why would the Tigers out homer the Orioles 121-107 in Camden? They probably would not.

Also, over the 2011-13 seasons, Balt has had a 122 park index for HRs while Detroit has had 99 (just about average). So Balt allows about  22% HRs than average. But that is only half the games. So we could increase the Tigers HR total this year (155) by 11% and we would get 17 more HRs or 172. Far short of 200