Friday, May 31, 2013

How On-Base Percentage and Slugging Percentage Affect Winning

This is something that I posted at Beyond the Boxscore in 2006.

It's probably obvious that if a team increases its on-base percentage (OBP) or slugging percentage (SLG), its winning percentage will go up. Get more runners on and hit for more power, you win more games. But how many more? If OBP goes up by as much as SLG, will they both lead to the same increase in wins? What does it mean for OBP to go up by as much as SLG? The same number of points? By the same percentage? Or should we look at something slightly more sophisticated, like a one standard deviation increase for each one? And what about reducing the OBP and SLG of your opponents? How many wins will that bring?

To try to get a handle on this, I used linear regression to find an equation for team winning percentage (I looked at all teams from 1989-2002). This is what I got

PCT = .493 + 2.01*OBP + .858*SLG - 2.06*OPPOBP - .806*OPPSLG

OPPOBP and OPPSLG are, respectively, the OBP and SLG teams allow their opponents. Given this relationship, how many more games will team win if they increase OBP and SLG (or reduce their opponents' OBP and SLG)? Table 1 shows the various increases in wins for a given change in performance

For example, if team OBP goes up by .010, wins over a 162 game season will increase by 3.26 (2.01*.01*162 = 3.26). For team SLG, it will go up 1.39 wins. The next column shows that the OBP increase is 2.35 times as important as the SLG increase. Lowering your opponents OBP and SLG have about the same effect and relationship.
The average team OBP was about .331. So a 10% increase would be about .033. The average team SLG was about .411, so a 10% increase would be about .041. The numbers were the same for OPPOBP and OPPSLG. A 10% increase in OBP adds 10.79 wins while a 10% increase in SLG adds 5.71. This makes OBP 1.89 times as important as SLG. The changes are about the same on the pitching side.
Standard deviation (SD) is a measure of spread or dispersion. The SD of OBP was .0149. That increase would add 4.85 wins. The SD for SLG was .0311. That increase would add 4.32 wins. In this case OBP is 1.12 times as important as SLG. On the pitching side, the SDs were about the same, so the results are similar.
So the relative win value of OBP and SLG can depend on how you frame the question or what kind of change you are looking at. In regressions with team runs per game as the dependent variable instead of winning percentage, the coefficient value on OBP is usually about 1.5 or 1.6 times that of SLG. It is more than double here for some reason. I am not sure why.
I also did the analysis with isolated power (ISO) instead of SLG. ISO is SLG minus AVG and is a better measure of power hitting than SLG, since a guy could get a single every time up and have an SLG of 1.000 with no extra base power. In this case, the regression equation was
PCT = .499 + 2.52*OBP + .962*ISO - 2.54*OPPOBP - .923*OPPISO
Table 2 shows the various win increases. I won't discuss those results since it would just repeat the previous discussion. The numbers mean the same things they meant in Table 1. The average ISO was .147 and the SD of ISO was .0227. Those were about the same on the pitching side.
Technical notes: The r-squared for the first regression .817, meaning that 81.7% of the variation in team winning percentage is explained by the equation. The standard error was .0297. That is about 4.8 wins a season. All of the independent variables were statistically significant, with all T-values above 8 or less than -8. The r-squared for the second regression .818, meaning that 81.8% of the variation in team winning percentage is explained by the equation. The standard error was .0297. That is about 4.8 wins a season. All of the independent variables were statistically significant, with all T-values above 8 or less than -8. There were 394 teams.
Now the comments


How much do OBP and SLG correlate with each other? If there is a high correlation between the two, OBP might be sucking up some of the effect of SLG%. Isolated power might help reduce some of that but not all. If there something you could use that breaks OBP into its component parts, Walks and Hits?


The correlation between OBP & SLG was .777. For OPPOBP & OPPSLG, it was .838. For ISO & OBP it was .616. For OPPOBP & OPPISO it was .703. Those seem high, so collinearity may be a problem. But I had low standard errors for the coefficient estimates, which is usually an indication that collinearity is not a problem.
Another way to check for multicollinearity is to run regressions in which one IV is a function of all of the other IVs. In the first model with OBP and SLG, the r-squared was about .5 when OBP was the dependent variable and the other variables (SLG, OPPOPB, OPPSLG) were the independent variables. There is a stat called the "variance inflation factor" or VIF. It is 1/(1 - r-squared). So if r-squared was .5, 1 - .5 = .5. Then 1/.5 = 2. A couple of sources I looked at suggested that if the VIF is under 10, multicollinearity is not a problem. So in this case, the VIF is only about 2. For the other 3 cases, VIF only got as high as 4. I did come across one source that said there is no rule about the value of VIF and multicollinearity.
But I did run the following regression based on your suggestions
PCT = .491 + 1.04*EXB +2.72*H + 2.53*W - 1*OPPEXB - 2.7*OPPH - 2.54*OPPW
EXB is extra bases/PA (PA = walks + ABs)
H is hits/PA
W = walks/PA
So it looks like a pretty big difference between getting hits and getting on base and hitting for power. Here are the win changes for a 1 SD improvement
EXB 3.38
W 4.64
H 4.43
OPPH 5.19
OPPW 4.15

No comments: