Friday, May 31, 2013

How On-Base Percentage and Slugging Percentage Affect Winning

This is something that I posted at Beyond the Boxscore in 2006.

It's probably obvious that if a team increases its on-base percentage (OBP) or slugging percentage (SLG), its winning percentage will go up. Get more runners on and hit for more power, you win more games. But how many more? If OBP goes up by as much as SLG, will they both lead to the same increase in wins? What does it mean for OBP to go up by as much as SLG? The same number of points? By the same percentage? Or should we look at something slightly more sophisticated, like a one standard deviation increase for each one? And what about reducing the OBP and SLG of your opponents? How many wins will that bring?

To try to get a handle on this, I used linear regression to find an equation for team winning percentage (I looked at all teams from 1989-2002). This is what I got

PCT = .493 + 2.01*OBP + .858*SLG - 2.06*OPPOBP - .806*OPPSLG

OPPOBP and OPPSLG are, respectively, the OBP and SLG teams allow their opponents. Given this relationship, how many more games will team win if they increase OBP and SLG (or reduce their opponents' OBP and SLG)? Table 1 shows the various increases in wins for a given change in performance


For example, if team OBP goes up by .010, wins over a 162 game season will increase by 3.26 (2.01*.01*162 = 3.26). For team SLG, it will go up 1.39 wins. The next column shows that the OBP increase is 2.35 times as important as the SLG increase. Lowering your opponents OBP and SLG have about the same effect and relationship.
 
The average team OBP was about .331. So a 10% increase would be about .033. The average team SLG was about .411, so a 10% increase would be about .041. The numbers were the same for OPPOBP and OPPSLG. A 10% increase in OBP adds 10.79 wins while a 10% increase in SLG adds 5.71. This makes OBP 1.89 times as important as SLG. The changes are about the same on the pitching side.
 
Standard deviation (SD) is a measure of spread or dispersion. The SD of OBP was .0149. That increase would add 4.85 wins. The SD for SLG was .0311. That increase would add 4.32 wins. In this case OBP is 1.12 times as important as SLG. On the pitching side, the SDs were about the same, so the results are similar.
 
So the relative win value of OBP and SLG can depend on how you frame the question or what kind of change you are looking at. In regressions with team runs per game as the dependent variable instead of winning percentage, the coefficient value on OBP is usually about 1.5 or 1.6 times that of SLG. It is more than double here for some reason. I am not sure why.
 
I also did the analysis with isolated power (ISO) instead of SLG. ISO is SLG minus AVG and is a better measure of power hitting than SLG, since a guy could get a single every time up and have an SLG of 1.000 with no extra base power. In this case, the regression equation was
 
PCT = .499 + 2.52*OBP + .962*ISO - 2.54*OPPOBP - .923*OPPISO
 
Table 2 shows the various win increases. I won't discuss those results since it would just repeat the previous discussion. The numbers mean the same things they meant in Table 1. The average ISO was .147 and the SD of ISO was .0227. Those were about the same on the pitching side.
 
 
Technical notes: The r-squared for the first regression .817, meaning that 81.7% of the variation in team winning percentage is explained by the equation. The standard error was .0297. That is about 4.8 wins a season. All of the independent variables were statistically significant, with all T-values above 8 or less than -8. The r-squared for the second regression .818, meaning that 81.8% of the variation in team winning percentage is explained by the equation. The standard error was .0297. That is about 4.8 wins a season. All of the independent variables were statistically significant, with all T-values above 8 or less than -8. There were 394 teams.
 
Now the comments
 

Correlation

How much do OBP and SLG correlate with each other? If there is a high correlation between the two, OBP might be sucking up some of the effect of SLG%. Isolated power might help reduce some of that but not all. If there something you could use that breaks OBP into its component parts, Walks and Hits?
 

Correlation

The correlation between OBP & SLG was .777. For OPPOBP & OPPSLG, it was .838. For ISO & OBP it was .616. For OPPOBP & OPPISO it was .703. Those seem high, so collinearity may be a problem. But I had low standard errors for the coefficient estimates, which is usually an indication that collinearity is not a problem.
 
Another way to check for multicollinearity is to run regressions in which one IV is a function of all of the other IVs. In the first model with OBP and SLG, the r-squared was about .5 when OBP was the dependent variable and the other variables (SLG, OPPOPB, OPPSLG) were the independent variables. There is a stat called the "variance inflation factor" or VIF. It is 1/(1 - r-squared). So if r-squared was .5, 1 - .5 = .5. Then 1/.5 = 2. A couple of sources I looked at suggested that if the VIF is under 10, multicollinearity is not a problem. So in this case, the VIF is only about 2. For the other 3 cases, VIF only got as high as 4. I did come across one source that said there is no rule about the value of VIF and multicollinearity.
 
But I did run the following regression based on your suggestions
 
PCT = .491 + 1.04*EXB +2.72*H + 2.53*W - 1*OPPEXB - 2.7*OPPH - 2.54*OPPW
 
EXB is extra bases/PA (PA = walks + ABs)
H is hits/PA
W = walks/PA
 
So it looks like a pretty big difference between getting hits and getting on base and hitting for power. Here are the win changes for a 1 SD improvement
 
EXB 3.38
W 4.64
H 4.43
OPPEXB 3.06
OPPH 5.19
OPPW 4.15
 

Sunday, May 26, 2013

Players Who Had A Line Drive Percentage Of At Least 30%

Sean Forman compiled this list for me. I noticed the other day that Miguel Cabrerra had 30% so far this year. I wondered what the record was and Sean was kind enough to come up with an answer. Interesting that about 90% of them are between 1996 and 2002 yet the stat goes back to 1988. Not sure why no one has done it the last 10 years. It is the % of all balls put in play that are line drives.