Thursday, September 26, 2013

How Well Do OBP And SLG Explain WPA?

I took all the players with 5000+ PAs since 1945 and found their WPA per PA using the data from Baseball Reference.

Baseball Reference only has WPA or Win Probability Added going back to 1945. The idea is that every hit, walk or out a player makes either increases or decreases his team's probability of winning. Players who hit well with runners on base or in the late innings of close games will tend to score higher.

I ran a regresssion with WPA/PA being the dependent variable and a player's OBP & SLG relative to the league average. Here is he equation

WPA = 0.00008686*RELSLG + 0.00016*RELOBP - .0245

Notice that the coefficient for OBP divided by the one for SLG is 1.85. That is pretty close to the 1.73 that Tom Tango shows at Blast From the Past: Proof of the Modified OPS.

The r-squared was .915 and the standard error was 0.0006686 or .468 over 700 PAs.

I also checked to see how each guy was predicted by the equation. Then I calculated how many more or fewer wins they had than predicted. Click here to see the list.

Sandy Alomar, Sr. had the biggest positive differential. I plugged in his RELOBP of 76 and RELSLG of 91 (both are below 100 and therefore, below average) and it predicted a WPA/PA of -.0034 but he actually had -.0015. So that means he actually hit better the more important the situation, the closer and later it was, the more outs there were and the more men that were on base.

Over 700 PAs, that is 1.33 wins. That means his teams 1.33 more games per season than we might have thought based only on his OBP and SLG.

Click here to see Alomar's career splits at Baseball Reference. He batted .297 in high leverage situations with a .341 OBP. Medium leverage, he had .240 & .286. Low leverage .232 & .275. Maybe being a switch hitter helped so that teams could not get the platoon advantage on him. He his followed Eric Davis, McCovey and Berra.

But Ted Williams had the biggest negative differential. Click here to see his splits. If you go the list I have and scroll down, he his at the bottom. The Red Sox won 1.96 fewer games than we might expect based on his OBP & SLG. The table below shows how Williams hit in the various situations

Split BA OBP SLG
High Lvrge 0.323 0.468 0.576
Medium Lvrge 0.349 0.485 0.638
Low Lvrge 0.333 0.475 0.633

His numbers are all down when it was high leverage (this probably only includes his post-war data). Maybe because he was a lefty, he faced more lefties in high leverage situations than he normally did. If I used the Baseball Reference Play Index correctly, he normally faced lefties 23.8% of the time but in high leverage cases it was 26.4%

But his overall OPS over the years they have play by play data was 1.158 vs. righties and .915 vs. lefties. In high leverage cases, it was 1.104 and .884. So he declined against both lefties and righties, and more in absolute terms against righties.

But Williams is not the only great player near the bottom of the list. The bottom 11 includes three other Hall of Famers: Musial, Boggs and Kiner. Robinson Cano is there, too.

Update 9-28: I broke down all the guys into groups of .1 and then made a distribution chart. Here it is. Actually, I cut everyone down to 1 decimal point and then did a frequency distribution. If there is a better or easier way to do this with Excel, please let me know



Per 700 PA count
1.3 2
1.2 1
1.1 2
1 4
0.9 10
0.8 11
0.7 20
0.6 27
0.5 31
0.4 37
0.3 50
0.2 47
0.1 51
0 48
-0.1 47
-0.2 44
-0.3 38
-0.4 31
-0.5 26
-0.6 20
-0.7 11
-0.8 6
-0.9 8
-1 9
-1.1 1
-1.2 1
-1.3 1
-1.4 2
-2 1

Here is a slightly different version

3 comments:

Cliff Blau said...

Off hand, this looks like a pretty normal distribution. Any reason to think the differences are due to anything other than chance?

Cyril Morong said...

Cliff

Thanks for reading and commenting. It does seem normal and my guess is that it is random. When I look at who is near the top and bottom, I don't see any particular kind of pattern.

Cy

Cyril Morong said...

I broke down all the guys into groups of .1 and then made a distribution chart. Actually, I cut everyone down to 1 decimal point and then did a frequency distribution. If there is a better or easier way to do this with Excel, please let me know