This is a follow up to today's earlier post on clutch hitting (which was a follow up to yesterday's post).
Again, I use Ed Oswalt’s measure “player’s win value” (or PWV) which is like WPA or Win Probability Added.
"WPA quantifies the percent change in a team's chances of winning from one event to the next. It does so by measuring the importance of a given plate appearance."
I ran a linear regression in which PWV/PA was the dependent variable and relative OBP and SLG were the independent variables. It used all 284 players with 5000 or more plate appearances from 1972-2002. The regression equation was
PWV/PA = -.0246 + .000149*OBP + .000097*SLG
The r-squared was .935, meaning that 93.5% of the variation in PWV/PA is explained by the model. The standard error is .00056 or about .39 wins for a 700 PA season. The correlation between OBP and SLG is about .52. Each of those has a correlation of over .8 with Wins/PA. So again, two very simple stats explain what is going on with the much more complex, clutch stat.
About 82% of the players were within .5 PWV or wins of what the equation predicts. Only 5 were more than 1 win better or worse. So at most, the best clutch hitter can add about 1.38 wins a season above what you would expect them to.
Now another stat, the Game State Victories (GSV) from the Rhoids Sports Analysis website, shows the same tendencies.
I have a data set with 191 players who have 900 or more at bats over the years 2001-2003. (this is not all of them-I'll explain later). So I ran a regression in which each hitter's GSV for the three seasons was the dependent variable and their cumulative totals for various other stats were the independent variables. The r-squared I got was .91, meaning that 91% of the variation in GSV across hitters is explained by regular counting stats that are not at all context dependent (well, not quite, again I will explain later-it concerns SACs). That is, these independent variables have nothing to do with the score or the inning. Yet they explain almost all of the variation in the context dependent variable.
SAC = -.021
SF = -.049
GIDP = -.1098
CS = -.059
SB = .0299
BB = .053
HR = .106
3B = .099
2B = .0877
1B = .0603
OUTS = -.01537
Intercept = -1.47
Now the data. I chose 900 at-bats because the data given that I was able to download from the Rhoids website, listed other stats, but not walks. So I wanted a convenient cutoff for which players to count and I chose 300 for individual years, figuring anyone who gets 400 plate appearances probably has at least 300 at bats. And over three years that is 900. I also used the data from Doug Steele's website to get walks, GIDP, etc.
The data problems. In some years, players who obviously did very well had zero for their GSV. Very often they were rookies who also had a zero listed for their salary and the Rhoids people wanted to do something with runs per dollar. Maybe that is why a zero is given, since you cannot divide by zero. Also some players simply had an "NA" listed. Others clearly had the wrong number, like Juan Sosa getting the same GSV as Sammy Sosa one year. Some other players were just not listed. I did not see Edgar Martinez in the "data dump" for this year. I could not always tell which Alex Gonzalez I was looking at or if I did they were not listed for all years. Some player names were not spelled the same way each year (I went through and made the necessary corrections to allow for doing subtotals in excel). So I did not have all of the players with 900 or more at bats from this period. I think about 30 got left out.
The correlation between GSV per plate appearance for players with 300 or more at bats in both 2001 and 2002 was .5. But the correlation between OPS in 2001 and GSV per PA in 2002 was actually higher, at .519. So if you wanted to predict a player's GSV per PA in 2002, his OPS in 2001 would do a slightly better job than his GSV per PA in 2001.
I also calculated a predicted GSV per PA for these players in both 2001 and 2002 using the coefficient values from the regression which used 1Bs, 2Bs, 3Bs, HRs, BBs, SBs, CSs, Outs and GIDPs. Then I calculated the difference between the actual GSV per PA in each year and the predicted GSV per PA in each year. Then I found the correlation between the differences or residuals for the two years and it was .081. That seems very low. I think this means that players who were especially good in the clutch in 2001 (who had a higher GSV per PA than predicted) were not likely to again, in 2002, have a higher GSV per PA than predicted in 2002 (I think this is the kind of analysis that Dick Cramer performed on the Player Win Average of the Mills brothers).
Sources
These are the sources that I listed 20 years ago. Some links might no longer work.
"Ballpark Figures to Bet On," Nov. 21, UPFRONT section BusinessWeek magazine. Author was Brian Hindo.
“What's a Ball Player Worth?” can be found at:
http://www.businessweek.com/print/bwdaily/dnflash/nov2003/nf2003115_2313_db016.htm?db
Player Win Averages by Eldon G. and Harlan D. Mills. 1970. A.S. Barnes, publisher.
Curve Ball: Baseball, Statistics, and the Role of Chance in the Game by Jim Albert and Jay Bennett. Revised 2003. Copernicus Books.
Rhoids Sports Analysis: http://www.rhoids.com/
Ed Oswalt’s site is at: http://www.livewild.org/bb/playervalues/index.html
The Nov. 7, 2004 NY Times article is at
http://query.nytimes.com/gst/abstract.html?res=F30A1EFA39580C748CDDA80994DC404482But you will probably have to pay to read all of it.
Other sites where you might find it are
http://www.iht.com/articles/2004/11/07/sports/base.html
http://redsox.mostvaluablenetwork.com/wp-content/sites/schwarzWRAP.html