Wednesday, March 25, 2026

The Problem With “Total Clutch” Hitting Statistics, Part 2

This is a follow up to today's earlier post on clutch hitting (which was a follow up to yesterday's post).

Again, I use Ed Oswalt’s measure “player’s win value” (or PWV) which is like WPA or Win Probability Added.

"WPA quantifies the percent change in a team's chances of winning from one event to the next. It does so by measuring the importance of a given plate appearance." 

From MLB.COM

I ran a linear regression in which PWV/PA was the dependent variable and relative OBP and SLG were the independent variables. It used all 284 players with 5000 or more plate appearances from 1972-2002.  The regression equation was

PWV/PA = -.0246 + .000149*OBP + .000097*SLG

The r-squared was .935, meaning that 93.5% of the variation in PWV/PA is explained by the model.  The standard error is .00056 or about .39 wins for a 700 PA season.  The correlation between OBP and SLG is about .52.  Each of those has a correlation of over .8 with Wins/PA. So again, two very simple stats explain what is going on with the much more complex, clutch stat.

About 82% of the players were within .5 PWV or wins of what the equation predicts. Only 5 were more than 1 win better or worse. So at most, the best clutch hitter can add about 1.38 wins a season above what you would expect them to. 

Now another stat, the Game State Victories (GSV) from the Rhoids Sports Analysis website, shows the same tendencies.

I have a data set with 191 players who have 900 or more at bats over the years 2001-2003. (this is not all of them-I'll explain later).  So I ran a regression in which each hitter's GSV for the three seasons was the dependent variable and their cumulative totals for various other stats were the independent variables.  The r-squared I got was .91, meaning that 91% of the variation in GSV across hitters is explained by regular counting stats that are not at all context dependent (well, not quite, again I will explain later-it concerns SACs).  That is, these independent variables have nothing to do with the score or the inning.  Yet they explain almost all of the variation in the context dependent variable. 

Here are the coefficient estimates

SAC = -.021
SF = -.049
GIDP = -.1098
CS = -.059
SB = .0299
BB = .053
HR = .106
3B = .099
2B = .0877
1B = .0603
OUTS = -.01537
Intercept = -1.47 

Outs means outs not counting the GIDP. BB includes hit by pitch.  The only variables not statistically significant (T-value less than 1.96) are SAC, SF and CS. That is a surprise for CS.   I believe that GSV or anything like it is a clutch stat.  Yet non-clutch data does a very good job of explaining the variation in GSV. It is possible that the unexplained variation is due to chance and not any kind of clutch hitting ability.

Now the data.  I chose 900 at-bats because the data given that I was able to download from the Rhoids website, listed other stats, but not walks.  So I wanted a convenient cutoff for which players to count and I chose 300 for individual years, figuring anyone who gets 400 plate appearances probably has at least 300 at bats.  And over three years that is 900. I also used the data from Doug Steele's website to get walks, GIDP, etc. 

The data problems.  In some years, players who obviously did very well had zero for their GSV.  Very often they were rookies who also had a zero listed for their salary and the Rhoids people wanted to do something with runs per dollar.  Maybe that is why a zero is given, since you cannot divide by zero. Also some players simply had an "NA" listed.  Others clearly had the wrong number, like Juan Sosa getting the same GSV as Sammy Sosa one year.  Some other players were just not listed.  I did not see Edgar Martinez in the "data dump" for this year.  I could not always tell which Alex Gonzalez I was looking at or if I did they were not listed for all years.  Some player names were not spelled the same way each year (I went through and made the necessary corrections to allow for doing subtotals in excel).  So I did not have all of the players with 900 or more at bats from this period.  I think about 30 got left out.  

The correlation between GSV per plate appearance for players with 300 or more at bats in both 2001 and 2002 was .5.  But the correlation between OPS in 2001 and GSV per PA in 2002 was actually higher, at .519.  So if you wanted to predict a player's GSV per PA in 2002, his OPS in 2001 would do a slightly better job than his GSV per PA in 2001.

I also calculated a predicted GSV per PA for these players in both 2001 and 2002 using the coefficient values from the regression which used 1Bs, 2Bs, 3Bs, HRs, BBs, SBs, CSs, Outs and GIDPs. Then I calculated the difference between the actual GSV per PA in each year and the predicted GSV per PA in each year.  Then I found the correlation between the differences or residuals for the two years and it was .081.  That seems very low. I think this means that players who were especially good in the clutch in 2001 (who had a higher GSV per PA than predicted) were not likely to again, in 2002, have a higher GSV per PA than predicted in 2002 (I think this is the kind of analysis that Dick Cramer performed on the Player Win Average of the Mills brothers).

Sources

These are the sources that I listed 20 years ago. Some links might no longer work. 

"Ballpark Figures to Bet On," Nov. 21, UPFRONT section BusinessWeek magazine. Author was Brian Hindo. 

“What's a Ball Player Worth?” can be found at:

http://www.businessweek.com/print/bwdaily/dnflash/nov2003/nf2003115_2313_db016.htm?db 

Player Win Averages by Eldon G. and Harlan D. Mills.  1970. A.S. Barnes, publisher. 

Curve Ball: Baseball, Statistics, and the Role of Chance in the Game by Jim Albert and Jay Bennett. Revised 2003. Copernicus Books.

Rhoids Sports Analysis: http://www.rhoids.com/ 

Ed Oswalt’s site is at: http://www.livewild.org/bb/playervalues/index.html

The Nov. 7, 2004 NY Times article is at 

http://query.nytimes.com/gst/abstract.html?res=F30A1EFA39580C748CDDA80994DC404482 

But you will probably have to pay to read all of it. 

Other sites where you might find it are

http://www.iht.com/articles/2004/11/07/sports/base.html  

http://redsox.mostvaluablenetwork.com/wp-content/sites/schwarzWRAP.html

The Problem With “Total Clutch” Hitting Statistics, Part 1

This is the follow up to yesterday's post on clutch hitting that I mentioned. It is also from articles I posted about 20 years ago. I will probably do a Part 2 very soon.

Introduction 

A recent article (late 2003) appeared in BusinessWeek magazine called “Ball Park Figures You Can Bet On” which described a “new statistic” developed by Benjamin Polak and Brian Lonergan of Yale University which measures “wins contributed” by major league hitters. (another article on their work appeared in Nov. 2004 in the NY Times-see sources below) From the online version: 

“Here's how their method works: Let's say the home team is down by two runs in the bottom of the fifth inning, with no outs and a runner on second base. At that moment, the home team has a 39% chance (or 0.39 probability) that it will win. If the batter grounds out, and the runner at second fails to advance, the team's chance of winning falls to 33%. The difference between the two, -0.06, is assigned to the batter who just grounded out.” 

Now they add this up for the whole season, every plate appearance and get wins contributed for each player (see link in sources). 

The problem with this approach is that it is not new and that it really tells us nothing about what a ball player is worth since in the long run this “total clutch” stat is highly correlated with normal hitting statistics, as I will demonstrate(my critique is not new-See the book "The Hidden Game of Baseball" by John Thorn and Pete Palmer. They discuss what Dick Cramer had to say about the Mills brothers)  By “total clutch” stat, I mean one that takes into account every plate appearance and each one is weighted by its importance according to the score and inning.  Hits when the game is late and close will count for more than hits when the game is early and the score is lopsided. 

History 

This is definitely not a new stat.  It goes back at least as far as 1970 when Eldon G. and Harlan D. Mills published their book Player Win Averages.  Polak and Lonergan’s “wins contributed” stat is similar. So is the “Player Game Percentage” in the book Curve Ball by Jim Albert and Jay Bennett.  So is the “Game State Victories (or Wins) found at the Rhoids Sports Analysis website (see sources).  So is “player's win value” by Ed Oswalt (his link is also in sources).  So what Lonergan and Polak have done is definitely not new. 

Analysis

Let’s start with the Ed Oswalt’s measure “player’s win value” (or PWV) since he uses thirty years of data, covering the years 1972-2002. The best hitters on his list will not surprise you and his stat divided by plate appearances (or PA) is highly correlated with stats like on-base percentage (OBP) and slugging percentage (SLG) as well as OPS (OBP + SLG). 

First, I looked at the top 100 players in plate appearances from 1972-2002.  I then correlated relative OPS (relative to the league average for each player) with Oswalt’s PWV/PA.  The correlation was 0.948.  This is very close to a one-to-one relationship.  If you square this (called r-squared), you get 0.898, meaning that 89.8% of the variation across hitters in PWV/PA is explained by relative OPS.  This is important because it shows that a very simple, non-clutch, non-situational, non-context stat like OPS pretty much explains a much more complex context dependent stat that is supposed to tell us the value of hitters. 

The linear regression equation is PWV/PA = .00022*OPS - 022 

In the figure below, you can see the relationship where PWV/PA is a function of relative OPS.

 

In the table below, you can see each player’s PWV/PA and his relative OPS.  Bonds, for example has 137, which means his OPS was 37% higher than the league average for the 1972-2002 period.  The top ten or twenty hitters will not surprise you.

Rank

Player

PWV/PA

Relative OPS

1

Barry Bonds

0.008

137

2

Mark McGwire

0.0068

131

3

Jeff Bagwell

0.006

127

5

Mike Schmidt

0.0049

126

6

Ken Griffey Jr.

0.0048

124

7

George Brett

0.0045

119

13

Fred McGriff

0.004

119

14

Rafael Palmeiro

0.0039

119

4

Will Clark

0.005

118

8

Rod Carew

0.0045

118

10

Jack Clark

0.0041

118

11

Reggie Jackson

0.004

118

26

Jim Rice

0.0032

118

49

Sammy Sosa

0.0023

118

19

Dwight Evans

0.0037

117

23

Fred Lynn

0.0035

117

32

Ellis Burks

0.003

117

12

John Olerud

0.004

116

20

Wade Boggs

0.0036

116

9

Tony Gwynn

0.0042

115

25

Jose Canseco

0.0033

115

55

Bobby Bonilla

0.002

115

15

Kirby Puckett

0.0038

114

17

Rickey Henderson

0.0038

114

18

Eddie Murray

0.0037

114

36

Dave Winfield

0.0027

114

42

Dale Murphy

0.0024

114

30

Don Mattingly

0.003

113

33

Andres Galarraga

0.0029

113

27

Dave Parker

0.0031

112

43

Al Oliver

0.0024

112

53

Bobby Grich

0.0021

112

16

Mark Grace

0.0038

111

21

Ken Singleton

0.0036

111

28

Harold Baines

0.0031

111

34

Tim Raines

0.0028

111

37

Cecil Cooper

0.0026

111

50

Wally Joyner

0.0023

111

51

Paul O'Neill

0.0022

111

56

Ron Cey

0.0019

111

58

Carlton Fisk

0.0019

111

61

Andre Dawson

0.0018

111

22

Keith Hernandez

0.0036

110

24

Darrell Evans

0.0033

110

39

Paul Molitor

0.0025

110

41

Ted Simmons

0.0025

110

44

Roberto Alomar

0.0024

110

45

Brian Downing

0.0024

110

47

Craig Biggio

0.0023

110

52

Barry Larkin

0.0022

110

59

Ryne Sandberg

0.0019

110

72

Chet Lemon

0.0009

110

29

Ken Griffey Sr.

0.0031

109

48

Dusty Baker

0.0023

109

54

Steve Garvey

0.002

109

31

Toby Harrah

0.003

108

35

Lou Whitaker

0.0027

108

40

Gary Matthews

0.0025

108

62

Chili Davis

0.0018

108

69

George Hendrick

0.0012

108

71

Don Baylor

0.0011

108

38

Pete Rose

0.0026

107

46

Jose Cruz

0.0024

107

60

Robin Ventura

0.0018

107

65

Robin Yount

0.0014

107

73

Gary Carter

0.0008

107

66

Cal Ripken

0.0014

106

67

Julio Franco

0.0013

106

63

Alan Trammell

0.0016

105

64

Chris Chambliss

0.0014

105

68

Graig Nettles

0.0013

105

70

Brett Butler

0.0011

104

74

Brady Anderson

0.0008

104

76

Ruben Sierra

0.0007

104

77

Carney Lansford

0.0007

104

78

Buddy Bell

0.0004

104

79

Joe Carter

0.0004

104

82

Todd Zeile

0.0003

103

85

Steve Finley

0

103

93

Lance Parrish

-0.0009

103

84

Jay Bell

0.0001

102

57

Tony Phillips

0.0019

101

75

Bill Buckner

0.0007

101

81

Tony Fernandez

0.0004

101

88

Tim Wallach

-0.0005

101

80

Willie Randolph

0.0004

100

86

Willie McGee

0

100

87

Gary Gaetti

-0.0003

100

83

B.J. Surhoff

0.0001

99

91

Devon White

-0.0007

99

89

Terry Pendleton

-0.0005

97

90

Dave Concepcion

-0.0006

96

92

Steve Sax

-0.0009

96

95

Willie Wilson

-0.0013

96

97

Garry Templeton

-0.0016

93

98

Frank White

-0.0018

93

94

Omar Vizquel

-0.0012

92

96

Ozzie Smith

-0.0015

92

99

Bob Boone

-0.0025

91

100

Larry Bowa

-0.0035

87

Sources

These are the sources that I listed 20 years ago. Some links might no longer work. 

"Ballpark Figures to Bet On," Nov. 21, UPFRONT section BusinessWeek magazine. Author was Brian Hindo. 

“What's a Ball Player Worth?” can be found at:

http://www.businessweek.com/print/bwdaily/dnflash/nov2003/nf2003115_2313_db016.htm?db 

Player Win Averages by Eldon G. and Harlan D. Mills.  1970. A.S. Barnes, publisher. 

Curve Ball: Baseball, Statistics, and the Role of Chance in the Game by Jim Albert and Jay Bennett. Revised 2003. Copernicus Books.

Rhoids Sports Analysis: http://www.rhoids.com/ 

Ed Oswalt’s site is at: http://www.livewild.org/bb/playervalues/index.html

The Nov. 7, 2004 NY Times article is at 

http://query.nytimes.com/gst/abstract.html?res=F30A1EFA39580C748CDDA80994DC404482 

But you will probably have to pay to read all of it. 

Other sites where you might find it are

http://www.iht.com/articles/2004/11/07/sports/base.html  http://redsox.mostvaluablenetwork.com/wp-content/sites/schwarzWRAP.html