I took all the Hall of Fame votes from 19662014 from Baseball Reference. On those pages, BR shows the vote% each player got but also their career WAR (it also shows the combined WAR of their seven best seasons as well as the Jay Jaffe stats "Jaws" which combines career and 7 best).
Five players were tossed out of the analysis: Barry Bonds, Rafael Palmeiro, Mark McGwire, Sammy Sosa and Pete Rose. The voters have severely penalized the first four for possible PED use, not because they underrated them. Something similar with Rose. There was a cloud of scandal over him when he first came up because of betting on baseball.
One thing I wanted to do was find a trend line for the vote. I could not find one that made sense using career WAR or Jaws. Any trend line had too many ups and downs. Vote% should not go down as WAR goes up. But once you look at the trend line I used for WAR7, you will see how nonlinear the data is.
So when I had Excel put in trend lines, the only one that made reasonable sense was a sixth degree polynomial with WAR7 as the independent variable and vote% as the dependent variable. It does have some ups and downs where I really don't want them, but they are not too severe.
Click here to see the graph.
So I hope you can see that trying to fit a trend line to the data has problems. This seems like the best I could do.
Using the regression equation, I then calculated each player's predict 1st year vote% (the equation you seen in the graph probably does not show enough decimal places for the coefficient valuesx in the graph is WAR7). Then that was subtracted from their actual 1st year vote% and a difference was found. I then ranked them all from the biggest negative difference to the biggest positive difference.
The player with the biggest negative differential, whom we might say was the most underrated, was Ron Santo. He got only 3.9% of the vote in his 1st year but if he was right on the trend line, it would have been 75.4%.
The most overrated player was Lou Brock. He got 79.7% of the vote while the model predicts he would get 6.7%. It helps to reach a milestone like 3000 hits, retire as the alltime SB leader and perform very well in three 7game world series. Click here to see my research that supports this. As for Santo, click here to see my post that explains he got about the vote% we would expect, given the general preferences of the voters.
Click here to see the complete rankings
Wednesday, January 15, 2014
Sunday, January 12, 2014
Eddie Robinson's Great HomerunToStrikeout Ratio
Update Jan. 13: He is actually not in the top 25, but he is pretty close. One thing I forgot to take into account when using the Lee Sinins Complete Baseball Encyclopedia is what the league average is based on. I just looked at his page and what he and the league average had. But it needed to be consistent with my earlier research. Click here to see the new, complete list.
I used all guys from 19202012 who had 4500+ PAs (the first study excluded Robinson because he had 4891). I set the Sinins database to compare players to nonpitchers and used each guy's rate of HRs & SOs on a per plate appearance basis (you could choose to go with all players including pitchers and use ABs instead of PAs, for example). Robinson is 40th out of 884 players, so that puts him in the top 5%. When you look the list, you will see him ahead of some great hitters.
**********************************************************
Eddie Robinson played for several teams in the 1940s and 50s, including the Indians, White Sox and Yankees. He was the regular first baseman on the 1948 world champion Indians. Click here to go to his Baseball Reference page. He batted .348 in 23 World Series atbats.
Here is a link to his SABR biography written by C. Paul Rogers III. One thing it says is
"He was the seventh player and first White Sox to hit a ball over the roof at old Comiskey Park (in 1951); the first six were Hall of Famers Babe Ruth, Lou Gehrig, Jimmy Foxx, Hank Greenberg, Ted Williams and Mickey Mantle."
He is also author (along with C. Paul Rogers III) of the 2011 book titled Lucky Me: My Sixtyfive Years in Baseball.
Robinson struck out less than the league average (356 vs. 489). But his HR rate relative to his strikeout rate was outstanding. He it 172 HRs while the league average player would 85. So his relative HR rate was 202.35. His relative strikeout rate was 73.868 (356/489 times 100). Then 202.35/73.868 = 2.739. That would put him in the top 25 alltime. But when I did this analysis I only included guys with 5000+ PAs. He just missed with 4891. See my post Which Players Had The Best HRToStrikeout Ratios?
I used all guys from 19202012 who had 4500+ PAs (the first study excluded Robinson because he had 4891). I set the Sinins database to compare players to nonpitchers and used each guy's rate of HRs & SOs on a per plate appearance basis (you could choose to go with all players including pitchers and use ABs instead of PAs, for example). Robinson is 40th out of 884 players, so that puts him in the top 5%. When you look the list, you will see him ahead of some great hitters.
**********************************************************
Eddie Robinson played for several teams in the 1940s and 50s, including the Indians, White Sox and Yankees. He was the regular first baseman on the 1948 world champion Indians. Click here to go to his Baseball Reference page. He batted .348 in 23 World Series atbats.
Here is a link to his SABR biography written by C. Paul Rogers III. One thing it says is
"He was the seventh player and first White Sox to hit a ball over the roof at old Comiskey Park (in 1951); the first six were Hall of Famers Babe Ruth, Lou Gehrig, Jimmy Foxx, Hank Greenberg, Ted Williams and Mickey Mantle."
He is also author (along with C. Paul Rogers III) of the 2011 book titled Lucky Me: My Sixtyfive Years in Baseball.
Robinson struck out less than the league average (356 vs. 489). But his HR rate relative to his strikeout rate was outstanding. He it 172 HRs while the league average player would 85. So his relative HR rate was 202.35. His relative strikeout rate was 73.868 (356/489 times 100). Then 202.35/73.868 = 2.739. That would put him in the top 25 alltime. But when I did this analysis I only included guys with 5000+ PAs. He just missed with 4891. See my post Which Players Had The Best HRToStrikeout Ratios?
Wednesday, January 8, 2014
What Made Maddux So Unique?
It was a highly unusual combination of being able to prevent HRs without walking many batters and without striking many out. I created an index to measure this and he is far ahead of anyone else. See Who Was More "Magical" Than Greg Maddux? (Or Pitcher's HR/BB/SO Rating) from December 2, 2009.
Tuesday, January 7, 2014
Has Albert Pujols Been Getting More MVP Votes Than Expected Based On WAR?
This is based on a couple posts you can probably see below. So read them for explanations and technical details. Those posts have been discussed over at Baseball Think Factory.
I did the analysis for each year of Albert Pujols' career. In each year I tried to find a polynomial trend line that best fit the voting that year. Who was included in the analysis each year? Anyone who had at least as many ABs as the lowest AB total for anyone who got votes. Sometimes players don't get any votes but have a pretty good WAR and I don't think they should be left out of the analysis. So there had to be some way to decide who got included. So it is a different number of players each year.
I usually went with the highest rsquared among 2nd, 3rd degree, etc. polynomials. But they had to make sense. Sometimes the line goes up and down alot and I preferred lines like the ones in the graphs I used already. Logs and exponential functions would not work be cause of zero values for WAR and MVP shares. Sometimes even negative WAR values came up.
So I got a predicted value for each year of his career (including 2013 when he got no votes and had only 391 ABs, so I included everyone in the AL who had 391+ ABs).
In 11 of his 13 seasons his share was higher than predicted. Adding up all of the differences between his predicted share and actual share I got 1.73. So, although his rank in the MVP vote is about right based on his rank in WAR, his vote total is still higher than expected based on the overall pattern of the vote by the writers.
Now this 1.73 is lower than the 3.78 I currently have for him. But to know where that ranks I will have to go through every season since 1931 for each league one by one and get a total for all players. That will take some time.
I did the analysis for each year of Albert Pujols' career. In each year I tried to find a polynomial trend line that best fit the voting that year. Who was included in the analysis each year? Anyone who had at least as many ABs as the lowest AB total for anyone who got votes. Sometimes players don't get any votes but have a pretty good WAR and I don't think they should be left out of the analysis. So there had to be some way to decide who got included. So it is a different number of players each year.
I usually went with the highest rsquared among 2nd, 3rd degree, etc. polynomials. But they had to make sense. Sometimes the line goes up and down alot and I preferred lines like the ones in the graphs I used already. Logs and exponential functions would not work be cause of zero values for WAR and MVP shares. Sometimes even negative WAR values came up.
So I got a predicted value for each year of his career (including 2013 when he got no votes and had only 391 ABs, so I included everyone in the AL who had 391+ ABs).
In 11 of his 13 seasons his share was higher than predicted. Adding up all of the differences between his predicted share and actual share I got 1.73. So, although his rank in the MVP vote is about right based on his rank in WAR, his vote total is still higher than expected based on the overall pattern of the vote by the writers.
Now this 1.73 is lower than the 3.78 I currently have for him. But to know where that ranks I will have to go through every season since 1931 for each league one by one and get a total for all players. That will take some time.
Monday, January 6, 2014
Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated? (Revised)
Click here to see the original post from a few days ago. The idea was to see the relationship between MVP shares and WAR.
If you read that, you will see that the regression line estimate was a 2nd degree polynomial. But to calculate the predicted MVP shares, I used the equation that appears when I ask Excel to graph it. That equation only goes to a few decimal places. In this case, it matters because the values of the WAR variable can be very large.
So I had Excel do the line estimate and I had the coefficient values go out several more decimal places. The new equation is
MVPShares = 0.000256845*WARSquared + 0.010979681*WAR  0.11979
Squaring Willie Mays' WAR gives us about 24,000. Now .003*24,304.81 = 7.291. But if I have
0.000256845*24,304.81 = 6.24
That alone lowers May's predicted MVP shares about 1 (again, using the regression estimate). Mays actually slips from the most underrated player to the second most underrated player. Lou Whitaker, who did very poorly in the Hall of Fame vote (unjustly so), is now number 1.
Click here to see the revised list. It does not look like players moved very much. Pujols was still the most overrated by this measure.
The new equation for the case where I used only each player's seven best seasons of WAR is
MVPShares = 0.001807713*WAR7Squared  0.041037828*WAR7 + 0.268264829
The original post had a + in front of the 2nd coefficient. It should have been minus and has been corrected. Wade Boggs was still the most underrated here and Pujols was still the most overrated.
Click here to see the revised results.
If you read that, you will see that the regression line estimate was a 2nd degree polynomial. But to calculate the predicted MVP shares, I used the equation that appears when I ask Excel to graph it. That equation only goes to a few decimal places. In this case, it matters because the values of the WAR variable can be very large.
So I had Excel do the line estimate and I had the coefficient values go out several more decimal places. The new equation is
MVPShares = 0.000256845*WARSquared + 0.010979681*WAR  0.11979
Squaring Willie Mays' WAR gives us about 24,000. Now .003*24,304.81 = 7.291. But if I have
0.000256845*24,304.81 = 6.24
That alone lowers May's predicted MVP shares about 1 (again, using the regression estimate). Mays actually slips from the most underrated player to the second most underrated player. Lou Whitaker, who did very poorly in the Hall of Fame vote (unjustly so), is now number 1.
Click here to see the revised list. It does not look like players moved very much. Pujols was still the most overrated by this measure.
The new equation for the case where I used only each player's seven best seasons of WAR is
MVPShares = 0.001807713*WAR7Squared  0.041037828*WAR7 + 0.268264829
The original post had a + in front of the 2nd coefficient. It should have been minus and has been corrected. Wade Boggs was still the most underrated here and Pujols was still the most overrated.
Click here to see the revised results.
Sunday, January 5, 2014
Friday, January 3, 2014
Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated?
Click here to see the revised version of this. Mays slips from #1 to #2. Lou Whitaker is now number #1.
To get a handle on these questions, I first compared career WAR and career MVP shares for a large group of players. In the first case I used WAR from Baseball Reference. In the second case I used WAR from Fangraphs (and only each player's seven best seasons).
They were everyone who had 5000+ PAs since 1931 (I excluded anyone who played more than half a season before 1931 because that was the first year of the baseball writers MVP award).
Then I included everyone who was in the top 200 in MVP shares (only position players). The lowest career WAR of anyone in that group was 17.3, belonging to Cecil Fielder. So I then also added in everyone who had 17.3+ WAR since 1931. Total players was 810.
Then I regressed MVP shares on WAR. A second order polynomial was a better fit than a straight line regression. Click here to see the scatter plot with trend line. Here is the regression equation
MVPShares = 0.0003*WARSquared + 0.011*WAR  0.1198
Then I estimated each player's predicted MVP shares and found the difference. Click here to see the entire results. The 20 players with the most negative differences are listed below. Willie Mays had a career WAR of 155.9. So he was predicted by the equation to have 8.89 MVP shares but he only had 5.94. So his differential is a 2.95.
This approach is not perfect. Some players might have long careers and so they compile a high career WAR. But if they never have any great seasons, they might not get many MVP votes. Plus, it helps to play on contenders. But Mays had plenty of great seasons and played on many contenders. There is also the possibility that if there are other great players around compiling high WAR seasons, you won't do as well in the voting.
Now here are the players who got more MVP Shares than predicted.
Pujols got 3.49 more shares than predicted, making him the most overrated player by this measure.
Then, using data from Fangraphs, I found all the players who had 4000+ PAs since 1931 and found their WAR from their seven best seasons combined (each player's WAR in 1981 was increased by 50% while it was 40% for 1994this is due to player strikes). Total players, 931. Click here to see the scatter plot and trend line. Again, a second degree polynomial was better than a straight line regression (if you look closely, the line slopes downward for very low WAR players, which should not make sensebut this is avoided with a sixth degree polynomial whose results are essentially the same, so I used the simpler one here). Here is the equation
MVPShares = 0.0018*WAR7Squared  0.041*WAR7 + 0.2683
Here are the most underrated players. Boggs was actually number 1 in WAR 3 straight years while his team came if first twice. He was second in WAR 3 times. He reached the post season a total of six times. But the best he ever finished in the MVP voting was fourth. Mays was 117th here. Click here to see the entire results.
Now the players who got more MVP shares than predicted
To get a handle on these questions, I first compared career WAR and career MVP shares for a large group of players. In the first case I used WAR from Baseball Reference. In the second case I used WAR from Fangraphs (and only each player's seven best seasons).
They were everyone who had 5000+ PAs since 1931 (I excluded anyone who played more than half a season before 1931 because that was the first year of the baseball writers MVP award).
Then I included everyone who was in the top 200 in MVP shares (only position players). The lowest career WAR of anyone in that group was 17.3, belonging to Cecil Fielder. So I then also added in everyone who had 17.3+ WAR since 1931. Total players was 810.
Then I regressed MVP shares on WAR. A second order polynomial was a better fit than a straight line regression. Click here to see the scatter plot with trend line. Here is the regression equation
MVPShares = 0.0003*WARSquared + 0.011*WAR  0.1198
Then I estimated each player's predicted MVP shares and found the difference. Click here to see the entire results. The 20 players with the most negative differences are listed below. Willie Mays had a career WAR of 155.9. So he was predicted by the equation to have 8.89 MVP shares but he only had 5.94. So his differential is a 2.95.
Rank

Player

Career
WAR

Award
Shares

Pred

Diff

1

Willie Mays

155.9

5.94

8.89

2.95

2

Rickey Henderson

110.6

2.46

4.77

2.31

3

Lou Whitaker

74.8

0.21

2.38

2.17

4

Wade Boggs

91

1.2

3.36

2.16

5

Eddie Mathews

96.1

1.61

3.71

2.10

6

Hank Aaron

142.3

5.45

7.52

2.07

7

Willie Randolph

65.6

0.04

1.89

1.85

8

Ozzie Smith

76.5

0.65

2.48

1.83

9

Bobby Grich

71

0.43

2.17

1.74

10

Buddy Bell

66

0.18

1.91

1.73

11

Willie Davis

60.8

0.1

1.66

1.56

12

Scott Rolen

69.9

0.57

2.11

1.54

13

Bobby Abreu

60.5

0.17

1.64

1.47

14

Carl Yastrzemski

96

2.23

3.70

1.47

15

Graig Nettles

67.9

0.56

2.01

1.45

16

Kenny Lofton

67.9

0.58

2.01

1.43

17

Chet Lemon

55.2

0

1.40

1.40

18

Johnny Damon

56.4

0.07

1.45

1.38

19

Darrell Evans

58.5

0.17

1.55

1.38

20

Cal Ripken

95.5

2.31

3.67

1.36

This approach is not perfect. Some players might have long careers and so they compile a high career WAR. But if they never have any great seasons, they might not get many MVP votes. Plus, it helps to play on contenders. But Mays had plenty of great seasons and played on many contenders. There is also the possibility that if there are other great players around compiling high WAR seasons, you won't do as well in the voting.
Now here are the players who got more MVP Shares than predicted.
Rank

Player

Career
WAR

Award
Shares

Pred

Diff

791

Cecil Fielder

17.3

1.67

0.16

1.51

792

Mike Piazza

59.2

3.16

1.58

1.58

793

Albert Belle

39.8

2.38

0.79

1.59

794

Harmon Killebrew

60.4

3.23

1.64

1.59

795

David Ortiz

44

2.6

0.94

1.66

796

Pedro Guerrero

34.4

2.3

0.61

1.69

797

George Bell

20.2

1.92

0.22

1.70

798

Steve Garvey

37.5

2.46

0.71

1.75

799

Willie Stargell

57.3

3.3

1.49

1.81

800

Roy Campanella

34.2

2.52

0.61

1.91

801

Juan Gonzalez

38.5

2.76

0.75

2.01

802

Jim Rice

47.3

3.15

1.07

2.08

803

Hank Greenberg

57.6

3.69

1.51

2.18

804

Ryan Howard

18.9

2.49

0.19

2.30

805

Yogi Berra

59.3

3.98

1.59

2.39

806

Dave Parker

39.9

3.19

0.80

2.39

807

Frank Thomas

73.6

4.79

2.31

2.48

808

Joe DiMaggio

78.3

5.45

2.58

2.87

809

Miguel Cabrera

54.7

4.25

1.38

2.87

810

Albert Pujols

92.9

6.9

3.49

3.41

Pujols got 3.49 more shares than predicted, making him the most overrated player by this measure.
Then, using data from Fangraphs, I found all the players who had 4000+ PAs since 1931 and found their WAR from their seven best seasons combined (each player's WAR in 1981 was increased by 50% while it was 40% for 1994this is due to player strikes). Total players, 931. Click here to see the scatter plot and trend line. Again, a second degree polynomial was better than a straight line regression (if you look closely, the line slopes downward for very low WAR players, which should not make sensebut this is avoided with a sixth degree polynomial whose results are essentially the same, so I used the simpler one here). Here is the equation
MVPShares = 0.0018*WAR7Squared  0.041*WAR7 + 0.2683
Here are the most underrated players. Boggs was actually number 1 in WAR 3 straight years while his team came if first twice. He was second in WAR 3 times. He reached the post season a total of six times. But the best he ever finished in the MVP voting was fourth. Mays was 117th here. Click here to see the entire results.
Rank

Name

WAR7

Award
Shares

Pred

Diff

1

Wade Boggs

56.1

1.2

3.63

2.43

2

Ron Santo

52.9

1.23

3.14

1.91

3

Eddie Mathews

55.3

1.61

3.51

1.90

4

Bobby Grich

46.15

0.43

2.21

1.78

5

Rickey Henderson

59.55

2.46

4.21

1.75

6

Chase Utley

46.7

0.73

2.28

1.55

7

Arky Vaughan

50.2

1.23

2.75

1.52

8

Scott Rolen

44.7

0.57

2.03

1.46

9

Bobby Abreu

40.9

0.17

1.60

1.43

10

Buddy Bell

40.65

0.18

1.58

1.40

11

Brian Giles

40.4

0.2

1.55

1.35

12

Andruw Jones

47.8

1.1

2.42

1.32

13

Robin Ventura

39.9

0.26

1.50

1.24

14

Chet Lemon

36.95

0

1.21

1.21

15

Ron Cey

39.35

0.25

1.44

1.19

16

Darrell Evans

38.3

0.17

1.34

1.17

17

Jim Edmonds

45

0.9

2.07

1.17

18

Kenny Lofton

42.24

0.58

1.75

1.17

19

Carlos Beltran

43.8

0.76

1.93

1.17

20

Graig Nettles

41.6

0.56

1.68

1.12

Now the players who got more MVP shares than predicted
Rank

Name

WAR7

Award
Shares

Pred

Diff

912

Barry Bonds

76.7

9.3

7.71

1.59

913

Brooks Robinson

45.1

3.69

2.08

1.61

914

Eddie Murray

41.1

3.33

1.62

1.71

915

Willie Stargell

40.7

3.3

1.58

1.72

916

George Bell

20.8

1.92

0.19

1.73

917

Hank Aaron

56.4

5.45

3.68

1.77

918

Pete Rose

43.4

3.68

1.88

1.80

919

Stan Musial

64.7

6.96

5.15

1.81

920

Jim Rice

38.2

3.15

1.33

1.82

921

David Ortiz

31.7

2.6

0.78

1.82

922

Steve Garvey

28.3

2.46

0.55

1.91

923

Dave Parker

36.9

3.19

1.21

1.98

924

Frank Robinson

50.8

4.84

2.83

2.01

925

Joe DiMaggio

54.6

5.45

3.40

2.05

926

Frank Thomas

49.7

4.79

2.68

2.11

927

Juan Gonzalez

27.2

2.76

0.48

2.28

928

Miguel Cabrera

44.1

4.25

1.96

2.29

929

Ryan Howard

20.6

2.49

0.19

2.30

930

Yogi Berra

40

3.98

1.51

2.47

931

Albert Pujols

58.9

6.9

4.10

2.80

Subscribe to:
Posts (Atom)