I took all the Hall of Fame votes from 1966-2014 from Baseball Reference. On those pages, BR shows the vote% each player got but also their career WAR (it also shows the combined WAR of their seven best seasons as well as the Jay Jaffe stats "Jaws" which combines career and 7 best).
Five players were tossed out of the analysis: Barry Bonds, Rafael Palmeiro, Mark McGwire, Sammy Sosa and Pete Rose. The voters have severely penalized the first four for possible PED use, not because they underrated them. Something similar with Rose. There was a cloud of scandal over him when he first came up because of betting on baseball.
One thing I wanted to do was find a trend line for the vote. I could not find one that made sense using career WAR or Jaws. Any trend line had too many ups and downs. Vote% should not go down as WAR goes up. But once you look at the trend line I used for WAR7, you will see how non-linear the data is.
So when I had Excel put in trend lines, the only one that made reasonable sense was a sixth degree polynomial with WAR7 as the independent variable and vote% as the dependent variable. It does have some ups and downs where I really don't want them, but they are not too severe.
Click here to see the graph.
So I hope you can see that trying to fit a trend line to the data has problems. This seems like the best I could do.
Using the regression equation, I then calculated each player's predict 1st year vote% (the equation you seen in the graph probably does not show enough decimal places for the coefficient values-x in the graph is WAR7). Then that was subtracted from their actual 1st year vote% and a difference was found. I then ranked them all from the biggest negative difference to the biggest positive difference.
The player with the biggest negative differential, whom we might say was the most underrated, was Ron Santo. He got only 3.9% of the vote in his 1st year but if he was right on the trend line, it would have been 75.4%.
The most overrated player was Lou Brock. He got 79.7% of the vote while the model predicts he would get 6.7%. It helps to reach a milestone like 3000 hits, retire as the all-time SB leader and perform very well in three 7-game world series. Click here to see my research that supports this. As for Santo, click here to see my post that explains he got about the vote% we would expect, given the general preferences of the voters.
Click here to see the complete rankings
Wednesday, January 15, 2014
Sunday, January 12, 2014
Eddie Robinson's Great Homerun-To-Strikeout Ratio
Update Jan. 13: He is actually not in the top 25, but he is pretty close. One thing I forgot to take into account when using the Lee Sinins Complete Baseball Encyclopedia is what the league average is based on. I just looked at his page and what he and the league average had. But it needed to be consistent with my earlier research. Click here to see the new, complete list.
I used all guys from 1920-2012 who had 4500+ PAs (the first study excluded Robinson because he had 4891). I set the Sinins database to compare players to non-pitchers and used each guy's rate of HRs & SOs on a per plate appearance basis (you could choose to go with all players including pitchers and use ABs instead of PAs, for example). Robinson is 40th out of 884 players, so that puts him in the top 5%. When you look the list, you will see him ahead of some great hitters.
**********************************************************
Eddie Robinson played for several teams in the 1940s and 50s, including the Indians, White Sox and Yankees. He was the regular first baseman on the 1948 world champion Indians. Click here to go to his Baseball Reference page. He batted .348 in 23 World Series at-bats.
Here is a link to his SABR biography written by C. Paul Rogers III. One thing it says is
"He was the seventh player and first White Sox to hit a ball over the roof at old Comiskey Park (in 1951); the first six were Hall of Famers Babe Ruth, Lou Gehrig, Jimmy Foxx, Hank Greenberg, Ted Williams and Mickey Mantle."
He is also author (along with C. Paul Rogers III) of the 2011 book titled Lucky Me: My Sixty-five Years in Baseball.
Robinson struck out less than the league average (356 vs. 489). But his HR rate relative to his strikeout rate was outstanding. He it 172 HRs while the league average player would 85. So his relative HR rate was 202.35. His relative strikeout rate was 73.868 (356/489 times 100). Then 202.35/73.868 = 2.739. That would put him in the top 25 all-time. But when I did this analysis I only included guys with 5000+ PAs. He just missed with 4891. See my post Which Players Had The Best HR-To-Strikeout Ratios?
I used all guys from 1920-2012 who had 4500+ PAs (the first study excluded Robinson because he had 4891). I set the Sinins database to compare players to non-pitchers and used each guy's rate of HRs & SOs on a per plate appearance basis (you could choose to go with all players including pitchers and use ABs instead of PAs, for example). Robinson is 40th out of 884 players, so that puts him in the top 5%. When you look the list, you will see him ahead of some great hitters.
**********************************************************
Eddie Robinson played for several teams in the 1940s and 50s, including the Indians, White Sox and Yankees. He was the regular first baseman on the 1948 world champion Indians. Click here to go to his Baseball Reference page. He batted .348 in 23 World Series at-bats.
Here is a link to his SABR biography written by C. Paul Rogers III. One thing it says is
"He was the seventh player and first White Sox to hit a ball over the roof at old Comiskey Park (in 1951); the first six were Hall of Famers Babe Ruth, Lou Gehrig, Jimmy Foxx, Hank Greenberg, Ted Williams and Mickey Mantle."
He is also author (along with C. Paul Rogers III) of the 2011 book titled Lucky Me: My Sixty-five Years in Baseball.
Robinson struck out less than the league average (356 vs. 489). But his HR rate relative to his strikeout rate was outstanding. He it 172 HRs while the league average player would 85. So his relative HR rate was 202.35. His relative strikeout rate was 73.868 (356/489 times 100). Then 202.35/73.868 = 2.739. That would put him in the top 25 all-time. But when I did this analysis I only included guys with 5000+ PAs. He just missed with 4891. See my post Which Players Had The Best HR-To-Strikeout Ratios?
Wednesday, January 8, 2014
What Made Maddux So Unique?
It was a highly unusual combination of being able to prevent HRs without walking many batters and without striking many out. I created an index to measure this and he is far ahead of anyone else. See Who Was More "Magical" Than Greg Maddux? (Or Pitcher's HR/BB/SO Rating) from December 2, 2009.
Tuesday, January 7, 2014
Has Albert Pujols Been Getting More MVP Votes Than Expected Based On WAR?
This is based on a couple posts you can probably see below. So read them for explanations and technical details. Those posts have been discussed over at Baseball Think Factory.
I did the analysis for each year of Albert Pujols' career. In each year I tried to find a polynomial trend line that best fit the voting that year. Who was included in the analysis each year? Anyone who had at least as many ABs as the lowest AB total for anyone who got votes. Sometimes players don't get any votes but have a pretty good WAR and I don't think they should be left out of the analysis. So there had to be some way to decide who got included. So it is a different number of players each year.
I usually went with the highest r-squared among 2nd, 3rd degree, etc. polynomials. But they had to make sense. Sometimes the line goes up and down alot and I preferred lines like the ones in the graphs I used already. Logs and exponential functions would not work be cause of zero values for WAR and MVP shares. Sometimes even negative WAR values came up.
So I got a predicted value for each year of his career (including 2013 when he got no votes and had only 391 ABs, so I included everyone in the AL who had 391+ ABs).
In 11 of his 13 seasons his share was higher than predicted. Adding up all of the differences between his predicted share and actual share I got 1.73. So, although his rank in the MVP vote is about right based on his rank in WAR, his vote total is still higher than expected based on the overall pattern of the vote by the writers.
Now this 1.73 is lower than the 3.78 I currently have for him. But to know where that ranks I will have to go through every season since 1931 for each league one by one and get a total for all players. That will take some time.
I did the analysis for each year of Albert Pujols' career. In each year I tried to find a polynomial trend line that best fit the voting that year. Who was included in the analysis each year? Anyone who had at least as many ABs as the lowest AB total for anyone who got votes. Sometimes players don't get any votes but have a pretty good WAR and I don't think they should be left out of the analysis. So there had to be some way to decide who got included. So it is a different number of players each year.
I usually went with the highest r-squared among 2nd, 3rd degree, etc. polynomials. But they had to make sense. Sometimes the line goes up and down alot and I preferred lines like the ones in the graphs I used already. Logs and exponential functions would not work be cause of zero values for WAR and MVP shares. Sometimes even negative WAR values came up.
So I got a predicted value for each year of his career (including 2013 when he got no votes and had only 391 ABs, so I included everyone in the AL who had 391+ ABs).
In 11 of his 13 seasons his share was higher than predicted. Adding up all of the differences between his predicted share and actual share I got 1.73. So, although his rank in the MVP vote is about right based on his rank in WAR, his vote total is still higher than expected based on the overall pattern of the vote by the writers.
Now this 1.73 is lower than the 3.78 I currently have for him. But to know where that ranks I will have to go through every season since 1931 for each league one by one and get a total for all players. That will take some time.
Monday, January 6, 2014
Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated? (Revised)
Click here to see the original post from a few days ago. The idea was to see the relationship between MVP shares and WAR.
If you read that, you will see that the regression line estimate was a 2nd degree polynomial. But to calculate the predicted MVP shares, I used the equation that appears when I ask Excel to graph it. That equation only goes to a few decimal places. In this case, it matters because the values of the WAR variable can be very large.
So I had Excel do the line estimate and I had the coefficient values go out several more decimal places. The new equation is
MVPShares = 0.000256845*WARSquared + 0.010979681*WAR - 0.11979
Squaring Willie Mays' WAR gives us about 24,000. Now .003*24,304.81 = 7.291. But if I have
0.000256845*24,304.81 = 6.24
That alone lowers May's predicted MVP shares about 1 (again, using the regression estimate). Mays actually slips from the most underrated player to the second most underrated player. Lou Whitaker, who did very poorly in the Hall of Fame vote (unjustly so), is now number 1.
Click here to see the revised list. It does not look like players moved very much. Pujols was still the most overrated by this measure.
The new equation for the case where I used only each player's seven best seasons of WAR is
MVPShares = 0.001807713*WAR7Squared - 0.041037828*WAR7 + 0.268264829
The original post had a + in front of the 2nd coefficient. It should have been minus and has been corrected. Wade Boggs was still the most underrated here and Pujols was still the most overrated.
Click here to see the revised results.
If you read that, you will see that the regression line estimate was a 2nd degree polynomial. But to calculate the predicted MVP shares, I used the equation that appears when I ask Excel to graph it. That equation only goes to a few decimal places. In this case, it matters because the values of the WAR variable can be very large.
So I had Excel do the line estimate and I had the coefficient values go out several more decimal places. The new equation is
MVPShares = 0.000256845*WARSquared + 0.010979681*WAR - 0.11979
Squaring Willie Mays' WAR gives us about 24,000. Now .003*24,304.81 = 7.291. But if I have
0.000256845*24,304.81 = 6.24
That alone lowers May's predicted MVP shares about 1 (again, using the regression estimate). Mays actually slips from the most underrated player to the second most underrated player. Lou Whitaker, who did very poorly in the Hall of Fame vote (unjustly so), is now number 1.
Click here to see the revised list. It does not look like players moved very much. Pujols was still the most overrated by this measure.
The new equation for the case where I used only each player's seven best seasons of WAR is
MVPShares = 0.001807713*WAR7Squared - 0.041037828*WAR7 + 0.268264829
The original post had a + in front of the 2nd coefficient. It should have been minus and has been corrected. Wade Boggs was still the most underrated here and Pujols was still the most overrated.
Click here to see the revised results.
Sunday, January 5, 2014
Friday, January 3, 2014
Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated?
Click here to see the revised version of this. Mays slips from #1 to #2. Lou Whitaker is now number #1.
To get a handle on these questions, I first compared career WAR and career MVP shares for a large group of players. In the first case I used WAR from Baseball Reference. In the second case I used WAR from Fangraphs (and only each player's seven best seasons).
They were everyone who had 5000+ PAs since 1931 (I excluded anyone who played more than half a season before 1931 because that was the first year of the baseball writers MVP award).
Then I included everyone who was in the top 200 in MVP shares (only position players). The lowest career WAR of anyone in that group was 17.3, belonging to Cecil Fielder. So I then also added in everyone who had 17.3+ WAR since 1931. Total players was 810.
Then I regressed MVP shares on WAR. A second order polynomial was a better fit than a straight line regression. Click here to see the scatter plot with trend line. Here is the regression equation
MVPShares = 0.0003*WARSquared + 0.011*WAR - 0.1198
Then I estimated each player's predicted MVP shares and found the difference. Click here to see the entire results. The 20 players with the most negative differences are listed below. Willie Mays had a career WAR of 155.9. So he was predicted by the equation to have 8.89 MVP shares but he only had 5.94. So his differential is a -2.95.
This approach is not perfect. Some players might have long careers and so they compile a high career WAR. But if they never have any great seasons, they might not get many MVP votes. Plus, it helps to play on contenders. But Mays had plenty of great seasons and played on many contenders. There is also the possibility that if there are other great players around compiling high WAR seasons, you won't do as well in the voting.
Now here are the players who got more MVP Shares than predicted.
Pujols got 3.49 more shares than predicted, making him the most overrated player by this measure.
Then, using data from Fangraphs, I found all the players who had 4000+ PAs since 1931 and found their WAR from their seven best seasons combined (each player's WAR in 1981 was increased by 50% while it was 40% for 1994-this is due to player strikes). Total players, 931. Click here to see the scatter plot and trend line. Again, a second degree polynomial was better than a straight line regression (if you look closely, the line slopes downward for very low WAR players, which should not make sense-but this is avoided with a sixth degree polynomial whose results are essentially the same, so I used the simpler one here). Here is the equation
MVPShares = 0.0018*WAR7Squared - 0.041*WAR7 + 0.2683
Here are the most underrated players. Boggs was actually number 1 in WAR 3 straight years while his team came if first twice. He was second in WAR 3 times. He reached the post season a total of six times. But the best he ever finished in the MVP voting was fourth. Mays was 117th here. Click here to see the entire results.
Now the players who got more MVP shares than predicted
To get a handle on these questions, I first compared career WAR and career MVP shares for a large group of players. In the first case I used WAR from Baseball Reference. In the second case I used WAR from Fangraphs (and only each player's seven best seasons).
They were everyone who had 5000+ PAs since 1931 (I excluded anyone who played more than half a season before 1931 because that was the first year of the baseball writers MVP award).
Then I included everyone who was in the top 200 in MVP shares (only position players). The lowest career WAR of anyone in that group was 17.3, belonging to Cecil Fielder. So I then also added in everyone who had 17.3+ WAR since 1931. Total players was 810.
Then I regressed MVP shares on WAR. A second order polynomial was a better fit than a straight line regression. Click here to see the scatter plot with trend line. Here is the regression equation
MVPShares = 0.0003*WARSquared + 0.011*WAR - 0.1198
Then I estimated each player's predicted MVP shares and found the difference. Click here to see the entire results. The 20 players with the most negative differences are listed below. Willie Mays had a career WAR of 155.9. So he was predicted by the equation to have 8.89 MVP shares but he only had 5.94. So his differential is a -2.95.
Rank
|
Player
|
Career
WAR
|
Award
Shares
|
Pred
|
Diff
|
1
|
Willie Mays
|
155.9
|
5.94
|
8.89
|
-2.95
|
2
|
Rickey Henderson
|
110.6
|
2.46
|
4.77
|
-2.31
|
3
|
Lou Whitaker
|
74.8
|
0.21
|
2.38
|
-2.17
|
4
|
Wade Boggs
|
91
|
1.2
|
3.36
|
-2.16
|
5
|
Eddie Mathews
|
96.1
|
1.61
|
3.71
|
-2.10
|
6
|
Hank Aaron
|
142.3
|
5.45
|
7.52
|
-2.07
|
7
|
Willie Randolph
|
65.6
|
0.04
|
1.89
|
-1.85
|
8
|
Ozzie Smith
|
76.5
|
0.65
|
2.48
|
-1.83
|
9
|
Bobby Grich
|
71
|
0.43
|
2.17
|
-1.74
|
10
|
Buddy Bell
|
66
|
0.18
|
1.91
|
-1.73
|
11
|
Willie Davis
|
60.8
|
0.1
|
1.66
|
-1.56
|
12
|
Scott Rolen
|
69.9
|
0.57
|
2.11
|
-1.54
|
13
|
Bobby Abreu
|
60.5
|
0.17
|
1.64
|
-1.47
|
14
|
Carl Yastrzemski
|
96
|
2.23
|
3.70
|
-1.47
|
15
|
Graig Nettles
|
67.9
|
0.56
|
2.01
|
-1.45
|
16
|
Kenny Lofton
|
67.9
|
0.58
|
2.01
|
-1.43
|
17
|
Chet Lemon
|
55.2
|
0
|
1.40
|
-1.40
|
18
|
Johnny Damon
|
56.4
|
0.07
|
1.45
|
-1.38
|
19
|
Darrell Evans
|
58.5
|
0.17
|
1.55
|
-1.38
|
20
|
Cal Ripken
|
95.5
|
2.31
|
3.67
|
-1.36
|
This approach is not perfect. Some players might have long careers and so they compile a high career WAR. But if they never have any great seasons, they might not get many MVP votes. Plus, it helps to play on contenders. But Mays had plenty of great seasons and played on many contenders. There is also the possibility that if there are other great players around compiling high WAR seasons, you won't do as well in the voting.
Now here are the players who got more MVP Shares than predicted.
Rank
|
Player
|
Career
WAR
|
Award
Shares
|
Pred
|
Diff
|
791
|
Cecil Fielder
|
17.3
|
1.67
|
0.16
|
1.51
|
792
|
Mike Piazza
|
59.2
|
3.16
|
1.58
|
1.58
|
793
|
Albert Belle
|
39.8
|
2.38
|
0.79
|
1.59
|
794
|
Harmon Killebrew
|
60.4
|
3.23
|
1.64
|
1.59
|
795
|
David Ortiz
|
44
|
2.6
|
0.94
|
1.66
|
796
|
Pedro Guerrero
|
34.4
|
2.3
|
0.61
|
1.69
|
797
|
George Bell
|
20.2
|
1.92
|
0.22
|
1.70
|
798
|
Steve Garvey
|
37.5
|
2.46
|
0.71
|
1.75
|
799
|
Willie Stargell
|
57.3
|
3.3
|
1.49
|
1.81
|
800
|
Roy Campanella
|
34.2
|
2.52
|
0.61
|
1.91
|
801
|
Juan Gonzalez
|
38.5
|
2.76
|
0.75
|
2.01
|
802
|
Jim Rice
|
47.3
|
3.15
|
1.07
|
2.08
|
803
|
Hank Greenberg
|
57.6
|
3.69
|
1.51
|
2.18
|
804
|
Ryan Howard
|
18.9
|
2.49
|
0.19
|
2.30
|
805
|
Yogi Berra
|
59.3
|
3.98
|
1.59
|
2.39
|
806
|
Dave Parker
|
39.9
|
3.19
|
0.80
|
2.39
|
807
|
Frank Thomas
|
73.6
|
4.79
|
2.31
|
2.48
|
808
|
Joe DiMaggio
|
78.3
|
5.45
|
2.58
|
2.87
|
809
|
Miguel Cabrera
|
54.7
|
4.25
|
1.38
|
2.87
|
810
|
Albert Pujols
|
92.9
|
6.9
|
3.49
|
3.41
|
Pujols got 3.49 more shares than predicted, making him the most overrated player by this measure.
Then, using data from Fangraphs, I found all the players who had 4000+ PAs since 1931 and found their WAR from their seven best seasons combined (each player's WAR in 1981 was increased by 50% while it was 40% for 1994-this is due to player strikes). Total players, 931. Click here to see the scatter plot and trend line. Again, a second degree polynomial was better than a straight line regression (if you look closely, the line slopes downward for very low WAR players, which should not make sense-but this is avoided with a sixth degree polynomial whose results are essentially the same, so I used the simpler one here). Here is the equation
MVPShares = 0.0018*WAR7Squared - 0.041*WAR7 + 0.2683
Here are the most underrated players. Boggs was actually number 1 in WAR 3 straight years while his team came if first twice. He was second in WAR 3 times. He reached the post season a total of six times. But the best he ever finished in the MVP voting was fourth. Mays was 117th here. Click here to see the entire results.
Rank
|
Name
|
WAR7
|
Award
Shares
|
Pred
|
Diff
|
1
|
Wade Boggs
|
56.1
|
1.2
|
3.63
|
-2.43
|
2
|
Ron Santo
|
52.9
|
1.23
|
3.14
|
-1.91
|
3
|
Eddie Mathews
|
55.3
|
1.61
|
3.51
|
-1.90
|
4
|
Bobby Grich
|
46.15
|
0.43
|
2.21
|
-1.78
|
5
|
Rickey Henderson
|
59.55
|
2.46
|
4.21
|
-1.75
|
6
|
Chase Utley
|
46.7
|
0.73
|
2.28
|
-1.55
|
7
|
Arky Vaughan
|
50.2
|
1.23
|
2.75
|
-1.52
|
8
|
Scott Rolen
|
44.7
|
0.57
|
2.03
|
-1.46
|
9
|
Bobby Abreu
|
40.9
|
0.17
|
1.60
|
-1.43
|
10
|
Buddy Bell
|
40.65
|
0.18
|
1.58
|
-1.40
|
11
|
Brian Giles
|
40.4
|
0.2
|
1.55
|
-1.35
|
12
|
Andruw Jones
|
47.8
|
1.1
|
2.42
|
-1.32
|
13
|
Robin Ventura
|
39.9
|
0.26
|
1.50
|
-1.24
|
14
|
Chet Lemon
|
36.95
|
0
|
1.21
|
-1.21
|
15
|
Ron Cey
|
39.35
|
0.25
|
1.44
|
-1.19
|
16
|
Darrell Evans
|
38.3
|
0.17
|
1.34
|
-1.17
|
17
|
Jim Edmonds
|
45
|
0.9
|
2.07
|
-1.17
|
18
|
Kenny Lofton
|
42.24
|
0.58
|
1.75
|
-1.17
|
19
|
Carlos Beltran
|
43.8
|
0.76
|
1.93
|
-1.17
|
20
|
Graig Nettles
|
41.6
|
0.56
|
1.68
|
-1.12
|
Now the players who got more MVP shares than predicted
Rank
|
Name
|
WAR7
|
Award
Shares
|
Pred
|
Diff
|
912
|
Barry Bonds
|
76.7
|
9.3
|
7.71
|
1.59
|
913
|
Brooks Robinson
|
45.1
|
3.69
|
2.08
|
1.61
|
914
|
Eddie Murray
|
41.1
|
3.33
|
1.62
|
1.71
|
915
|
Willie Stargell
|
40.7
|
3.3
|
1.58
|
1.72
|
916
|
George Bell
|
20.8
|
1.92
|
0.19
|
1.73
|
917
|
Hank Aaron
|
56.4
|
5.45
|
3.68
|
1.77
|
918
|
Pete Rose
|
43.4
|
3.68
|
1.88
|
1.80
|
919
|
Stan Musial
|
64.7
|
6.96
|
5.15
|
1.81
|
920
|
Jim Rice
|
38.2
|
3.15
|
1.33
|
1.82
|
921
|
David Ortiz
|
31.7
|
2.6
|
0.78
|
1.82
|
922
|
Steve Garvey
|
28.3
|
2.46
|
0.55
|
1.91
|
923
|
Dave Parker
|
36.9
|
3.19
|
1.21
|
1.98
|
924
|
Frank Robinson
|
50.8
|
4.84
|
2.83
|
2.01
|
925
|
Joe DiMaggio
|
54.6
|
5.45
|
3.40
|
2.05
|
926
|
Frank Thomas
|
49.7
|
4.79
|
2.68
|
2.11
|
927
|
Juan Gonzalez
|
27.2
|
2.76
|
0.48
|
2.28
|
928
|
Miguel Cabrera
|
44.1
|
4.25
|
1.96
|
2.29
|
929
|
Ryan Howard
|
20.6
|
2.49
|
0.19
|
2.30
|
930
|
Yogi Berra
|
40
|
3.98
|
1.51
|
2.47
|
931
|
Albert Pujols
|
58.9
|
6.9
|
4.10
|
2.80
|