Wednesday, January 15, 2014

Using A Player's WAR To Predict First Year Hall Of Fame Vote Percentage (and possibly estimate "underratedness")

I took all the Hall of Fame votes from 1966-2014 from Baseball Reference. On those pages, BR shows the vote% each player got but also their career WAR (it also shows the combined WAR of their seven best seasons as well as the Jay Jaffe stats "Jaws" which combines career and 7 best). 

Five players were tossed out of the analysis: Barry Bonds, Rafael Palmeiro, Mark McGwire, Sammy Sosa and Pete Rose. The voters have severely penalized the first four for possible PED use, not because they underrated them. Something similar with Rose. There was a cloud of scandal over him when he first came up because of betting on baseball. 

One thing I wanted to do was find a trend line for the vote. I could not find one that made sense using career WAR or Jaws. Any trend line had too many ups and downs. Vote% should not go down as WAR goes up. But once you look at the trend line I used for WAR7, you will see how non-linear the data is.

So when I had Excel put in trend lines, the only one that made reasonable sense was a sixth degree polynomial with WAR7 as the independent variable and vote% as the dependent variable. It does have some ups and downs where I really don't want them, but they are not too severe.

Click here to see the graph.

So I hope you can see that trying to fit a trend line to the data has problems. This seems like the best I could do.

Using the regression equation, I then calculated each player's predict 1st year vote% (the equation you seen in the graph probably does not show enough decimal places for the coefficient values-x in the graph is WAR7). Then that was subtracted from their actual 1st year vote% and a difference was found. I then ranked them all from the biggest negative difference to the biggest positive difference.

The player with the biggest negative differential, whom we might say was the most underrated, was Ron Santo. He got only 3.9% of the vote in his 1st year but if he was right on the trend line, it would have been 75.4%.

The most overrated player was Lou Brock. He got 79.7% of the vote while the model predicts he would get 6.7%. It helps to reach a milestone like 3000 hits, retire as the all-time SB leader and perform very well in three 7-game world series. Click here to see my research that supports this. As for Santo, click here to see my post that explains he got about the vote% we would expect, given the general preferences of the voters.

Click here to see the complete rankings

Sunday, January 12, 2014

Eddie Robinson's Great Homerun-To-Strikeout Ratio

Update Jan. 13: He is actually not in the top 25, but he is pretty close. One thing I forgot to take into account when using the Lee Sinins Complete Baseball Encyclopedia is what the league average is based on. I just looked at his page and what he and the league average had. But it needed to be consistent with my earlier research. Click here to see the new, complete list.

I used all guys from 1920-2012 who had 4500+ PAs (the first study excluded Robinson because he had 4891). I set the Sinins database to compare players to non-pitchers and used each guy's rate of HRs & SOs on a per plate appearance basis (you could choose to go with all players including pitchers and use ABs instead of PAs, for example). Robinson is 40th out of 884 players, so that puts him in the top 5%. When you look the list, you will see him ahead of some great hitters.

**********************************************************

Eddie Robinson played for several teams in the 1940s and 50s, including the Indians, White Sox and Yankees. He was the regular first baseman on the 1948 world champion Indians. Click here to go to his Baseball Reference page. He batted .348 in 23 World Series at-bats.

Here is a link to his SABR biography written by C. Paul Rogers III. One thing it says is

"He was the seventh player and first White Sox to hit a ball over the roof at old Comiskey Park (in 1951); the first six were Hall of Famers Babe Ruth, Lou Gehrig, Jimmy Foxx, Hank Greenberg, Ted Williams and Mickey Mantle."

He is also author (along with C. Paul Rogers III) of the 2011 book titled Lucky Me: My Sixty-five Years in Baseball.

Robinson struck out less than the league average (356 vs. 489). But his HR rate relative to his strikeout rate was outstanding. He it 172 HRs while the league average player would 85. So his relative HR rate was 202.35. His relative strikeout rate was 73.868 (356/489 times 100). Then 202.35/73.868 = 2.739. That would put him in the top 25 all-time. But when I did this analysis I only included guys with 5000+ PAs. He just missed with 4891. See my post Which Players Had The Best HR-To-Strikeout Ratios?

Wednesday, January 8, 2014

What Made Maddux So Unique?

It was a highly unusual combination of being able to prevent HRs without walking many batters and without striking many out. I created an index to measure this and he is far ahead of anyone else. See Who Was More "Magical" Than Greg Maddux? (Or Pitcher's HR/BB/SO Rating) from December 2, 2009.

Tuesday, January 7, 2014

Has Albert Pujols Been Getting More MVP Votes Than Expected Based On WAR?

This is based on a couple posts you can probably see below. So read them for explanations and technical details. Those posts have been discussed over at Baseball Think Factory.

I did the analysis for each year of Albert Pujols' career. In each year I tried to find a polynomial trend line that best fit the voting that year. Who was included in the analysis each year? Anyone who had at least as many ABs as the lowest AB total for anyone who got votes. Sometimes players don't get any votes but have a pretty good WAR and I don't think they should be left out of the analysis. So there had to be some way to decide who got included. So it is a different number of players each year.

I usually went with the highest r-squared among 2nd, 3rd degree, etc. polynomials. But they had to make sense. Sometimes the line goes up and down alot and I preferred lines like the ones in the graphs I used already. Logs and exponential functions would not work be cause of zero values for WAR and MVP shares. Sometimes even negative WAR values came up.

So I got a predicted value for each year of his career (including 2013 when he got no votes and had only 391 ABs, so I included everyone in the AL who had 391+ ABs).

In 11 of his 13 seasons his share was higher than predicted. Adding up all of the differences between his predicted share and actual share I got 1.73. So, although his rank in the MVP vote is about right based on his rank in WAR, his vote total is still higher than expected based on the overall pattern of the vote by the writers.

Now this 1.73 is lower than the 3.78 I currently have for him. But to know where that ranks I will have to go through every season since 1931 for each league one by one and get a total for all players. That will take some time.

Monday, January 6, 2014

Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated? (Revised)

Click here to see the original post from a few days ago. The idea was to see the relationship between MVP shares and WAR.

If you read that, you will see that the regression line estimate was a 2nd degree polynomial. But to calculate the predicted MVP shares, I used the equation that appears when I ask Excel to graph it. That equation only goes to a few decimal places. In this case, it matters because the values of the WAR variable can be very large.

So I had Excel do the line estimate and I had the coefficient values go out several more decimal places. The new equation is

MVPShares = 0.000256845*WARSquared + 0.010979681*WAR - 0.11979

Squaring Willie Mays' WAR gives us about 24,000. Now .003*24,304.81 = 7.291. But if I have

0.000256845*24,304.81 = 6.24

That alone lowers May's predicted MVP shares about 1 (again, using the regression estimate). Mays actually slips from the most underrated player to the second most underrated player. Lou Whitaker, who did very poorly in the Hall of Fame vote (unjustly so), is now number 1.

Click here to see the revised list. It does not look like players moved very much. Pujols was still the most overrated by this measure.

The new equation for the case where I used only each player's seven best seasons of WAR is

MVPShares = 0.001807713*WAR7Squared - 0.041037828*WAR7 + 0.268264829

The original post had a + in front of the 2nd coefficient. It should have been minus and has been corrected. Wade Boggs was still the most underrated here and Pujols was still the most overrated.

Click here to see the revised results.

Friday, January 3, 2014

Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated?

Click here to see the revised version of this. Mays slips from #1 to #2. Lou Whitaker is now number #1. 

To get a handle on these questions, I first compared career WAR and career MVP shares for a large group of players. In the first case I used WAR from Baseball Reference. In the second case I used WAR from Fangraphs (and only each player's seven best seasons).

They were everyone who had 5000+ PAs since 1931 (I excluded anyone who played more than half a season before 1931 because that was the first year of the baseball writers MVP award).

Then I included everyone who was in the top 200 in MVP shares (only position players). The lowest career WAR of anyone in that group was 17.3, belonging to Cecil Fielder. So I then also added in everyone who had 17.3+ WAR since 1931. Total players was 810.

Then I regressed MVP shares on WAR. A second order polynomial was a better fit than a straight line regression. Click here to see the scatter plot with trend line. Here is the regression equation

MVPShares = 0.0003*WARSquared + 0.011*WAR - 0.1198

Then I estimated each player's predicted MVP shares and found the difference. Click here to see the entire results. The 20 players with the most negative differences are listed below. Willie Mays had a career WAR of 155.9. So he was predicted by the equation to have 8.89 MVP shares but he only had 5.94. So his differential is a -2.95.

Rank
Player
Career WAR
Award Shares
Pred
Diff
1
Willie Mays
155.9
5.94
8.89
-2.95
2
Rickey Henderson
110.6
2.46
4.77
-2.31
3
Lou Whitaker
74.8
0.21
2.38
-2.17
4
Wade Boggs
91
1.2
3.36
-2.16
5
Eddie Mathews
96.1
1.61
3.71
-2.10
6
Hank Aaron
142.3
5.45
7.52
-2.07
7
Willie Randolph
65.6
0.04
1.89
-1.85
8
Ozzie Smith
76.5
0.65
2.48
-1.83
9
Bobby Grich
71
0.43
2.17
-1.74
10
Buddy Bell
66
0.18
1.91
-1.73
11
Willie Davis
60.8
0.1
1.66
-1.56
12
Scott Rolen
69.9
0.57
2.11
-1.54
13
Bobby Abreu
60.5
0.17
1.64
-1.47
14
Carl Yastrzemski
96
2.23
3.70
-1.47
15
Graig Nettles
67.9
0.56
2.01
-1.45
16
Kenny Lofton
67.9
0.58
2.01
-1.43
17
Chet Lemon
55.2
0
1.40
-1.40
18
Johnny Damon
56.4
0.07
1.45
-1.38
19
Darrell Evans
58.5
0.17
1.55
-1.38
20
Cal Ripken
95.5
2.31
3.67
-1.36


This approach is not perfect. Some players might have long careers and so they compile a high career WAR. But if they never have any great seasons, they might not get many MVP votes. Plus, it helps to play on contenders. But Mays had plenty of great seasons and played on many contenders. There is also the possibility that if there are other great players around compiling high WAR seasons, you won't do as well in the voting.

Now here are the players who got more MVP Shares than predicted.


Rank
Player
Career WAR
Award Shares
Pred
Diff
791
Cecil Fielder
17.3
1.67
0.16
1.51
792
Mike Piazza
59.2
3.16
1.58
1.58
793
Albert Belle
39.8
2.38
0.79
1.59
794
Harmon Killebrew
60.4
3.23
1.64
1.59
795
David Ortiz
44
2.6
0.94
1.66
796
Pedro Guerrero
34.4
2.3
0.61
1.69
797
George Bell
20.2
1.92
0.22
1.70
798
Steve Garvey
37.5
2.46
0.71
1.75
799
Willie Stargell
57.3
3.3
1.49
1.81
800
Roy Campanella
34.2
2.52
0.61
1.91
801
Juan Gonzalez
38.5
2.76
0.75
2.01
802
Jim Rice
47.3
3.15
1.07
2.08
803
Hank Greenberg
57.6
3.69
1.51
2.18
804
Ryan Howard
18.9
2.49
0.19
2.30
805
Yogi Berra
59.3
3.98
1.59
2.39
806
Dave Parker
39.9
3.19
0.80
2.39
807
Frank Thomas
73.6
4.79
2.31
2.48
808
Joe DiMaggio
78.3
5.45
2.58
2.87
809
Miguel Cabrera
54.7
4.25
1.38
2.87
810
Albert Pujols
92.9
6.9
3.49
3.41


Pujols got 3.49 more shares than predicted, making him the most overrated player by this measure.

Then, using data from Fangraphs, I found all the players who had 4000+ PAs since 1931 and found their WAR from their seven best seasons combined (each player's WAR in 1981 was increased by 50% while it was 40% for 1994-this is due to player strikes). Total players, 931. Click here to see the scatter plot and trend line. Again, a second degree polynomial was better than a straight line regression (if you look closely, the line slopes downward for very low WAR players, which should not make sense-but this is avoided with a sixth degree polynomial whose results are essentially the same, so I used the simpler one here). Here is the equation

MVPShares = 0.0018*WAR7Squared -  0.041*WAR7 + 0.2683

Here are the most underrated players. Boggs was actually number 1 in WAR 3 straight years while his team came if first twice. He was second in WAR 3 times. He reached the post season a total of six times. But the best he ever finished in the MVP voting was fourth. Mays was 117th here. Click here to see the entire results.


Rank
Name
WAR7
Award Shares
Pred
Diff
1
Wade Boggs
56.1
1.2
3.63
-2.43
2
Ron Santo
52.9
1.23
3.14
-1.91
3
Eddie Mathews
55.3
1.61
3.51
-1.90
4
Bobby Grich
46.15
0.43
2.21
-1.78
5
Rickey Henderson
59.55
2.46
4.21
-1.75
6
Chase Utley
46.7
0.73
2.28
-1.55
7
Arky Vaughan
50.2
1.23
2.75
-1.52
8
Scott Rolen
44.7
0.57
2.03
-1.46
9
Bobby Abreu
40.9
0.17
1.60
-1.43
10
Buddy Bell
40.65
0.18
1.58
-1.40
11
Brian Giles
40.4
0.2
1.55
-1.35
12
Andruw Jones
47.8
1.1
2.42
-1.32
13
Robin Ventura
39.9
0.26
1.50
-1.24
14
Chet Lemon
36.95
0
1.21
-1.21
15
Ron Cey
39.35
0.25
1.44
-1.19
16
Darrell Evans
38.3
0.17
1.34
-1.17
17
Jim Edmonds
45
0.9
2.07
-1.17
18
Kenny Lofton
42.24
0.58
1.75
-1.17
19
Carlos Beltran
43.8
0.76
1.93
-1.17
20
Graig Nettles
41.6
0.56
1.68
-1.12

Now the players who got more MVP shares than predicted

Rank
Name
WAR7
Award Shares
Pred
Diff
912
Barry Bonds
76.7
9.3
7.71
1.59
913
Brooks Robinson
45.1
3.69
2.08
1.61
914
Eddie Murray
41.1
3.33
1.62
1.71
915
Willie Stargell
40.7
3.3
1.58
1.72
916
George Bell
20.8
1.92
0.19
1.73
917
Hank Aaron
56.4
5.45
3.68
1.77
918
Pete Rose
43.4
3.68
1.88
1.80
919
Stan Musial
64.7
6.96
5.15
1.81
920
Jim Rice
38.2
3.15
1.33
1.82
921
David Ortiz
31.7
2.6
0.78
1.82
922
Steve Garvey
28.3
2.46
0.55
1.91
923
Dave Parker
36.9
3.19
1.21
1.98
924
Frank Robinson
50.8
4.84
2.83
2.01
925
Joe DiMaggio
54.6
5.45
3.40
2.05
926
Frank Thomas
49.7
4.79
2.68
2.11
927
Juan Gonzalez
27.2
2.76
0.48
2.28
928
Miguel Cabrera
44.1
4.25
1.96
2.29
929
Ryan Howard
20.6
2.49
0.19
2.30
930
Yogi Berra
40
3.98
1.51
2.47
931
Albert Pujols
58.9
6.9
4.10
2.80