## Wednesday, January 15, 2014

### Using A Player's WAR To Predict First Year Hall Of Fame Vote Percentage (and possibly estimate "underratedness")

I took all the Hall of Fame votes from 1966-2014 from Baseball Reference. On those pages, BR shows the vote% each player got but also their career WAR (it also shows the combined WAR of their seven best seasons as well as the Jay Jaffe stats "Jaws" which combines career and 7 best).

Five players were tossed out of the analysis: Barry Bonds, Rafael Palmeiro, Mark McGwire, Sammy Sosa and Pete Rose. The voters have severely penalized the first four for possible PED use, not because they underrated them. Something similar with Rose. There was a cloud of scandal over him when he first came up because of betting on baseball.

One thing I wanted to do was find a trend line for the vote. I could not find one that made sense using career WAR or Jaws. Any trend line had too many ups and downs. Vote% should not go down as WAR goes up. But once you look at the trend line I used for WAR7, you will see how non-linear the data is.

So when I had Excel put in trend lines, the only one that made reasonable sense was a sixth degree polynomial with WAR7 as the independent variable and vote% as the dependent variable. It does have some ups and downs where I really don't want them, but they are not too severe.

So I hope you can see that trying to fit a trend line to the data has problems. This seems like the best I could do.

Using the regression equation, I then calculated each player's predict 1st year vote% (the equation you seen in the graph probably does not show enough decimal places for the coefficient values-x in the graph is WAR7). Then that was subtracted from their actual 1st year vote% and a difference was found. I then ranked them all from the biggest negative difference to the biggest positive difference.

The player with the biggest negative differential, whom we might say was the most underrated, was Ron Santo. He got only 3.9% of the vote in his 1st year but if he was right on the trend line, it would have been 75.4%.

The most overrated player was Lou Brock. He got 79.7% of the vote while the model predicts he would get 6.7%. It helps to reach a milestone like 3000 hits, retire as the all-time SB leader and perform very well in three 7-game world series. Click here to see my research that supports this. As for Santo, click here to see my post that explains he got about the vote% we would expect, given the general preferences of the voters.

Click here to see the complete rankings

## Sunday, January 12, 2014

### Eddie Robinson's Great Homerun-To-Strikeout Ratio

Update Jan. 13: He is actually not in the top 25, but he is pretty close. One thing I forgot to take into account when using the Lee Sinins Complete Baseball Encyclopedia is what the league average is based on. I just looked at his page and what he and the league average had. But it needed to be consistent with my earlier research. Click here to see the new, complete list.

I used all guys from 1920-2012 who had 4500+ PAs (the first study excluded Robinson because he had 4891). I set the Sinins database to compare players to non-pitchers and used each guy's rate of HRs & SOs on a per plate appearance basis (you could choose to go with all players including pitchers and use ABs instead of PAs, for example). Robinson is 40th out of 884 players, so that puts him in the top 5%. When you look the list, you will see him ahead of some great hitters.

**********************************************************

Eddie Robinson played for several teams in the 1940s and 50s, including the Indians, White Sox and Yankees. He was the regular first baseman on the 1948 world champion Indians. Click here to go to his Baseball Reference page. He batted .348 in 23 World Series at-bats.

Here is a link to his SABR biography written by C. Paul Rogers III. One thing it says is

"He was the seventh player and first White Sox to hit a ball over the roof at old Comiskey Park (in 1951); the first six were Hall of Famers Babe Ruth, Lou Gehrig, Jimmy Foxx, Hank Greenberg, Ted Williams and Mickey Mantle."

He is also author (along with C. Paul Rogers III) of the 2011 book titled Lucky Me: My Sixty-five Years in Baseball.

Robinson struck out less than the league average (356 vs. 489). But his HR rate relative to his strikeout rate was outstanding. He it 172 HRs while the league average player would 85. So his relative HR rate was 202.35. His relative strikeout rate was 73.868 (356/489 times 100). Then 202.35/73.868 = 2.739. That would put him in the top 25 all-time. But when I did this analysis I only included guys with 5000+ PAs. He just missed with 4891. See my post Which Players Had The Best HR-To-Strikeout Ratios?

## Wednesday, January 8, 2014

It was a highly unusual combination of being able to prevent HRs without walking many batters and without striking many out. I created an index to measure this and he is far ahead of anyone else. See Who Was More "Magical" Than Greg Maddux? (Or Pitcher's HR/BB/SO Rating) from December 2, 2009.

## Tuesday, January 7, 2014

### Has Albert Pujols Been Getting More MVP Votes Than Expected Based On WAR?

This is based on a couple posts you can probably see below. So read them for explanations and technical details. Those posts have been discussed over at Baseball Think Factory.

I did the analysis for each year of Albert Pujols' career. In each year I tried to find a polynomial trend line that best fit the voting that year. Who was included in the analysis each year? Anyone who had at least as many ABs as the lowest AB total for anyone who got votes. Sometimes players don't get any votes but have a pretty good WAR and I don't think they should be left out of the analysis. So there had to be some way to decide who got included. So it is a different number of players each year.

I usually went with the highest r-squared among 2nd, 3rd degree, etc. polynomials. But they had to make sense. Sometimes the line goes up and down alot and I preferred lines like the ones in the graphs I used already. Logs and exponential functions would not work be cause of zero values for WAR and MVP shares. Sometimes even negative WAR values came up.

So I got a predicted value for each year of his career (including 2013 when he got no votes and had only 391 ABs, so I included everyone in the AL who had 391+ ABs).

In 11 of his 13 seasons his share was higher than predicted. Adding up all of the differences between his predicted share and actual share I got 1.73. So, although his rank in the MVP vote is about right based on his rank in WAR, his vote total is still higher than expected based on the overall pattern of the vote by the writers.

Now this 1.73 is lower than the 3.78 I currently have for him. But to know where that ranks I will have to go through every season since 1931 for each league one by one and get a total for all players. That will take some time.

## Monday, January 6, 2014

### Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated? (Revised)

Click here to see the original post from a few days ago. The idea was to see the relationship between MVP shares and WAR.

If you read that, you will see that the regression line estimate was a 2nd degree polynomial. But to calculate the predicted MVP shares, I used the equation that appears when I ask Excel to graph it. That equation only goes to a few decimal places. In this case, it matters because the values of the WAR variable can be very large.

So I had Excel do the line estimate and I had the coefficient values go out several more decimal places. The new equation is

MVPShares = 0.000256845*WARSquared + 0.010979681*WAR - 0.11979

Squaring Willie Mays' WAR gives us about 24,000. Now .003*24,304.81 = 7.291. But if I have

0.000256845*24,304.81 = 6.24

That alone lowers May's predicted MVP shares about 1 (again, using the regression estimate). Mays actually slips from the most underrated player to the second most underrated player. Lou Whitaker, who did very poorly in the Hall of Fame vote (unjustly so), is now number 1.

Click here to see the revised list. It does not look like players moved very much. Pujols was still the most overrated by this measure.

The new equation for the case where I used only each player's seven best seasons of WAR is

MVPShares = 0.001807713*WAR7Squared - 0.041037828*WAR7 + 0.268264829

The original post had a + in front of the 2nd coefficient. It should have been minus and has been corrected. Wade Boggs was still the most underrated here and Pujols was still the most overrated.

Click here to see the revised results.

## Friday, January 3, 2014

### Was Willie Mays The Most Underrated Player In History? Or Was It Wade Boggs? Is Albert Pujols The Most Overrated?

Click here to see the revised version of this. Mays slips from #1 to #2. Lou Whitaker is now number #1.

To get a handle on these questions, I first compared career WAR and career MVP shares for a large group of players. In the first case I used WAR from Baseball Reference. In the second case I used WAR from Fangraphs (and only each player's seven best seasons).

They were everyone who had 5000+ PAs since 1931 (I excluded anyone who played more than half a season before 1931 because that was the first year of the baseball writers MVP award).

Then I included everyone who was in the top 200 in MVP shares (only position players). The lowest career WAR of anyone in that group was 17.3, belonging to Cecil Fielder. So I then also added in everyone who had 17.3+ WAR since 1931. Total players was 810.

Then I regressed MVP shares on WAR. A second order polynomial was a better fit than a straight line regression. Click here to see the scatter plot with trend line. Here is the regression equation

MVPShares = 0.0003*WARSquared + 0.011*WAR - 0.1198

Then I estimated each player's predicted MVP shares and found the difference. Click here to see the entire results. The 20 players with the most negative differences are listed below. Willie Mays had a career WAR of 155.9. So he was predicted by the equation to have 8.89 MVP shares but he only had 5.94. So his differential is a -2.95.

 Rank Player Career WAR Award Shares Pred Diff 1 Willie Mays 155.9 5.94 8.89 -2.95 2 Rickey Henderson 110.6 2.46 4.77 -2.31 3 Lou Whitaker 74.8 0.21 2.38 -2.17 4 Wade Boggs 91 1.2 3.36 -2.16 5 Eddie Mathews 96.1 1.61 3.71 -2.10 6 Hank Aaron 142.3 5.45 7.52 -2.07 7 Willie Randolph 65.6 0.04 1.89 -1.85 8 Ozzie Smith 76.5 0.65 2.48 -1.83 9 Bobby Grich 71 0.43 2.17 -1.74 10 Buddy Bell 66 0.18 1.91 -1.73 11 Willie Davis 60.8 0.1 1.66 -1.56 12 Scott Rolen 69.9 0.57 2.11 -1.54 13 Bobby Abreu 60.5 0.17 1.64 -1.47 14 Carl Yastrzemski 96 2.23 3.70 -1.47 15 Graig Nettles 67.9 0.56 2.01 -1.45 16 Kenny Lofton 67.9 0.58 2.01 -1.43 17 Chet Lemon 55.2 0 1.40 -1.40 18 Johnny Damon 56.4 0.07 1.45 -1.38 19 Darrell Evans 58.5 0.17 1.55 -1.38 20 Cal Ripken 95.5 2.31 3.67 -1.36

This approach is not perfect. Some players might have long careers and so they compile a high career WAR. But if they never have any great seasons, they might not get many MVP votes. Plus, it helps to play on contenders. But Mays had plenty of great seasons and played on many contenders. There is also the possibility that if there are other great players around compiling high WAR seasons, you won't do as well in the voting.

Now here are the players who got more MVP Shares than predicted.

 Rank Player Career WAR Award Shares Pred Diff 791 Cecil Fielder 17.3 1.67 0.16 1.51 792 Mike Piazza 59.2 3.16 1.58 1.58 793 Albert Belle 39.8 2.38 0.79 1.59 794 Harmon Killebrew 60.4 3.23 1.64 1.59 795 David Ortiz 44 2.6 0.94 1.66 796 Pedro Guerrero 34.4 2.3 0.61 1.69 797 George Bell 20.2 1.92 0.22 1.70 798 Steve Garvey 37.5 2.46 0.71 1.75 799 Willie Stargell 57.3 3.3 1.49 1.81 800 Roy Campanella 34.2 2.52 0.61 1.91 801 Juan Gonzalez 38.5 2.76 0.75 2.01 802 Jim Rice 47.3 3.15 1.07 2.08 803 Hank Greenberg 57.6 3.69 1.51 2.18 804 Ryan Howard 18.9 2.49 0.19 2.30 805 Yogi Berra 59.3 3.98 1.59 2.39 806 Dave Parker 39.9 3.19 0.80 2.39 807 Frank Thomas 73.6 4.79 2.31 2.48 808 Joe DiMaggio 78.3 5.45 2.58 2.87 809 Miguel Cabrera 54.7 4.25 1.38 2.87 810 Albert Pujols 92.9 6.9 3.49 3.41

Pujols got 3.49 more shares than predicted, making him the most overrated player by this measure.

Then, using data from Fangraphs, I found all the players who had 4000+ PAs since 1931 and found their WAR from their seven best seasons combined (each player's WAR in 1981 was increased by 50% while it was 40% for 1994-this is due to player strikes). Total players, 931. Click here to see the scatter plot and trend line. Again, a second degree polynomial was better than a straight line regression (if you look closely, the line slopes downward for very low WAR players, which should not make sense-but this is avoided with a sixth degree polynomial whose results are essentially the same, so I used the simpler one here). Here is the equation

MVPShares = 0.0018*WAR7Squared -  0.041*WAR7 + 0.2683

Here are the most underrated players. Boggs was actually number 1 in WAR 3 straight years while his team came if first twice. He was second in WAR 3 times. He reached the post season a total of six times. But the best he ever finished in the MVP voting was fourth. Mays was 117th here. Click here to see the entire results.

 Rank Name WAR7 Award Shares Pred Diff 1 Wade Boggs 56.1 1.2 3.63 -2.43 2 Ron Santo 52.9 1.23 3.14 -1.91 3 Eddie Mathews 55.3 1.61 3.51 -1.90 4 Bobby Grich 46.15 0.43 2.21 -1.78 5 Rickey Henderson 59.55 2.46 4.21 -1.75 6 Chase Utley 46.7 0.73 2.28 -1.55 7 Arky Vaughan 50.2 1.23 2.75 -1.52 8 Scott Rolen 44.7 0.57 2.03 -1.46 9 Bobby Abreu 40.9 0.17 1.60 -1.43 10 Buddy Bell 40.65 0.18 1.58 -1.40 11 Brian Giles 40.4 0.2 1.55 -1.35 12 Andruw Jones 47.8 1.1 2.42 -1.32 13 Robin Ventura 39.9 0.26 1.50 -1.24 14 Chet Lemon 36.95 0 1.21 -1.21 15 Ron Cey 39.35 0.25 1.44 -1.19 16 Darrell Evans 38.3 0.17 1.34 -1.17 17 Jim Edmonds 45 0.9 2.07 -1.17 18 Kenny Lofton 42.24 0.58 1.75 -1.17 19 Carlos Beltran 43.8 0.76 1.93 -1.17 20 Graig Nettles 41.6 0.56 1.68 -1.12

Now the players who got more MVP shares than predicted

 Rank Name WAR7 Award Shares Pred Diff 912 Barry Bonds 76.7 9.3 7.71 1.59 913 Brooks Robinson 45.1 3.69 2.08 1.61 914 Eddie Murray 41.1 3.33 1.62 1.71 915 Willie Stargell 40.7 3.3 1.58 1.72 916 George Bell 20.8 1.92 0.19 1.73 917 Hank Aaron 56.4 5.45 3.68 1.77 918 Pete Rose 43.4 3.68 1.88 1.80 919 Stan Musial 64.7 6.96 5.15 1.81 920 Jim Rice 38.2 3.15 1.33 1.82 921 David Ortiz 31.7 2.6 0.78 1.82 922 Steve Garvey 28.3 2.46 0.55 1.91 923 Dave Parker 36.9 3.19 1.21 1.98 924 Frank Robinson 50.8 4.84 2.83 2.01 925 Joe DiMaggio 54.6 5.45 3.40 2.05 926 Frank Thomas 49.7 4.79 2.68 2.11 927 Juan Gonzalez 27.2 2.76 0.48 2.28 928 Miguel Cabrera 44.1 4.25 1.96 2.29 929 Ryan Howard 20.6 2.49 0.19 2.30 930 Yogi Berra 40 3.98 1.51 2.47 931 Albert Pujols 58.9 6.9 4.10 2.80