Wednesday, January 15, 2014

Using A Player's WAR To Predict First Year Hall Of Fame Vote Percentage (and possibly estimate "underratedness")

I took all the Hall of Fame votes from 1966-2014 from Baseball Reference. On those pages, BR shows the vote% each player got but also their career WAR (it also shows the combined WAR of their seven best seasons as well as the Jay Jaffe stats "Jaws" which combines career and 7 best). 

Five players were tossed out of the analysis: Barry Bonds, Rafael Palmeiro, Mark McGwire, Sammy Sosa and Pete Rose. The voters have severely penalized the first four for possible PED use, not because they underrated them. Something similar with Rose. There was a cloud of scandal over him when he first came up because of betting on baseball. 

One thing I wanted to do was find a trend line for the vote. I could not find one that made sense using career WAR or Jaws. Any trend line had too many ups and downs. Vote% should not go down as WAR goes up. But once you look at the trend line I used for WAR7, you will see how non-linear the data is.

So when I had Excel put in trend lines, the only one that made reasonable sense was a sixth degree polynomial with WAR7 as the independent variable and vote% as the dependent variable. It does have some ups and downs where I really don't want them, but they are not too severe.

Click here to see the graph.

So I hope you can see that trying to fit a trend line to the data has problems. This seems like the best I could do.

Using the regression equation, I then calculated each player's predict 1st year vote% (the equation you seen in the graph probably does not show enough decimal places for the coefficient values-x in the graph is WAR7). Then that was subtracted from their actual 1st year vote% and a difference was found. I then ranked them all from the biggest negative difference to the biggest positive difference.

The player with the biggest negative differential, whom we might say was the most underrated, was Ron Santo. He got only 3.9% of the vote in his 1st year but if he was right on the trend line, it would have been 75.4%.

The most overrated player was Lou Brock. He got 79.7% of the vote while the model predicts he would get 6.7%. It helps to reach a milestone like 3000 hits, retire as the all-time SB leader and perform very well in three 7-game world series. Click here to see my research that supports this. As for Santo, click here to see my post that explains he got about the vote% we would expect, given the general preferences of the voters.

Click here to see the complete rankings

No comments: