It seems like it might be for the reasons I have have seen people give the last week or so: no post-season exposure, somewhat short career (he did not reach 10,000 PAs), lack of milestones like 3000 hits or 500 HRs and lack of MVP awards.

Last year and earlier this year I posted some regression generated equations that tried to explain the percentage of the Hall of Fame vote player got in their first year of eligibility (and also their highest percentage). The model I came up with was based on some trial and error. That seemed unavoidable, since it is hard to have priors on what exactly the voters are thinking. The model looked at all players that became eligible for the first time from 1980-2009.

The model uses the following data to explain vote percentage:

Reaching 10,000 PAs

500 HRs

3000 hits

500 SBs

Gold Gloves

All-Star games

World Series performance

MVP awards

Gold Gloves and All-Star games got capped at certain levels which were then squared. The idea was that those things have an exponential effect which tapers off. There were also interaction terms for World Series performance, Gold Gloves and All-Star games. The idea there was that getting lots of Gold Gloves and playing in lots of All-Star games has more than an additive effect (after I discuss what the model predicted for Santo, technical details like regression results and variable descriptons will be covered).

Santo's first year percentage was 3.9%. Normally, he would no longer be eligible in the writers' voting. But he and some other players were re-instated in 1985. He got 13.4%. The model predicted that he would get 17.65%. The standard error was .08. So even if we give him 8% more, that only jumps him up to 21.4%. Still a pretty low total for a first year (Billy Williams got 23.4% in his first year in 1982 and steadily increased until he got 85.7% in 1987).

Santo's highest percentage was 43%. The model predicted it would be 30%. So he actually did better than that. The standard error was .117. So he was predicted to be about 4 standard errors below what is needed for induction, 75%. And his actual highest percentage was still about 3 standard errors below 75%. Billy Williams highest predicted percentage was 29.6% while it was actually 85.7%. That differential of 56.1% is the highest positive differential. Why Williams is in and Santo isn't is an interesting question.

Here was the equation where the player's first year vote percentage was the dependent variable

PCT = -.010 + .00086(WSAS) + .048(GGAS) + .070(MVP) + .404(3000 HIT) + .280(500 HR) + .002(ASSQ10) - .00089(GGSQ7) + .071(500SB) - .006(WSIMPSQ50) + .100(10000PA)

The adjusted r-squared was .898 The standard error was .08.

Here was the equation where the player's highest vote percentage was the dependent variable

PCT = -.014 + .00037(WSAS/1000) + .025(GGAS/1000) + .067(MVP) + .257(3000 HIT) + .201(500 HR) + .0048(ASSQ10) - .0013(GGSQ7) + .071(500SB) - .00167(WSIMPSQ50/1000) + .137(10000PA)

The adjusted r-squared was .861 The standard error was .117.

MVP is number of MVP awards won, 3000H is a dummy variable (1 if a player reached it, 0 otherwise). The 500HR is also a dummy variable as it is for 500SB and 10000PA (if you made it to 10,000 career plate appearances, you get a 1, 0 otherwise). I used all the voting data from 1990-2009.

What is ASSQ10? It is the square of the number of All-star games played in squared. But AS games played is maxed out at 10. The assumption here is that being an all-star has a positive exponential effect but only up to a point where no more games helps (I have a graph below to help explain this). The GGSQ7 is the same thing for Gold Gloves.

WSIMPSQ50 involves World Series play. First, WSIMP is World Series PAs times OPS. The idea here that the more you play in the World Series the more votes you would get, but by multiplying it by OPS, it also includes how well you played (or just hit). This gets maxed out at 50 and is squared, for the same reason as all-star games (yes, Reggie Jackson is first here and way ahead of everyone else at 141, with Dave Justice and Lonnie Smith tied for 2nd at 101).

The last two variables are interaction variables. GGAS is the gold glove variable multiplied by the all-star variable and WSAS is the world series variable times the all-star game variable. It looks strange that the coefficient values on GGSQ7 and WSIMPSQ50 are negative. But you might notice that they are positive on the interactive variables. I think this is like when a regression uses both X and X-squared in a regression if the phenomena is non-linear (an inverted parabola, for example). The coefficient on X ends up being positive while the x-squared coefficient is negative. The reason I put in these interactive variables was to see if players who were strong in both got an extra boost, as if there was some synergy going on. It seems like they did get an extra boost.

Since the dependent variable can only go from 0 to 100, the coefficient would be very low. So I divided these three variables by 1000 (my stat package was showing coefficient values of .00000 before I did this).

Subscribe to:
Post Comments (Atom)

## 11 comments:

Very interesting, Cy. Great work.

One other thing that I think holds him back is that I've gotten the impression he wasn't well liked at the time he played and perhaps even less liked after he started campaigning to get into the Hall of Fame, though that campaigning was largely the fans themselves.

Thanks for the comments. So you think the fans hurt his cause? Did Billy Williams campaign? Did the fans campaign on his behalf? Why is he in and not Santo?

Cyril,

Stick to economics.

Hey Cy,

I've got a paper under review right now modeling the same sort of thing. Interestingly enough, at least based on what the BBWAA likes, Ron Santo had none of it. And I didn't actually bother to include playoff experience in the mix.

Without taking the era into account (which, of course, should be done), Santo is comparably offensively to Chili Davis (they're almost like twins if you ask me). By my model, his career gave him about an 11.3% chance of getting into the Hall, just ahead of Steve Garvey, Brett Butler, and Bernie Williams. In fact, according to the model, Rob Nen seems to have a career more in line with what the BBWAA likes to vote for.

Now, this says nothing about them being right, but rather tells us a bit how they think. As you say, the shorter career seems to have done him in.

Millsy

Thanks. Interesting that we both have similar conclusions. Good luck on your paper and in grad school.

Cy

I think the fans did hurt his cause, Cy. I recall reading on more than one occasion that the Veteran's Committee was irritated with the campaign. That doesn't say anything as to why he was not elected before the VC had a chance to do so. I think these numbers you post here pretty much explain that, but in recent years, yes, the fans campaign hurt his chances in my opinion.

As for why he wasn't elected initially, I think it's partly the numbers you've posted here and I do think there is reason to believe that silly heel tap frustrated the BBWAA.

All of that said, your work here further proves to me we need an improved system of electing hall of famers. I was never outspoken about Santo getting into the Hall of Fame because I really don't care. The HOF is important to the individual and while I would have liked to have seen Santo get in, there's a detailed record of everything he accomplished and all of the hardships he suffered the along the way. That will never be forgotten. I think a lot of fans think it will, but just as I look at stats pages of some of the greats from the 1910s or 1920s and read more about them, many fans will do the same with regards to Santo in the future. Those who want to know will and those who don't care won't. In other words, nothing will change except that he does in fact deserve to be in Cooperstown. Then again, there are others who deserve to be and are not.

dwill66, this is a thoughtful article and it's well written. I don't believe he is stating his opinion one way or the other, but rather showing why he was not in the HOF. This is exactly the kind of thing that keeps me and many others coming here.

I'm not sure if I'd call Santo's vote totals low. He exhausted his 15 years of eligibility on the BBWAA ballot and peaked at 43.2 percent of the vote his final year. I wrote a piece on Don Mattingly awhile back where I noted that, historically, the Veterans Committee has had a better than 50 percent induction rate on players who peak between even 20 and 30 percent of the vote. My guess is Santo gets in somewhere within the next 10-15 years.

I admittedly did not read your post before submitting my first comment. Having now read it, I would be interested to see what it would show for Gil Hodges.

Good work!

Graham

Thanks for dropping by. I guess I meant low percentages as in they were not enough to get him elected. On Hodges, I am not sure if he was part of my study, since he started getting votes before 1980. I wanted to have some cutoff. I will check when I get home. One problem I wanted to avoid was going back to far in time. As you do that whot the voters are changes and I think any results are less meanigful.

Cy

mb21

Thanks for the vote of confidence.

Cy

This is a couple months late and may have already been answered, but a good cutoff for looking back on Hall of Fame voting could be somewhere around 1967. That's about the time modern voting procedures like 15 years of eligibility were implemented.

Post a Comment