I did not check to see if this was mentioned anywhere, but the Rangers have a team isolated power (ISO) of .217 through 48 games (a .485 SLG and .268 AVG). The all-time single season record for a team is .204 by the 1997 Mariners. The Yankees have .205 so far this year. The Phillies have .199, equal to the NL record set by the 2000 Astros. The Rangers have 80 HRs in 48 games which is a pace to hit 270, more that the record of 264 by 1997 Mariners. The Rangers are on a pace to get 651 extrabase hits. The 2003 Red Sox hold the record of 649.
The Rangers have 80 HRs in 1,671 ABs for a HR% of 4.79%. That is higher than the 4.70% record of the 1997 Mariners.
The Rangers only have a team OBP of .327 (the AL avg is .338). The power is probably why they are above average in runs per game (5.27 vs. 4.88). Just imagine if Josh Hamilton were hitting.
Saturday, May 30, 2009
Wednesday, May 27, 2009
Some Baseball Pioneers Are "Heroes Of Capitalism"
There is a great blog called Heroes of Capitalism. What is the blog about? The authors say "What is a hero of capitalism? Someone who used private property* to produce wealth."
*Private property includes the tangible (like land) and intangible (like ideas).
Here are links to their posts on baseball people they have honored:
John A. "Bud" Hillerich
Bill Veeck
Branch Rickey
Bill Beane
Hal Richman (inventor of the table top baseball game Strat-o-matic, which simulates the performance of actual major league players)
Daniel Okrent, Robert Sklar, Steve Wulf and Glen Waggoner (creators of Rotisserie League baseball, the forerunner of fantasy baseball)
Okrent has also written baseball books, including Nine Innings: The Anatomy of a Baseball Game in which you can learn alot about baseball and its history through his analysis of a single game, bewteen the Brewers and Orioles on June 10, 1982.
*Private property includes the tangible (like land) and intangible (like ideas).
Here are links to their posts on baseball people they have honored:
John A. "Bud" Hillerich
Bill Veeck
Branch Rickey
Bill Beane
Hal Richman (inventor of the table top baseball game Strat-o-matic, which simulates the performance of actual major league players)
Daniel Okrent, Robert Sklar, Steve Wulf and Glen Waggoner (creators of Rotisserie League baseball, the forerunner of fantasy baseball)
Okrent has also written baseball books, including Nine Innings: The Anatomy of a Baseball Game in which you can learn alot about baseball and its history through his analysis of a single game, bewteen the Brewers and Orioles on June 10, 1982.
Monday, May 25, 2009
Why Isn't Steve Garvey In The Hall Of Fame?
It seems like he would have been elected based on the voters preferences in recent years (I have been analyzing voting patterns and what I write below will be based on that-scroll down to see these studies). But first I want briefly to discuss the sabermetric case for or against.
Garvey had 279 career "Win Shares" (WS), the Bill James stat which incorporates all phases of the game. That tied him for 222nd place all-time among all players and pitchers through 2001. Not bad, since about 200 guys are in the Hall. But this is marginal.
His career TPR or "total player rating," from Pete Palmer, editor of the Baseball Encyclopedia was actually -6.1. That means that if an average first baseman had played instead of Garvey, those teams would have won 6.1 more games during his career. Most of his seasons were negative and his best was only +1.2.
But the baseball writers who vote don't necessarily take sabermetric stats into account. The analysis I have posted recently used more conventional stats. In one model I used logit analysis to predict the probability of any player getting elected. That model had career AVG, seasons with 100+ RBIs, ALLSTAR games, career plate appearances (PAs), MVP awards, a variable for world series performance, being in the 3000 hit club and a positional adjustment for being a catcher. That model gave Garvey a 94.7% probability of being elected to the Hall of Fame. The model itself was 98.9% accurate
Another logit model (also 98.9% accurate) had the following variables:
Career HRs
2B
SS
3B
CF
C
WSIMP
ALLSTAR
MVPSH
500SB
Career NON-HRs
3000 HIT
The variables after Career HRs are positional adjustments. The WSIMP is for world series play. This model had Garvey's probability at 64.8%. Tony Perez has 52.6% and Jim Rice has 10.6% and both are in the Hall.
Another model simply predicted the % of votes received in the first year of eligibility. This model took into account MVP awards, a variable for world series performance, being in the 3000 hit club, ALLSTAR games, being in the 500 HR club, being in the 500 SB club, Gold Glove awards and career PAs. It predicted that Garvey would be named on 48.9% of the ballots in his first year while he actually got 41.6%. His predicted 48.9% is more than what was predicted for the following players who did eventually make it:
Ryne Sandberg -0.460
Kirby Puckett -0.418
Gary Carter-0.380
Carlton Fisk-0.375
Tony Perez-0.299
Jim Rice-0.219
And Garvey's actual first year % of 41.6 is higher than that of Rice (29.8%) and very close to Carter's 42.3%.
So from 3 different regressions, it looks like Garvey had the stats or qualifications to make it in, based on what the voters seem to like.
It is also very easy to find some impressive achievements that would go on Garvey's plaque, if he ever made it. They include:
-5 100 RBI seasons
-.294 career AVG
-batted over .300 7 times
-had 200 or more hits in a season 6 times
-1974 NL MVP
-batted .319 in 5 World Series
-batted .356 in 5 league championship series
-batted .393 in 10 all-star games
-won 4 Gold Glove awards
-2 time MVP of the all-star game
-2 time MVP of the league championship series
-finished in the top 5 in total bases 7 times
-set a NL record by playing 193 straight games without committing an error
-set a ML record with his .996 fielding percentage at first base.
-played in 1,207 consecutive games, an NL record and 4th longest overall
He also has 2.46 Career MVP Shares which is the 55th best total. An MVP share is what % of the total possible points a player got in the voting in a given year. A first place vote is 14 points, 2nd place 9, 3rd, 8points, etc. A guy might come in 5th but if he had 40 points out of a maximum of, say, 400, he gets a .10. Garvey's high rank here means the voters liked him when he played, alot more than they liked other players. He finished in the top 6 in MVP voting 5 times.
Baeball Reference lists the Hall of Fame Monitor for which they say:
"This is another Jamesian creation. It attempts to assess how likely (not how deserving) an active player is to make the Hall of Fame. It's rough scale is 100 means a good possibility and 130 is a virtual cinch. It isn't hard and fast, but it does a pretty good job. Here are the batting rules."
Garvey gets a 130 which is 104th best among position players. Now this is a very complicated point system with so many points for this or that. But this shows that Garvey fits the statistical profile of the kind of player the voters very much like to put in the Hall.
So why isn't he in? I found some theories.
The Baseball Page said, among other things, the following:
"In the 1980s it became clear that "Mr. Dodger" was far from wholesome. Several paternity suits and a tell all book from his ex-wife tarnished his image irreparably. Where he once was considered a candidate for state or even national office, Garvey became a leper, destined to host game shows and infomercials (really).
He had the reputation as a selfish, egotistical player. The media didn't like him as much as it seemed. His "Mr. Dodger" persona was created by Dodger PR and a few well-placed friends in the press. More than a few teammates quickly tired of Garvey's habit of staying in front of the camera or microphone.
In August of 1978, Garvey took offense to a comment made by teammate Don Sutton and the two men ended up wrestling their way across the visitors' clubhouse in Shea Stadium. The fight cemented a bitter feud between the two men and it damaged Garvey's reputation in the league.
He aged quickly. By the time he was 31-32, his skills were rapidly diminishing. He would have benefited from a day off here and there, but he didn't do it."
Chris Jaffe over at the Harball Times had an interesting article called Hitler. Stalin. Garvey. Here is an exerpt:
"There was always a sense he was a fake. With the Dodgers, he got in a big fistfight in the clubhouse with teammate Don Sutton. He had a nasty divorce in the early 1980s. When he started to get hit with paternity suits, though, his reputation was shattered.
In some ways, though, it's even deeper than that. Our society can forgive—or at least cease baiting—a hypocrite, provided he asks for some degree of atonement. Jim Bakker wrote his book, I Was Wrong, for instance.
Garvey hasn't done that."
Jeff Sackmann also has an interesting article called Steve Garvey Gets No Respect
Update Dec. 3, 2010: I did a follow up post in July, 2010. Click here to read it.
Garvey had 279 career "Win Shares" (WS), the Bill James stat which incorporates all phases of the game. That tied him for 222nd place all-time among all players and pitchers through 2001. Not bad, since about 200 guys are in the Hall. But this is marginal.
His career TPR or "total player rating," from Pete Palmer, editor of the Baseball Encyclopedia was actually -6.1. That means that if an average first baseman had played instead of Garvey, those teams would have won 6.1 more games during his career. Most of his seasons were negative and his best was only +1.2.
But the baseball writers who vote don't necessarily take sabermetric stats into account. The analysis I have posted recently used more conventional stats. In one model I used logit analysis to predict the probability of any player getting elected. That model had career AVG, seasons with 100+ RBIs, ALLSTAR games, career plate appearances (PAs), MVP awards, a variable for world series performance, being in the 3000 hit club and a positional adjustment for being a catcher. That model gave Garvey a 94.7% probability of being elected to the Hall of Fame. The model itself was 98.9% accurate
Another logit model (also 98.9% accurate) had the following variables:
Career HRs
2B
SS
3B
CF
C
WSIMP
ALLSTAR
MVPSH
500SB
Career NON-HRs
3000 HIT
The variables after Career HRs are positional adjustments. The WSIMP is for world series play. This model had Garvey's probability at 64.8%. Tony Perez has 52.6% and Jim Rice has 10.6% and both are in the Hall.
Another model simply predicted the % of votes received in the first year of eligibility. This model took into account MVP awards, a variable for world series performance, being in the 3000 hit club, ALLSTAR games, being in the 500 HR club, being in the 500 SB club, Gold Glove awards and career PAs. It predicted that Garvey would be named on 48.9% of the ballots in his first year while he actually got 41.6%. His predicted 48.9% is more than what was predicted for the following players who did eventually make it:
Ryne Sandberg -0.460
Kirby Puckett -0.418
Gary Carter-0.380
Carlton Fisk-0.375
Tony Perez-0.299
Jim Rice-0.219
And Garvey's actual first year % of 41.6 is higher than that of Rice (29.8%) and very close to Carter's 42.3%.
So from 3 different regressions, it looks like Garvey had the stats or qualifications to make it in, based on what the voters seem to like.
It is also very easy to find some impressive achievements that would go on Garvey's plaque, if he ever made it. They include:
-5 100 RBI seasons
-.294 career AVG
-batted over .300 7 times
-had 200 or more hits in a season 6 times
-1974 NL MVP
-batted .319 in 5 World Series
-batted .356 in 5 league championship series
-batted .393 in 10 all-star games
-won 4 Gold Glove awards
-2 time MVP of the all-star game
-2 time MVP of the league championship series
-finished in the top 5 in total bases 7 times
-set a NL record by playing 193 straight games without committing an error
-set a ML record with his .996 fielding percentage at first base.
-played in 1,207 consecutive games, an NL record and 4th longest overall
He also has 2.46 Career MVP Shares which is the 55th best total. An MVP share is what % of the total possible points a player got in the voting in a given year. A first place vote is 14 points, 2nd place 9, 3rd, 8points, etc. A guy might come in 5th but if he had 40 points out of a maximum of, say, 400, he gets a .10. Garvey's high rank here means the voters liked him when he played, alot more than they liked other players. He finished in the top 6 in MVP voting 5 times.
Baeball Reference lists the Hall of Fame Monitor for which they say:
"This is another Jamesian creation. It attempts to assess how likely (not how deserving) an active player is to make the Hall of Fame. It's rough scale is 100 means a good possibility and 130 is a virtual cinch. It isn't hard and fast, but it does a pretty good job. Here are the batting rules."
Garvey gets a 130 which is 104th best among position players. Now this is a very complicated point system with so many points for this or that. But this shows that Garvey fits the statistical profile of the kind of player the voters very much like to put in the Hall.
So why isn't he in? I found some theories.
The Baseball Page said, among other things, the following:
"In the 1980s it became clear that "Mr. Dodger" was far from wholesome. Several paternity suits and a tell all book from his ex-wife tarnished his image irreparably. Where he once was considered a candidate for state or even national office, Garvey became a leper, destined to host game shows and infomercials (really).
He had the reputation as a selfish, egotistical player. The media didn't like him as much as it seemed. His "Mr. Dodger" persona was created by Dodger PR and a few well-placed friends in the press. More than a few teammates quickly tired of Garvey's habit of staying in front of the camera or microphone.
In August of 1978, Garvey took offense to a comment made by teammate Don Sutton and the two men ended up wrestling their way across the visitors' clubhouse in Shea Stadium. The fight cemented a bitter feud between the two men and it damaged Garvey's reputation in the league.
He aged quickly. By the time he was 31-32, his skills were rapidly diminishing. He would have benefited from a day off here and there, but he didn't do it."
Chris Jaffe over at the Harball Times had an interesting article called Hitler. Stalin. Garvey. Here is an exerpt:
"There was always a sense he was a fake. With the Dodgers, he got in a big fistfight in the clubhouse with teammate Don Sutton. He had a nasty divorce in the early 1980s. When he started to get hit with paternity suits, though, his reputation was shattered.
In some ways, though, it's even deeper than that. Our society can forgive—or at least cease baiting—a hypocrite, provided he asks for some degree of atonement. Jim Bakker wrote his book, I Was Wrong, for instance.
Garvey hasn't done that."
Jeff Sackmann also has an interesting article called Steve Garvey Gets No Respect
Update Dec. 3, 2010: I did a follow up post in July, 2010. Click here to read it.
Friday, May 15, 2009
The Marginal Impact Statistics Have on Hall of Fame Voting
A couple of weeks ago I presented a binary (logit) model of Hall of Fame voting. I looked at all players whose first year of eligibility was from 1990-2009 (except for Pete Rose). The model's equation estimates the probability that a player would be elected to the Hall of Fame (though not necessarily in his first year of eligibility). Of course, there are coefficient estimates and what I do here is give the change in the probability of being elected due to a change in a hypothetical player's stats.
I made up a player who had a 0.290 career AVG, had 6 100 RBI seasons, won 1 MVP award, played in 8 all-star games and had 8,894 career plate appearances. This player was not a catcher and did not achieve 3,000 hits. He had a world series impact of 18 (that is, his world series PAs times his world series OPS). It is like he had a .750 OPS in 24 world series PAs.
The first line in the table below shows that his probability of being elected was .50 or 50%. The rest of the table shows what his probability would be if his stats changed. For instance, if his career average had been .300 instead of .290 (with nothing else changing), his probability (or PR) rises to .663. Another 10 point gain in average brings it to .795. But if he had only hit .280, the PR is just .337. Of course, other things might have changed if his averaged had changed. Maybe 100 RBI seasons or all-star games played would have been different. But I will have to assume that they did not. The different cases for AVG are in red. More discussion follows the table. You can see a larger version of the table if you click on it.
Adding 1 100 RBI seasons raises PR to .638. Taking 1 away lowers it to .362. So the 100 RBI seasons has a powerful impact, like AVG. Adding an MVP award raises PR to .59.But all-star games played in (AS) might have the strongest effect. Going from 8 to 9 all-star games raises the PR to .80. If all-star games falls to 7, PR is just .20.
Having 3000 hits, not surprisingly, raises the PR to 1.00 or 100% (actually its a little less but it is above 99.99%). If the player had been a catcher (see the 1 in the C column), it jumps to .973. The WS or world series impact is very slight. Adding or subtracting 500 career PAs matters alot. Adding 500 PAs bumps PR up to .657 and taking 500 away drops it to .343.
I also did the same test for an actual player, Jim Rice. His predicted PR was the closest of any of the players in the study to .50 at .595. The next table has the marginal changes like the one above. Maybe the biggest impact for Rice is all-star games. If he had just one less, his PR falls to .269. In general, his table looks similar to that of the hypothetical player.
I made up a player who had a 0.290 career AVG, had 6 100 RBI seasons, won 1 MVP award, played in 8 all-star games and had 8,894 career plate appearances. This player was not a catcher and did not achieve 3,000 hits. He had a world series impact of 18 (that is, his world series PAs times his world series OPS). It is like he had a .750 OPS in 24 world series PAs.
The first line in the table below shows that his probability of being elected was .50 or 50%. The rest of the table shows what his probability would be if his stats changed. For instance, if his career average had been .300 instead of .290 (with nothing else changing), his probability (or PR) rises to .663. Another 10 point gain in average brings it to .795. But if he had only hit .280, the PR is just .337. Of course, other things might have changed if his averaged had changed. Maybe 100 RBI seasons or all-star games played would have been different. But I will have to assume that they did not. The different cases for AVG are in red. More discussion follows the table. You can see a larger version of the table if you click on it.
Adding 1 100 RBI seasons raises PR to .638. Taking 1 away lowers it to .362. So the 100 RBI seasons has a powerful impact, like AVG. Adding an MVP award raises PR to .59.But all-star games played in (AS) might have the strongest effect. Going from 8 to 9 all-star games raises the PR to .80. If all-star games falls to 7, PR is just .20.
Having 3000 hits, not surprisingly, raises the PR to 1.00 or 100% (actually its a little less but it is above 99.99%). If the player had been a catcher (see the 1 in the C column), it jumps to .973. The WS or world series impact is very slight. Adding or subtracting 500 career PAs matters alot. Adding 500 PAs bumps PR up to .657 and taking 500 away drops it to .343.
I also did the same test for an actual player, Jim Rice. His predicted PR was the closest of any of the players in the study to .50 at .595. The next table has the marginal changes like the one above. Maybe the biggest impact for Rice is all-star games. If he had just one less, his PR falls to .269. In general, his table looks similar to that of the hypothetical player.
Friday, May 8, 2009
Does The "High" April Slugging Percentage Mean Anything?
(note: I will have another post on Hall of Fame voting next weekend)
It sure seemed like there was lots of slugging going on in April. The SLG for MLB was .420 for the month, much higher than the .402 and .401 for the two previous seasons. The table below shows the SLG for April and post-April for each of the last 10 years (5 years had some March games, so I included that data in April for those years).
The average April SLG over the last 10 years is just about .420. So it may not be that high, although in beats 5 of the last 7 seasons.
The graph below shows the relationship between April SLG (horizontal axis) and non-April SLG (vertical axis).
The average non-April SLG is about .426. I also ran a regression with non-April SLG as the dependent variable and April SLG as the independent variable. Here is the equation:
non-April SLG = 0.335*(April SLG) + 0.2856
The r-squared was .68 and the standard error was .0037, which seems pretty low. The equation predicts all non-April SLGs with +/- .006. So I guess we can expect SLG the rest of the year to be between .414 and .426 and it may be a good bet for it to be between .416 and .424. But so far in May it is only .410, so who knows. Maybe the weather matters and maybe this April was unusually warm.
It sure seemed like there was lots of slugging going on in April. The SLG for MLB was .420 for the month, much higher than the .402 and .401 for the two previous seasons. The table below shows the SLG for April and post-April for each of the last 10 years (5 years had some March games, so I included that data in April for those years).
The average April SLG over the last 10 years is just about .420. So it may not be that high, although in beats 5 of the last 7 seasons.
The graph below shows the relationship between April SLG (horizontal axis) and non-April SLG (vertical axis).
The average non-April SLG is about .426. I also ran a regression with non-April SLG as the dependent variable and April SLG as the independent variable. Here is the equation:
non-April SLG = 0.335*(April SLG) + 0.2856
The r-squared was .68 and the standard error was .0037, which seems pretty low. The equation predicts all non-April SLGs with +/- .006. So I guess we can expect SLG the rest of the year to be between .414 and .426 and it may be a good bet for it to be between .416 and .424. But so far in May it is only .410, so who knows. Maybe the weather matters and maybe this April was unusually warm.
Sunday, May 3, 2009
Predicting Who Makes The Hall Of Fame Using A Logit Model
The last two weeks I presented regression results on first year Hall of Fame vote percentage. I used a linear regression. This week I use a logit model where 1 means the player has made it in and 0 means not. The probability that a player was voted in is
(1) P = 1/(1 + exp(-Z))
where exp is approximately 2.78 or Euhler's number. The -Z is the following equation times -1 for each player. This is the estimatated regression equation:
(2) -46.1955 + 67.81*CAVG + .5658*100RBI* + 1.386*ALLSTAR + .0013*PA + .3645*MVP + .0001*WSIMP + 14.93*3000HIT + 3.586*C
CAVG is a player's career batting average, 100RBI is the number of seasons with 100+ RBIs, ALLSTAR is number of all-star games played in, PA is career plate appearances, MVP is number of MVP awards won, WSIMP is world series PAs times world series OPS (this is world series impact, a combination of quantity of quality), 3000HIT is a dummy variable (1 or 0) for reaching that milestone and C is the same if the player was a catcher. So all the data gets plugged in for each player and "Z" is calculated. Then the negative of that is plugged in to equation (1) to get the probability of each player being elected into the Hall.
The statistical results can be viewed at logit results. The data for each player and their calculated probability can be seen at logit probabilities.
The results page shows something called a "classification table." It says that if a probability is .5 or greater, the player will be in the Hall. My data includes all players whose first year of eligibility was 1990 or later (except for Pete Rose). The classification table says that all 20 players who actually have made it in should be (all 20 have a probability of .5 or higher). Of the other 161 players it predicts that 159 would not make it. The two who are predicted to make it are Steve Garvey and Andre Dawson. Garvey's 15 years of eligibility to be voted in by the writers has ended. But Dawson could still make it. Garvey had a probability of 95.7% according to the model. Dawson had 60.8%. The model is 98.9% correct.
I tried lots of variables and this model works the best in terms of getting all the actual Hall of Famers right and the overall correct%. Also, some models might have done a bit better but a variable would be negative (like career HRs) that should not be. Some models had many more variables. My Acastat programs sometimes could not complete a regression no matter how long I waited. That might have been if I had more than 12 variables. So another model might be a bit better. I just don't know.
Now it is hard to see what exactly the impact of each variable is since this is a non-linear model. So I will just mention a few players and how their probability (P)would change if their data changed.
Yount-If he does not have 3000 hits, his P falls to about 1% from 99%! Molitor would fall from 99% to 45%.
Ozzie Smith-If he falls from 14 all-star games to 10, his P falls from 99% to 36%
Fisk-If he is not a catcher, his P falls from 96% to 46%. Gary Carter falls from 65% to 36%.
Carew-If he does not have 3000 hits, his P falls only about 1% from 100% to 99%! (Boggs, Gwynn and Brett have the same thing)
Eddie Murray-If he does not have 3000 hits, his P falls to about 83% from 99%.
Puckett-If his CAVG falls from .318 to .301 his P falls from 75% to 49% (a NO). If you don't change his CAVG but take his All-Star games from 10 to 9, his P falls to 43%. Incidentally, he finished his career with 280 Win Shares and if he could have played 5 more years he could have easily gotten to 360 Win Shares, what Bill James says is almost a lock for the Hall (but Tim Raines has 392 and might not make it-if he went from 7 to 10 all-star games his P would be about 75%).
Tony Perez-If you take his All-Star games from 7 to 6, his P falls to 29% from 62%.
McGwire-Maybe his not getting in is why 500 HRs was not working very well in the model. The only others were Murray, Jackson and Schmidt. But if he had 9200 PAs, his P value jumps to .5. Of course, he has the steroid scandal. It would be nice to quantify scandals, but that may be impossible and I have already take Rose out of the model. Back to McGwire, if he had 11 all-star games instead of 9, his P would go up to 69%.
Ryne Sandberg-Take away his MVP award and his P falls from 63% to 54% (do that and take away 1 all-star game, he falls to 23%). For some other guys, it makes almost no difference. Take all 3 of Schmidt's awards away and he still has a 99% P. Take away 1 of Morgan's awards and he falls from 66% to 58%. Take away the other and he falls to 48%.
Al Oliver would go from a P of 10% to over 50% if he had 6 100 RBI seasons instead of 2. Same for Ted Simmons if he jumped from 3 to 7 100 RBI seasons. Same for Harold Baines.
So all-star games and 3000 hits matter alot (and 100 RBI seasons, too). If Dave Parker had 8 all-star game instead of 6, he goes from a P of 8% to over 60%. Harold Baines would have a P of 99% if he had 3000 hits. The 3 strike shortened seasons of 1981, 1994 and 1995 might have cost him 3000 hits. Will Clark and Keith Hernandez would have P's of 99% if they made 3000 hits.
I can email my spread sheet to anyone if you want to play around with these kinds of possibilities.
(1) P = 1/(1 + exp(-Z))
where exp is approximately 2.78 or Euhler's number. The -Z is the following equation times -1 for each player. This is the estimatated regression equation:
(2) -46.1955 + 67.81*CAVG + .5658*100RBI* + 1.386*ALLSTAR + .0013*PA + .3645*MVP + .0001*WSIMP + 14.93*3000HIT + 3.586*C
CAVG is a player's career batting average, 100RBI is the number of seasons with 100+ RBIs, ALLSTAR is number of all-star games played in, PA is career plate appearances, MVP is number of MVP awards won, WSIMP is world series PAs times world series OPS (this is world series impact, a combination of quantity of quality), 3000HIT is a dummy variable (1 or 0) for reaching that milestone and C is the same if the player was a catcher. So all the data gets plugged in for each player and "Z" is calculated. Then the negative of that is plugged in to equation (1) to get the probability of each player being elected into the Hall.
The statistical results can be viewed at logit results. The data for each player and their calculated probability can be seen at logit probabilities.
The results page shows something called a "classification table." It says that if a probability is .5 or greater, the player will be in the Hall. My data includes all players whose first year of eligibility was 1990 or later (except for Pete Rose). The classification table says that all 20 players who actually have made it in should be (all 20 have a probability of .5 or higher). Of the other 161 players it predicts that 159 would not make it. The two who are predicted to make it are Steve Garvey and Andre Dawson. Garvey's 15 years of eligibility to be voted in by the writers has ended. But Dawson could still make it. Garvey had a probability of 95.7% according to the model. Dawson had 60.8%. The model is 98.9% correct.
I tried lots of variables and this model works the best in terms of getting all the actual Hall of Famers right and the overall correct%. Also, some models might have done a bit better but a variable would be negative (like career HRs) that should not be. Some models had many more variables. My Acastat programs sometimes could not complete a regression no matter how long I waited. That might have been if I had more than 12 variables. So another model might be a bit better. I just don't know.
Now it is hard to see what exactly the impact of each variable is since this is a non-linear model. So I will just mention a few players and how their probability (P)would change if their data changed.
Yount-If he does not have 3000 hits, his P falls to about 1% from 99%! Molitor would fall from 99% to 45%.
Ozzie Smith-If he falls from 14 all-star games to 10, his P falls from 99% to 36%
Fisk-If he is not a catcher, his P falls from 96% to 46%. Gary Carter falls from 65% to 36%.
Carew-If he does not have 3000 hits, his P falls only about 1% from 100% to 99%! (Boggs, Gwynn and Brett have the same thing)
Eddie Murray-If he does not have 3000 hits, his P falls to about 83% from 99%.
Puckett-If his CAVG falls from .318 to .301 his P falls from 75% to 49% (a NO). If you don't change his CAVG but take his All-Star games from 10 to 9, his P falls to 43%. Incidentally, he finished his career with 280 Win Shares and if he could have played 5 more years he could have easily gotten to 360 Win Shares, what Bill James says is almost a lock for the Hall (but Tim Raines has 392 and might not make it-if he went from 7 to 10 all-star games his P would be about 75%).
Tony Perez-If you take his All-Star games from 7 to 6, his P falls to 29% from 62%.
McGwire-Maybe his not getting in is why 500 HRs was not working very well in the model. The only others were Murray, Jackson and Schmidt. But if he had 9200 PAs, his P value jumps to .5. Of course, he has the steroid scandal. It would be nice to quantify scandals, but that may be impossible and I have already take Rose out of the model. Back to McGwire, if he had 11 all-star games instead of 9, his P would go up to 69%.
Ryne Sandberg-Take away his MVP award and his P falls from 63% to 54% (do that and take away 1 all-star game, he falls to 23%). For some other guys, it makes almost no difference. Take all 3 of Schmidt's awards away and he still has a 99% P. Take away 1 of Morgan's awards and he falls from 66% to 58%. Take away the other and he falls to 48%.
Al Oliver would go from a P of 10% to over 50% if he had 6 100 RBI seasons instead of 2. Same for Ted Simmons if he jumped from 3 to 7 100 RBI seasons. Same for Harold Baines.
So all-star games and 3000 hits matter alot (and 100 RBI seasons, too). If Dave Parker had 8 all-star game instead of 6, he goes from a P of 8% to over 60%. Harold Baines would have a P of 99% if he had 3000 hits. The 3 strike shortened seasons of 1981, 1994 and 1995 might have cost him 3000 hits. Will Clark and Keith Hernandez would have P's of 99% if they made 3000 hits.
I can email my spread sheet to anyone if you want to play around with these kinds of possibilities.