Thursday, December 31, 2009

Was Ron Santo The Best Player In the National League From 1964-68?

The table below shows how many BFW (batting wins plus fielding wins from Pete Palmer, using the linear weights method) and how many Win Shares (WS), the Bill James stat, various players had in the NL during this period. The two lists are not necessarily the top 25 in either stat. I simply found all the players who had at least 1 season in the top 5 in BFW in these years and at least 1 season in the top 12 in WS, found their total for the 5 years and then ranked them. The BFW totals come from Retrosheet and the WS totals come from the electronic version of the book.



Santo completely dominates in BFW. The age number is given for players who are in the Hall of Fame and it is their age in 1964, with a June 30 cutoff. It may be cherry picking to make these comparisons since I am taking Santo's best 5 year period. But that is why I put the age of the other Hall of Famers in. Many of them were at an age where they could still be performing well. Also, below, I show the best 5 year periods for these other Hall of Famers (except Aaron and Mays) and Santo still does extremely well. I did not include Frank Robinson because he was traded to the AL prior to the 1966 season.

Santo is only 3rd in WS. But he only trails the leader, Mays, by 6. Santo had the following ranks in BFW in these years: 1, 2, 1, 1, 4. For WS, they were 3, 4, 4, 1, 10. Those ranks for both stats are only among position players. I guess not too many people noticed how well he did since his ranks in the MVP voting, among position players, were 8, 13, 9, 4, 17. Maybe he was hurt by the Cubs not winning any pennants.

I also tried to find the leaders in Wins Above Replacement (WAR) during those years from Sean Smith's site. Below are what are probably the top 5. I checked a few other players, but none of them came close to these five.

Mays 40.8
Santo 39.6
Clemente 35.5
Aaron 35.1
Allen 33.1

So only Willie Mays was better. Now the next table shows the best 5 year stretches for what I think are all of the Hall of Famers who played in the NL during these years, not counting Aaron and Mays. This is not limited to the 1964-68 years. It is the highest 5 consecutive years for each one of these players.



Santo is only topped in BFW by Joe Morgan. He is only tied for 3rd in WS. But that still means that he beats 6 Hall of Famers he played with and/or against. Three of them, Willie Stargell, Willie McCovey, and Lou Brock were elected in their first year of eligibility. Clemente would have had he not died and been elected right away under special circumstances.

Sunday, December 27, 2009

Ron Santo vs. Brooks Robinson And Hall Of Fame Voting

How these two players compare in various stats is summarized in the table below. Discussion follows the table.



Robinson was an overwhelming choice to make the Hall in his first year of eligibility while Santo got very little support (normally a player needs 5% to remain on the ballot but somehow Santo returned to the ballot in 1985 after the low vote in 1980, as shown on his Baseball Reference page). And then Santo never got higher than 43.1%.

The predicted 1st year % comes from my model of voting Estimating Hall Of Fame Vote Percentages For The 1980s. Santo was 16.4% below the prediction (20.3 - 3.9 = 16.4). Robinson was 17.4% above the prediction. As much as both of those predictions are off, they both still pretty much predict that Robinson would make it and Santo wouldn't. Since a player needs 75%to get in, Robinson is just about there (according to the model) and very few guys get to the 60% level without eventually making it. For Santo, very few players start out very low and eventually make it. He also got just 13.4% in 1985.

My model is based mainly on how many all-star games and gold gloves a player has gotten, plus MVP awards and milestones like 3000 hits. World Series performance matters, too. Robinson beat Santo in all-star games 18-8 and 15-5 in Gold Gloves. Santo never played in a World Series, while Robinson played in 4. So it is not a surprise that Robinson did so much better in the voting.

The stats WS, BFW and WAR are all composite stats that attempt to value players using all phases of the game. WS is Bill James stat. Robinson beats Santo by 32 here, but that is not alot, actually. James says that a season with 15 WS is an average season. So all that separates the two is a coupld of extra average seasons by Robinson. James says that 20 WS makes an all-star season while 30 is an MVP type season.

Robinson does beat Santo 10-8 in all-star seasons, but it is very close. Then Santo beats him very easily in MVP type seasons and 3 best straight seasons. So he definitely had a higher peak value than Robinson while coming very close to him in career value.

BFW is batting plus fielding wins from Pete Palmer and the data came from Retrosheet. Santo totally outclasses Robinson in both career value and peak value here.

WAR or Wins Above Replacement is from Sean Smith's site and the numbers in parantheses are their respective ranks among position players in career value. Robinson just barely wins the career fight but Santo is way ahead in peak value.

RCAA or runs created above average, which is park adjusted since it is from the Lee Sinins Complete Baseball Encyclopedia. Santo beats Robinson by a wide margin, but Santo's career ended at 34. Robinson played until he was 40. RCAA can be negative so if a player is still active at older ages his career RCAA can go down. But through age 34, to equalize things, Santo is still ahead 253-85. For Robinson to be better, he would have to be ahead in fielding runs by 168. Maybe he was, but that is alot. Then Santo also has a big edge in peak value.

Looking at PAs, we can see that Santo was able to come close to Robinson's career value while having about 2,400 fewer career PAs. It think the sabermetric evidence is in Santo's favor, yet he has gotten much less support. Perhaps my voting model explains the writers's preferences but Santo can still get in by the Veteran's Committee. I hope they look at him again.

Saturday, December 26, 2009

Mark Buehrle Earns Rare Honor For Baseball Player And Becomes A Crossover Star By Appearing On The Cover Of...

It wasn't AROD. Or Albert Pujols. Or Clemens, or Bonds, or Maddux or Pedro Martinez. For the first time in at least 20 years (and maybe the first time ever), a baseball player's picture appears on on the cover of the The World Almanac and Book of Facts. Buehrle not only pitched a perfect game, but he retired 45 straight batters over the course of a 3 games to set a new record.

This will no doubt lead to a huge jolt in his popularity and make him a mega-star. Endorsements and guest appearances will probably come pouring in. People magazine might be next. I checked the covers of past almanacs at amazon.com and google images and could not find one with a picture of a baseball player on it. It looks like in recent years both Tiger Woods and Michael Phelps have made the cover. I hope scandal is not about to hit Mr. Buehrle anytime soon.

And Buehrle was not the only White Sox left-hander to appear on the cover. Click here to see the other guy. He is even more famous.

Monday, December 21, 2009

Estimating Hall Of Fame Vote Percentages For The 1980s

This is a follow up from on my last post and it is from a suggestion at Baseball Think Factory. You might have to read the previous post to understand this one.

I plugged the values from 1980-89 into the model. There were 115 guys who had their first year on the ballot. The 1990-2009 model (the one without Rose, McGwire and Puckett), predicted 101 of them within .10 of their actual total. So that was 87.8% of them. 89 were predicted to with .05 or 77.4% of them.

In the 1990-2009 group, 88.8% were predicted to within .10 and 74.3% were predicted to within .05. So the 1990-2009 model seems to predict the 1980-1989 results fairly well. The two predictions that are off the most are for Willie McCovey and Willie Stargell. McCovey got 81.4% while the model predicted 38.95, so he got 42.45% more than expected. The difference was even bigger for Stargell. He got 82.4% while the prediction was 23.3. So he got 59.1% more more. But then the next biggest positive differential was Bench who got 96.4% while the prediction said 69.3% for a difference of 27.1%. Then no one else had a positive differential of even 20% (next highest was 17.5%).

The biggest positive differential in the 1990-2009 study was Fisk, who got 35.6% more than expected. Then the next biggest one was about 23%.

Back to the 1980-1989 period, the biggest negative differential belongs to Aaron. The model said he should get 131% but he got "only" 97.8%, for a difference of -32.8%. That is not too much higher than the biggest negative differential from 1990-2009, which belongs to Fred Lynn, of -26.4. Lynn was the only guy to have -20 or bigger (well, bigger in absolute terms among the negative differentials). In the 1980-1989 period, only 2 more guys were -20 or more. I think Aaron's big negative differential is understandable. He scores high on everything or reached almost every milestone. Great world series performer, gold gloves, MVP, 500HR, 3000Hits, 10,000 PAs, etc. All those all-star games. I think some of the voters would give Aaron more than 100% if they could (or the equivalent of more than one vote). I bet most would say that there is alot bigger difference, say, between Aaron and Brooks Robinson, than the 97.8-92=5.8 difference shows. That implies Aaron was only 6.3% better. But I think most people would say it a bigger difference.

The equation for the 1990-2009 period without Rose, McGwire and Puckett was

PCT = -.0165 + .00077*(WSAS/1000) + .04659*(GGAS/1000) + .0475*MVP + .44741*3000HIT + .25953*500HR + .00267*ASSQ10 -.00103*GGSQ7 + .06416*500SB - .0092*(WSIMPSQ50/1000) + .09891*10000PA

Thursday, December 17, 2009

My Predictions For The Hall Of Fame Vote

I base my predictions on regression analysis of the voting from 1990-2009. I looked at voting in the first year of eligibility only. Here is the regression equation:

PCT = .04824*MVP + .45177*3000H + .16754*500HR + .00216*ASSQ10 - .00122*GGSQ7 + .04901*500SB
- .0119*WSIMPSQ50 + .09928*10000PA + .00112*WSAS + .06242*GGAS - .01282

I will explain what the variables mean below. The adjusted r-squared was .923. So 92.3% of the difference across players is explained by the equation. The standard error was .072. There were 181 players, all of those who came up for the first time from 1990-2009, except for Pete Rose. MVP is number of MVP awards won, 3000H is a dummy variable (1 if a player reached it, 0 otherwise). The 500HR is also a dummy variable as it is for 500SB and 10000PA (if you made it to 10,000 career plate appearances, you get a 1, 0 otherwise). I used all the voting data from 1990-2009.

What is ASSQ10? It is the square of the number of All-star games played in squared. But AS games played is maxed out at 10. The assumption here is that being an all-star has a positive exponential effect but only up to a point where no more games helps (I have a graph at a post last summer to help explain this-link below). The GGSQ7 is the same thing for Gold Gloves.

WSIMPSQ50 involves World Series play. First, WSIMP is World Series PAs times OPS. The idea here that the more you play in the World Series the more votes you would get, but by multiplying it by OPS, it also includes how well you played (or just hit). This gets maxed out at 50 and is squared, for the same reason as all-star games (yes, Reggie Jackson is first here and way ahead of everyone else at 141, with Dave Justice and Lonnie Smith tied for 2nd at 101).

The last two variables are interaction variables. GGAS is the gold glove variable multiplied by the all-star variable and WSAS is the world series variable times the all-star game variable. It looks strange that the coefficient values on GGSQ7 and WSIMPSQ50 are negative. But you might notice that they are positive on the interactive variables. I think this is like when a regression uses both X and X-squared in a regression if the phenomena is non-linear (an inverted parabola, for example). The coefficient on X ends up being positive while the x-squared coefficient is negative. The reason I put in these interactive variables was to see if players who were strong in both got an extra boost, as if there was some synergy going on. It seems like they did get an extra boost. My results in terms of r-squared and the standard error are better than what I got without the interaction variables last summer (links below).

All of the variables were significant at the 10% level except for WSIMP, which came close with a p-value of .13. The other variables all had p-values of under .05 with 7 under .01. I also divided the following variables by 1000 since the regression at first gave them a very low coefficient (due to them being very large numbers): WSIMPSQ50, WSAS and GGAS. With WSIMP50 going as high as 50 (then squared to get 2500) and AS going as high as 10 (then squared to get 100), the interaction term could be 250,000. Since the dependent variable can only go from 0 to 100, the coefficient would be very low (even thought the variables were significant). So I divided these three variables by 1000 (my stat package was showing coefficient values of .00000 before I did this).

So, what percentages does this equation predict for the first time eligibles once I plug in their own values? The table below shows this.



Prediction1 is based on the above regression equation. I did another regression with the same variables but I took out Kirby Puckett and Mark McGwire. Puckett retired relatively early due to his eye problems and McGwire has the steroid scandal. Puckett got 82.1% of the vote in his first year of eligibility while the model predicts he would get 63.5%, for positive differential of 18.6%. McGwire got 23.5% of the vote his first time through while the model predicted him to get 40.3%, for a negative differential of -16.8%. Puckett had one of the biggest positive differentials while McGwire had one of the biggest on the negative side. I don't think any of the first-timers for 2010 are like Puckett or McGwire, so it might be reasonable to take them out. The predictions based on the model without those two guys is in the last column of the table above (and the standard error for that regression was .068). Things don't change much. But Alomar does slip below getting in. But that is still a high percentage and if he does not make it in the first time he probably will eventually.

I know that some predictions are negative. That is a drawback of this approach. The intercept is not terriblly negative (just -1.28%). So that is not a big problem. But GGSQ7 and WSIMPSQ50 do both have negative coefficients. So it is possible that a player might have gotten high scores there but if they could not get into any all-star games, those high scores would actually hurt them since the interaction variables would be zero and could not offset the negatives of the straight variables. But anyone with zero all-star games is probably not a Hall-of-Famer.


What Determines Vote Percentage In The First Year Of Hall Of Fame Eligibility? (Part 2)

What Determines Vote Percentage In The First Year Of Hall Of Fame Eligibility?

Saturday, December 12, 2009

Peak Value And Hall Of Fame Worthiness

In my last post, I discussed some eligible players who have not made it in yet who I believe have good sabermetric credentials. I mostly discussed their career rankings by various measures. Here I look at how those players ranked in their best 3-straight seasons by BFW.

Retrosheet has a link with each year called ML Players By Positions. It lists BFW or batting plus fielding wins, which comes from Pete Palmer's linear weights. I took all of that data from Retrosheet then tried to find the best 3-year periods in BFW. This link has the top 500. The Cons3 is their BFW over the 3-year period. The year is the last year in the period. So for Bonds it was 2001-3.

The table below has the highest ranked 3-year period for all the players I looked at in my last post. They are in order starting with the best. So Ron Santo's 1965-67 BFW of 21.1 is tied for the 24th best all-time and is the best among the group I have been looking at. Now Santo's 1964-66 is 37th best, but that is not listed here since it is his 2nd best 3-year period. Players with a * are eligible for the first time in 2010.



I found about 20,000 3-year periods. So anything in the top 200 is in the top 1%. I really don't know how high a player needs to rank here to be considered impressive. There are, of course, overlapping periods for players. So the greatest players often appear several times early on in the rankings. That can push many players down quite a bit. But consider this. The players on the list below are the only players ever to have a better 3-year period than Ron Santo.

Babe Ruth
Barry Bonds
Cal Ripken
Honus Wagner
Joe Morgan
Mickey Mantle
Nap Lajoie
Rogers Hornsby
Ted Williams

So these are the only guys Santo takes a back seat to. Just 9 players.

Three players, Grich, Bell and Dawson all had very good years in 1981, a strike-shortened season when only about 2/3 of the games were played. Here are their BFWs that year:

Andre Dawson 8.4 (T140)
Buddy Bell 5.2 (T69)
Dwight Evans 3.7 (1017)

If you increase each of those by 50%, and then re-calculate their 3-year totals, their ranks would change. Those are the numbers in parantheses. I don't know if that kind of adjustment needs to be done, but some guys might have just been having their best years ever in 1981. Maybe they get short changed (I think that Fred McGriff would have made 500 HRs and Harold Baines might have made 3000 hits without work stoppages and games lost).

One other thing about adjusting for lost time due to strikes. Mike Schmidt had a BFW of 7.2 in 1981. Increasing that to 10.8 would give him 24.2 from 1980-82. Then the only guys better than that are Ruth, Hornsby and Bonds.

I also checked to see how these guys did in Win Shares (WS) during their given 3-year periods. The next table shows this. The numbers in parantheses are their rank in that given year. The players in red were in the top 10 all three years. That seems pretty impressive.



The guys that really stick out are Santo, Dick Allen and Tim Raines. They each had 30+ WS in their years, a level Bill James says is MVP caliber. Raines was clearly the best in the NL for three years running (and I think he has more WS than anyone from the AL, too). Two guys from my last post that I was high on, Barry Larkin and Keith Hernandez just barely miss being in red. They each had two top 10 finishes and an 11th.

But the three guys who were in the top 200 BFW from the first table who also were in the top 10 all three years in WS are Santo, Bobby Grich and Darrell Evans. Both measures, WS and BFW, confirm their elite status for a 3-year period. And this is a very tough test to pass because a player might have had a very high 3-year period in WS that does not match up with their best 3-year BFW. But I required that.

Some additional observations: Roberto Alomar also had a 1st and a 5th in WS in 1992-3, plus a 3rd in 1996. Larkin had a 2nd in 1995. Dick Allen followed his 3-year run with a 7th and then a 4th. Then another number 1 in 1972 as the AL MVP. The players from the early 1900s like Jimmy Sheckard & Sherry Magee also had more pitchers in the top dozen or so than later players. Their ranks could be higher.

I also noticed some guys like Dwight Evans, Reggie Smith and Bill Dahlen who had many seasons well over 20 WS but few, if any, 30+ seasons. Bill James said that 20 is an all-star season. So these guys consistently exceeded that level but maybe really never stood out for very long.

Here are the other top 5 finishes in WS for the players discussed here (outside of their 3-year BFW period)

Magee-1, 3T, 5
Sheckard-5
Hack-3
Santo-3
Bonds-5
Wynn-4
Grich-2, 4T
Hernandez-4T
Trammell-2T, 3T
DW Evans-3T
Raines-5
W. Clark-3
McGwire-3, 4, 5T

Now ALL of the top 5 finishes in BFW for this group of players.

Dahlen-1, 2, 2, 4
Leach-3, 3
Magee-3, 3, 4
Sheckard-1, 1
Hack-2, 4, 5
Johnson-3
Cash-1
Santo-1, 1, 1, 2, 4, 4
Allen-1, 2, 2, 4
DA Evans-2, 4, 4
Bonds-5
Wynn-4
Grich-1, 1, 2, 2, 3, 4, 4T
Randolph-4
Dawson-2
Bell-2, 3
Hernandez-2, 3
Trammell-2, 4,
Raines-2, 3, 3, 4
Larkin-1, 4, 4, 4, 4
W. Clark-2
Martinez-1, 2, 3, 4, 4
McGwire-1, 4, 4
Alomar-1, 1, 3

In some cases my accounting may not be clear. Santo had 3 top 5 finishes in WS and 6 in BFW. Dick Allen had 5 & 4. Grich has 3 & 7. Raines had 4 & 4. Alomar had 5 & 3. McGwire had 4 & 3. Magee had 4 & 3. I think these are all the guys that had at least 3 seasons in the top 5 in both WS and BFW. Seems like they were often among the best playes in their league. Add that to their career values established in the last post, and we have good cases for the Hall of Fame.

Wednesday, December 9, 2009

Some Players With A Good Sabermetric Case For The Hall Of Fame

To come up with this list, I looked for players who were:

1. in the top 150 in wins above replacement (WARP) from Sean Smith's site
2. in the top 120 in MVP win shares as listed at baseball reference
3. in the top 200 in career Win Shares from Bill James' book
4. in the top 150 in batting plus fielding wins or BFW (from Pete Palmer and the Baseball Encyclopedia)

For fans unfamilar with some of these terms, I attempt to explain them at the end of this post. I went with the top 120 for MVP shares because there were not many MVP awards given out before 1931. I went with the top 200 in Win Shares (WS) because about 40 of them are pitchers (through 2001). Some players had some WS after that and their totals were adjusted slightly but I still used the ranks through 2001. I also have only looked at players who are eligible. Pete Rose and Joe Jackson are not included.

The table below shows all the players I could find who fit at least 3 of the 4 criteria (plus the last two guys Tommy Leach & Jimmy Sheckard because they just missed making criteria #1 and did extremely well in WS). You can click on the table to see a larger version. Alomar has a career WARP of 63.6 at Sean Smith's site, ranking him 85th all-time. According to Baseball Reference, he had 1.91 MVP shares, ranking him 98th. He had 375 WS ranking him 53rd. He had 35.8 BFW, ranking him 79th.



The first three players met all 4 criteria. So Alomar will be an interesting case. McGwire has the PED scandal. But Hernandez is surprising. He ranks so highly by different measures including MVP voting. So the voters saw something in him all those years while he was compiling a stellar sabermetric resume. Anyone have an explanation for why he is not in? I think all of the guys who meet 3 or 4 of the criteria deserve serious consideration. The players with a * will be eligible for the first time in this current vote. MVP stands for MVP shares. This list is not meant to be complete-only to show players who pass some major hurdles.

Dick Allen just misses making the top 120 in MVP shares. If he had made that, he would meet all 4 criteria. Andre Dawson looks very good except for his low BFW rating. The only place where Bobby Grich falls down is in MVP shares. Sherry Magee meets 3 of the criteria. The only one missing is MVP shares and they had very little of that in his day. We can say the same thing for Bill Dahlen, whose rankings are very high. Perhaps the most astounding MVP share is for Willie Randolph, .004, for a rank of 1157th yet his rankings in the sabermetric stats are great.

Now alot of career value is all well and good. But what about peak performance? (I tried to bring that in with the MVP shares). The table below shows how many seasons these guys had with 20+ WS, 25+ WS and 30+ WS. Bill James said that 20 WS is an all-star type season while 30 is an MVP type season. So I assume that 25 is an all-star/MVP type season.



I guess everyone will have to judge for themselves if any of these guys had enough all-star or MVP type seasons. But 10 or more all-star type seasons sounds pretty impressive. That helps Hernandez. 10 all-star seasons with his other high ranks should put him in. The following players all had 4 or 5 seasons with 30+ WS

Roberto Alomar*
Dick Allen
Bobby Bonds
Tim Raines
Ron Santo
Jimmy Wynn

Tim Raines should be in. It is a no brainer.

In one way this might not be fair. Players before 1961 or 1962 only played 154 games, so that can knock your WS down 5% (then there were 1900-1903, and 1917-1918 when they played less than 154 games). Then there are strike years. Andre Dawson had 25 WS in 1981. If you play the full season, maybe he gets 37. But I made no adjustment on this account.

I also used BFW to find peak seasons. The table below shows how many seasons each player had a BFW of 2+, 3+, 4+,and 5+. I think a 2+ season is starting to put you into all-star territory while 5+ is MVP territory. This list is in alphabetical order. To see how the table works, look at Dick Allen. He had 10 seasons with 2+ BFW. That ties him for 41st all-time in such seasons (this was through 2005). He had 3 seasons with a BFW of 5+, good for a tie for 33rd.



Many of these players rank very highly here. Look at Bobby Grich. He is tied for 16th in BFW5 seasons. Think about that. Only 15 players ever had more of these kinds of seasons than he did. I think many of these players do well here, showing that they achieved a very high peak value, not just a high career value. Blank spaces means they ranked too low to be worth mentioning. The next table might explain this. I think anyone who had 2 or more BFW5 seasons who also has high career total ranks should be in because only 91 players make this cut.



In many cases, as you might guess, there are many players tied for a certain position. The table above shows how many times each case occurred. For example, there was one player who had 19 seasons with a 2+ in BFW (I think it was Hank Aaron). There is a total of 194 players who had 6+ seasons in BFW2. For BFW5, only 91 players had 2 or more such seasons. If a player in the previous table has a blank, it means that he did not make the minimum number of occurrences in a given case. Norm Cash is blank for BFW3, meaning had fewer than 4 such seasons. Dick Allen is tied with 17 other guys for 41st place in BFW2.

WARP-the idea here is how many more games your team wins with player A as opposed to the next best thing, an easily available or nearly free replacement player. If WARP = 4, then your team won 4 more games with player A than with a replacement. Sean Smith includes hitting, fielding and baserunning in this rating.

Win Shares also includes all phases of the game in its rating. Bill James takes team wins and multiplies it by 3. Then he divides up those WS among the players and pitchers.

BFW rates players compared to the average. So it is like WARP except 0 is average. Usually WARP is -2. BFW also takes all phases of the game into account.

MVP shares-This is a Bill James idea. Players get points in the voting: 14 for a 1st place vote, 9 for a 2nd, 8 for a 3rd. So if there are maximum of 392 points, and a player got 196, he gets a .5 share. If he got 392 points, he gets a 1. Then you add up all those shares from each season to get a career total.

Wednesday, December 2, 2009

Who Was More "Magical" Than Greg Maddux? (Or Pitcher's HR/BB/SO Rating)

Greg Maddux was great at preventing HRs and great at not walking batters. It must be tough for a pitcher to achieve that combination because you are putting the ball in strike zone alot where the batters can hit it. To rate pitchers on this combination, I used data from the Lee Sinins Complete Baseball Encyclopedia which tells us how much better or worse than the league average a pitcher was in various stats.

Maddux, for example, gave up 49% fewer HRs than the league average (what the 1.49 means in the graph below) while walking 86% fewer batters. My HRBB rating multiplies these two numbers together. The table below shows the top 25 among pitchers with 2000+ IP from 1946-2009. Maddux has a pretty clear edge over the competition.



But then I realized that Maddux was not a great strikeout pitcher. He was not preventing batters from hitting HRs by overpowering them. I decided to then multiply each pitcher's HR rate, BB rate and SO rate. But first, I inverted the rate of strikeouts. Maddux struck out 94% as many batters as the average pitcher. Being below average in strikeouts increases the difficulty in achieving a high HRBB rate because there is a positive correlation between not allowing HRs and striking batters out (about .15). So for Maddux, 1/.94 = 1.06, indicating that he was 6% worse at striking out batters than average. So his HRBBSO rating would be 1.49*1.86*1.06 = 2.95.



But notice in the table that he finishes 2nd behind Lew Burdette, who somehow managed to give up 7 fewer HRs than average and walk 76% fewer batters while striking out 61% fewer batters. Pitching the bulk of his career for the Braves in County stadium may have helped. The simple average of the HR park factors from 1952-1962 (which includes the last year the Braves played in Boston) has Burdette with a 75. So he pitched in parks that only allowed about 75% as many HRs as average. For Maddux, from 1987-2003, his parks allowed about 109% of the league average (HR park factors from various Bill James books).

I tried to adjust for this (for these two guys). Assuming that a pitcher pitches half his innings at home, and that he allowed 7% fewer HRs than average (for a rate of 1.07) and that his park has a 75 rating, I thought it best to multiply the 1.07 by .875 (which is half way between 1 and .75). That left Burdette with a HR rate of .936 (which now means he allowed 6.4% more HRs than average). For Maddux, I multipled his 1.49 by 1.045 (half way between 1 and 1.09). That gives him an adjusted HR rate of 1.56. Then recalculating the HRBBSO rate, Burdette ends up with 2.66 while Maddux ends up with 3.09.

Thursday, November 26, 2009

Baseball's "300" Hitters

What players have had both 300+ times reaching base (RB) and 300+ total bases (TB) in the same season? Not many. RB includes hits, walks and HBP. To see a list of all such occurrences, go to 300RBTB. It is in chronological order.

When you get there, the list on the left has all the players who did it. The list on the right shows some near misses, guys who had 280+ in each stat but did not make it. The tables also show each player's offensive winning percentage and RCAA or runs created above average, which is park adjusted since it is from the Lee Sinins Complete Baseball Encyclopedia. The most recent occurrence was Pujols in 2009, with 310 RB & 374 TB. The table below shows the leaders in such seasons.



The next table shows the very near misses, guys who had 297+ in both stats but not 300+ in both. Frank Thomas has one other very near miss. In 1995, when the season was only 144 or 145 games, he had 294 RB & 299 TB.



Now the breakdown by decade:

1890s 2
1910s 2
1920s 19
1930s 20
1940s 9
1950s 6
1960s 3
1970s 2
1980s 3
1990s 21
2000s 24

Sunday, November 22, 2009

Did The Yankees Buy A World Championship In 2009?

That was the subject of a recent Wall Street Journal article by economist Andrew Zimbalist titled The Yankees Didn't Buy the World Series. On the surface, it would seem that they did. They ususally have the highest payroll and they signed three big free agents in the off season, 1B man Mark Teixeira and pitchers C. C. Sabathia & A. J. Burnett. Teixeira led the American League in home runs and runs batted in while the two pitchers both finished in the top 20 in earned run average and the top 11 in innings pitched.

But Zimbalist said:

"It's a little surprising, but the statistical relationship between a team's winning percentage and its payroll is not very high. When I plot payroll and win percentage on the same graph, the two variables don't always move together. In other words, knowing a team's payroll does not enable one to know a team's win percentage.

More precisely, depending on the year, I find somewhere between 15% and 30% of the variance in team win percentage can be explained by the variance in team payroll. That means between 70% and 85% of a team's on-field success is explained by factors other than payroll. Those factors can include front office smarts, good team chemistry, player health, effective drafting and player development, intelligent trades, a manager's in-game decision-making, luck, and more."

Some readers, however, disagreed, making some good points in the letters to the editor a few days later (see In Baseball’s World Series, Money Loads the Bases). The best point may have been made by Ira H. Malis who mentioned that the top 4 teams in salaries in the American League make up, on average, 60% of the teams that make the playoffs.

Economist T. Norman Van Cott makes a good point in support of Zimbalist, that long before the period of free agency, when players can sell their services to the highest bidder (with certain limits), the Yankees dominated baseball. But we need to recall that before 1965, a player coming out of high school or college could sign a contract with any team (but then the reserve clause kept them tied to that team forever). The Yankees had money advantage over the other teams in getting good players in the first place. They could offer bigger bonuses and the promise of often getting a World Series check and making business connections in New York.

Zimbalist also only analyzed the salary and win relationship one year at a time. I did something different last year, using team averages over many years. I found a stronger relationship between salaries and wins that Zimbalist did, that almost 50% of variance in team win percentage can be explained by the variance in team payroll Here is that post (Another look at salaries and wins).

Alot of people have looked at this. But I started thinking about it again after I came across some data at JC Bradbury's site. You can view that data here. The data shows how many games, on average, that teams won each year from 1986-2005. It also shows how much above or below the league average in total salary each team paid in percentage terms. Again, it shows yearly averages. Suppose a team was 10% above average one year and 30% above average another year, they would get 20 (if were just over two years).

What I did was to run a regression with average wins per year as the dependent variable and the average salary (SAL, the % above or below the league average) as the independent variable.

Here is the regression equation

Wins = 0.157*SAL + 80.22

The r-squared was .489 and the standard error was 3.89 wins. The T-value for SAL was 5.17. The .157 means that if you spent 10% more on salaries than the average team, you win 1.57 more games than the average team. A zero for SAL would mean that a team spent the average amount on salaries. A negative number means the team spent below the average salary level. The table below summarizes each team.



Tampa Bay, for example, on average, had a payroll that was 38.87% below the league average. They were predicted to win 74.12 but only 64.33 wins per game. If a team were to spend 100% more than average, it should win about 96-97 games a year. The Yankees had the highest payroll above average. They spent about 70% more than the average team. They were predicted to win 91.26 games a year but actually only won 90.24.

I think the results are fairly strong. 16 of the 30 teams were predicted to within 3 or fewer wins. Only 3 were off by 6 or more wins. I think what I did differently than JC Bradbury was to use the average annual values for each team, instead of each team's data for each year. By using the averages, I think the randomness from year-to-year is eliminated. A team can sign a big free agent and maybe one year he does not do well. Or you get lucky and some non-arbitration eligible young players do very well. So by averaging, some of the good and bad luck gets flushed out.

The graph below also summarizes the results. You can see that the relationship is strong.

Tuesday, November 17, 2009

Age And Performance Of Outfielders And First Basemen Who Had Long Careers

I found all the players who were primarily 1Bmen and/or OFers who had 15+ seasons with 400+ PAs and found their average RCAA at every age from 21-39. RCAA is runs created above average. It comes from the Lee Sinins Complete Baseball Encyclopedia. Here is how he defines it: “It’s the difference between a player’s RC total and the total for an average player who used the same amount of his team’s outs. A negative RCAA indicates a below average player in this category.” It is also park adjusted.

The graph below shows the averages at each age. It surprised me to see that there is no peak but a plateau from 25-29. I sure don't know why it would be like that for this group.



The table below gives the average for each age as well as the number of players at each age. Seems like a pretty stable number of players from 24-36.

Sunday, November 15, 2009

Aging Patterns And Full-Time Players

In 2006 I wrote an article for Beyond the Boxscore called Player Aging Patterns Over Time. One thing I showed there was what percentage of the full-time players in any decade were a given age (I used 400+ PAs for full-time). Below is a graph of the distribution for 1991-2000. The trend line is the two year moving average. Usually 27 is the highest, with around 10% of the full-time players being that age.



Now for the 1961-1970 period. No reason why I picked these decades. Just wanted to show a sample.



The next graph as all the decades starting with 1901-10.



The table below shows the average of each decade's percentage for each age (it was just a simple average, so for age 27, for example, I just added its percentage from each decade and divided by 11, the number of decades I used although 2001-05 was only a half decade).

Tuesday, November 10, 2009

Should Andy Pettitte Make The Hall Of Fame?

This got discussed recently at Baseball Think Factory after Sean Forman wrote Pettitte Falls Short for the Hall of Fame for the NY Times. So here is my take on it.

I first looked at where he ranked all time in RSAA. That stat is from the Lee Sinins Complete Baseball Encyclopedia. It is "RSAA--Runs saved against average. It's the amount of runs that a pitcher saved vs. what an average pitcher would have allowed," including park adjustments. Pettitte now has 204 RSAA. That ranks him 77th all-time. Seems like too low of a rank to make the Hall. But he is 18th among lefites. Maybe left-handed pitchers have a tougher time than righties, so maybe the bar should be a little lower for them. It is not anyone's fault if they are left-handed. They could not have simply worked hard to become a righty. Of course, it also is the case that lefties simply have less value since there are many more right-handed batters. And maybe the Hall has to recognize how much value a pitcher had. But I will continue to show where Pettitte ranks among lefties.

Next I found the RSAA per IP for all pitchers with 2000+ career IP. Pettitte had .0697 (or . 63 runs per 9 IP). That was good enough for 58th. But among lefties, he was 13th. Then I found each pitcher's expected winning percentage using the Bill James pythagorean formula and assumed a league average of 4.5 runs per game. Each pitcher was given a number of games equal to his IP/9. That was multiplied by the expected winning percentage to get projected wins. I then subtracted from that the number of wins a replacement pitcher would have won. For that, I assumed a .400 winning pct. This process predicted that Pettitte would win 186.8 games while the replacement would win 130.06. So that gives him 56.74 WARP or wins above replacement pitcher. He ranks 87th in this WARP measure but is 20th among lefties.

But runs saved is partly determined by the fielders. So I created simple fielding independent ERA. I looked at all all pitchers with 2000+ career IP and used the following stats, all relative to the league average: ERA, HR, SO and BB. 100 is average. A number over 100 means better than average. I ran a regression with ERA as the dependent variable and the others as the independent variables. Here is the equation

ERA = 37.96 + .187*BB + .262*SO + .202*HR

Here are Pettitte's numbers:

BB 122
SO 103
HR 144

So, for example, he gave up 44% fewer HRs than the average pitcher (this comes from Lee Sinins Complete Baseball Encyclopedia). Plugging these numbers into the equation, Pettitte gets 116.85, meaning his projected ERA based on fielding independent stats is 16.85% better than the league average. But his actual ERA is 17% better, so he just happens to project well. Anwyay, he ranks 69th overall but is 20th among lefties.

I also computed a WARP using this predicted ERA in the manner described above. Pettitte ranks 89th while being 21st among lefties with 57.62.

The biggest think in his favor is ranking 13th in RSAA/IP for lefties. But some of his other ranks are pretty low. I think the Hall of Fame has about 219 players, of whom 71 are pitcers or 32.4%. If a team has 25 players and pitchers are 40% of the team, then the Hall should have pitchers (about 87). But if all the position players are deserving (not likely, but I will play along anyway), then about 38 more pitchers need to be in (109/257 is about .4). Only 15 of the pitchers were lefties and Pettitte does rank fairly high among lefties. And if there should be 109 pitchers, he seems to be in the top 109 all-time. Even if there should be 87, even his worst rank that I found is close to that. Of course, all of this assumes that there are no undeserving players or pitchers in right now.

A couple of other things. I thought maybe Pettitte got an advantage pitching at Yankee stadium above the normal park adjustments since he is a lefty and might face alot of righties there where they have a harder time hitting HRs. But from Retrosheet, he gave up a HR% (based on batters faced) at home of 1.94%. On the road it was 2.01%. That does not seem to out of the ordinary. But his HR% (based on ABs) vs. righties has been 2.18% while vs. lefties it has been 2.18% as well. It seems like it should be higher against righties because over the last three years in MLB left-handed pitchers have allowed a HR% of about .5 percentage points higher against right-handed batters. That points to him getting an advantage from Yankee Stadium, but then his home HR% does not seem to give him much of an edge. So I don't know what to conclude from that.

I also once created what I called the Pitcher’s Homerun/Walk Rating. It combined a pitcher's ability to prevent both HRs and BBs into one index rating. Pettitte was 23rd among pitchers with 2000+ IP from 1920-2006. Now it looks like he has slipped to 32nd. But that is out of 277 pitchers. Pretty darn good.

Wednesday, November 4, 2009

Starting Pitchers As Relievers Over Time

Many fans know that starters were often also used as relievers in the past. Lefty Grove, for example, only started 30 games the year he won 31 games (in 1931). He came in 11 times as a reliever. In 1930, he won 28 games while starting 32 and coming in to relieve 18 times.

On May 23, 1911, Christy Mathewson pitched a complete game victory giving up only 1 earned run. Then on May 26, he pitched the last 1 and 2/3 innings to get a win. When he came in in the 8th, the Phillies had two men on and had just scored 2 runs to tie the game. Then he got a double play. The Giants scored 2 in the bottom of the 8th and Mathewson pitched the 9th for the win, giving up no hits. The next day he pitched a complete game shutout.

But how often did starters pitch in relief in the past and how has this changed over time? I looked at the percentage of games pitched in relief by starters each decade starting with 1900-09. In each decade I found this % for the season leaders in games started. The number of pitchers in the leaders were 3 for each team in each year. I figured that each team would have at least 3 guys who started fairly often. But I also looked at the % for all pitchers who started at least 31 games (and at least 33 beginning in 1960). So the table below shows these percentages:



The first column shows the % of games pitched in relief by the leaders in starts. That would be the top 480 in games started in a season for the 1920s, for example. So in that group, 19.5% of their games were in relief. The next column shows the % of games pitched in relief by pitchers who started at least 31 games (up to the 1950s) or 33 games since the 1960s. The trends are pretty clear.

The graph below shows the percentages over time.

Wednesday, October 28, 2009

What Does The Past 3 Years Tell Us About The World Series? (updated)

I tried to use a tool of Tangotiger's called Marcel. There is a good chance I did not apply it correctly (I think I did, that is why it is updated) but I attempted to measure the skill level of the players using the last 3 years of performance with more recent years being weighted more and using a regression to the mean. Maybe in a day or two I will go through the numbers.

First I tried to generate an OPS relative to the league average for the 8 position players on each team. Then I took the simple average of that. For the Yankees, it was 10.13% above the league average. For the Phillies, it was 7.54% better. If we assume a league average of .750, then the Yankees would be at .826 while the Phillies would be at .807. I did not make any park adjustments and this might hurt Ibanez for his two years in Seattle.

For pitchers, I did the same thing using the FIP ERA from Fangraphs. Here are their ratios to the league average for the top 3 starters

Sabathia 0.751
Lee 0.829
Burnett 0.945
Martinez 1.052
Pettitte 0.907
Hamels 0.881


The Yankees have a big edge in the first and second matchups while the Phillies have the edge in the 3rd one. But Fangraphs has the same league average each year for both leagues. This may not be right, and if not, it would probably mean that the Yankees have an edge in all 3 slots. Then, as I mentioned yesterday, the Yankees were much better against lefties this year than the Phillies.

As I understand the Marcel method, the last 3 years have a weight of 5, 4, and 3. So the total is 12. Then the weight is

5/12 = .417
4/12 = .333
3/12 = .25

Now I think that assumes that the player has an equal number of PAs in each year. But I think those weights should be changed if PAs are not equal in each year. Let's take Jeter. Here are his PAs from 2007-9

695
648
706

The total is 2049. In each year, here are the %'s of the total for each year:

.339
.316
.345

Now how does that change the weight of 5, 4, 3? In 2009, he had a larger than expected pct (which is .333). The pct was .345/.333 = 1.036 times the expected value. So instead of using .416 for 2009, I used .416*1.036 = .357. Something similar was done for all the other players. Here are Jeter's OPS divided by the league average from 2007-9

1.11
1.02
1.14

So what is his ratio for the 3 years? 1.11*.339 + 1.02*.316+ 1.14*.345 = 1.097. Now the regression to the mean. First multiply the PAs from the 3 recent years (going backwards) by 5, 4 and 3. That gives 8207. But the regression to the mean involves two seasons worth of league average hitting. Each year is 600 PAs or 1200. So we have a denominator of 9407 (8207 + 1200). So Jeter's OPS relative to the league average is

(8207/9407)*1.097 + (1200/9407)*1 = 1.0846

Jeter's skill level means his OPS is 8.46% better than average. So I did this for each player on each team. I added their relative to the league average and then divided by 8. I did something similar for the starting pitchers.

Monday, October 26, 2009

Yankees vs. Phillies: Can OPS Tell Us Anything?

The table below shows the team OPS for both the Yankees and Phillies as well as the OPS their pitchers allowed.



So the Yankees have an overall advantage of .081 in OPS. I once found that winning pct = 1.26*OPSDIFF + .5. A team with that big of an OPS differential wins about 60% of their games. So that might be the Yankees probability of winning (although it may not be that simple-I actually came up with about a 72% chance for them to win if they have a 60% of winning each game). Of course, the Phillies did not have Lee or Martinez in their rotation all year. So the differences may not be thaat great in the series and we need to take that into account.

The next table shows the OPS of each pitcher in the rotation for the two teams, in what looks like will be the order for the series. Each pitcher's OPS is compared to the league average.



The next table shows which pitcher has the advantage in each matchup.



The Yankees have a big advantage in each of the first three games. My guess is that it will only be in game 4 that the Phillies have the advantage. Then the rotation starts up again. The Yankee hitters also had an OPS that was .076 better than the league average while the Phillies were only .042 better.

The next table shows some other breakdowns. It shows both hitting and pitching OPS for both teams, home and road and also the league averages for those respective stats.



The Yankees outhit their opponents at home by .129 in OPS. For the Phillies, it is only .037. On the road, these two stats are .082 and .012. So when in Yankee Stadium, the Yankees have an advantage of .117 (.129 - .012). Even in Philadelphia, the Yankees advantage is .045 (.082 - .037).

My guess is that park effects are not a big deal here. The simple average of the OPS in Yankee stadium was about 1.6% higher than in Yankee road games (I simply added what the Yankees hit and allowed at home and divided by 2, then did the same for road games and then the home number was divided by the road number-that is all probably not quite right since Yankee pitchers have more innings at home than Yankee hitters since they don't bat alot of the time at home in the bottom of the 9th). For the Phillies, this was 2.2% higher in home games. So, overall, not much going on with park effects.

Also notice that the Yankees hit .081 better than the league against lefties this year and they get to face 3 lefty starters.

The next table shows how the two bullpens faired compared to the league average bullpens.



So even here, the Yankees have an advantage.

Also, I once calculated that the team with home field advantage wins 51.52% of the time, if the two teams are of equal strength. The Yankees played in the tougher league (the AL has been winning most of the interleague games the past few years). And the other 4 teams in the AL East combined to finish 6 games over .500 outside their division this year. In the NL East, it was 28 games under. So it looks like the Yankees played in a much tougher division, too.

Friday, October 23, 2009

Will Barry Larkin Get Elected To The Hall Of Fame?

This is being discussed at Baseball Think Factory now. Click on Red Reporter: JinAZ: A HOF Case for Barry Larkin. I sure hope he gets elected. Sean Smith's Wins Above Replacement Rankings have him at 58th all time. Seems like a no brainer.

But what do the voters like? I created two models earlier this year. One is called Predicting Who Makes The Hall Of Fame Using A Logit Model. It gives him a probability of only about 17% of making it. The model took into account career average, number of 100 RBI seasons, all-star games, PAs, MVP awards, world series performance, getting 3000 hits and being a catcher.

The other model was called What Determines Vote Percentage In The First Year Of Hall Of Fame Eligibility? (Part 2). It said it would be 34.6% for Larkin. It took into account the same things as above plus getting 500 HRs, getting 500 SBs, gold gloves (but not being a catcher).

I sure hope my models are wrong. But this analysis was based on what the voters did from 1990-2009.

Tuesday, October 20, 2009

Some Very Old Sabermetric Classics That Are Online

Goodby To Some Old Baseball Ideas (from LIFE magazine 1954-contains some fairly advanced formulas)

If you really want to blow your mind, read this article from Fortune magazine in 1935 about a very early and very sophisticated

The Base in Baseball By Travis Hoke

Why the System of Batting Averages Should Be Changed (by FC Lane around the year 1917-has linear weights values-dedades ahead of its time-the man was a trained scientist)

Then his analysis of the value of walks is at

The Base on Balls

And links to more Baseball Magazine articles are at

Cyril Morong's Sabermetric Research

I posted the following at BTB a few years ago

The post below is the few pages from FC Lane's book called "Batting" that dealt with the batting order. Whether or not it matches up with some of the recent analysis on lineups I will leave up to readers. One expert mentioned that it was a good idea to bat Cy Williams 2nd. FC Lane was a great baseball writer and editor of Baseball Magazine in the early part of the 20th century. It comes to you through the miracle of scanning (well, it was a miracle that I figured out how to use the scanner-actually my wife who is a computer programmer showed me how-the miracle is that she stays married to me)

How the Batting Order "Colors" Batting

FC Lane on the Batting Order

Monday, October 19, 2009

Does Jimmy Rollins Have More Pop As A Left-Handed Batter?

One of the announcers last night, I think it was Buck Martinez, said that Rollins did. If I recall correctly, it was because he hit more HRs as a lefty this past season. He did hit 14 HRs vs. righties (when bats left-handed) and 7 vs. lefties. But, as many fans know, he also faced righties alot more. Here are his HR%'s vs. lefties and righties this year:

vs. lefties (as a right-handed batter) 4.02%
vs. righties (as a left-handed batter) 2.81%

Now for his entrire career.

vs. lefties (as a right-handed batter) 2.83%
vs. righties (as a left-handed batter) 2.33%

So it looks like he actually has more HR power as a right-handed batter, although it is pretty close for his entire career. Another way to look at "pop" is to use isolated power or SLG - AVG. Here are his 2009 figures:

vs. lefties (as a right-handed batter) .195
vs. righties (as a left-handed batter) .165

Now for his whole career.

vs. lefties (as a right-handed batter) .168
vs. righties (as a left-handed batter) .163

So, it looks like he has more power as a right-handed batter (but again, the edge is slight for his whole career). I don't think this is sabermetrics. I think it is just arithmetic. But it seems like announcer make this mistake alot when talking about lefty/righty stats. They look at raw totals instead of percentages, forgetting that there are fewer lefty pitchers than righties.

Friday, October 16, 2009

Even If There Really Are Clutch Hitters And We Can Tell Who They Are, Does It Significantly Affect Winning Or Affect Personnel Decisions?

There were some recent posts around the blogosphere on clutch hitting. As many times before, the discussion was mainly about whether or not it exists. Here are the links:

Overestimating the Fog by JC Bradbury at Sabernomics. This article got discussed at Baseball Think Factory. JC Bradbury also posted two other studies: Does Clutch Pitching Exist? and A Little Clutch Hitting Study. Phil Birnbaum had Doesn't "The Book" study pretty much settle the clutch hitting question?.

Bradbury's "Fog" article refers to an article from a few years ago by Bill James (JC has a link to it). Bill James suggested that our statistical methods might not be able to detect clutch hitting. JC has presents a different view.

Phil makes a refernce to "The Book" by Tom Tango, Mitchel Lichtman and Andrew Dolphin. Their basic finding was that there is clutch ability but it is very slight.

Now getting back to Bill James. He wrote an article a couple of years ago called Mr. Clutch: Big Papi, Chipper, Pujols come through when it counts. James said:

""Clutch" is a complicated concept, containing at least seven elements:

1. The score,
2. The runners on base,
3. The outs,
4. The inning,
5. The opposition,
6. The standings,
7. The calendar."

Then he showed how certain players did alot better in these cases than they normally do. But what he does not say in this article (it may be elsewhere), is how much differently all players hit in these situations than they normally do (that is, the league average differential). This information is necessary to see which players' clutch performance is statistically significant. I made a crude attempt at analyzing James' new measure of clutch in this post: Is David Ortiz A Clutch Hitter?. For differences from normal performance, I used those in close and late situations. It looked like his clutch performance was not significant. But I have not seen James post the clutch data for all players, so a complete analysis has not been done (maybe he has posted this on his site but I have not signed up to pay for it).

I did a study a few years ago called How Many Games Do Clutch Hitters Really Win?. I had two methods of seeing how many wins clutch hitters added above their normal hitting. In one method, only about 10% of the hitters I looked at were able generate as many .5 more wins a season than expected by hitting better in the clutch than they normally do. That assumes that this was thier true clutch ability. In the other method, only 3 out of 71 players added as many .5 wins (that table is partly cutoff now at the link).

Getting back to what "The Book" says, they show that the biggest clutch hitting skill of any player over the 2000-2004 period was .0018 on their wOBA stat (based on another formula they mention, I estimate that is about .004 in OPS). Their clutch situation was the 8th inning or later and the batting team is down 1-3 runs. I don't know what percentage of all plate appearances are made up by these situations, but for close and late situations (CL) it is 15%. My study, mentioned in the previous paragraph, found only a small number of hitters making much difference by their clutch performance and I made no "regression to the mean" adjustment to their clutch stats like "The Book" people did.

I assumed that if a guy's OPS was .050 higher in the clutch than otherwise, that was his true clutch ability. If "The Book's" clutch situation is also about 15% of the PAs (like CL), then I have to assume that their methods say that players add many fewer wins from their clutch performance than my method since my method has a top differential of .117 for Tino Martinez. That is, his OPS was that much higher in the clutch than otherwise. They have a biggest difference of about .004 in OPS, which probably creates very few extra wins. And that is the best they found.

Phil Birnbaum's post also mentioned how different kinds of hitters, like power hitters vs. singles hitters, hit differently in the clutch and whether or not it was due to a change in their approach with the game on the line. I did a study once called Do Power Hitters Choke in the Clutch?. It was inspired by a study by Andrew Dolphin (one of "The Book" people-I have a link to it at this study). I found mixed results, but maybe powers did do a little worse in the clutch than other hitters.

Finally, if clutch hitters are real, do teams make trades to get them? Do they offer those free agents more money? I would love to know if teams have ever done this. There is a study on this called Are Players Paid for "Clutch" Performance? by Jahn K. Hakes and Raymond D. Sauer. My guess is that teams never consider any clutch data when making personnel decisons. If that is the case, then effectively clutch is a non-issue.

Thursday, October 8, 2009

The Percentage Of Batters Faced By Relief Pitchers Since 1953

The data came from Retrosheet. The graph below shows the % faced in the AL.



Now for the NL.



Now for both leagues in the same graph. The AL is the red line and the NL is the blue line.



This last graph shows the difference between the two leagues (NL - AL). In the first year, 1953, the NL had 0.288 while the AL had 0.259 for a difference of about .029. Then the next year the NL was .04 higher. It is intersting to see that there was one trend to about 1970 of the NL edge falling (actually turning negative in 1960 and staying there until 1970, except for 1962). Then there is a trend for at least 10 years of the NL rising relative to the AL. Then it generally declines until about 2000 and then it starts rising again. Maybe the DH plays some role here but it can't explain all of it.

Monday, October 5, 2009

Did The Increased Use Of Relief Pitching Cause A Decline In Clutch Hitting?

This is mainly an elaboration on last week's post called Clutch Hitting Over Time (1952-2008). What I found was a correlation between the fall in percentage of games completed and clutch hitting (as measured by the difference between non-close and late (NCL) situations and close and late (CL) situations). Here, I just turn things around and make the measure of clutch CL - NCL (the two stats I used were AVG and isolated power or ISO).

The table below shows the AL AVG in both CL and NCL for the given periods. I broke things down by 3 year periods because there was alot of volatility from year to year (the Retrosheet data on this in the AL starts in 1953 and 1952 for the NL). The period averages are simple averages. The DIFF column is just the first minus the second and the last column is the percentage of games not completed.



You can see that the difference has generally gotten more negative over time as the percentage of games not completed has increased (a proxy for the use of relief pitching). I was surprised to find that there were years when the AVG in CL situations was higher than in NCL situations. The next graph shows relationship between the last two columns from the table above.



The r-squared in the graph refers to the percentage of variation in clutch hitting (CL - NCL) explained by the percentage of games not completed (%NCG). It was 71.94%. Now the same two tables for the NL.





Interesting that the r-squared is so much lower in the NL. No reason comes to mind.

The next set of graphs does the same thing for ISO in the AL.





The .8652 seems very high. 86.52% of the variation in clutch is explained by the change in games not completed. Now for the NL.



Saturday, October 3, 2009

Pujols wins triple crown

Okay, he has won the triple crown covering the years 2001-2008 in the NL. Here are the top 10 in AVG, HRs, RBIs with a 2000 PA minimum. Once we extend it to 2009, he will still lead in all 3. Maybe some other hitters have done this over a 9 year stretch or longer. Hornsby did for his entire NL career! Ted Williams did it for his entire career! So did Stan Musial! Anybody know who else had a long span triple crown? I will look at obvious choices when I get a chance. Data from the Lee Sinins Complete Baseball Encyclopedia.

AVERAGE
1 Albert Pujols .334
2 Todd Helton .326
3 Barry Bonds .325
4 Matt Holliday .319
5 Chipper Jones .317
6 Larry Walker .316
7 Miguel Cabrera .313
8 David Wright .309
9 Hanley Ramirez .308
10 Moises Alou .304

HOMERUNS
1 Albert Pujols 319
2 Adam Dunn 278
3 Barry Bonds 268
4 Lance Berkman 263
5 Andruw Jones 255
6 Aramis Ramirez 237
7 Pat Burrell 233
T8 Chipper Jones 219
T8 Jim Edmonds 219
10 Derrek Lee 207

RBI
1 Albert Pujols 977
2 Lance Berkman 879
3 Aramis Ramirez 815
4 Andruw Jones 770
T5 Pat Burrell 748
T5 Todd Helton 748
7 Chipper Jones 739
8 Jeff Kent 725
9 Adam Dunn 672
10 Luis Gonzalez 664

Hornsby had a career triple crown while in the NL. He lead the NL in all 3 stats (even with just a 1000 PA minimum) for his entire NL career, from 1915-33. So from 1915-1933, Hornsby lead in AVG, HRs, and RBIs. Maybe someone has said this before but I have not seen it. Here are the top 10

AVG
1 Rogers Hornsby .359 (.35936)
2 Chuck Klein .359 (.35907)
3 Lefty O'Doul .355
4 Paul Waner .346
5 Bill Terry .341
6 Riggs Stephenson .339
7 Babe Herman .332
8 Lloyd Waner .332
9 Kiki Cuyler .330
10 Spud Davis .330

HR
1 Rogers Hornsby 298
2 Cy Williams 247
3 Hack Wilson 238
4 Jim Bottomley 194
5 Chuck Klein 191
6 Mel Ott 176
7 Gabby Hartnett 154
8 George Kelly 148
9 Babe Herman 143
10 Bill Terry 138

RBI
1 Rogers Hornsby 1555
2 Jim Bottomley 1188
3 Pie Traynor 1176
4 Frankie Frisch 1084
5 Hack Wilson 1033
6 George Kelly 1020
7 Charlie Grimm 1015
8 Cy Williams 967
9 Bill Terry 892
10 Edd Roush 891

Now for Ted Williams

AVG
1 Ted Williams .344
2 Joe DiMaggio .322
3 Jimmie Foxx .315
4 Harvey Kuenn .313
5 Dale Mitchell .312
6 Barney McCosky .312
7 Luke Appling .310
8 Hank Greenberg .309
9 Bob Dillinger .308
10 Taffy Wright .308

HR
1 Ted Williams 521
2 Mickey Mantle 320
3 Yogi Berra 318
4 Joe DiMaggio 254
5 Larry Doby 253
T6 Vic Wertz 247
T6 Vern Stephens 247
8 Roy Sievers 243
9 Gus Zernial 237
10 Joe Gordon 228

RBI
1 Ted Williams 1839
2 Yogi Berra 1306
3 Mickey Vernon 1296
4 Vern Stephens 1174
5 Bobby Doerr 1153
6 Joe DiMaggio 1105
7 Vic Wertz 1092
8 Larry Doby 970
9 Mickey Mantle 935
10 Rudy York 922

Now Musial

AVG
1 Stan Musial .331
2 Hank Aaron .320
3 Willie Mays .315
4 Tommy Davis .313
5 Dixie Walker .312
6 Jackie Robinson .311
7 Orlando Cepeda .310
8 Vada Pinson .309
9 Richie Ashburn .308
10 Joe Medwick .305

HR
1 Stan Musial 475
2 Eddie Mathews 422
3 Willie Mays 406
4 Duke Snider 403
5 Gil Hodges 370
6 Ernie Banks 353
7 Ralph Kiner 351
8 Hank Aaron 342
9 Hank Sauer 288
10 Del Ennis 286

RBI
1 Stan Musial 1951
2 Duke Snider 1316
3 Del Ennis 1277
4 Gil Hodges 1274
5 Willie Mays 1179
6 Eddie Mathews 1166
7 Hank Aaron 1121
8 Carl Furillo 1058
9 Bob Elliott 1051
10 Ernie Banks 1026

The following sites also discussed these issues

http://www.philly.com/philly/sports/phillies/20090923_High___Inside__NL_Notes.html

http://www.baseballthinkfactory.org/files/newsstand/discussion/goold_albert_pujols_claim_to_a_triple_crown_or_two/

http://www.stltoday.com/blogzone/bird-land/bird-land/2009/01/albert-pujols-could-be-close-to-claiming-a-triple-crown-or-two/

Thursday, October 1, 2009

Yes, We Should Have Kept An Eye On The Rockies

On June 8th, I asked Should We Keep An Eye On The Rockies? It was right after they swept the Cardinals in St. Louis, scoring alot of runs in a combination of blowouts and un-close games. Given that the Cards were (and still are) a very good team, I thought the sweep was an indicator of how good the Rockies might be.

But I sure got some other precictions wrong. Like Albert Pujols Has A Good Chance To Win The Triple Crown. He lead the league in HRs and RBI's on July 4th while only trailing Hanley Ramirez in average by .008 in average. I thought is better track record (including 2nd half hitting) gave him a good shot to lead in AVG over Ramirez and the other top hitters. But he may not even lead in RBIs.

And then there was Is Ryan Howard The New Mickey Vernon? (Or Is His Career Really In Decline?). His offensive winning percentage(OWP) had declined the last 2 years.

.777 (26)
.675 (27)
.582 (28)

So those declines were .102 & .093. If I had limited the study to declines of .093 or more, there were only 8 guys. The only one whose decline started before age 30 was Vernon. Here is what happened to Vernon:

.759 (28)
.465 (29)
.284 (30)

But he bounced back at age 31 with .579. And Howard, too, has bounced back. I don't have his OWP for this year, but his adjusted OPS the last 4 years, including this year have been (from Baseball Reference)

167
144
124
136

Vernon's 4 years were

160
99
73
113

So I guess I was right: Howard is the new Vernon. Any player under 30 with a .093 or more decline in OWP 2 straight years will bounce back the next year with a better season.:)