Part 1 was Explaining The 1959 White Sox. There I showed that the White Sox had a much higher winning percentage than their underlying stats would indicate. Here I try to quantify how much their clutch performance helped. They had an actual winning percentage of .610.
But according to my study from a few years ago called Does Team Clutch Matter in Baseball?, they should have only had a pct of .522 based on their OPS differential. The Sox only had an OPS differential of about .020 since their hitters had an OPS of .691 while their pitchers allowed an OPS of .671. My team clutch study had a regression equation for winning pct of
(1) PCT = 0.49 + 1.27*OPS - 1.26*OPPOPS
That projects a team with an OPS differential of .020 to have a pct of .522. So how did the Sox end up with a .610 pct? Another regression equation was
(2) PCT = 0.501 + 0.918*NONCLOPS + 0.345*CLOPS - 0.845*OPPNONCLOPS - 0.421*OPPCLOPS
That is, OPS by the hitters and OPS allowed by the pitchers was broken down into "close and late" (CL) situations and non-CL situations. Here is where we can start to see how the Sox had such a good winning pct. Their hitters had a CL OPS of .742 and a non-CL OPS of .680. For the pitchers, those numbers were .623 and .682, respectively. Plugging those numbers into equation (2), the Sox would have a pct of .543. So it is possible that their superior CL performance (especially compared to nonCL), added about .021 to their pct. That is about 3.2 wins over 154 games.
The Sox also did extremely well with runners in scoring postion (RISP). Their hitters had a RISP OPS of .742 and a non-RISP OPS of .673. For the pitchers, those numbers were .639 & .680. How did this RISP performance affect their winning pct? Equation (3) estimates that. It is like equation (2), but broken down by RISP and non-RISP performance.
(3) PCT = 0.501 + 0.848*NONRISPOPS + 0.432*RISPOPS - 0.799*OPPNONRISPOPS - 0.462*OPPRISPOPS
It projects the Sox to have a pct of .553, .031 better than the .522 estimated by equation (1). Over 154 games, that is about 4.77 wins.
It is not clear how to combine the information generated by equations (1) and (2). I am not sure if we can simply add the .021 to the .031 to get .051 and then say that their clutch play add that much to their pct. There is going to be an overlap between the CL and RISP situations. Usually about 25% of plate appearances (PAs) are with RISP and about 15% are CL. Just multiplying the .25*.15 would get about .0375, meaning that 3.75% of PAs are both CL and RISP. So maybe their would not be much overlap and summing the .021 & .031 is okay. Maybe not. I'm just not sure.
But it could be that the combined RISP-CL situations are extremely important and maybe the Sox did well in those cases, further adding to their pct. Anyway, if we could add the two gains together, it would explain more than half the unexplained gap from equation (1), which was .088 (.610 - .522). Whatever the case, the Sox had no advantage over their opponents in non-CL and RISP situations but totally dominated when it was CL or RISP. This is probably the reason for their success. Since no one seems to know how to consistently perform well in the clutch, we have to view the 1959 White Sox has having been very lucky.
I also found that their starting pitchers allowed an OBP of .306, not counting sacrifice hits and IBBs. The starters had allowed 2.26% of hitters to hit HRs. These numbers for the relievers were .306 and 2%. So it could be argued that the White Sox had a great bullpen and that explains the great pitching performance in CL situations. But that is not the case.
Sunday, August 30, 2009
Wednesday, August 26, 2009
Konerko Breaks The Tie With Mark McGwire, Catches Mike Lowell!
At the start of the season, the three of them were all tied for the record with the fewest triples with 5000+ ABs with 6. Lowell got one on April 27th. But tonight Paul Konerko hit one in the first inning in Boston (Lowell was not in the starting lineup). Cecil Fielder had 7. One more and Konerko can tie Mike Piazza. Only 302 more to go to catch Sam Crawford.
Tuesday, August 25, 2009
Did Drug Use Keep Dave Parker Out Of The Hall Of Fame?
This issue came up in ALAN ROBINSON's article titled Parker wonders if drug use keeps him out of HOF. It compares Parker to recent inductee Jim Rice and others in an attempt to show that Parker might be Hall-worthy. It generally relies on conventionals stats. I don't think Parker has an especially strong case. But it is not bad, either.
Here are some things I came up with. Parker ranks 308th in wins above replacement level among position players with 37.9 at Sean Smith's site Top 500 Position Players. Rice is ranked 257th with 41.5. Neither of those ranks seems impressive.
I have a site which ranks players by the Win Shares Per Plate Appearance. Through 2001, Rice ranked 197th with 20.17 WS per 648 PA among outfielders. Parker was Parker 176th 20.81. Again, not really impressive for either guy.
Baseball Reference has a couple of measures of who is Hall-worthy. One is the "Hall of Fame Monitor." BR says "This is another [Bill] Jamesian creation. It attempts to assess how likely (not how deserving) an active player is to make the Hall of Fame. It's rough scale is 100 means a good possibility and 130 is a virtual cinch. It isn't hard and fast, but it does a pretty good job." Click here to see the rankings. Parker has a score of 124 which ranks 110th while a "Likely HOFer ≈ 100." Rice ranks 90 with a score of 144. So this suggests that both had careers that normally get you in, based on the voters' preferences. Maybe the drug use could be affecting Parker here.
One issue here is that over the years players have gotten in by the BBWAA or the Veteran's committee. The latter has changed its procedures in recent years, so it may not be clear if these patterns will hold in the future and they probably did not always agree with the BBWAA. It would be interesting to have separate formulas for each. Anwyay, if you don't get in by the BBWAA, it usually takes awhile before you are eligible for the Veteran's Committee.
BR also has its Hall of Fame Standards Batting and says "It is used to measure the overall quality of a player's career as opposed to singular brilliance (peak value)." Parker has a score of 41 which ranks him 135th while the "Average HOFer ≈ 50." Rice is ranked 116th with a score of 43. So both guys fall a little short here. But they are close. Parker could argue he is close enough to Rice to get in.
A few months ago, I came up with my own regression based formulas for estimating the % of the votes a player might get in his first year of eligibility or his probability of getting in at all based on voting patterns and the apparent preferences of the voters.
One model was an OLS regression. You can read about that at What Determines Vote Percentage In The First Year Of Hall Of Fame Eligibility? (Part 2). This one estimated first year eligibility vote %. It took into the following variables MVP awards, getting 3000 hits, getting 500 HRs, all-star games, Gold Gloves, getting 500 stolen bases, world series performance and career plate appearances. The model estimated that Parker would get 28.8% while he actually got 17.5%. So a little less than expected but if you read the link you will see that others did even worse.
The other study I did was Predicting Who Makes The Hall Of Fame Using A Logit Model. That model took into account career AVG, number of seasons with 100 RBIs, all-star games, career plate appearances, MVP awards, world series performance, getting 3000 hits and being a catcher. Parker did not do well here but I concluded "If Dave Parker had 8 all-star game instead of 6, he goes from a P of 8% to over 60%." Maybe without the drug use he would have made more all-star games and it would put him in the Hall of Fame. Rice's probability of making the Hall was about 59.5%.
So I don't really see any strong case for Parker. But some evidence supports him.
Here are some things I came up with. Parker ranks 308th in wins above replacement level among position players with 37.9 at Sean Smith's site Top 500 Position Players. Rice is ranked 257th with 41.5. Neither of those ranks seems impressive.
I have a site which ranks players by the Win Shares Per Plate Appearance. Through 2001, Rice ranked 197th with 20.17 WS per 648 PA among outfielders. Parker was Parker 176th 20.81. Again, not really impressive for either guy.
Baseball Reference has a couple of measures of who is Hall-worthy. One is the "Hall of Fame Monitor." BR says "This is another [Bill] Jamesian creation. It attempts to assess how likely (not how deserving) an active player is to make the Hall of Fame. It's rough scale is 100 means a good possibility and 130 is a virtual cinch. It isn't hard and fast, but it does a pretty good job." Click here to see the rankings. Parker has a score of 124 which ranks 110th while a "Likely HOFer ≈ 100." Rice ranks 90 with a score of 144. So this suggests that both had careers that normally get you in, based on the voters' preferences. Maybe the drug use could be affecting Parker here.
One issue here is that over the years players have gotten in by the BBWAA or the Veteran's committee. The latter has changed its procedures in recent years, so it may not be clear if these patterns will hold in the future and they probably did not always agree with the BBWAA. It would be interesting to have separate formulas for each. Anwyay, if you don't get in by the BBWAA, it usually takes awhile before you are eligible for the Veteran's Committee.
BR also has its Hall of Fame Standards Batting and says "It is used to measure the overall quality of a player's career as opposed to singular brilliance (peak value)." Parker has a score of 41 which ranks him 135th while the "Average HOFer ≈ 50." Rice is ranked 116th with a score of 43. So both guys fall a little short here. But they are close. Parker could argue he is close enough to Rice to get in.
A few months ago, I came up with my own regression based formulas for estimating the % of the votes a player might get in his first year of eligibility or his probability of getting in at all based on voting patterns and the apparent preferences of the voters.
One model was an OLS regression. You can read about that at What Determines Vote Percentage In The First Year Of Hall Of Fame Eligibility? (Part 2). This one estimated first year eligibility vote %. It took into the following variables MVP awards, getting 3000 hits, getting 500 HRs, all-star games, Gold Gloves, getting 500 stolen bases, world series performance and career plate appearances. The model estimated that Parker would get 28.8% while he actually got 17.5%. So a little less than expected but if you read the link you will see that others did even worse.
The other study I did was Predicting Who Makes The Hall Of Fame Using A Logit Model. That model took into account career AVG, number of seasons with 100 RBIs, all-star games, career plate appearances, MVP awards, world series performance, getting 3000 hits and being a catcher. Parker did not do well here but I concluded "If Dave Parker had 8 all-star game instead of 6, he goes from a P of 8% to over 60%." Maybe without the drug use he would have made more all-star games and it would put him in the Hall of Fame. Rice's probability of making the Hall was about 59.5%.
So I don't really see any strong case for Parker. But some evidence supports him.
Friday, August 21, 2009
Why Are The Angels Winning More Games Than Their Pythagorean Projection?
This is being discussed at Beyond The Boxscore and Baseball Think Factory. With 685 runs scored and 605 allowed, their Pyth pct is .561, good for 91 wins in a season. But they actually have a .613 pct, good for about 99 wins. So the gap is projected to be about 8 (this comes from the Bill James idea that a team's winning percentage is going to be close to their runs scored squared/(runs scored squared + runs scored allowed)).
The big reason why the Angels are doing better than expected is how well they are doing in close and late situations (situations "in the 7th or later with the batting team tied, ahead by one, or the tying run at least on deck" according to Baseball Reference, where I got my data from along with ESPN). Then they also hit extremely well with runners on base (as I explain in Part 2). Part 3 discusses how well the Angels are doing by a sophisticated measure called Win Probability Added.
Part 1
I did a study a few years ago called Does Team Clutch Matter in Baseball?. The equation for pct was
(1) PCT = 0.49 + 1.27*OPS - 1.26*OPPOPS
Where OPPOPS is the OPS allowed by a team's pithcers. Using only walks and hits in OBP, the Angels have an .804 OPS and a .787 OPS allowed. It predicts they will have a pct of a .519.
But if you break it down by close and late situations and non close and late situations it was
(2) PCT = 0.501 + 0.918*NONCLOPS + 0.345*CLOPS - 0.845*OPPNONCLOPS - 0.421*OPPCLOPS
For hitting, their OPS was .796 in NONCL and .852 in CL. For pitching, it was .799 & .707. So they get predicted to have a .553 pct. That .033 gain over 162 games is 5.41 wins.
With 685 runs scored and 605 allowed, their Pyth pct is .561, good for 91 wins in a seasons. But they actually have a .613 pct, good for about 99 wins. So the gap is projected to be about 8. But 5.4 (or about 2/3) of that is due to their close and late performance. And as the next part shows, some of the rest of the gap will be explained by how well they hit with runners on base (they don't pitch any better with runners on base than they do normally).
But the more important comparison is between the .519 predicted by equation (1) and and the Angel's actual pct of .613. The .519 says they should win about 84 games. So the gap is 15. The 5.4 explains about 1/3 of it. Still, a big chunk.
One more thing, and to make another long story short, and using formulas from my team clutch study linked above, taking RISP performance into account would add another .023 to the Angel's pct (similar to what I did with equation to and close and late situations). That amounts to 3.7 additional wins. It is probably not that much, since some RISP situations happen when it is close and late and I already did a calculation that took that into account. But if about 25% of PAs are with RISP and about 15% when it is C&L, then maybe 3.75% are both. Not sure if is that simple, but I think most of that 3.7 could be added to the 5.4 I got before and get us close to 9 wins, 60% of the differential of 15.
Part 2
The Angels have scored about about .25 more runs per game than expected. From a regression, the runs per game in the AL this year can be estimated by
R/G = 5.96*SLG + 24.96*OBP - 6.01
It says the Angels should score 5.50 R/G but they actaully have 5.76. So over the whole season, that is about an extra 40 runs. But they are doing it because of their phenomenal hitting with runners on base (ROB). The overall AVG-OBP-SLG this year are 0.290-0.354-0.451. But with ROB, they are 0.307-0.378-0.464. So their differential in all three with ROB are .019-.024-.013.
The AL league averages for AVG-OBP-SLG are .266-.335-.430. With ROB, they are .270-.345-.430. The differences are .004-.010-.000. So the Angels ramp it up alot more with ROB than most other teams.
Then I ran a regression with SLG and OBP broken down by ROB & NONROB. The equation was
R/G = 6.94*NONROBOBP + 4.46*NONROBSLG +17.18*ROBOBP + 1.72*ROBSLG -5.98
This predicts that the Angels would score about 5.63 runs per game. Over a whole season, it means they are scoring about 21 more runs than expected. So taking their ROB hitting into account, we reduce their differential by about half. When I did OBP for both regressions, I only included walks and hits. So the OBPs used are slightly different than what I report (from ESPN). Now their will be some overlap between close and late situations and ROB situations, but my guess is that the gap between actual and predicted wins from Part 1 will be even smaller once ROB is taken into account.
Part 3
The Fangraphs website has a more sophisticated measure. It uses WPA, or Win Probability Added. It is a stat which credits a player with how much he changes his team's probability of winning after a given plate appearance. A guy might get a hit, raising the prob. by .1 or make an out lowering it by .1. These probabilities are based on historical data of how often teams win games in certain situations (leading by 2 in the seventh inning, trailing by 4 in the 8th, etc.) It also takes into account how the base-out situation changes, which affects runs and the chances of winning.
So far this year, Fangraphs has the Angels as +4.79 clutch for their hitters and +4.18 for their pitchers. So that is 8.92 extra wins due to clutch performance (that is, doing better in clutch situations than they normally do). What it means is that the Angels, whether the batters or pitchers, are really coming through the closer and later the game and the more runners on base there are. You can see this data at
Fangraphs Win Probabilities for batters
Fangraphs Win Probabilities for pitchers
The big reason why the Angels are doing better than expected is how well they are doing in close and late situations (situations "in the 7th or later with the batting team tied, ahead by one, or the tying run at least on deck" according to Baseball Reference, where I got my data from along with ESPN). Then they also hit extremely well with runners on base (as I explain in Part 2). Part 3 discusses how well the Angels are doing by a sophisticated measure called Win Probability Added.
Part 1
I did a study a few years ago called Does Team Clutch Matter in Baseball?. The equation for pct was
(1) PCT = 0.49 + 1.27*OPS - 1.26*OPPOPS
Where OPPOPS is the OPS allowed by a team's pithcers. Using only walks and hits in OBP, the Angels have an .804 OPS and a .787 OPS allowed. It predicts they will have a pct of a .519.
But if you break it down by close and late situations and non close and late situations it was
(2) PCT = 0.501 + 0.918*NONCLOPS + 0.345*CLOPS - 0.845*OPPNONCLOPS - 0.421*OPPCLOPS
For hitting, their OPS was .796 in NONCL and .852 in CL. For pitching, it was .799 & .707. So they get predicted to have a .553 pct. That .033 gain over 162 games is 5.41 wins.
With 685 runs scored and 605 allowed, their Pyth pct is .561, good for 91 wins in a seasons. But they actually have a .613 pct, good for about 99 wins. So the gap is projected to be about 8. But 5.4 (or about 2/3) of that is due to their close and late performance. And as the next part shows, some of the rest of the gap will be explained by how well they hit with runners on base (they don't pitch any better with runners on base than they do normally).
But the more important comparison is between the .519 predicted by equation (1) and and the Angel's actual pct of .613. The .519 says they should win about 84 games. So the gap is 15. The 5.4 explains about 1/3 of it. Still, a big chunk.
One more thing, and to make another long story short, and using formulas from my team clutch study linked above, taking RISP performance into account would add another .023 to the Angel's pct (similar to what I did with equation to and close and late situations). That amounts to 3.7 additional wins. It is probably not that much, since some RISP situations happen when it is close and late and I already did a calculation that took that into account. But if about 25% of PAs are with RISP and about 15% when it is C&L, then maybe 3.75% are both. Not sure if is that simple, but I think most of that 3.7 could be added to the 5.4 I got before and get us close to 9 wins, 60% of the differential of 15.
Part 2
The Angels have scored about about .25 more runs per game than expected. From a regression, the runs per game in the AL this year can be estimated by
R/G = 5.96*SLG + 24.96*OBP - 6.01
It says the Angels should score 5.50 R/G but they actaully have 5.76. So over the whole season, that is about an extra 40 runs. But they are doing it because of their phenomenal hitting with runners on base (ROB). The overall AVG-OBP-SLG this year are 0.290-0.354-0.451. But with ROB, they are 0.307-0.378-0.464. So their differential in all three with ROB are .019-.024-.013.
The AL league averages for AVG-OBP-SLG are .266-.335-.430. With ROB, they are .270-.345-.430. The differences are .004-.010-.000. So the Angels ramp it up alot more with ROB than most other teams.
Then I ran a regression with SLG and OBP broken down by ROB & NONROB. The equation was
R/G = 6.94*NONROBOBP + 4.46*NONROBSLG +17.18*ROBOBP + 1.72*ROBSLG -5.98
This predicts that the Angels would score about 5.63 runs per game. Over a whole season, it means they are scoring about 21 more runs than expected. So taking their ROB hitting into account, we reduce their differential by about half. When I did OBP for both regressions, I only included walks and hits. So the OBPs used are slightly different than what I report (from ESPN). Now their will be some overlap between close and late situations and ROB situations, but my guess is that the gap between actual and predicted wins from Part 1 will be even smaller once ROB is taken into account.
Part 3
The Fangraphs website has a more sophisticated measure. It uses WPA, or Win Probability Added. It is a stat which credits a player with how much he changes his team's probability of winning after a given plate appearance. A guy might get a hit, raising the prob. by .1 or make an out lowering it by .1. These probabilities are based on historical data of how often teams win games in certain situations (leading by 2 in the seventh inning, trailing by 4 in the 8th, etc.) It also takes into account how the base-out situation changes, which affects runs and the chances of winning.
So far this year, Fangraphs has the Angels as +4.79 clutch for their hitters and +4.18 for their pitchers. So that is 8.92 extra wins due to clutch performance (that is, doing better in clutch situations than they normally do). What it means is that the Angels, whether the batters or pitchers, are really coming through the closer and later the game and the more runners on base there are. You can see this data at
Fangraphs Win Probabilities for batters
Fangraphs Win Probabilities for pitchers
Wednesday, August 19, 2009
Have The Texas (Power) Rangers Discovered OBP?
I have written a few posts this season on how the Rangers are a great power hitting team but they are just about average in scoring because their team OBP is well below average. Click here to read about that.
But so far in August, the Rangers have a team OBP of .351 (overall this year it is .321). The league average for the year is .336. So the Rangers are well behind that. But the league average for August is .344, so now the Rangers are getting on base more than most teams.
Unfortuantely, they are only averaging 4.53 runs per game in August while the league average is 5.19. They also have an above average SLG this month (.459 vs. .451). So it is still probably a good sign that their OBP is coming up. Their below average runs per game this month is probably just bad luck. If they keep that OBP high, combined with their power, they should stay in the wild card chase.
But so far in August, the Rangers have a team OBP of .351 (overall this year it is .321). The league average for the year is .336. So the Rangers are well behind that. But the league average for August is .344, so now the Rangers are getting on base more than most teams.
Unfortuantely, they are only averaging 4.53 runs per game in August while the league average is 5.19. They also have an above average SLG this month (.459 vs. .451). So it is still probably a good sign that their OBP is coming up. Their below average runs per game this month is probably just bad luck. If they keep that OBP high, combined with their power, they should stay in the wild card chase.
Tuesday, August 18, 2009
Are The White Sox Really A Bunch Of Underachievers?
By now you proably heard that Sox GM Kenny Williams called them that. Is it true? To see if it is, I compared how individual players and pitchers are doing to what they were projected to do by Bill James in this year's Handbook.
The fist table has OPS and predictd OPS for all the guys who have 200+ ABs on the Sox this year. Some guys did not get projections. Overall it does not look too bad. Some guys are doing better than expected, some worse. I calculated the average differential at a negative -.010. Not huge.
Now pitchers and their ERAs. Things don't look too bad here, either. There is one more table after this that uses WHIP per 9 IP.
Now the WHIP table. Does not look so terrible.
The fist table has OPS and predictd OPS for all the guys who have 200+ ABs on the Sox this year. Some guys did not get projections. Overall it does not look too bad. Some guys are doing better than expected, some worse. I calculated the average differential at a negative -.010. Not huge.
Now pitchers and their ERAs. Things don't look too bad here, either. There is one more table after this that uses WHIP per 9 IP.
Now the WHIP table. Does not look so terrible.
Monday, August 17, 2009
When Buying a Bullpen, It’s Better to Go Cheap
That was the title of an article in the Wall Street Journal on Aug. 4. I have not seen much comment on it in the blogosphere, so I thought I would pass it along. Here is the link: When Buying a Bullpen, It’s Better to Go Cheap. Here is the key passage;
"Among the 54 pitchers who have been used regularly in high-leverage situations this season, the correlation between salary and performance is just .07 (where zero is no relation and one is a perfect bond)."
It would be intersting to see how this shakes out over time. One year can have alot of fluctuations. Maybe using a 3-4 year period would get a higher correlation. Also, I wonder how this compares to correlations between performance and salary at other positions. Then there is the issue of whether or not a given pitcher was eligible for free agency or arbitration. But still, it does seem like a pretty low correlation.
"Among the 54 pitchers who have been used regularly in high-leverage situations this season, the correlation between salary and performance is just .07 (where zero is no relation and one is a perfect bond)."
It would be intersting to see how this shakes out over time. One year can have alot of fluctuations. Maybe using a 3-4 year period would get a higher correlation. Also, I wonder how this compares to correlations between performance and salary at other positions. Then there is the issue of whether or not a given pitcher was eligible for free agency or arbitration. But still, it does seem like a pretty low correlation.
Saturday, August 15, 2009
Maybe Stat Zombies Ignored The Marlins But Others Did, Too
The Marlins are actually doing just a bit better than this stat zombie would expect.
From a regression I ran a few years ago,
Winning Pct = .5 + 1.21*OPSDifferential
The Marlins have a .008 differential this year for a .50968 predicted winning percentage. For a 115 games, that would be 58.61 wins. They have 61. So 2.39 more wins than expected. Does not seem like a big deal. The standard error of the regression was 1.54 wins per season. So the Marlins are one win beyond that. We should also wait until the season is over to judge them. My regression equation is from the article An OPS Question.
But this issue comes from a Paul Lebowitz article called Why the stat zombies ignore the Marlins. (Hat Tip: Baseball Think Factory) Here is the relevant paragraph:
"Most of the stat zombies predicted them as winning between 66 and 74 games. But it doesn't matter whether they win 80, 85 or 90 games; whether or not they make the playoffs or fade out at the end. They've built an organization that should be admired in the way that Michael Lewis's creative non-fiction Moneyball canonized the Billy Beane A's. Lewis tried to create an "age of enlightenment" in baseball inserting the Ivy League-educated genius into the game at the expense of those who can look at an athlete and find his talent regardless of what his stat sheet says. The numbers don't fit into that kind of analysis, so it's best to ignore it and hope it goes away; but it's not going away."
So some stat zombies did not correctly predict the Marlin win total. Mr. Lebowitz does not say who they were or provide any links. Also, I don't know if any stat zombies claim they have perfect foresight. There will always be things no one can predict. For example, who could have predicted that both Jim Rice and Fred Lynn would have had such outstanding rookie seasons in 1975 to propel the Red Sox to the AL East title?
But as I hinted at earlier, stat zombies were not the only ones to "ignore" the Marlins. The following links show they were not highly regarded before the season by non-zombies:
Chris Bahr's Predictions at the Sporting News (he predicted they would be 4th)
Sporting News Power Poll (the Marlins at #20 in mlb, 11th in the NL)
CNN/SI MLB 2009 Preseason Predictions
The last one had 13 experts and none predicted the Marlins to win the division or be the wild card.
One last thing. One stat zombie, Tom Tippett, of Diamond Mind baseball fame, has done a pretty good job of making predictions. You can read about that at 2005 Predictions -- Keeping Score. Look for the 7 year composite rankings towards the end. He did a very good job of predicting standings using his simulations from the Diamond Mind game. He was a also such a good stat zombie that he got hired by the Red Sox.
From a regression I ran a few years ago,
Winning Pct = .5 + 1.21*OPSDifferential
The Marlins have a .008 differential this year for a .50968 predicted winning percentage. For a 115 games, that would be 58.61 wins. They have 61. So 2.39 more wins than expected. Does not seem like a big deal. The standard error of the regression was 1.54 wins per season. So the Marlins are one win beyond that. We should also wait until the season is over to judge them. My regression equation is from the article An OPS Question.
But this issue comes from a Paul Lebowitz article called Why the stat zombies ignore the Marlins. (Hat Tip: Baseball Think Factory) Here is the relevant paragraph:
"Most of the stat zombies predicted them as winning between 66 and 74 games. But it doesn't matter whether they win 80, 85 or 90 games; whether or not they make the playoffs or fade out at the end. They've built an organization that should be admired in the way that Michael Lewis's creative non-fiction Moneyball canonized the Billy Beane A's. Lewis tried to create an "age of enlightenment" in baseball inserting the Ivy League-educated genius into the game at the expense of those who can look at an athlete and find his talent regardless of what his stat sheet says. The numbers don't fit into that kind of analysis, so it's best to ignore it and hope it goes away; but it's not going away."
So some stat zombies did not correctly predict the Marlin win total. Mr. Lebowitz does not say who they were or provide any links. Also, I don't know if any stat zombies claim they have perfect foresight. There will always be things no one can predict. For example, who could have predicted that both Jim Rice and Fred Lynn would have had such outstanding rookie seasons in 1975 to propel the Red Sox to the AL East title?
But as I hinted at earlier, stat zombies were not the only ones to "ignore" the Marlins. The following links show they were not highly regarded before the season by non-zombies:
Chris Bahr's Predictions at the Sporting News (he predicted they would be 4th)
Sporting News Power Poll (the Marlins at #20 in mlb, 11th in the NL)
CNN/SI MLB 2009 Preseason Predictions
The last one had 13 experts and none predicted the Marlins to win the division or be the wild card.
One last thing. One stat zombie, Tom Tippett, of Diamond Mind baseball fame, has done a pretty good job of making predictions. You can read about that at 2005 Predictions -- Keeping Score. Look for the 7 year composite rankings towards the end. He did a very good job of predicting standings using his simulations from the Diamond Mind game. He was a also such a good stat zombie that he got hired by the Red Sox.
Friday, August 14, 2009
Has The Dodgers Defense Been Good This Year And How Much Difference Has It Made?
The Dodgers lead the NL in both fielding percentage and defensive efficiency rating (DER). DER simply says what % of balls in play are turned into outs. The following link at mlb.com has the fielding stats Sortable Team Stats.
It seems unusual that at team would lead its league in both categories. A team (or player) can have a high fiedling percentage but not get to that many balls. But that is not the case with the Dodgers. They must be getting to alot of balls and are fielding them cleanly. So how many runs are they saving with all these balls they catch that other teams don't and all these errors they don't make?
Using data from ESPN, the Dodgers have allowed 452 runs this year with 422 of them being earned. So they have 30 unearned runs. The league average is 40. So that makes the Dodgers 10 runs better than average.
Then ESPN shows that the Dodgers DIPS% is 107, meaning that their pitchers would have an ERA that is 7% higher than it actually is if they allowed a league average of hits on balls in play (they are , of course, better than average). With their actual ERA being 3.61, then their DIPS ERA is 3.86. So here their fielders save .25 runs per game (that is, if the pitchers have nothing to do with batting average on balls in play). The Dodgers have played 115 games, so this is an additional 28.75 runs scored. Adding the 10 in from fewer unearned runs gives us 38.75 runs. Since it usually takes about 10 runs to win one game, a rough estimate is that the Dodgers have won close to 4 games this year with their fielding.
On the other hand, Fan Graphs Team Fielding shows them to have just about average defense using more advanced metrics.
It seems unusual that at team would lead its league in both categories. A team (or player) can have a high fiedling percentage but not get to that many balls. But that is not the case with the Dodgers. They must be getting to alot of balls and are fielding them cleanly. So how many runs are they saving with all these balls they catch that other teams don't and all these errors they don't make?
Using data from ESPN, the Dodgers have allowed 452 runs this year with 422 of them being earned. So they have 30 unearned runs. The league average is 40. So that makes the Dodgers 10 runs better than average.
Then ESPN shows that the Dodgers DIPS% is 107, meaning that their pitchers would have an ERA that is 7% higher than it actually is if they allowed a league average of hits on balls in play (they are , of course, better than average). With their actual ERA being 3.61, then their DIPS ERA is 3.86. So here their fielders save .25 runs per game (that is, if the pitchers have nothing to do with batting average on balls in play). The Dodgers have played 115 games, so this is an additional 28.75 runs scored. Adding the 10 in from fewer unearned runs gives us 38.75 runs. Since it usually takes about 10 runs to win one game, a rough estimate is that the Dodgers have won close to 4 games this year with their fielding.
On the other hand, Fan Graphs Team Fielding shows them to have just about average defense using more advanced metrics.
Thursday, August 13, 2009
Pedro Martinez Has A 2.16 DIPS ERA In His First Start
That is according to the the ESPN rankings. They say they use the DIPS 2.0 formula. Although he allowed 3 earned runs in 5 IP (for a 5.40 ERA), he struck out 5 while only walking 1 and allowed no HRs.
Wednesday, August 12, 2009
Explaining The 1959 White Sox
They won the pennant but their underlying stats were not that good. The reason they came in first is that they performed remarkably well in "clutch" situations.
First, let's look at their underlying stats. Yesterday I presented a formula that estimates a team's winning percentage. I found each team's HRs, BBs, and non-HR hits per game for both their pitchers and hitters and then converted this to a differential. Then I came up with the following formula for winning percentage using regression analysis:
Pct = .5 + .071*NONHR + .047*BB + .157*HR
(some technical notes on this at the end). This formula predicted that the 1959 White Sox would have a winning percentage of .512, while it was actually .610. So they exceeded their predicted pct. by .099 (not .098 due to rounding). This was the third highest positive differential since 1920. The highest belonged to the 1931 Cardinals. But Retrosheet does not have situational splits for them. Next is the 2007 Diamondbacks. But they did not make it to the World Series.
The table below shows the standings for the 1959 AL using the predicted pct. The White Sox are 4th.
The White Sox were actually out homered by their opponents (with the biggest negative differential). But they were third in both walk differential and nonHR differential. Now let's look at how they did in "clutch" situations vs. other situations. The table below shows how the Sox hitters did in various situations.
Now what the Sox pitchers did.
Now the differentials followed by some discussion.
The total line, of course, refers to all plate appearances. The Sox had modest differentials here. They batted .250 while the Sox pitchers held their opponents to a .242 AVG. With no runners on base, the differentials are even lower. But now look at their differentials with men on. For AVG, it is .013, much higher than the .005 with none on. For OBP, the differential jumps from .009 to .023. SLG goes from .001 to .010.
With runners in scoring position (RISP), they had a .040 differential in AVG!. It was actually negative in nonRISP situations. Sox pitchers held opposing batters to a .221 AVG with RISP. Their OBP differential jumped from .007 to .040 while SLG jumped from -.014 to .063. Incredible. Their hitters' SLG went up .032 with RISP while the pitchers lowered it by .044.
Moving to close and late situations, the Sox outhit their opponents by .024 while it it was only .005 in nonCL situations. The OBP differential rose from .010 to .043 while for SLG it went from -.012 to .076. Another stunning swing. The Sox hitters actually had an SLG of .400 in close and late situations, by far their highest for any case.
So it is pretty easy to see what happened that year. I have not looked at other teams, but the case of the 1959 White Sox must be very unusual.
Technical notes: The regression was linear. The r-squared was .806 and the standard error was .035, which amounts to 5.67 wins per 162 games. I also put each team's data in groups of 5 years and then did the same regression. The r-squared was .917 and the standard error in terms of wins fell quite a bit (I think it was about 2.2 wins but I left that data at the office). So some of the randomness is mitigated by aggregating over 5 years. The coefficient values were about the same in each regression.
First, let's look at their underlying stats. Yesterday I presented a formula that estimates a team's winning percentage. I found each team's HRs, BBs, and non-HR hits per game for both their pitchers and hitters and then converted this to a differential. Then I came up with the following formula for winning percentage using regression analysis:
Pct = .5 + .071*NONHR + .047*BB + .157*HR
(some technical notes on this at the end). This formula predicted that the 1959 White Sox would have a winning percentage of .512, while it was actually .610. So they exceeded their predicted pct. by .099 (not .098 due to rounding). This was the third highest positive differential since 1920. The highest belonged to the 1931 Cardinals. But Retrosheet does not have situational splits for them. Next is the 2007 Diamondbacks. But they did not make it to the World Series.
The table below shows the standings for the 1959 AL using the predicted pct. The White Sox are 4th.
The White Sox were actually out homered by their opponents (with the biggest negative differential). But they were third in both walk differential and nonHR differential. Now let's look at how they did in "clutch" situations vs. other situations. The table below shows how the Sox hitters did in various situations.
Now what the Sox pitchers did.
Now the differentials followed by some discussion.
The total line, of course, refers to all plate appearances. The Sox had modest differentials here. They batted .250 while the Sox pitchers held their opponents to a .242 AVG. With no runners on base, the differentials are even lower. But now look at their differentials with men on. For AVG, it is .013, much higher than the .005 with none on. For OBP, the differential jumps from .009 to .023. SLG goes from .001 to .010.
With runners in scoring position (RISP), they had a .040 differential in AVG!. It was actually negative in nonRISP situations. Sox pitchers held opposing batters to a .221 AVG with RISP. Their OBP differential jumped from .007 to .040 while SLG jumped from -.014 to .063. Incredible. Their hitters' SLG went up .032 with RISP while the pitchers lowered it by .044.
Moving to close and late situations, the Sox outhit their opponents by .024 while it it was only .005 in nonCL situations. The OBP differential rose from .010 to .043 while for SLG it went from -.012 to .076. Another stunning swing. The Sox hitters actually had an SLG of .400 in close and late situations, by far their highest for any case.
So it is pretty easy to see what happened that year. I have not looked at other teams, but the case of the 1959 White Sox must be very unusual.
Technical notes: The regression was linear. The r-squared was .806 and the standard error was .035, which amounts to 5.67 wins per 162 games. I also put each team's data in groups of 5 years and then did the same regression. The r-squared was .917 and the standard error in terms of wins fell quite a bit (I think it was about 2.2 wins but I left that data at the office). So some of the randomness is mitigated by aggregating over 5 years. The coefficient values were about the same in each regression.
Tuesday, August 11, 2009
Were The 1922 St. Louis Browns The 14th Best Team Since 1920?
Yesterday I mentioned that they have done very well in computer simulation seasons. I think that might be due to how well they performed statistically in real life. If the computer simulations don't include any situational adjustments (like performance with runners on base or in close and late situations), then a team's raw stats will dictate how well they do. Some teams have bad luck. They don't win as many games as their runs scored and runs allowed might suggest. And they might score fewer runs and give up more than their stats might predict. That might have happened to the Browns.
Ideally, we would rate all teams on their OBP & SLG, both by their hitters and what is allowed by their pitchers. But these data are not easily obtained going back so far in history. So I found each team's HRs, BBs, and non-HR hits per game for both their pitchers and hitters and then converted this to a differential. Then I came up with the following formula for winning percentage using regression analysis:
Pct = .5 + .071*NONHR + .047*BB + .157*HR
Then I calculated a predicted winning percentage for every team since 1920. The table below shows the top 25 in terms of predicted winning percentage. You can click on the table to see a bigger version. Not a surprise that the 1927 Yankees come out on top. The had an actual pct. of .714 but the predicted pct. is .744, so they fell .03 short of what the model says.
The 1931 Yankees did very well, coming in third. Yet they came in 2nd place, to the A's that year, about 13 games out! The 1931 A's are 31st according to the model. As you can also see, the 1922 Browns are 14th. The 1969-71 Orioles had 3 teams in the top 33. Now the teams with worst predicted percentages.
In case you are wondering where the 1962 Mets are, they get mentioned below. Now the teams that exceeded their predicted pct. the most. I suspect that these teams did especially well in clutch situations. I will have to look at their splits in Retrosheet (where available) to see if this is true.
Now the teams that underperformed the most. The 1962 Mets were the 2nd worst.
I will try to add some more discussion later when I get some time. Roger Godin told me that there was a book in written in 1950 by Tom Meany called Baseball's Greatest Teams and that the 1922 Browns were one of the ten teams discussed.
Ideally, we would rate all teams on their OBP & SLG, both by their hitters and what is allowed by their pitchers. But these data are not easily obtained going back so far in history. So I found each team's HRs, BBs, and non-HR hits per game for both their pitchers and hitters and then converted this to a differential. Then I came up with the following formula for winning percentage using regression analysis:
Pct = .5 + .071*NONHR + .047*BB + .157*HR
Then I calculated a predicted winning percentage for every team since 1920. The table below shows the top 25 in terms of predicted winning percentage. You can click on the table to see a bigger version. Not a surprise that the 1927 Yankees come out on top. The had an actual pct. of .714 but the predicted pct. is .744, so they fell .03 short of what the model says.
The 1931 Yankees did very well, coming in third. Yet they came in 2nd place, to the A's that year, about 13 games out! The 1931 A's are 31st according to the model. As you can also see, the 1922 Browns are 14th. The 1969-71 Orioles had 3 teams in the top 33. Now the teams with worst predicted percentages.
In case you are wondering where the 1962 Mets are, they get mentioned below. Now the teams that exceeded their predicted pct. the most. I suspect that these teams did especially well in clutch situations. I will have to look at their splits in Retrosheet (where available) to see if this is true.
Now the teams that underperformed the most. The 1962 Mets were the 2nd worst.
I will try to add some more discussion later when I get some time. Roger Godin told me that there was a book in written in 1950 by Tom Meany called Baseball's Greatest Teams and that the 1922 Browns were one of the ten teams discussed.
Monday, August 10, 2009
Are The 1922 St. Louis Browns An Unacknowledged Great Team?
There is a book about them. It is 1922 St. Louis Browns: Best of the American League's Worst by by Roger A. Godin. But I have not read it and I don't know if it attempted to rank them among all teams in baseball history (I will post something on this tomorrow and it will show that they rank very high).
I started thinking about this when I read about a simulation called the Seamheads Near Miss League at wezen-ball. The Browns did extremely well. Actually, they were an outstanding team statistically.
The Brown's team strikeout-to-walk ratio was about 34% above the league average. That is the 110th best ever in AL & NL history. With well over 2000 team-seasons, that puts them in the top 5%. Their team ERA was about 19% below the league average, good for 195th, still in the top 10%. But their park factor was 108, meaning that they pitched in a somewhat high run environment. They did give up a few more HRs than the average AL team in 1922. They gave up 71 HRs while the average for the other 7 teams was about 65. But their park gave up 91% more HRs than average.
On the hitting side, their team offensive winning percentage of .577, which ranks 191st. Again, that is in the top 10%. OWP is the Bill James stat that tells us if a team had 9 identical hitters and gave up an average number of runs, what would their winning percentage be? Since I used the Lee Sinins Complete Baseball Encyclopedia, it is park adjusted. So pretty impressive, being in the top 10% in both hitting and pitching.
What happened to them? Why didn't they win the pennant? Using Retrosheet, here are some interesting facts about the 1922 season:
The Browns finished 1 game out, behind the Yankees. But the Browns had a run differential of 224 to the Yankees 140 (of course Ruth missed 44 games, most probably due to his suspension. He also got off to a slow start in May, batting just .190 with 2 HRs in 42 ABs).
The Browns had 256 more hits than their opponents that year, 50 more walks and 27 more HRs. For the Yankees, it was 101-69-25. So the Browns stats look much better.
The Yankees beat the Browns 14 times out of 22 even though the Browns outscored the Yankees 105-100 in those games. It looks like the Yankees won all 8 of the 1-run games between the two teams that year, with 4 in extra innings.
In mid-Sept, the Yankees came to Stl. for the last series between the two teams that year. The Yanks were a half game ahead when the series started and won 2 out of 3. In the last game, the Browns led 2-0 after 7, but NY won it 3-2 with 2 in the 8th and 1 in the 9th. 2 of the 3 runs were unearned as the Browns made 3 errors.
Later, with 2 games for each team left, the Browns were 2 back. But both of them won game 153 and so it was over. But if the Browns had just one more win against NY, they would have been tied with 2 games left.
Udate at 7:43 am central time, 8-11-2009: Chirs Jaffe had a good discusssion of this team last year at October country’s refugees (part 2 of 2)
I started thinking about this when I read about a simulation called the Seamheads Near Miss League at wezen-ball. The Browns did extremely well. Actually, they were an outstanding team statistically.
The Brown's team strikeout-to-walk ratio was about 34% above the league average. That is the 110th best ever in AL & NL history. With well over 2000 team-seasons, that puts them in the top 5%. Their team ERA was about 19% below the league average, good for 195th, still in the top 10%. But their park factor was 108, meaning that they pitched in a somewhat high run environment. They did give up a few more HRs than the average AL team in 1922. They gave up 71 HRs while the average for the other 7 teams was about 65. But their park gave up 91% more HRs than average.
On the hitting side, their team offensive winning percentage of .577, which ranks 191st. Again, that is in the top 10%. OWP is the Bill James stat that tells us if a team had 9 identical hitters and gave up an average number of runs, what would their winning percentage be? Since I used the Lee Sinins Complete Baseball Encyclopedia, it is park adjusted. So pretty impressive, being in the top 10% in both hitting and pitching.
What happened to them? Why didn't they win the pennant? Using Retrosheet, here are some interesting facts about the 1922 season:
The Browns finished 1 game out, behind the Yankees. But the Browns had a run differential of 224 to the Yankees 140 (of course Ruth missed 44 games, most probably due to his suspension. He also got off to a slow start in May, batting just .190 with 2 HRs in 42 ABs).
The Browns had 256 more hits than their opponents that year, 50 more walks and 27 more HRs. For the Yankees, it was 101-69-25. So the Browns stats look much better.
The Yankees beat the Browns 14 times out of 22 even though the Browns outscored the Yankees 105-100 in those games. It looks like the Yankees won all 8 of the 1-run games between the two teams that year, with 4 in extra innings.
In mid-Sept, the Yankees came to Stl. for the last series between the two teams that year. The Yanks were a half game ahead when the series started and won 2 out of 3. In the last game, the Browns led 2-0 after 7, but NY won it 3-2 with 2 in the 8th and 1 in the 9th. 2 of the 3 runs were unearned as the Browns made 3 errors.
Later, with 2 games for each team left, the Browns were 2 back. But both of them won game 153 and so it was over. But if the Browns had just one more win against NY, they would have been tied with 2 games left.
Udate at 7:43 am central time, 8-11-2009: Chirs Jaffe had a good discusssion of this team last year at October country’s refugees (part 2 of 2)
Saturday, August 8, 2009
Rangers Power Update
(or maybe we should call them the "Power Rangers") With 2/3 of the season having been played, the Rangers have 168 HRs. If we simply increse that by 50%, they would finish with 252, 4th best all time. I first wrote about this issue last May with Texas Rangers On A Pace To Set Power Hitting Records. Obviously the Rangers have tailed off in their power hitting since then. But their isolated power is still .201, which would be tied for 3rd all-time. Their HR% is 4.56. If they finish with that, they would be 2nd.
But they are only scoring an average number of runs per game (4.84, the league average is 4.83). The reason they are only averge in runs per game even though they are doing all this power hitting is that their OBP is only .317 while the league average is .335.
But they are only scoring an average number of runs per game (4.84, the league average is 4.83). The reason they are only averge in runs per game even though they are doing all this power hitting is that their OBP is only .317 while the league average is .335.
Monday, August 3, 2009
Jim Rice and the Hall of Fame (Revisited)
Bob Ryan of boston.com recently wrote an article called The big picture is that Rice earned his plaque. He takes a swipe at "SABR people." But we're not monolithic and I don't know if all or even most SABR members agree with my views on Rice. But anyway, here is part of what Ryan said and I follow by summarizing some of my past posts on Rice that counter what Ryan says.
One of my posts was Was Jim Rice A Feared Hitter?. I showed that he did not draw very many intentional walks compared to other top hitters and that players who batted in front of him were not especially helped.
With Jim Rice and the Hall of Fame I showed that his clutch hitting stats, although better than his overall stats, were very close. I wrote "According to retrosheet, with ROB [runners on base], his AVG-SLG were .305 & .509. With RISP [runners in scoring position] he had .308 & .501. These are very close to his overall stats of .298 & .502." In fact, he was more likely to come up with runners on base in Fenway, a good hitters park. So naturally he would hit better in those situations. He most likely had a disproportionate number of ABs with RISP & ROB in Fenway. His close and late AVG-SLG were .274-.453. So that does not look very clutch.
He was helped by Fenway. His AVG-SLG in home games was .320 & .546 while on the road they were only .277 & .459. I also showed that his RBI-to-GDP ratio was very poor, even below average.
One commentor at Ryan's article mentioned that Rice had alot of clutch hits in Septmber 1986 when the Red Sox were in a tight divisional race. But his AVG was .310 in Sept while it was .324 for the whole season. He also grouned into 6 double plays that month. He had 19 for the whole season, so he had close to 1/3 in Sept. His SLG was .560 in Sept. while it was .490 for the whole season. So he did slug better even if he got fewer hits.
I have also found that Jose Cruz of the Astros may be just as Hall worthy as Rice. Go to Jim Rice vs. Jose Cruz.
"The SABR people are resolutely anti-Rice. They’ve got numbers parsed by the truckload to downplay his impact, and to this I say, “Phooey,’’ or maybe even something stronger. For SABR people refuse to acknowledge the concept of anecdotal evidence when evaluating a ballplayer (no, not you, Bill James). So when I speak of the time Milwaukee manager Alex Grammas confirmed for me that, yes, indeed, he had ordered a sizzling Jim Rice pitched around (like, four straight unhittable balls out of the strike zone) in a sixth-inning, bases-loaded situation, or when fellow inductee Rickey Henderson says, as he did yesterday, that when the A’s had pitchers meetings prior to Red Sox series in the Rice era guys “trembled,’’ they say that’s nice, but irrelevant.
Sorry, it matters.
There was a three-year period from 1977-79 when Rice was The Man in the American League, averaging 41 homers, 127 RBIs, and 206 hits a year. And did you know he had back-to-back seasons (’77-78) of 15 triples? He was a feared - yeah, SABR people, feared - hitter, because he was very content to get a base hit in a key situation. He was, after all, just trying to win the game."
One of my posts was Was Jim Rice A Feared Hitter?. I showed that he did not draw very many intentional walks compared to other top hitters and that players who batted in front of him were not especially helped.
With Jim Rice and the Hall of Fame I showed that his clutch hitting stats, although better than his overall stats, were very close. I wrote "According to retrosheet, with ROB [runners on base], his AVG-SLG were .305 & .509. With RISP [runners in scoring position] he had .308 & .501. These are very close to his overall stats of .298 & .502." In fact, he was more likely to come up with runners on base in Fenway, a good hitters park. So naturally he would hit better in those situations. He most likely had a disproportionate number of ABs with RISP & ROB in Fenway. His close and late AVG-SLG were .274-.453. So that does not look very clutch.
He was helped by Fenway. His AVG-SLG in home games was .320 & .546 while on the road they were only .277 & .459. I also showed that his RBI-to-GDP ratio was very poor, even below average.
One commentor at Ryan's article mentioned that Rice had alot of clutch hits in Septmber 1986 when the Red Sox were in a tight divisional race. But his AVG was .310 in Sept while it was .324 for the whole season. He also grouned into 6 double plays that month. He had 19 for the whole season, so he had close to 1/3 in Sept. His SLG was .560 in Sept. while it was .490 for the whole season. So he did slug better even if he got fewer hits.
I have also found that Jose Cruz of the Astros may be just as Hall worthy as Rice. Go to Jim Rice vs. Jose Cruz.