Saturday, December 20, 2008

Was Jim Rice A Feared Hitter?

This issue came up on the SABR list this week. Someone suggested that batters in the lineup slot ahead of him were helped by his presence. That is, since pitchers knew Rice was up next, they gave good pitches to the current batter. Did batting in front of Rice actually help anyone? I address this below but first I discuss Rice and intentional walks.

My recollection is that Rice was very feared and he was very imposing. So many HRs (and so many long ones) were probably the reason. But he only finished in the top 10 in IBBs 3 times in his career (thanks to Lee Sinins Complete Baseball Encyclopedia). A 5th a tied for 10th and a tied for 9th. Also, he was only tied for 12th in the AL in IBBs from 1975-89. Here are the leaders:

1 George Brett 187
2 Eddie Murray 131
3 Rod Carew 111
4 Ben Oglivie 95
5 Harold Baines 89
6 Wade Boggs 87
T7 Reggie Jackson 85
T7 Ken Singleton 85
T9 Don Baylor 82
T9 Don Mattingly 82
11 Carlton Fisk 78
T12 Kent Hrbek 77
T12 Jim Rice 77

I would expect a feared hitter to rank higher. There are lots of factors that go into IBBs. Maybe he always had someone good behind him (but these other guys might have, too). The guys ahead of him tend to be lefties or switch hitters. Maybe that is the reason (I think there is another interesting issue here about IBBs that I address below after I discuss if batting in front of Rice actually help anyone).

As for how batters in front of him did, I looked at 4 seasons, using Retrosheet, 1977-79 and 1983, arguably his 4 best years. I threw out 1979 since Rice batted 4th all year and Lynn pretty much was the only 3rd place hitter and Lynn did not bat anywhere else.

Let's start with 1977. Rice pretty much batted third. Below are the players who had a significant number of ABs batting both 2nd and in other slots. I show there ABs, AVG, SLG. First I show there stats batting 2nd (in front of Rice) and then the others (combining all ABs not in front of Rice)

Doyle (137-.219-.285) (318-0.248-.318)
Lynn (364-.253-.453) (133-0.278-.428)

Now 1978 (Rice was pretty much 3rd)

Burleson (76-.197-.263) (550-.255-.349)
Lynn (80-.275-.463) (461-.302-.497)
Remy (418-.280-.349) (165-.273-.352)

Now 1983 (Rice was pretty much 3rd)

Boggs (315-.352-.470) (267-.371-.506)
Evans (258-.225-0.419) (212-.255-.458)
Stapleton (54-.259-.352) (488-.246-.365)

It does not look like hitters did alot better in front of Rice than they did elsewhere.

I mentioned the leaders in the AL in IBBs from 1975-89 in my last post. I also just checked the NL. Below are the top 20 in each league. It looks like the AL only had 5 righties while the NL had 10. Also, the top 2 in the NL were righties while in the AL the highest ranked righty was tied for 9th. Seems like a big difference between the two leagues. Also looks like all 10 righties in the AL had more IBBs than the highest ranked AL righty (Baylor). Maybe it is jut a fluke. My apologies if I miss labeled anyone below. I put in R for the righties and nothing for lefites and switch hitters.

AL

1 George Brett 187
2 Eddie Murray 131
3 Rod Carew 111
4 Ben Oglivie 95
5 Harold Baines 89
6 Wade Boggs 87
T7 Reggie Jackson 85
T7 Ken Singleton 85
T9 Don Baylor-R 82
T9 Don Mattingly 82
11 Carlton Fisk-R 78
T12 Kent Hrbek 77
T12 Jim Rice-R 77
T14 Cecil Cooper 73
T14 Fred Lynn 73
16 Mike Hargrove 68
T17 Alvin Davis 67
T17 Robin Yount-R 67
19 Bruce Bochte 65
20 Buddy Bell-R 62

NL

1 Mike Schmidt-R 184
2 Dale Murphy-R 141
3 Dave Parker 139
4 Garry Templeton 134
5 Keith Hernandez 127
6 Ted Simmons 124
7 Jose Cruz 123
8 Bill Madlock-R 112
9 Tim Raines 110
10 Jack Clark-R 104
11 Andre Dawson-R 103
12 Steve Garvey-R 100
13 Gary Carter-R 98
T14 Ron Cey-R 96
T14 Pedro Guerrero-R 96
T14 Leon Durham 96
T14 George Foster-R 96
18 Darryl Strawberry 93
19 Ron Oester 92
20 Dan Driessen 91

Monday, December 15, 2008

Maybe Joe Gordon Does Belong In The Hall Of Fame

Gordon had 242 career win shares. Through 2001, that was tied for 334th among all players including pitchers. But he did miss two seasons due to the war. He missed 1944 and 1945. The two previous seasons he had 28 and 31 (although the competition in 1943 was not so good). In 1946 he only had 9, must have been hurt. In the next two years he had 25 and 24. Suppose we give him 50 for the two years missed. That brings him up to 292. That would be tied for 187 through 2001. Not too bad of a ranking. Good enough for the Hall? I don't know.

But in general, 2B men have an average wins shares per PA that is lower than other positions. Win Shares is supposed to allow us to compare players across positions. Gordon might deserve even more win shares. Maybe he deserves another 10-20. To see the data on win shares per PA for different positions, go to

http://us.share.geocities.com/cyrilmorong@sbcglobal.net/WSperPA.htm (Update Jan. 10, 2016: Here is the new, correct link   http://cyrilmorong.com/WSperPA.htm)

If we do give him all these extra win shares he gets close to the top 150 through 2001. I really don't know what type of adjustments to make for him, but I guess a good case could be made for him.

One other thing I thought of is that he was a right handed batter in Yankee Stadium. Win Shares uses runs created get offensive value. Runs created are adjusted for park effects but to the extent that I understand them, no adjustemt is made if a park favors lefties over righties. Gordon hit 69 HRs in home games at Yankee stadium and 84 in road games. You would expect more at home. In his Cleveland years, he had 50 both home and away. Perhaps, on balance, over his career, he was hurt by his parks.

I was a little surprised by his, and his only, selection. But it may be okay. Joe McCarthy said Gordon was the best all around player he ever saw.

Sunday, December 7, 2008

Two Follow Ups: Underpaid Second Basemen And What Happens When Players Cut Down On Strikeouts

A recent report called Increase in MLB salary slowed in 2008 shows that only relief pitchers get paid less than second basemen. Here is the key exerpt:

"Among regulars at positions, designated hitters had the highest average at $7.5 million, followed by first basemen ($7.1 million), third basemen ($6.6 million), shortstops ($5 million), outfielders ($4.8 million), catchers ($3.7 million), second basemen ($3.5 million) and relief pitchers ($1.9 million)."

So second basemen are only half as valuable as designated hitters? Hard to believe. A few months ago I posted a study called Have Second Basemen Been Underpaid?. I found in regressions that, holding hitting performance constant and accounting for free agent/arbitration status, that being a second baseman had a negative effect on salaries.

For the other issue, two weeks ago, I posted Should Ryan Howard Try To Strikeout Less?. The basic idea was that from year to year, there was a positive correlation between player's change in strikeout frequency and change in contact average.

But a commentor named Vince at the The Sabernomics blog said:

"Could this just be a selection effect? If your strikeout rate rises and your contact rate falls, then you might get benched and not show up in the sample."

My response was:

"There were 267 players in 2005 who had 300+ ABs. 200 of them also had 300+ ABs in 2006. So it is possible that those 67 who did not make it to 300 in 2006 were benched for poor performance (which would include a low contact average).

But I took those 67 guys and found the ones who had atleast 100 ABs in 2006 (I think anything less is a small sample size). That left 41 guys. The correlation between their change in strikeout frequency and change in contact average was .037. So it was still positive for the ones who were “selected out” but not as strong an effect."

Sunday, November 30, 2008

Do The Best Hitters Strikeout More Than Other Hitters (And Has This Changed Over Time)?

I found the correlation between strikeout frequency and offensive winning percentage (OWP) decade by decade. I started with the NL from 1910-1919 and the AL 1913-19 (there was a period after 1900 before this when strikeouts for batters was not compiled). I used players with 2000+ PAs in each decade or time period. Strikeout frequency was calculated two ways, per PA and per AB. So the table below has the correlation between OWP and strikeout frequency for each period. The PA column shows the correlation between OWP and strikeouts per PA and the AB column does the same for strikeouts per AB.



There seems to be quite a bit of fluctuation over time. I don't think I have any good reasons why. Most of the time the correlation is positive, meaning that the better hitters usually strikeout more that average. I am surprised that the correlations have come down since 1980 and that they are not as high today as they were in the 1960s and 1970s. This is because we have guys like Ryan Howard and Adam Dunn around.

It is also interesting that the 1930s were much higher than the periods right before and after. Same for the 1960s and 1970s. The table below shows the top ten batters in OWP for the 1930s and their strikeout rates.



The simple average of the two strikeout rates for these ten were 7.87% and 9.29% while the rates for the entire group in the 1930s were 6.94% and 7.79%. So the very best hitters struckout alot more than average then.

The next table shows the top ten in strikeouts per AB from the 1930s. The simple average of the OWP of these players was .631. Ruth was over .800 and Foxx and Greenberg were over .700 and three others were over .600.



In the AL 1913-19, the top ten in OWP had strikeout rates of 5.81% and 6.75% while the averages for the whole group were 6.97% and 7.99%. So in this period and league, the best hitters struckout alot less than average.

Sunday, November 23, 2008

Should Ryan Howard Try To Strikeout Less?

You might think so. In both 2006 and 2007 he led the major leagues in "contact average." I define that as hits divided by (AB - K + SF). His contact average in 2006 was .448 and in 2007 it was .421 (although it fell to .372 in 2008). And he strikes out about 190 times a year. So more contact would mean more hits, right? Maybe, maybe not. I looked at this issue a few years ago with Strikeouts and the value of hitters.

Generally I found that when batters cut down their strikeout rates from year to year, they hit better. But I also found the effect was slight. Here is an exerpt:

"Using the data from the 2002-3 seasons, I ran a regression with change in AVG being the dependent variable and change in strikeouts per AB being the independent variable.

The equations was:

AVGChange = -.00036 - .274*(SO/AB)Change

This means that if a player cut his strikeouts down by 100, his hits would go up by 27.4. That is like saying on his additional ABs when he does not strikeout, he bats .274. This may not be impressive because for all of these players over the 2002-3 seasons, they already bat about .336 when they don't strikeout. Also, the r-squared was only .068, meaning that the regression explains only about 6.8% of the variation in AVGChange. So if there is any negative side to striking out, it is probably not too large."

This got me to thinking what happens to batter's contact average when their strikeout rate changes (something I had not looked at in this earlier study). I found all the hitters in baseball who had 300+ ABs in both 2006 and 2007 (190 plyaers). Then I calculated their strikeout rates (K/AB), their contact rates and how each one changed from 2006 to 2007. The correlation between the change in strikeout rate and the change in contact rate was .142. So if a batter's strikeout rate increased, his batting average while making contact also increased. Looking at the changes from 2005 to 2006 gave a .18 correlation.

Maybe this makes sense. If you swing harder, you strike out more. But a harder swing means the ball is hit harder, which should mean more hits. So combined with the earlier study, a player should be careful if he thinks he should make a big effort to strikeout less.

Sunday, November 9, 2008

Which Players Had The Most Uncharacteristically Good Seasons? (adjusted for their age)

I did this last week but did not adjust for age. The key stat I used is offensive winning percentage, so read last week's post to understand it. The idea is to find out which player had a season that deviated the most from his norm or career average. But I did not take age into account. Player performance improves, then peaks, then declines. The typical peak may be as young as 25. So a player doing 100 points better than his norm at age 25 is not the same as doing 100 points better at age 38. To find the expected performance at a given age, I found the relationship between age and average OWP at each age using all players with 15+ seasons of 400+ PAs. That relationship is

OWP = -0.0008*AGESQUARED + 0.0474*AGE - 0.0574

This comes from regression analysis which had an r-squared of .95, meaning that 95% of the variation in an age's average OWP is explained by the equation. The standard error was .008 or pretty low. But as Bill James, Phil Birnbaum and probably many others have pointed out, averaging each player's OWP at a given age to predict career trends can have many problems. One is that as we get to older ages, there are not many players to use to get an average because so many players are not good enough to even play anymore. If those retired guys had kept playing, the average OWP for ages 39, 40, etc. would be much lower. So this equation will underestimate how unusual some seasons might have been for older players.

To predict a player's OWP at a given age, the above equation is used. But an adjustment is made based on his career norm, too. The average OWP by age for the group was .588. If a player had a .550 career OWP, then at any age his predicted OWP is adjusted down by .038 (a player with a career OWP of .638 would have each predicted OWP upped by .050). Once that was done, I found the 50 top seasons in terms of OWP above the prediction. The table below shows this. For example, Tommy Tucker in 1989 had an OWP of .783 at age 25. The equation predicts that he would have an OWP of .628. But his career OWP was .495, or .093 below the norm. So his adjusted prediction is .535. Since .783 - .535 = .248, his OWP was .248 better than expected. This was the highest positive difference ever (you will need to click on the table to see a larger version).

Barry Bonds' 2004 season at age 39 is number 31. His 2002 season is 54th, his 2001 season is 163rd and his 2003 season is 172nd. There were a total of 6319 season. So the four Bonds seasons from 2001-04 (ages 36-39) are all in the top 2.7%. He is the only player in the top 3% to have 4 seasons.

Sunday, November 2, 2008

Which Players Had The Most Uncharacteristically Good Seasons?

Many fans know that Norm Cash batted .361 in 1961. He also had 41 HRs and 132 RBIs. Never batted .300 again (his last year was 1974) nor did he ever reach 40 HRs or 100 RBIs. Perhaps this is the most atypically good season ever. He clearly performed well above what ended up as being his career norms (was it the corked bat mentioned in the ESPN almanac? I recall that physicist Robert Adair said a corked would not really help).

Anyway, to study this, I looked at all players with 10+ seasons with 400+ PAs through 2005 (there were 504 players). I found the simple mean of their yearly offensive winning percentage or OWP (a Bill James stat that says what a team's winning percentage would be if all 9 batters were identical and you gave up an average number of runs). Since I used data from the Lee Sinins complete baseball encyclopedia, OWP is also park adjusted. Then I subtracted that mean from their best year. The following table shows the top 25 in terms of best minus average OWP. Cash's 1961 season was 25th. Another table follows that only looks at seasons since 1920.



Sunday, October 26, 2008

Another Look At Consistency

Last week I had a post on which players had the most consistent careers. One measure I used was a player's yearly standard deviation in offensive winning percentage. Then I divided by the mean, thinking that high OWP hitters would fluctuate more. But Gerry Myerson suggested on the SABR-list that with an upper bound on OWP of 1.000, the best hitters won't fluctuate more than the worst. So I redid the list, which you can see if you click here. Actually, this time there are two lists, as there were last week. One ranks everyone just in standard deviation and the other is SD divided by number of years.

Sunday, October 19, 2008

Which Players Had The Most Consistent Careers?

I don't know if anyone has ever proved that consistency has value. But I have compiled two lists which you can see here. I took all the players who had 10+ seasons with 400+ PAs through 2005 (there were 504 players). Then I found the standard deviation of their offensive winning percentage (a Bill James stat that says what a team's winning percentage would be if all 9 batters were identical and you gave up an average number of runs). Since I used data from the Lee Sinins complete baseball encyclopedia, OWP is also park adjusted. Then that SD is divided by the mean OWP. This is necessary because players with high OWPs will see bigger absolute year-to-year fluctuations.

But then I wondered if players with extra long careers would be penalized. The reason is that when you get older, your performance can tail off very quickly and those very low OWPs increase your SD. So you get penalized for longevity. Then also, your career OWP falls and your SD gets divided by a smaller number, lowering my measure of consistency because we get a bigger number now which means less consistency (the lower the SD/mean, the more consistent). So I created one more list where SD/mean was then divided by the number of years. For Hank Aaron, he jumped from 112th to 12th.

On the first list (SD/mean), Dom DiMaggio is first. He only had 10 400+ PAs seasons. Once I did the 2nd list, (SD/mean)/Years, Mel Ott jumped to first. Dom DiMaggio dropped to 8th. Guys that get hurt by the 2nd list are the guys who lost years to military service in WW II. They get divided by a smaller number.

Sunday, October 12, 2008

Does Experience Affect Clutch Hitting?

In the Red Sox-Rays game yesterday, one of the announcers mentioned that Dioner Navarro batted .314 this year with runners in scoring position (RISP) while it was only .214 last year. He said that Navarro improved in the clutch due to experience. Maybe, maybe not. His overall average went from .227 to .295, also a big jump. Experience might make you a better hitter overall anyway.

But I had an article published on a similar topic in "By the Numbers," SABR's statistical bulletin several years ago. It was called Clutch Hitting and Experience (I know, not a real creative title). I only looked at one year, but I found that more experienced players did better, relative to their normal performance, in close and late situations than less experienced players. For example:

"ONE THING I DID NOT MENTION IN THE PAPER WAS THAT THE AVERAGE OPS FOR EXPERIENCED PLAYERS (2000 OR MORE PA) IN THE NONCLUTCH WAS .815 AND .808 IN THE CLUTCH, A DROP OF ONLY .007. FOR THE INEXPERIENCED PLAYERS, THEIR NONCLUTCH OPS WAS .792 AND NONCLUTCH WAS .741. A DROP OF .051, MUCH LARGER THAN FOR THE EXPERIENCED PLAYERS. THE DIFFERENCE IN DECLINES IS .044. THAT IS HIGH IN BASEBALL TERMS. THERE IS A RELATIVELY SMALL DIFFERENCE BETWEEN THE TWO GROUPS OF PLAYERS IN THE NONCLUTCH BUT A MUCH LARGER ONE IN THE CLUTCH SITUATIONS."

So it is possible that experience affects clutch hitting. But it was just one study over one year. If you know of any other studies on this, let me know. Also, I have a page called Clutch Hitting Links. There are links to lots of good stories and research. If you know of any that are not listed there, please let me know about that, too.

Sunday, October 5, 2008

Something New In Clutch Hitting? A Couple Of Recent Articles

One was called Analysts: Tough to determine if there is such a thing as clutch by Paul White. In discussing the issue of whether or not some guys are clutch hitters, Reggie Jackson was mentioned. "Mr. October" had the following AVG-OBP-SLG 27 World Series games .357-.457-.755 (data from Retrosheet). But what about in 45 league championship series games? He had .227-.298-.380. Combining the two he has a .276 AVG and .521. Still good numbers but hardly stunning and why did he hit so poorly in the LCS? Lucky for him his teammates were doing well enough for him to make it into the World Series.

The article also mentioned Derek Jeter. Nike even has a shoe called the "Jeter Clutch." But in his career his AVG in close and late situations is .286 while his overall AVG is .317 (both through 2007). Generally players hit more poorly in when it is close and late because you face ace relievers and the pitcher is more likely to have the platoon advantage. But his differential is probably bigger than normal.

To read lots of other good articles on clutch hitting go to Clutch Hitting Links. One thing that is important to ask when we talk about clutch hitting is do teams make personnel moves even partly based on it? Have you ever heard of a team trading a .300 hitter because he hit poorly in the clutch or trading for a .250 hitter because he was good in the clutch?

What about Barry Bonds in the post season? It appeared that he was a bad clutch hitter until 2002, based on his past post-season performances. His averages in the LCS in 1990-2 were .167, .148, and .261. Did Dusty Baker decide to bench him in the 2002 playoffs because he was a bad clutch hitter? No. Obviously Baker, a big league manager, does not buy into clutch. For more on this kind if argument, go to Please, no more clutch hitting statistics!

The other article is called Clutch hitting is no accident. Apparently, Twins manager Ron Gardenhire thinks you can teach it. From this article:

"The Twins' .311 batting average with runners in scoring position is so much higher than any other major league team's — runner-up Baltimore is 24 points behind, at .287 — that it seems like a statistical fluke. In the Twins' case, the manager said, they can shorten their swings, watch for particular pitches, and use the entire field as a target. Under batting coach Joe Vavra, Gardenhire said, every Twins hitter sharpens his run-producing skills every day.

"It's execution — getting them over, getting them in. I think that's definitely a skill," Gardenhire said. "You work at anything long enough, you get a mind-set for what you're trying to do." The Twins didn't have that last season, when they batted .276 with runners in scoring position, 14th best in baseball."

We will have to see if the Twins continue to do so well with runners in scoring position next year. If they do, maybe other teams will adopt what they do and we will see them hit better in these situations, too. But I am not holding my breath. Pitches might start pitching differently then.

Sunday, September 28, 2008

Have The Angels Been Lucky This Year?

The Wall Street Journal had an article about this recently called Baseball's Luckiest Team. It mentioned some things like their AVG with runners on base and how many more games they have won than expected. It also mentions how well their pitchers have done in stranding runners. But with runners on they allow a .264 AVG, 11th best in baseball. They are 5th in that in OPS allowed at .744. They are 10th in AVG allowed with runners in scoring position with .260.

So I decided to do my own analysis. First, I checked to see what their winning percentage should be based on their OPS differential using the equation

Pct = .5 +1.21*OPSDIFF

The table below shows how teams ranked in wins above those predicted using this formula (which is based on regression analysis I did a few years ago). With an OPSDIFF of .010, they should have a pct of .512 but they actually have .615! So they have won about 16 more games than predicted (over 161 games-I used that for all teams). You can click on the table to see a bigger image. After the table, I present another analysis which takes clutch situations into account.



I have some research called Does Team Clutch Matter in Baseball? I broked down performance into close and late and non-close and late. Then I ran a regression with OPS and opponents OPS in close and late and non-close and late situations as independent variables explaining pct. Here is the equation:

PCT = 0.501 + 0.918*NONCLOPS + 0.345*CLOPS - 0.845*OPPNONCLOPS - 0.421*OPPCLOPS

Then I predicted each team's pct and how many more games they won than predicted. The table below shows how the teams did and again the Angels are first in terms of doing better than expected. Maybe they are lucky.

Sunday, September 21, 2008

Was Devon White A Good Leadoff Man As A Blue Jay?

Last Sunday one of the announcers on the TBS game said that Devon White became a good leadoff hitter when he came to the Blue Jays (I think it was Buck Martinez). Let's see if he was a good leadoff man during his Toronto years, 1991-5.

His SLG and OBP were .432 & .327 while the league average was .406 & .335. So his OBP was below the league average, probably not a good sign for a leadoff man. White did steal 31.11 bases per 162 games with 5.68 CS. The league averages were 13.58 & 6.67. (data from the Lee Sinins Complete Baseball Encyclopedia). So he was a better base stealer than the league average, but got on base less often. Since OBP is probably the most important stat for leadoff men, this is not a good sign.

The average leadoff man during those years in the AL had an SLG & OBP of .391 & .350 with 33.92 SB & 12.78 CS per 162 games. White has a slight edge in stealing due to his better success rate and a higher SLG but a big deficit in OBP. And SLG is not the key to being a good leadoff man.

I explained a fairly complex way of evaluating leadoff men a few months ago. You can read about it at Who are good leadoff men. The basic idea is that hit%, walk%, extra-base-hit%, SB per game and CS per game each has a run value based on which lineup slot you are talking about. As you might guess, walk% and SB% are very important for leadoff men, but less so for cleanup hitters where extra-base-hit% is more important. I had found these run values a few years ago using regression analysis.

Anyway, White's marginal run value as a leadoff man was 1.290 while for the average leadoff man it was 1.292. So he was below average. Not by alot, but that is not good.

Finally, I had come up with a simple statistical rating for leadoff men earlier this summerWho Are The Good Leadoff Men?. Here is the gist of it:

It seems obvious: Hitters who are fast and get on base alot. You also probably don't want someone who hits alot of HRs, since you want those guys to bat with runners on. So I tried to devise a stat that would capture this. Here it is:

(2B + 1.25*3B - HR + SB)/outs

In other words, how many times a player gets into scoring position per out. Since triples are worth about 25% more than 2B's according to run expectancy tables, I multiply them by 1.25. By dividing by outs, the ability to get on base is taken into account since if you make an out you don't reach base. Also, outs include caught stealing. By subtracting HRs I am saying that guys that hit alot of HRs, even though they may have other good leadoff traits, are "penalized" here, since they might be better suited to batting lower in the order.

Anyway, White ranked 11th among all AL players 1991-5 with 2000+ PAs. Considering that there are 14 teams and each one has a leadoff man, 11th is not that great a rank.

Sunday, September 14, 2008

Cliff Lee vs. Roy Halladay

There was an interesting post on this at Battersbox: Lee vs. Halladay. One thing they mentioned is that the batters that Lee has faced this year have a collective OPS of .732 while it is .766 (OPS = OBP + SLG).

But how should this difference affect each guy's ERA? I did not see it mentioned or discussed there (my apologies if it was). So I will take a look at this issue.

Based on data from all major league teams from 2001-2004, here is the relationship between OPS and runs per game

R/G = 13.26*OPS - 5.29

For all teams this year it is

R/G = 12.07*OPS - 4.39

If we multiply 13.26 times .034, the difference in the OPS of their opponents, we get .45. If we use 12.07, we get .41. If we add that to Lee's ERA of 2.36, we get 2.77 or 2.81. Halladay is at 2.77. That makes things pretty even.

But what if we look at DIPS ERA, an ERA computed based only on things the pitcher controls himself like strikeouts, walks and HRs (DIPS means defense independent and was developed by Voros McCracken). Lee has a 2.85 DIPS ERA and Halladay has 3.06. So then Lee would jump well above Halladay in ERA.

Now we don't how good the pitchers were that these batters faced. Maybe the batters who have a collective OPS of .732 (the ones Lee has faced) faced unusually good pitchers. Probably not, but we do need to note it. If not, then this analysis gives the edge to Halladay. Halladay came into today with an edge of 14 in IP (224 to 210). He pitched 7 more today. Given that he will end up with more IP (it might not be as much as 21, though depending on how much each guy pitches from now on), Halladay has a case for the Cy Young award.

Technical note: The standard error in the OPS/Runs regression was about .15 in each case or about 24 runs a season. Certainly not the best estimators around but decent and OPS is the stat at issue here.

Sunday, September 7, 2008

How Good Are Playoff Bound Teams At Preventing Homeruns?

Last week a commentator on a game (I think it was on TBS) said that the White Sox might have problems in the playoffs since they rely on HRs so much and pitchers in the playoffs are good at preventing HRs. So I looked at all the playoff teams in both leagues over the last three years and compared their HR rate allowed (HRs divided by batters faced) to the league average. Over that time, the NL playoff teams allowed about 1.5% fewer HRs than average. The AL teams allowed about 5.3% fewer HRs than average.

How might this impact the White Sox if they make it to the post-season this year? Suppose their season rate of hitting HRs is 1.5 per game (it is not quite that high, but close). Then even if that goes down 5.3% in the playoffs, that still leaves them with about 1.42 HRs per game. If a typical HR is worth 1.4 runs (using the linear weights value from Pete Palmer), the White Sox would lose about .112 runs per game (since .08*1.4 = .112). If a typical playoff team hit 1 HR per game, then that goes down to .947 a game in the playoffs. That would cost them about .074 runs per game.

Now the difference between what the White Sox lose and what the typical team loses is less than .04. Not very big. And the other teams will probably see something that they do better go down more than for the Sox and suffer bigger loss (like in stealing or walking or just plain hits-remember that the pitching staffs of playoff bound teams are probably better than average at other things than just preventing HRs). Then that brings the two teams even closer togther. The Sox reliance on the HR is not a big deal.

Sunday, August 31, 2008

More On The Changing Historical Relationship Between Walks, HBPs and HRs

What I posted last week was something I posted on the SABR list last year. At that time, someone raised a question about this. Below is the question and how I responded, with a little more research. I think my basic finding is that there are not more HBP these days due to pitchers throwing faster.

"Cyril mentioned that current pitchers seem to be more willing to hit batters than pitchers in the past. How about since a lot more pitchers now pitch the ball around 90 MPH, it's harder for batters to get out of the way. Historically, have the pitchers leading the leagues in HB been hard throwers (more Ks) or poor control pitchers (more BBs)?"

I did some analysis on this although it is not exactly what John Lewis suggests. I took the top 500 pitchers in batters faced (seasonal data) from 1960-69 and 1997-2006. I ran a regression in each case in which the HBP rate was the dependent variable and the strikeout rate and the walk rate were the independent variables. Intentional walks were removed.

Here is the regression equation for the 1960s

HBP = .00387 + .0177*BB + .00186*SO

For the 1997-2006 period it was

HBP = .005 + .0031*BB + .00486*SO

The r-squared in the first case was just .013 and in the second it was .025. The r-squared tells us what percent of the variation in the dependent variable is explained by the model. So it is pretty weak. But the T-values for BBs and SOs in the first case were 2.44 and .44. So the walk rate is statistically significant. For the second period they were 3.32 and 1.13.

In the first period, a one standard deviation increase in BB rate increased HBP rate .000392. For the strikeout rate it was .00007. So if a pitcher increases his walk rate he increases his HBP rate more than if he increases his SO rate. For the second period these numbers were .00065 and .00022. So again, the walk rate has a bigger impact.

So all of this suggests that it is worse control in general that increases the HBP rate.

*********************

Now another response to that question

The other day I discussed a regression relating HBP, BBs and SOs. I did that again but I added in HRs with the idea that a pitcher might be more likely hit a guy who hit a HR last time up (or the next guy). I again looked at both the 1960s and the last 10 years. Skipping the regression details (except to say the coefficient values and the r-sqaured values did not change much), the interesting thing I found was that HRs had a negative relationship with HBP in the 1960s but it was positive in the last 10 years. So in the 1960s, a pitcher who gave up more HRs hit fewer batters but today a pitcher who gives up more HRs hits more batters.

Having an increase in HR% of .01 over 1000 batters faced reduced HBP in the 1960s by about .23. In the last 10 years, they went up by .33. A 1 standard deviation increase in HR% in the 1960s decreased HBP by .15. In the last 10 years it increased HBP by .24 (again, over 1000 batters). The standard deviation of HR% in the 1960s was .0066. In the last 10 years it was .0075.

The T-value on HRs was not significant for either time period. But maybe the difference in their coefficients could be. Anyone know if you can look at two different regressions and run some kind of a test to see if the difference between coefficients from the regressions is significant?

I ran a regression which combined the two periods. There was a dummy variable for time period. It indicates that pitching in the last 10 years instead of the 1960s, holding everything else constant, means 2.5 more HBP per 1000 batters faced. The T-value was 8.98. In other words, highly significant.

I also ran a regression with the dummy variable and the dummy variable was multiplied by each of the other variables (HRs, BBs, SOs). In this case the dummy for time period was just about zero and not significant. The value of the HR*dummy coefficient was .055 (although the T-value was just 1.53 and about 2 is usually needed for significance). So I think the .055 value means that any given increase in HR% in the last 10 years would make the HBP rate go up by .055 more than in the 1960s. So over 1000 batters faced, if your HR% goes up by .01 (say you give up 10 more HRs) you would hit .55 more batters in the last 10 years than you would have in the 1960s.

Monday, August 25, 2008

The Changing Historical Relationship Between Walks, HBPs and HRs

Since I posted something on HBP's last week, I thought I would post a couple of items that I put on the SABR list last year. Here they are.

As many of you probably know, the HBP rate has been on a general increase for many years (since about 1980). But one thing that could account for it is that pitchers have poorer control than they used to (I am not saying that they do-just that it could be a reason for the rise in HBP rates). So I thought that it might be useful to look at the HBP-to-walk ratio over time. I created 4 graphs and they are at

http://www.geocities.com/cyrilmorong@sbcglobal.net/HBPWalks.doc

There is one graph for each league. The first one is the HBP-to-walk ratio using all walks and the second one excludes intentional walks (they were not officially recorded until 1955). I also started the NL in 1897 or so because it did not look like all of the HBP were recorded by then. The file is a Microsoft Word file so when you click on it you might be asked to open it in that program. You will have to say yes.

Both leagues were around .16 about 1900. That is, there were 16 HBP for every 100 walks. But by around 1940 or so, it was 4 (or fewer) HBP per 100 walks. For both leagues, the rate has been rising since 1980. This suggests to me that the higher HBP rates these days is not due to poor control. There may be other issues involved so we might not be able to conclude that.

************************************************

Yesterday I discussed the HBP rate relative to the walk rate and how HBP/Walks has risen over time. But I also thought about how HRs might affect this. If a player hits a HR, the pitcher might want to pitch inside more to that player or anyone else on that team. This could lead to more HBP. Maybe even sometimes pitchers intentionally try to hit someone because of HRs. So I looked at HBP/HR over time. Since 1920, in both leagues, the rate has pretty much stayed under .5. But, of course, control is an issue, too. So I figured out the non-intentional walk rate each season since 1955 for both leagues and then the historical average from 1955-2006 in both leagues.

For each league/season, I then divided the non intentional walk rate by the average over the 1955-2006 period. If a league/season had a rate that was 10% higher than the historical average, then they got a 1.10. The HBP/HR rate for that league/season was divided by 1.10. So I deflate the HBP/HR rate by 10% since that league/season's pitchers had control that was 10% worse than average, which could partly account for a higher HBP/HR rate. So I did that for all league/seasons. The new number is called the adjusted HBP/HR rate. I graphed this for each league since 1955. The two graphs are at

http://www.geocities.com/cyrilmorong@sbcglobal.net/HBPHR.doc

The file is a Microsoft Word file so when you click on it you might be asked to open it in that program. You will have to say yes.

What I see here, is that if you adjust for HRs and control (as measured by the walk rate), is that pitchers today seem pretty willing to hit batters. Does this mean that they are willing to pitch inside and that the high HBP rate is a side affect of that? We probably can't tell for sure since we don't have stats on how many pitches are thrown inside. But certainly pitchers today are willing to hit batters. In the AL, each of the last 6 seasons is above the historical average of my adjusted HBP/HR rate (which is about .27). In the AL, 5 of the top 6 seasons in the unadjusted HBP/HR rate were from 2001-05. 2006 was the 11th highest.

In the NL, the historical average of the adjusted HBP/HR rate is also about .27 and each of the last 6 years is above that. 6 of the 10 highest unadjusted HBP/HR rates were from 2001-06. One of the reasons I looked into this issues is that it came up at the most recent SABR convention. There was a panel on St. Louis baseball and the former player all said that pitchers today don't pitch inside enough, that they leave the ball out over the plate too much and that they are reluctant to hit, or be aggressive with guys who are hitting HRs. Based on what I have done, this does not seem to be true.

Sunday, August 17, 2008

Are Good Pitchers More Likely To Hit Batters Who Hit Them Well?

I started wondering about this after last week's post on whether or not HR hitters are more likely to get hit by the pitch in recent times than they did in the 1950s and 60s. I took the top 10 in wins from 1960-69 and from 1998-2007. Then I found the correlation between their HBP% and HR%, OPS and SLG. For HBP% the formula was HBP/(HBP + AB). The other stats are calculated normally. My table below shows only 5 pitchers in the last 10 years since only 5 of them had faced 30+ batters in at least 50 ABs (those were the cutoffs I used). The data comes from Retrosheet. You can click on table to see a bigger version. A batter's record against a pitcher also includes cases not in the specified period. It includes their entire careers.

There may not be alot to learn here. Some guys have negative correlations and many are very low. The two who standout are Bunning and Mussina. According to the Lee Sinins Complete Baseball Encyclopedia, Bunning hit 160 batters while the average would have hit only 90. Relative to the league average, he was the 3rd most likely to hit a batter in the 1960s with 1000+ IP.

Mussina is very interesting. In his career he only hit 52 batters while the average pitcher would have hit 125. He was the 10th least likely to hit a batter relative to the league average in the last 10 years. Yet he has very high correlations on OPS & SLG. It seems like if a guy hit Mussina well, he was more likely to hit him. Yet Mussina has been very good at not hitting people in general. Has be been selectively and intentionally hitting certain guys? Of the 39 batters who have 50+ ABs against Mussina in the last 10 years, only 10 have been hit at least once. But their collective AVG against him is .321 (again, that is for their whole careers, not just the last 10 years). The other batters combined for only a .253 AVG. Getting back to the 10 who have been hit, they have collectively slugged .541 in their careers against Mussina.

Saturday, August 9, 2008

Do Sluggers Get Hit By The Pitch More Than They Used To?

I found the correlation between HR frequency and HBP frequency for each decade since the 1950s. In one case the denominator was AB + HBP, in the other it was AB + BB + HBP. Here are the correlations for the first case, starting with the 1950s

0.029
0.119
0.088
0.222
0.186
0.17

Now for the second measure.

0.022
0.101
0.072
0.22
0.173
0.128

The correlations are higher in the 80s, 90s and the 2000s, meaning players who hit HRs more frequently are more likely to get hit by a pitch than in the the 50s, 60s and 70s. So when old-timers tell you something like "if you hit a HR off Bob Gibson, next time you got brushed back or put on your but," don't believe them. If that kind of thing was going so much, there would have been more hit batters (some of those brushbacks would be a little off the mark, so the pitch would hit you, not just come close). And the correlation would have been higher between HR hitting and getting hit back in those days. But they are higher now.

In fact, hitting a HR in the 1990s increased your chances alot more than hitting a HR in the 1960s. Here is the regression equation from the 1960s

HBP% = 0.0311*HR% + 0.0058

Now for the 1990s

HBP% = 0.0573*HR% + 0.0065

Since .0573/.0311 = 1.83, hitting a HR in the 1990s was 83% more dangerous in the 1990s than it was in the 1960s. And the T-value on HR% in the 1990s was significant (2.84) while it was not significant in the 1960s (1.52).

Sunday, August 3, 2008

Predicting 2nd Half Winning Pct With First Half OPS Differential and Winning Pct

Earlier in the season, I did a post on which teams had the best OPS differentials. So I thought it might be interesting to see what has a higer correlation with second half (actually post all-star) winning pct: first half (actually pre all-star) winning pct or first half OPS differential? Using the data from ESPN, here are those correlations for the years 2000-2007. The first half pct is the first number and the 2nd is OPS differential.

0.384**0.498
0.384**0.34
0.708**0.669
0.612**0.708
0.625**0.607
0.327**0.297
0.226**0.237
0.444**0.361

Interesting that the correlations were much higher in 2002-4. Overall, it looks like first half pct does a slightly better job. The average correlation for the first half winning pct is 0.46375 and for first half OPS it is 0.46463. So a very slight edge for OPS.

I expected a bigger edge for OPS since it gives a good idea of a team's performance and pct can be more affected by luck in a short time span. Maybe it reflects how good the closer or bullpen is and that carries over from half to half.

Sunday, July 27, 2008

The Best "Leadoff" Hitters Since 1951

A few weeks ago I had a post called Who Are The Good Leadoff Men?. In the latter part of that post I explained a ranking system that is used here. To summarize it, it takes into account the ability to get walks, hits, hit for extra bases, and stealing. It also takes into account the value of a hitter that would be lost if he batted elsewhere in the lineup. The complete list of can be found if click here. The top 25 are below. I looked at all players who had 4,000+ PAs since 1951, the earliest year that both leagues began continuously keeping track of caught stealing. I did not adjust stats for park effects or league average.

Rank Player
1 Ted Williams
2 Barry Bonds
3 Wade Boggs
4 Frank Thomas
5 Eddie Yost
6 Edgar Martinez
7 Jason Giambi
8 Mickey Mantle
9 Todd Helton
10 Dave Magadan
11 Joe Cunningham
12 John Olerud
13 Mike Hargrove
14 Jim Thome
15 Gene Tenace
16 Lance Berkman
17 Brian Giles
18 John Kruk
19 Ken Singleton
20 Stan Musial
21 Richie Ashburn
22 Mark McGwire
23 Albert Pujols
24 Manny Ramirez
25 Gene Woodling

Sunday, July 20, 2008

Do Fast Players Hit Fewer 2Bs and 3Bs With A Runner On First?

If a fast player hits a ball hard and/or far down the line or into to the gap and there is a runner on first if that runner is slow or not fast, he might hold up at 3B. The fast player will have to hold at 2B. Had there been no runner on, then he might have hit a triple. A similar story could be told for doubles. So do fast players hit fewer 2Bs and 3Bs if there is a man on first base?

First, I identify the fastest players using the triple-to-double ratio. Just triples is not good enough since some fast players either don't hit the ball enough or far enough to get triples. But by using this ratio we are looking at long hits when the batter has a chance to turn a double into a triple. Fast players will do this more than slow players.

The top 15 in this ratio from 2005-2007 with 1200+ PAs were

Dave Roberts 0.5926
Jose Reyes 0.5111
Curt Granderson 0.4667
Juan Pierre 0.4533
Ichiro Suzuki 0.4444
Carl Crawford 0.4444
Chone Figgins 0.3333
Jimmy Rollins 0.3306
Luis Castillo 0.2830
Rafael Furcal 0.2791
Orlando Hudson 0.2644
Nick Punto 0.2632
Omar Vizquel 0.2500
Willy Taveras 0.2444
Mark Teahen 0.2346

I did not include Kenny Lofton since he is not currently playing and I could not get the data for him that I use below. He was 7th.

With no runner on first base, this group of player had a 2B rate of 4.491%. Their 3B rate was 1.57%. With a runner on first base, those rates were 1.68% and 4.235%. So they hit triples more often with a man on first base than without, but doubles were fewer. The rate was 6% lower. This group of players hit 254 doubles with a runner on first over this three year period. 6% of that is only about 15. That works out to about just .33 fewer doubles per year. The triple rate was about 7% higher with a runner on first base. This group of players hit 101 3Bs over the three years with a runner on first. Another 7% is about 7 then is only about .15 triples per year. So, all in all, fast players hit about the same number of doubles and triples with a runner on first base as they do with no runner on first.

Monday, July 7, 2008

The most indispensable seasons

I saw someone mention this idea once, but I can't remember where or what came of it. The question is which player seasons were the most indispensable, that is were most vital or necessary to their team making the post season or coming in first place?

I calculated this by subtracting from their Total Player Rating the number of games his team finished ahead of the team behind them. If a guy had a TPR of 7 and his team came if first by 1 game, he gets a 6. TPR tells us how many more games a team would win if an average player at a given position is replaced by the player in question. An average player has a zero TPR. It takes fielding, hitting and base stealing into account. The numbers in parantheses are their actual TPR and how many games ahead of the next team they were)

Bonds-2002-8.2 (11.7-3.5)
Yount-1982-6.3 (7.3-1)
Boudreau-1948-6.1-(7.1-1)
Yastrzemski-1967-5.9 (6.9-1)
Schmidt-1980-5.9 (6.9-1)
AROD-2000-5.8 (6.8-1)
Brett-1985-5.7 (6.7-1)
Ruth-1926-5.5 (8.5-3)
Boggs-1988-5.4 (6.4-1)
Mays-1962-5.2 (6.2-1)

So Bonds is at the top again. The Giants finished 3.5 games ahead of the Dodgers for the wild card. Bonds had a TPR of 11.7. I have not looked at pitchers very carefully. But Ron Guidry had 6.4 pitching wins in 1978 and the Yankees beat the Red Sox by just one game. So he would get a 5.4. Also, as I scanned the seasons from 1900-1919, there did not seem to be many close races. A player could have a great year, but if his team easily came in first, he would not rank very high here. I did not look at pre-1900. I used the latest Baseball Encyclopedia by Pete Palmer and Gary Gillette.

Here are the Sept/Oct regular season numbers for the guys that Retrosheet has the numbers for

Bonds .362-.614-.681 (MVP)
Yount .341-.404-.563 (MVP)
Boudreau (MVP)
Yaz .417-.504-.761 (MVP)
Schmidt .294-.366-.677 (MVP)
AROD .241-.363-.565 (3rd in MVP)
Brett .261-.358-.512 (2nd in MVP)
Ruth (was not eligible for MVP)
Boggs .423-.551-.536 (6th in MVP)
Mays .337-.437-.673 (2nd in MVP)

Yaz was simply sensational in 1967. Maury Wills won the NL MVP in 1962, edging Mays 209-202. But Mays beat him in TPR 6.1 to 2.7. The leagues gave out the awards in the 1920s and the rules said you could only win once. Ruth won in 1923, so he was not eligible in 1926.

Saturday, June 21, 2008

Who Are The Good Leadoff Men?

It seems obvious: Hitters who are fast and get on base alot. You also probably don't want someone who hits alot of HRs, since you want those guys to bat with runners on. So I tried to devise a stat that would capture this. Here it is:

(2B + 1.25*3B - HR + SB)/outs

In other words, how many times a player gets into scoring position per out. Since triples are worth about 25% more than 2B's according to run expectancy tables, I multiply them by 1.25. By dividing by outs, the ability to get on base is taken into account since if you make an out you don't reach base. Also, outs include caught stealing. By subtracting HRs I am saying that guys that hit alot of HRs, even though they may have other good leadoff traits, are "penalized" here, since they might be better suited to batting lower in the order. But I also ran the numbers without subtracting HRs (the correlation between the two different formulas for all the players in the study was about .86). The table below shows the top 15 from 2007 among players with 400+ PAs using both methods.



The players in the top 15 are probably not big surprises. But are they really that great at being leadoff men? Do they increase team runs by batting leadoff comapared to anyone else? To try to answer these questions, I turned to some analysis I did on lineups two years ago. You can read those articles here and here. In that research, I studied the impact on team scoring by what each slot in the lineup did. In the latter of those two articles, team runs per game was the dependent variable in a linear regression while walk%, hit%, extra-base%, SB per game and CS per game were the independent variables. The regression found a run value for each event and for each lineup slot.

I plugged in the values for those events for Jose Reyes for the number one slot to see what impact he would have on team runs per game. But I also did the same for Adam Dunn, a player who you probably would not think of as making a good leadoff man. In my rankings above, he is 208th out of 216 players. In fact, I tried both Reyes and Dunn in the leadoff slot and both in the clean up slot. The table below shows their relevant stats and the run values for each lineup slot.



If Reyes bats first, his numbers combine to make 1.326 while if Dunn bats 4th we get 1.453 (the regression had an intercept or constant equal to about -5, so to get a number for team runs per game I would have to plug in numbers for all slots, multiply things out then subtract 5-the numbers here are just individual contributions). So those two add up to 2.779. But what if Dunn batted first and Reyes batted 4th? Dunn gets 1.521 and Reyes gets 1.306 for a total of 2.827. That is actually better than having Reyes bat first and Dunn 4th. Your team would score .0485 more runs per game or about 7.86 more per season. The reason it happens this way is that Dunn walks more (101 vs. 30) and if you went to one of my links above, you can see that the run value for walks is highest for the leadoff slot.

Now if a team really tried this, Dunn might not get walked so much since he won't be as big a threat batting with the bases empty. But if the guys right behind him don't have much power and since he is not fast, they might walk him more. Reyes might not get as many extra base hits since some of his triples and doubles are a result of speed and with runners on base he might have someone clogging the bases. I looked at his career stats on that and the results are mixed. It is also possible that Dunn would not score on hits that would have scored Reyes and since some of Reyes' doubles are a result of speed more than hitting distance, his doubles might drive in fewer runs than Dunn's doubles. Would it make a 7.86 run difference over the course of a season? Maybe, but even if it did, it is still interesting that batting Dunn first and Reyes fourth, instead of vice-versa, does not seem to hurt scoring that much, even though Reyes is rated far better as a leadoff man by my measure (which seems to make some sense).

I also tried using David Pinto's optimal lineup finder, based on my lineup research. I set a lineup with Reyes batting first and Dunn 4th. Then I used Retrosheet data to fill in the rest of the lineup. I used the OBP & SLG of each lineup slot for the NL in 2007. This tool has two methods, each based on my two separate lineup studies that used different years. Having Reyes batting first and Dunn 4th with everyone else being league average for their slot generated 4.93 to 4.94 runs per game. But the tool in each case did find that Reyes should bat leadoff. In one case it had Dunn batting 4th which generated 5.04 runs per game. In the other case it had Dunn 2nd for 4.99 runs per game. In the two cases where I had Dunn first and Reyes 4th, the runs per game were 4.90 and 4.93 (as stated above, the reverse yielded 4.93 and 4.94).

Now that model did not include stealing. But again, even though Reyes batting first and Dunn 4th does better than vice-versa, it is not by much. If stealing were included in Pinto's tool, it would be a bigger difference. But recall that in the model with things broken down by hits, walks and extrabases, Dunn batting first did better.

Then I ran a simulation using the Star Simulator. I plugged in all the numbers for each lineup slot again using Retrosheet data (2007 NL). The simulation had the average team scoring about 754 runs per season, about 2% less than in real life. But it also had about 2% fewer ABs (maybe because it only does offense and does not have extra inning games). Then I put Reyes first and Dunn 4th. The team scored 793.8 runs per season. If it were reversed, it was 791.5. So the difference, although in favor of Reyes batting first, is only 1.8 runs over a season. Having Reyes bat first with an average cleanup hitter, it was 773.36. With Dunn batting leadoff it was 788.27! So having Adam Dunn instead of Jose Reyes as your leadoff hitter would means about 15 more runs per season.

If you go back to my earlier analysis, from the second table, if we just multiply out the impact of Dunn batting first and Reyes batting first, we get 1.52 for Dunn and 1.33 for Reyes. Over 162 games that difference of about .019 is about 31.6 runs!

This all seems to be about tradeoffs. Getting on base versus speed and having a high OBP guy bat lead off versus losing his power if he batted in the middle of the lineup. I am looking for a way to incorporate all those factors in to find the optimal leadoff man. So I tried one more thing. I calculated each guy's impact in batting leadoff (like the way I did using the second table). So each player has a leadoff impact. But even if someone gets a high score there, it might not be a good idea to bat them first since you might lose an even better score or impact from another slot they might bat in. So I found each guy's impact in all nine slots. Then that got subtracted from their leadoff impact.

Barry Bonds, for example, had a leadoff impact of 1.77. His impact in the number 2 slot was 1.72. So he is .05 better batting leadoff than 2nd. He was .23 better number at 1 than number 3. I did that all the way down to the number 9 slot. Here are all of Bonds' differences

0.05
0.23
0.13
0.20
0.32
0.47
0.40
0.59

That adds up to about 2.4. Then I added up all of those differences for each player and ranked them from highest to lowest. Remember, that I am taking into account not just how good they would be leading off, but how much better (or worse) they would be than batting elsewhere. Below are the top 15 leadoff men from last year, even taking into account what you would lose by not having them bat elsewhere (based on walks, hits, extrabases, SB and CS)

Barry Bonds
Todd Helton
David Ortiz
Jorge Posada
Jack Cust
Pat Burrell
Magglio Ordonez
Jim Thome
Chipper Jones
Carlos Pena
Albert Pujols
Travis Hafner
Scott Hatteberg
Kevin Youkilis
David Wright

Just to be complete, here are the top 15 (based on walks, hits, extrabases, SB and CS) while not adjusting for how well they would hit elsewhere. It is some of the same players as above,but not identical

Barry Bonds
David Ortiz
Alex Rodriguez
Magglio Ordonez
Chipper Jones
Carlos Pena
Albert Pujols
Matt Holliday
Jorge Posada
David Wright
Prince Fielder
Chase Utley
Jim Thome
Todd Helton
Mark Teixeira

Saturday, June 14, 2008

Chipper Jones And Batting .400

Both Tangotiger and Phil Birnbaum blogged about this after there was an article at Baseball Prospectus. One issue is what is Jones' "true" average. Here is my comment at Phil's site:

In the last 4 months last year, Jones batted .354. Just doing a cursory look at his Retrosheet stats, that comes pretty close to what his best 4 month period (within one season) might have been anytime in his career. But it was over only 353 ABs.

Anyway, combining the last 4 months from last year with this year, he has batted .378 over his last 580 ABs. And his yearly AVGs have been going up lately. Starting with 2004, here are his averages with his age

.248 (32)
.296 (33)
.324 (34)
.337 (35)

This does not seem like a normal aging/performance pattern. How unusual, I don't know. But I just wonder if something is going on or changing with this guy that makes it really hard to know his true ability.

So I decided to look at what .400 hitters were like in the past. There are two tables below, one for the guys from the 1800s and one for the guys since 1900. The tables show their age, the league average, their career average, their career average before the year they hit .400 and their average before the year they hit .400.

One thing you might notice is that the average age is about 27 for both groups. Jones is 36. Cobb batted .400 when he was 35 but the league average was .285 that year, much higher than the NL average this year of .259. Cobb also had batted .400 twice before, had a much higher career average than Jones and batted alot higher the year before.

Barnes batted .400 in 1876 but I think that was the year you bunt a ball that went foul before it got to third or first base and it was still a hit. Dunlap did it in the only year of the Union Association. Notice that Jones' career average and average last year are far below what was normal in the past for .400 hitters. Same for the league average. Those are all just simple averages and in the case of Joe Jackson, he had only 75 ABs in his previous year and 115 ABs previously in his career. Taking him out would not change things much, with the last two columns being .344 and .377.

Besides Cobb, the only other guy to bat .400 since 1900 while being at least 30 years old was Bill Terry. But he did when the league average was .303. So when it comes to age, leage average and previous performance, Jones is not even close to what .400 hitters were in the past. If he does it, it will be amazing. I will wonder how he did it.



Tuesday, June 10, 2008

Have Second Basemen Been Underpaid?

It seems like 2nd basemen get paid less than they should. In trying to explain player salaries when taking their hitting performance, position and free agent/arbitration status into account, I found that being a 2nd baseman had a negative impact. I was doing a study involving salaries looking for something else and I thought it would be a good idea to have dummy variables for the "skilled" positions. What I found is discussed below. If you are interested in this issue, you might want to read a couple of papers by Jahn K. Hakes and Raymond D. Sauer (one was published in the Journal of Economic Perspectives). References to those papers are at the end.

I looked at 5 years, 1985, 1990, 1995, 2000 and 2005. I used regression analysis to predict each player's salary. The data set included all the players with 400+ plate appearances in that year. Here is the basic model or equation

SAL = Constant + b1*FA + b2*ARB + b3*2B + b4*SS + b5*3B + b6*CF + b7*C + b8*HITS + b9*XB + b10*BB

FA means the player had played long enough to be a free agent or had been granted free agency (in some cases I found that out from Retroseet). The salary data came from the SABR "Business of Baseball" site. Players with 3 years service (and some with 2) can be eligible for arbitration. So ARB is for those guys. Both FA and ARB are dummy variables, 1 or 0. The same is true for the "skill" positions. 2B is for second basemen, SS for shortstops and so on.

I broke down hitting performance into three variables: hits, extrabases(XB) and walks (BB). This measures 3 different types of abilities (as in the work of Hakes and Sauer). XB means 1 for a double, 2 for a triple and 3 for a HR or all bases over 1 on a hit. So there are three abilities: to get a hit, to hit for power and to get walks.

I actually ran two versions of the model. One was a linear regression and the other was non-linear, where I took the natural log of salary (called LOGSAL). The results are summarized in the tables below. The first table shows the linear results and the second one shows the non-linear results (LOGSAL). You can click on the tables to get a bigger version. The values for each variable are the coefficient estimates. * means it was significant at the 10% level, ** the 5% level and *** the 1% level. It is probably not a big surprise that FA, ARB, HITs, XB and BB are all very significant in both the linear and non-linear models. With higher r-squared values and F values, the non-linear model looks like a better fit.

Being a FA in 2005 meant about an extra $5.7 million in salary, everything else being equal, in the linear model. It is hard to see an exact value in the non-linear model for being a FA. But I simply changed the 1 to 0 for a few guys to see how their predicted salary would change. If Alex Rodriquez was not a FA (or eligible for ARB) in 2005, his predicted salary would have been just $1.1 million. The model predicted it would be $12.2 million. So, for him, being a free agent, meant an extra $11.1 million. For Brad Ausmus, it meant about $7.6 million (or almost all of his $8.3 million in salary). Since the regression is non-linear, the effect is not the same for everyone.

Now what about the 2nd basemen? The coefficient for them is negative in all years in the non-linear regression and negative in 4 of the 5 years for the linear regression. If being a 2nd basemen truly has no effect on salary, the odds of getting a negative sign all 5 years is 32 to 1. The only year it was positive in the linear model was 2000, and it only added about $60,000 in salary. It was only significant in one case, in the non-linear model in 2005. The coefficient is -.334, so it is hard to see the dollar value of the loss to being a 2nd basemen. The linear model shows it to be $996,000 (although it was not significant, with the p-value being .22). In the non-linear model, I switched the 1 to a 0 for all the 2nd basemen for 2005. For 11 out 20 of them, it meant a drop of more than $1 million (again, like the the FA case mentioned above for AROD and Ausmus, the effect is not the same for each player in the non-linear model). That is, if those guys had been in LF, RF or at 1B, they would be making about $1 million more. For 6 others, the drop was six figures but those were guys who had salaries under $1 million anyway, so it was big portion of their salary.

The story might not be that much different for some of the other skilled positions. The coefficient for CF was negative in all 5 years of the non-linear model. Same for 3B and it was significant in the year 2000. The results, however, for catchers (C) and SS are mixed. Sometimes negative, sometimes positive. But it looks like there has been a general tendency to underpay players at the skilled positions. In fact, in the non-linear regression, 20 of the 25 coefficients (5 per year for 5 years) are negative.





Other Work
Here are the papers by Hakes and Sauer

An Economic Evaluation of the Moneyball Hypothesis

he Moneyball Anomaly and Payroll Efficiency: A Further Investigation

Saturday, June 7, 2008

Should Sox Manager Guillen Have Been Upset?

After losing a third straight game to the Rays last Sunday, Guillen had a tirade, complaining about how players were not hitting and coming through with runners on base. Especially frustrating was scoring just 4 runs in the 3 losses and leaving 10 runners on base in a 4-3, 10 inning loss. He swore and wanted Sox GM Ken Williams to get some better players. But were the Sox underperforming?

Let's first look at what the Sox were expected to do and what they have been doing. The table below shows the OBP and SLG for the 9 Sox regulars and what they were projected to do in the Bill James Handbook.



Alexi Ramirez was not included because he did not have a projection in the book. A weighted average of the projected OBPs and SLGs is .343 and .460 (guys like Quentin and Swisher were projected for parks other than U. S. Cellular Field, so that is a problem, but I hope not too big). This year, the league OBP and SLG are lower than last year. For OBP, it has fallen from .338 to .330 and for SLG it has fallen from .423 to .402. So if we give the Sox the same declines for their projection, they end up wtih .335 and .439 for their projections. How many runs per game should that bring them?

Based on regression analysis of all teams from 2001-03,

R/G = 17.11*OBP + 11.13*SLG - 5.66

That predicts that the Sox should score 4.95 runs per game. Right now they are at 4.51. So that is almost half a run less per game than expected, a big disappointment. Their actual OBP and SLG right now are .330 and .416, also below projections. At those numbers, they should be scoring 4.62 runs per game but the actually are scoring 4.51. So that is a little lower than they should have, but nothing major.

Before the three game losing streak started in Tampa Bay about a week and a half ago, their OBP and SLG were about 0.324 and 0.416. So they should have been scoring 4.51 R/G while they were actually at 4.44. So the team was scoring about what they should have. So Guillen should not have been upset about scoring enough runs or leaving runners on base. He could have been upset about players not hitting up to expectations as shown above, but that is it. Also, the game they got shutout in in Tampa was started by Scott Kazmier, who went 7 IP and is one of the best pitchers in baseball. And Tampa has good pitching in general. Sometimes they will give up just 4 runs in three games. You can't go crazy when that happens.

On the flip side, the Sox have pitched better than expectations. Below are the ERAs for the Sox pitchers this year and their projections. Nick Masset is left out because he did not have projections in the book.



The weighted average of the projected ERAs for these pitchers is 4.45. But the league ERA is .37 lower this year than last year, so we can lower the prediction to 4.08. The Sox actually have a league leading 3.33 ERA, far lower than expected. So they are allowing .75 fewer earned runs than expected (and they were scoring .44 less than expected). So on balance, they are .31 ahead of expectations.

Are they winning the number of games they should this year based on their runs and runs allowed this year? They have scored 275 runs and allowed 226 runs. Using the Bill James "pythagorean projection," that works out to a .597 pct and 35.8 wins. They have won 34. So a bit of a disappointment, but not huge. Last week I showed that the Sox had won about 1 game less than expected based on their OPS differential.

As I stated above, the Sox were expected to be scoring 4.95 runs a game this year. Their expected ERA was 4.08. Add about .4 to that for unearned runs, and you get 4.48 runs allowed per game. Scoring that many runs and allowing that many runs would give them a .549 pct and about 33 wins, 1 less than they actually have. So overall, the Sox are doing about as well as expected when the season started and they are scoring about as many runs as they should based on their OBP and SLG. They are also winning pretty close to the number of games they should based on both OPS differential and runs and runs allowed. So what is there to complain about?

Saturday, May 31, 2008

Which teams have the best OPS differentials so far in 2008?

Since the season is about one-third over (teams have played 54 games or close to that number), it might be a good time to look at this.

I've done some research before where I came up with the following equation to explain a team's winning percentage

PCT = 1.21*OPSDIFF + .5

Where OPSDIFF is OPS differential, a team's hitting OPS minus the OPS it's pitchers allow. OPS is on-base percentage plus slugging percentage. You can read that earlier study here.

In the graphs below, teams in each league are ranked by OPS differential. The next column shows each team's actual winning percentage followed by their pct predicted by the equation. Then their actual wins, predicted wins and the difference.

The Angels have a -.030 OPPSDIFF but have a .571 winning pct. Maybe they have been lucky so far this year. But check out the Astros. They have a -.060 differential yet have a winning record! Then there are the Twins who have a winning record with a -.069 OPPSDIFF. So far, the Cubs and Red Sox are the strongest teams in their respective leagues. The Braves are very strong, too, but their record so far does not show it.



Sunday, May 25, 2008

Another look at salaries and wins

Alot of people have looked at this. But I started thinking about it again after I came across some data at JC Bradbury's site. You can view that data here. The data shows how many games, on average, that teams won each year from 1986-2005. It also shows how much above or below the league average in total salary each team paid in percentage terms. Again, it shows yearly averages. Suppose a team was 10% above average one year and 30% above average another year, they would get 20 (if were just over two years).

What I did was to run a regression with average wins per year as the dependent variable and the average salary (SAL, the % above or below the league average) as the independent variable.

Here is the regression equation

Wins = 0.157*SAL + 80.22

The r-squared was .489 and the standard error was 3.89 wins. The T-value for SAL was 5.17. The .157 means that if you spent 10% more on salaries than the average team, you win 1.57 more games than the average team. A zero for SAL would mean that a team spent the average amount on salaries. A negative number means the team spent below the average salary level. The table below summarizes each team.



Tampa Bay, for example, on average, had a payroll that was 38.87% below the league average. They were predicted to win 74.12 but only 64.33 wins per game. If a team were to spend 100% more than average, it should win about 96-97 games a year. The Yankees had the highest payroll above average. They spent about 70% more than the average team. They were predicted to win 91.26 games a year but actually only won 90.24.

I think the results are fairly strong. 16 of the 30 teams were predicted to within 3 or fewer wins. Only 3 were off by 6 or more wins. I think what I did differently than JC Bradbury was to use the average annual values for each team, instead of each team's data for each year. By using the averages, I think the randomness from year-to-year is eliminated. A team can sign a big free agent and maybe one year he does not do well. Or you get lucky and some non-arbitration eligible young players do very well. So by averaging, some of the good and bad luck gets flushed out.

The graph below also summarizes the results. You can see that the relationship is strong.

Monday, May 19, 2008

Can Brandon Webb win 25 games this year?

Since he already has 9 wins and might get another 26-27 starts, it seems possible. But since 1980, only one pitcher has won 25 or more games, Bob Welch, who won 27 in 1990.

I looked at all the seasons since 1946 with 25+ wins. Those pitchers averaged 38.5 starts per season and 308 IP. Webb's career high in games started is 35 and for IP it is 236.33. Mel Parnell, in 1949, had the lowest number of starts for a 25 win pitcher (33). But Parnell won one game in relief and had 295 IP.

The pitchers who had 35 or fewer starts (Webb's career high) that won 25+ games averaged 274 IP, well above Webb's career high of 236. This group of pitchers also averaged 37 games pitched, so they had 2.5 relief appearances on average, which could help them win an extra game or two (that group of 6 pitchers averaged 34.5 starts). Webb has only pitched 1 game in relief in his entire career.

The lowest number of IP for a 25 win pitcher since 1946 was Welch's 238 in 1990, just a bit more than Webb's career high. If Webb were to make 26 more starts, he would have to win 16 of those games or 61.5% of his starts. In his career, he has won 43% of his starts and last year he won 52.9%. He has been allowing 2.98 runs per 9 IP and the Diamondbacks are scoring 5.4 runs per game. Using the Bill James Pythagorean formual, that works out to a winning percentage of .767.

If he were to get the decision in 82.3% of his starts (his % from last year) the rest of the way (26 starts) and if he had a .767 winning percentage in those games, he wins 16.4 games. Added to the 9 he already has, he gets to 25. So he needs to keep pitching as well as he has this year and the Diamondbacks need to keep scoring 5.4 runs per game.

Tuesday, May 13, 2008

Berkman is on a hot streak, but Cecil Cooper had a few good ones, too

Berkman, of course, is on a great streak. He is batting .641 in May with a 1.205 SLG. You can click here to see his May stats. So Astro manager, Cecil Cooper, who was a pretty good major league hitter (he batted .352 one year) said:

"I was never in a streak like that. Never, ever."

He is either forgetting or being a little modest. He went 23 for 41 (.561 AVG) in a 10 game stretch in Aug 1980. Of course that is short of Berkman's 25 for 39 (.641) and Cooper had "only" 3 2Bs and 2 HRs. But that is still pretty darn good.

Through the first 11 games in 1979, Cooper had a .465 AVG and a .930 SLG over 43 ABs.

He also had 18 hits in 7 games in 32 ABs in a stretch in Aug 1981 for .563 AVG. and 29 TBs (a .906 SLG)

I got all these stats using Retrosheet.

Tuesday, April 29, 2008

How Many HRs Would Ruth Have Hit With Integration?

I have written about this before. But the reason I am doing again is that a friend emailed me some comments that her friends had made about my research on this. So I answer them below. But first, here is the gist of what I did. I estimated how many non-white pitchers there might have been and how good they might have been if they had been allowed pre-1947. Then I estimated how many fewer HRs there would have been in baseball due to the improved pitching quality. I came up with 5% and assumed that Ruth's total would go down 5% (to 678).

Before I respond to the comments, here are the links.

How Would Integration Have Affected Ruth and Cobb?
How Many HRs Would Babe Ruth Have in Integrated Baseball?

The first link has a link to an article I wrote for the now defunct Chicago Sports Weekly. It no longer works. Now to the comments.

COMMENT: "I'm not sure how accurate is is to presume that the "worst" 15% of white pitchers would necessarily be replaced by black pitchers. Without the color barrier, they would have just promoted who was better. Can anyone say with any certainty that black pitchers were significantly better, inherently than whites? The other thing is that in Ruth's time, pitching staffs generally consisted of about 8 or 9 pitchers at any given time. There were 4-man rotations, and those pitchers were conditioned from early on to be able to complete games. Pitchers today could be conditioned that way too....but they're not. Back then, there was less need for relievers, hence the small pitching staffs.

So this means that at any given time, there were approximately 140 pitchers in the game-spread out over 16 teams. Even if we went with the figure of 15%, that means that there might have been about 21 black pitchers, spread out over 16 teams - which means about 1.3 per team-and half of those Ruth would not face because they would be in the National League.

So the 9 or so he would face on the opposing 7 teams-how many of them would have been exceptional enough to really put a significant dent in Ruth's perfromance? I'm not thinking too many. In fact, he may well have dominated some of them."

Yes, I think the worst 15% would be replaced. Here is how I look at it. Suppose all of the sudden a new talent pool was discovered that had major league quality pitchers in it. You would want some of them on your team, right? So whatever number of pitchers that you carry, for every one pitcher you add from this new talent pool, you have to send one to the minors or release him. The only logical thing to do would be to release the worst pitcher every time you add one good pitcher. For example, if you add Satchel Paige, you don't dump Bob Feller or Bob Lemon.

I am not saying that any one race is better than any other race. Here is what I wrote in one article

"I estimate that about 15% of the IP then were by non-whites, blacks, dark-skinned Hispanics and Asians. Using the Lee Sinins Complete Baseball Encyclopedia, I found all the pitchers with 1,000+ IP in this period and then calculated what percent of the IP by these guys was by non-whites. You can see the list here. I checked the race of any pitcher I did not already know by looking at when they played and finding pictures of them in books or online. Any pitchers with Hispanic names were considered non-white. There were pitchers like Lefty Gomez before 1947, whose skin was light enough to play. But I did not want to have to judge who would have been able to play and who would not.

In that list, I have relative ERA listed. That is simply ERA divided by the league ERA. The relative ERA of all the whites combined was 105.75, meaning their ERA’s were about 5.75% better than the league average. For the non-whites it was just a bit higher at 106.8. In the analysis below, I assume that the ERAs of whites and non-whites will be the same. The number of IP by the pitchers with 1,000+ IP since 1947 accounted for 58% of all the IP in this time period."

So, the quality of the whites who have pitched in MLB since 1947 and the non-whites is about the same. Neither is better. But if those non-whites had not been there this whole time, who would have been there in their place? Some white guys, who were not as good (if they were, they would have been there instead). Let's call them the bad-whites. Having the bad-whites instead of the non-whites means that the overall quality of pitching would have been worse, meaning more HRs hit (assuming we make no change in who is batting). But that means having the non-whites there improves the quality of pitching and the hitters would not quite do as well. If anyone has read my articles, I had a rough estimate of Ruth going down 5%, leaving him with 678 HRs, still alot.

It does not matter how many pitchers are on the teams or what kind of rotation they had. 15% is 15%. Of course he would not have faced the guys in the other league. But I make an assumption of 15% in each league, which is done to keep everything fair. Yes, with some of the non-white pitchers, he would have dominated them. A few of them would have been just a hair better than the bottom rung pitchers (the bad-whites). But some were great, like Gibson and Marichal. I am just estimating that collectively, all the non-whites were as good as the top 85% of the whites.


COMMENT: "But think about this now. If 60 years of integration can only produce a handful of solid minority pitchers, why should the 15% percent you are replacing them with be any better than the ones you are deducting?"

I think this is explained above.

COMMENT: "Ruth did not hit a HR every time up but changing 8-9 pitchers (15%)who may be better pitchers is no guarantee that they would dominate Ruth or at least do better than their predecessors."

Think of it this way. Suppose all of the sudden the worst 15% of the pitchers were let go and the best 85% all pitched a little bit more to make up for the lost IP. Suppose that there was some pill they could take that allowed them to add extra innings with no loss of effectiveness. Then all the batters face a higher quality of pitching. HRs would go down. I don't think that losing 5% of your HRs means you were dominated.

Some other issues. I think right now more than 15% of the pitchers are non-whites. Probably less than 15% were non-white when Aaron and Mays were starting to rack up their big HR totals. If there had been more non-white pitchers in the 1950s, those two guys would also lose HRs. I think my analysis shows how good Ruth was and how much we should respect his records. I think my analysis is an answer to those who ask "How many HRs would Ruth have hit if he had to face Pedro Martinez?" Well, still quite a few because he would not face Pedro all the time and Pedro does allow some HRs.