Cybermetrics: October 2009

Wednesday, October 28, 2009

What Does The Past 3 Years Tell Us About The World Series? (updated)

I tried to use a tool of Tangotiger's called Marcel. There is a good chance I did not apply it correctly (I think I did, that is why it is updated) but I attempted to measure the skill level of the players using the last 3 years of performance with more recent years being weighted more and using a regression to the mean. Maybe in a day or two I will go through the numbers.

First I tried to generate an OPS relative to the league average for the 8 position players on each team. Then I took the simple average of that. For the Yankees, it was 10.13% above the league average. For the Phillies, it was 7.54% better. If we assume a league average of .750, then the Yankees would be at .826 while the Phillies would be at .807. I did not make any park adjustments and this might hurt Ibanez for his two years in Seattle.

For pitchers, I did the same thing using the FIP ERA from Fangraphs. Here are their ratios to the league average for the top 3 starters

Sabathia 0.751
Lee 0.829
Burnett 0.945
Martinez 1.052
Pettitte 0.907
Hamels 0.881

The Yankees have a big edge in the first and second matchups while the Phillies have the edge in the 3rd one. But Fangraphs has the same league average each year for both leagues. This may not be right, and if not, it would probably mean that the Yankees have an edge in all 3 slots. Then, as I mentioned yesterday, the Yankees were much better against lefties this year than the Phillies.

As I understand the Marcel method, the last 3 years have a weight of 5, 4, and 3. So the total is 12. Then the weight is

5/12 = .417
4/12 = .333
3/12 = .25

Now I think that assumes that the player has an equal number of PAs in each year. But I think those weights should be changed if PAs are not equal in each year. Let's take Jeter. Here are his PAs from 2007-9

695
648
706

The total is 2049. In each year, here are the %'s of the total for each year:

.339
.316
.345

Now how does that change the weight of 5, 4, 3? In 2009, he had a larger than expected pct (which is .333). The pct was .345/.333 = 1.036 times the expected value. So instead of using .416 for 2009, I used .416*1.036 = .357. Something similar was done for all the other players. Here are Jeter's OPS divided by the league average from 2007-9

1.11
1.02
1.14

So what is his ratio for the 3 years? 1.11*.339 + 1.02*.316+ 1.14*.345 = 1.097. Now the regression to the mean. First multiply the PAs from the 3 recent years (going backwards) by 5, 4 and 3. That gives 8207. But the regression to the mean involves two seasons worth of league average hitting. Each year is 600 PAs or 1200. So we have a denominator of 9407 (8207 + 1200). So Jeter's OPS relative to the league average is

(8207/9407)*1.097 + (1200/9407)*1 = 1.0846

Jeter's skill level means his OPS is 8.46% better than average. So I did this for each player on each team. I added their relative to the league average and then divided by 8. I did something similar for the starting pitchers.

Monday, October 26, 2009

Yankees vs. Phillies: Can OPS Tell Us Anything?

The table below shows the team OPS for both the Yankees and Phillies as well as the OPS their pitchers allowed.

So the Yankees have an overall advantage of .081 in OPS. I once found that winning pct = 1.26*OPSDIFF + .5. A team with that big of an OPS differential wins about 60% of their games. So that might be the Yankees probability of winning (although it may not be that simple-I actually came up with about a 72% chance for them to win if they have a 60% of winning each game). Of course, the Phillies did not have Lee or Martinez in their rotation all year. So the differences may not be thaat great in the series and we need to take that into account.

The next table shows the OPS of each pitcher in the rotation for the two teams, in what looks like will be the order for the series. Each pitcher's OPS is compared to the league average.

The next table shows which pitcher has the advantage in each matchup.

The Yankees have a big advantage in each of the first three games. My guess is that it will only be in game 4 that the Phillies have the advantage. Then the rotation starts up again. The Yankee hitters also had an OPS that was .076 better than the league average while the Phillies were only .042 better.

The next table shows some other breakdowns. It shows both hitting and pitching OPS for both teams, home and road and also the league averages for those respective stats.

The Yankees outhit their opponents at home by .129 in OPS. For the Phillies, it is only .037. On the road, these two stats are .082 and .012. So when in Yankee Stadium, the Yankees have an advantage of .117 (.129 - .012). Even in Philadelphia, the Yankees advantage is .045 (.082 - .037).

My guess is that park effects are not a big deal here. The simple average of the OPS in Yankee stadium was about 1.6% higher than in Yankee road games (I simply added what the Yankees hit and allowed at home and divided by 2, then did the same for road games and then the home number was divided by the road number-that is all probably not quite right since Yankee pitchers have more innings at home than Yankee hitters since they don't bat alot of the time at home in the bottom of the 9th). For the Phillies, this was 2.2% higher in home games. So, overall, not much going on with park effects.

Also notice that the Yankees hit .081 better than the league against lefties this year and they get to face 3 lefty starters.

The next table shows how the two bullpens faired compared to the league average bullpens.

So even here, the Yankees have an advantage.

Also, I once calculated that the team with home field advantage wins 51.52% of the time, if the two teams are of equal strength. The Yankees played in the tougher league (the AL has been winning most of the interleague games the past few years). And the other 4 teams in the AL East combined to finish 6 games over .500 outside their division this year. In the NL East, it was 28 games under. So it looks like the Yankees played in a much tougher division, too.

Friday, October 23, 2009

Will Barry Larkin Get Elected To The Hall Of Fame?

This is being discussed at Baseball Think Factory now. Click on Red Reporter: JinAZ: A HOF Case for Barry Larkin. I sure hope he gets elected. Sean Smith's Wins Above Replacement Rankings have him at 58th all time. Seems like a no brainer.

But what do the voters like? I created two models earlier this year. One is called Predicting Who Makes The Hall Of Fame Using A Logit Model. It gives him a probability of only about 17% of making it. The model took into account career average, number of 100 RBI seasons, all-star games, PAs, MVP awards, world series performance, getting 3000 hits and being a catcher.

The other model was called What Determines Vote Percentage In The First Year Of Hall Of Fame Eligibility? (Part 2). It said it would be 34.6% for Larkin. It took into account the same things as above plus getting 500 HRs, getting 500 SBs, gold gloves (but not being a catcher).

I sure hope my models are wrong. But this analysis was based on what the voters did from 1990-2009.

Tuesday, October 20, 2009

Some Very Old Sabermetric Classics That Are Online

Goodby To Some Old Baseball Ideas (from LIFE magazine 1954-contains some fairly advanced formulas)

If you really want to blow your mind, read this article from Fortune magazine in 1935 about a very early and very sophisticated

The Base in Baseball By Travis Hoke

Why the System of Batting Averages Should Be Changed (by FC Lane around the year 1917-has linear weights values-dedades ahead of its time-the man was a trained scientist)

Then his analysis of the value of walks is at

The Base on Balls

And links to more Baseball Magazine articles are at

Cyril Morong's Sabermetric Research

I posted the following at BTB a few years ago

The post below is the few pages from FC Lane's book called "Batting" that dealt with the batting order. Whether or not it matches up with some of the recent analysis on lineups I will leave up to readers. One expert mentioned that it was a good idea to bat Cy Williams 2nd. FC Lane was a great baseball writer and editor of Baseball Magazine in the early part of the 20th century. It comes to you through the miracle of scanning (well, it was a miracle that I figured out how to use the scanner-actually my wife who is a computer programmer showed me how-the miracle is that she stays married to me)

How the Batting Order "Colors" Batting

FC Lane on the Batting Order

Monday, October 19, 2009

Does Jimmy Rollins Have More Pop As A Left-Handed Batter?

One of the announcers last night, I think it was Buck Martinez, said that Rollins did. If I recall correctly, it was because he hit more HRs as a lefty this past season. He did hit 14 HRs vs. righties (when bats left-handed) and 7 vs. lefties. But, as many fans know, he also faced righties alot more. Here are his HR%'s vs. lefties and righties this year:

vs. lefties (as a right-handed batter) 4.02%
vs. righties (as a left-handed batter) 2.81%

Now for his entrire career.

vs. lefties (as a right-handed batter) 2.83%
vs. righties (as a left-handed batter) 2.33%

So it looks like he actually has more HR power as a right-handed batter, although it is pretty close for his entire career. Another way to look at "pop" is to use isolated power or SLG - AVG. Here are his 2009 figures:

vs. lefties (as a right-handed batter) .195
vs. righties (as a left-handed batter) .165

Now for his whole career.

vs. lefties (as a right-handed batter) .168
vs. righties (as a left-handed batter) .163

So, it looks like he has more power as a right-handed batter (but again, the edge is slight for his whole career). I don't think this is sabermetrics. I think it is just arithmetic. But it seems like announcer make this mistake alot when talking about lefty/righty stats. They look at raw totals instead of percentages, forgetting that there are fewer lefty pitchers than righties.

Friday, October 16, 2009

Even If There Really Are Clutch Hitters And We Can Tell Who They Are, Does It Significantly Affect Winning Or Affect Personnel Decisions?

There were some recent posts around the blogosphere on clutch hitting. As many times before, the discussion was mainly about whether or not it exists. Here are the links:

Overestimating the Fog by JC Bradbury at Sabernomics. This article got discussed at Baseball Think Factory. JC Bradbury also posted two other studies: Does Clutch Pitching Exist? and A Little Clutch Hitting Study. Phil Birnbaum had Doesn't "The Book" study pretty much settle the clutch hitting question?.

Bradbury's "Fog" article refers to an article from a few years ago by Bill James (JC has a link to it). Bill James suggested that our statistical methods might not be able to detect clutch hitting. JC has presents a different view.

Phil makes a refernce to "The Book" by Tom Tango, Mitchel Lichtman and Andrew Dolphin. Their basic finding was that there is clutch ability but it is very slight.

Now getting back to Bill James. He wrote an article a couple of years ago called Mr. Clutch: Big Papi, Chipper, Pujols come through when it counts. James said:

""Clutch" is a complicated concept, containing at least seven elements:

1. The score,
2. The runners on base,
3. The outs,
4. The inning,
5. The opposition,
6. The standings,
7. The calendar."

Then he showed how certain players did alot better in these cases than they normally do. But what he does not say in this article (it may be elsewhere), is how much differently all players hit in these situations than they normally do (that is, the league average differential). This information is necessary to see which players' clutch performance is statistically significant. I made a crude attempt at analyzing James' new measure of clutch in this post: Is David Ortiz A Clutch Hitter?. For differences from normal performance, I used those in close and late situations. It looked like his clutch performance was not significant. But I have not seen James post the clutch data for all players, so a complete analysis has not been done (maybe he has posted this on his site but I have not signed up to pay for it).

I did a study a few years ago called How Many Games Do Clutch Hitters Really Win?. I had two methods of seeing how many wins clutch hitters added above their normal hitting. In one method, only about 10% of the hitters I looked at were able generate as many .5 more wins a season than expected by hitting better in the clutch than they normally do. That assumes that this was thier true clutch ability. In the other method, only 3 out of 71 players added as many .5 wins (that table is partly cutoff now at the link).

Getting back to what "The Book" says, they show that the biggest clutch hitting skill of any player over the 2000-2004 period was .0018 on their wOBA stat (based on another formula they mention, I estimate that is about .004 in OPS). Their clutch situation was the 8th inning or later and the batting team is down 1-3 runs. I don't know what percentage of all plate appearances are made up by these situations, but for close and late situations (CL) it is 15%. My study, mentioned in the previous paragraph, found only a small number of hitters making much difference by their clutch performance and I made no "regression to the mean" adjustment to their clutch stats like "The Book" people did.

I assumed that if a guy's OPS was .050 higher in the clutch than otherwise, that was his true clutch ability. If "The Book's" clutch situation is also about 15% of the PAs (like CL), then I have to assume that their methods say that players add many fewer wins from their clutch performance than my method since my method has a top differential of .117 for Tino Martinez. That is, his OPS was that much higher in the clutch than otherwise. They have a biggest difference of about .004 in OPS, which probably creates very few extra wins. And that is the best they found.

Phil Birnbaum's post also mentioned how different kinds of hitters, like power hitters vs. singles hitters, hit differently in the clutch and whether or not it was due to a change in their approach with the game on the line. I did a study once called Do Power Hitters Choke in the Clutch?. It was inspired by a study by Andrew Dolphin (one of "The Book" people-I have a link to it at this study). I found mixed results, but maybe powers did do a little worse in the clutch than other hitters.

Finally, if clutch hitters are real, do teams make trades to get them? Do they offer those free agents more money? I would love to know if teams have ever done this. There is a study on this called Are Players Paid for "Clutch" Performance? by Jahn K. Hakes and Raymond D. Sauer. My guess is that teams never consider any clutch data when making personnel decisons. If that is the case, then effectively clutch is a non-issue.

Thursday, October 8, 2009

The Percentage Of Batters Faced By Relief Pitchers Since 1953

The data came from Retrosheet. The graph below shows the % faced in the AL.

Now for the NL.

Now for both leagues in the same graph. The AL is the red line and the NL is the blue line.

This last graph shows the difference between the two leagues (NL - AL). In the first year, 1953, the NL had 0.288 while the AL had 0.259 for a difference of about .029. Then the next year the NL was .04 higher. It is intersting to see that there was one trend to about 1970 of the NL edge falling (actually turning negative in 1960 and staying there until 1970, except for 1962). Then there is a trend for at least 10 years of the NL rising relative to the AL. Then it generally declines until about 2000 and then it starts rising again. Maybe the DH plays some role here but it can't explain all of it.

Monday, October 5, 2009

Did The Increased Use Of Relief Pitching Cause A Decline In Clutch Hitting?

This is mainly an elaboration on last week's post called Clutch Hitting Over Time (1952-2008). What I found was a correlation between the fall in percentage of games completed and clutch hitting (as measured by the difference between non-close and late (NCL) situations and close and late (CL) situations). Here, I just turn things around and make the measure of clutch CL - NCL (the two stats I used were AVG and isolated power or ISO).

The table below shows the AL AVG in both CL and NCL for the given periods. I broke things down by 3 year periods because there was alot of volatility from year to year (the Retrosheet data on this in the AL starts in 1953 and 1952 for the NL). The period averages are simple averages. The DIFF column is just the first minus the second and the last column is the percentage of games not completed.

You can see that the difference has generally gotten more negative over time as the percentage of games not completed has increased (a proxy for the use of relief pitching). I was surprised to find that there were years when the AVG in CL situations was higher than in NCL situations. The next graph shows relationship between the last two columns from the table above.

The r-squared in the graph refers to the percentage of variation in clutch hitting (CL - NCL) explained by the percentage of games not completed (%NCG). It was 71.94%. Now the same two tables for the NL.

Interesting that the r-squared is so much lower in the NL. No reason comes to mind.

The next set of graphs does the same thing for ISO in the AL.

The .8652 seems very high. 86.52% of the variation in clutch is explained by the change in games not completed. Now for the NL.

Saturday, October 3, 2009

Pujols wins triple crown

Okay, he has won the triple crown covering the years 2001-2008 in the NL. Here are the top 10 in AVG, HRs, RBIs with a 2000 PA minimum. Once we extend it to 2009, he will still lead in all 3. Maybe some other hitters have done this over a 9 year stretch or longer. Hornsby did for his entire NL career! Ted Williams did it for his entire career! So did Stan Musial! Anybody know who else had a long span triple crown? I will look at obvious choices when I get a chance. Data from the Lee Sinins Complete Baseball Encyclopedia.

AVERAGE
1 Albert Pujols .334
2 Todd Helton .326
3 Barry Bonds .325
4 Matt Holliday .319
5 Chipper Jones .317
6 Larry Walker .316
7 Miguel Cabrera .313
8 David Wright .309
9 Hanley Ramirez .308
10 Moises Alou .304

HOMERUNS
1 Albert Pujols 319
2 Adam Dunn 278
3 Barry Bonds 268
4 Lance Berkman 263
5 Andruw Jones 255
6 Aramis Ramirez 237
7 Pat Burrell 233
T8 Chipper Jones 219
T8 Jim Edmonds 219
10 Derrek Lee 207

RBI
1 Albert Pujols 977
2 Lance Berkman 879
3 Aramis Ramirez 815
4 Andruw Jones 770
T5 Pat Burrell 748
T5 Todd Helton 748
7 Chipper Jones 739
8 Jeff Kent 725
9 Adam Dunn 672
10 Luis Gonzalez 664

Hornsby had a career triple crown while in the NL. He lead the NL in all 3 stats (even with just a 1000 PA minimum) for his entire NL career, from 1915-33. So from 1915-1933, Hornsby lead in AVG, HRs, and RBIs. Maybe someone has said this before but I have not seen it. Here are the top 10

AVG
1 Rogers Hornsby .359 (.35936)
2 Chuck Klein .359 (.35907)
3 Lefty O'Doul .355
4 Paul Waner .346
5 Bill Terry .341
6 Riggs Stephenson .339
7 Babe Herman .332
8 Lloyd Waner .332
9 Kiki Cuyler .330
10 Spud Davis .330

HR
1 Rogers Hornsby 298
2 Cy Williams 247
3 Hack Wilson 238
4 Jim Bottomley 194
5 Chuck Klein 191
6 Mel Ott 176
7 Gabby Hartnett 154
8 George Kelly 148
9 Babe Herman 143
10 Bill Terry 138

RBI
1 Rogers Hornsby 1555
2 Jim Bottomley 1188
3 Pie Traynor 1176
4 Frankie Frisch 1084
5 Hack Wilson 1033
6 George Kelly 1020
7 Charlie Grimm 1015
8 Cy Williams 967
9 Bill Terry 892
10 Edd Roush 891

Now for Ted Williams

AVG
1 Ted Williams .344
2 Joe DiMaggio .322
3 Jimmie Foxx .315
4 Harvey Kuenn .313
5 Dale Mitchell .312
6 Barney McCosky .312
7 Luke Appling .310
8 Hank Greenberg .309
9 Bob Dillinger .308
10 Taffy Wright .308

HR
1 Ted Williams 521
2 Mickey Mantle 320
3 Yogi Berra 318
4 Joe DiMaggio 254
5 Larry Doby 253
T6 Vic Wertz 247
T6 Vern Stephens 247
8 Roy Sievers 243
9 Gus Zernial 237
10 Joe Gordon 228

RBI
1 Ted Williams 1839
2 Yogi Berra 1306
3 Mickey Vernon 1296
4 Vern Stephens 1174
5 Bobby Doerr 1153
6 Joe DiMaggio 1105
7 Vic Wertz 1092
8 Larry Doby 970
9 Mickey Mantle 935
10 Rudy York 922

Now Musial

AVG
1 Stan Musial .331
2 Hank Aaron .320
3 Willie Mays .315
4 Tommy Davis .313
5 Dixie Walker .312
6 Jackie Robinson .311
7 Orlando Cepeda .310
8 Vada Pinson .309
9 Richie Ashburn .308
10 Joe Medwick .305

HR
1 Stan Musial 475
2 Eddie Mathews 422
3 Willie Mays 406
4 Duke Snider 403
5 Gil Hodges 370
6 Ernie Banks 353
7 Ralph Kiner 351
8 Hank Aaron 342
9 Hank Sauer 288
10 Del Ennis 286

RBI
1 Stan Musial 1951
2 Duke Snider 1316
3 Del Ennis 1277
4 Gil Hodges 1274
5 Willie Mays 1179
6 Eddie Mathews 1166
7 Hank Aaron 1121
8 Carl Furillo 1058
9 Bob Elliott 1051
10 Ernie Banks 1026

The following sites also discussed these issues

http://www.philly.com/philly/sports/phillies/20090923_High___Inside__NL_Notes.html

http://www.baseballthinkfactory.org/files/newsstand/discussion/goold_albert_pujols_claim_to_a_triple_crown_or_two/

http://www.stltoday.com/blogzone/bird-land/bird-land/2009/01/albert-pujols-could-be-close-to-claiming-a-triple-crown-or-two/

Thursday, October 1, 2009

Yes, We Should Have Kept An Eye On The Rockies

On June 8th, I asked Should We Keep An Eye On The Rockies? It was right after they swept the Cardinals in St. Louis, scoring alot of runs in a combination of blowouts and un-close games. Given that the Cards were (and still are) a very good team, I thought the sweep was an indicator of how good the Rockies might be.

But I sure got some other precictions wrong. Like Albert Pujols Has A Good Chance To Win The Triple Crown. He lead the league in HRs and RBI's on July 4th while only trailing Hanley Ramirez in average by .008 in average. I thought is better track record (including 2nd half hitting) gave him a good shot to lead in AVG over Ramirez and the other top hitters. But he may not even lead in RBIs.

And then there was Is Ryan Howard The New Mickey Vernon? (Or Is His Career Really In Decline?). His offensive winning percentage(OWP) had declined the last 2 years.

.777 (26)
.675 (27)
.582 (28)

So those declines were .102 & .093. If I had limited the study to declines of .093 or more, there were only 8 guys. The only one whose decline started before age 30 was Vernon. Here is what happened to Vernon:

.759 (28)
.465 (29)
.284 (30)

But he bounced back at age 31 with .579. And Howard, too, has bounced back. I don't have his OWP for this year, but his adjusted OPS the last 4 years, including this year have been (from Baseball Reference)

167
144
124
136

Vernon's 4 years were

160
99
73
113

So I guess I was right: Howard is the new Vernon. Any player under 30 with a .093 or more decline in OWP 2 straight years will bounce back the next year with a better season.:)

Cybermetrics