Monday, June 29, 2009

Which Players Had The Best HR-To-Strikeout Ratios?

I looked at every player with 5000+ PAs since 1920. I found their relative HRs and their relative strikeouts. Then found the ratio of the two. Ken Williams, for example, hit 3.70 times as many HRs as the average player of his time and league while striking out only 75% as often as the average player. Since his ratio of ratios (3.7/.75 = 4.93) is the highest of anyone in the study, he is ranked first. The data comes from the Lee Sinins Complete Baseball Encyclopedia. The table below shows the top 25:

DiMaggio hit only 41% of his HRs at home in his career while Williams hit 72%. So it is likely the case that DiMaggio would rank first, and probably by a wide margin, if HRs were park adjusted. Ted Williams hit less than 50% of his HRs at home.

The next table shows which players had the lowest relative strikeout rates among guys who hit 40+ HRs. Again, no pikers here. In 2004, Bonds had only 41 strikeouts while the average player would have had 100. I am so proud to see the demonstration of Polish power with 3 for Ted Kluszewski and 1 for Carl Yastrzemski (whose 1970 season ranks 27th). Don't forget Stan Musial is 13th on the above list.

Monday, June 22, 2009

Harold Reynolds And Using Context To Evaluate Hitters

ESPN analyst and former major league player wrote a blog entry called Enjoy it for what it's worth. Sky Kalkman at "Beyond the Boxscore" wrote a response called Defending Harold Reynolds. Reynolds criticizes some of the "newer" stats like OPS:

"Not all statistics work. Some do, some don't. And one of the stats that has become real popular is OPS. On-base plus slugging. All of a sudden, it's this stat that defines whether a guy is a good ball player or not. And the fact of the matter is, if you're a power hitter then the situation will dictate what a pitcher does with you - either walk you or pitch you real careful. So more than likely you're going to end up on base and therefore your on-base percentage goes up. This in my mind has become the stat the everyone thinks is the be all and end all. It is not. If you have a ball club that's a great offensive team then that changes everything. But if you have a guy like Adrian Gonzalez, for example, his OPS is going to be high - he's got a lot of home runs and walks a lot...because you're not going to pitch to him. Power guys like Giambi and Dunn have always had high OPS because no one wants to pitch to them. But it takes two hits to score them from first."

Reynolds began by saying that context and situation matters and it probably does. But this raises the question of how much? Some of my past research touches on these issues and I will discuss that below. But first, even if you don't like OPS, or OBP + SLG, it is still better than the traditional stats (for example, he mentions that Ichiro Suzuki gets 200+ hits every year). The 1998 STATS, INC. Baseball Scoreboard book had a nice little study that showed that the team with the highest OPS in a game has a winning percentage of .852 while it was .804 for batting average (they looked at several other stats and OPS had the highest winning percentage).

But let's look at some of what Reynolds said specifically. He seems to be saying that when a slugger walks on a weak hitting team, it is not so valuable. But I had done some analysis on this. It was called The Value of OBP and SLG by Lineup Position for High-Scoring and Low-Scoring Teams. If you go to this link, you will see that the marginal run value for the cleanup hitter's OBP is actually higher on the low scoring team.

Now how much might context matter or change our evaluation of hitters if we are using OBP and SLG? My analysis on this is called Evaluating Hitters Based on Their Lineup Slot. The most anyone was adjusted was a +6.2 runs per season, for Luis Castillo. So if I took into account that he was a leadoff hitter instead of a generic hitter, his value to his team would be about 6 more runs a year. This seems pretty small. So context does not change our valuation much.

Then there is the issue of situational hitting. My analysis on this is called The Problem With “Total Clutch” Hitting Statistics. What I found was that OPS was highly correlated with how much impact a hitter had on winning and losing depending upon the situation. The stat I used was Ed Oswalt’s measure “player’s win value” (or PWV). It makes a HR in a close and late game more valuable than one in a blowout. It calculates how much each hitter's result changed his team's chances of winning. The correlation between PWV/PA and OPS was .948 (a perfect correlation is 1.00). The relationship was even stronger when I broke down OPS into its separate components of OBP and SLG. So the bottom line is that we really don't need to know the situations a player faced to evaluate him. His regular stats tell us that.

Monday, June 15, 2009

Which Players Had The Most Surprising Walk Rates? (Part 2)

Click on Part 1 to see what I did last January. Then I looked at walk rates relative to the league average as a function of isolated power, relative to the league average with the idea being that it is harder to walk alot if you are not a power hitter.

What inspired me to go back and do more on this was a discussion of walks between Bill James and Joe Posnanski at Talkin' about the underappreciated base on balls, with Bill James. Another interesting take on walks appeared in Baseball Magazine in 1917. The article was by FC Lane and seems ahead of its time. It was called The Base on Balls: Why Should the Records Ignore This Powerful Factor in Brainy Baseball?

This time I also included a variable for height and one for stealing. Height was in inches and stealing was stolen bases divided by singles + walks + HBP. Sort of a frequency. That was also relative to the league average. The idea is that shorter guys have an easier time walking and guys who steal alot won't get walked too much if the pitcher can help it. Here is the regression equation. Everything is relative to the league average except height. My data sourse in the Lee Sinins Complete Baseball Encyclopedia.

Walks = 195.58 - 1.25*SB - 1.8*HT + .369*ISO

The stats are all converted to a number relative to 100. If you were average at something, then you get a 100 (except for SB where 1.00 was average). Height and isolated power were significant but stealing was not.

The graph below shows the players with the most surprising walk rates. That is, their walks relative to the league average were the most above league average compared to what the equation predicted.

So Thomas walked 2.19 times as often as the average hitter. His isolated power was only 57% of the league average, he was 71 inches tall and his stolen base rate was only 68% of the league average. Now the guys who walked the least compared to expectations.

I will try to give more details later. But time to give a test.

I am back. The r-squared was .148 and the standard error was about 30. I also tried taking logs of all the variables but the results were no better. For the linear regression there was no correlation between the prediction error and any of the independent variables. I also wonder if height should be relative to the league average. But it raises the question if a 6'0" tall pitcher has a harder time throwing strikes to a 5'6" batter than a 5'6" pitcher. I don't but I assumed the height of the pitcher did not matter.

Saturday, June 13, 2009

What Luis Castillo teaches us about the internet

I did the following google search (searching blogs only)

"Luis Castillo" yankees

I restricted it at first to the the following dates: June 10-11. 25 hits came up. Then I had the search cover the past 12 hours and there were 1100 hits. The last day gave over 1500.

In case you don't know, last night Castillo dropped a pop up that allowed 2 runs to score in the bottom of the 9th inning, giving the Yankees a 1-run win over the Mets. If he had caught it, the game would have ended.

Wednesday, June 10, 2009

My Interview With Vince Gennaro

Last year I interviewed him for the now defunct paper, the Chicago Sports Weekly. Here is a link. It is a PDF file.

Vince Gennaro interview

Vince is the author of Diamond Dollars: The Economics of Winning in Baseball. He also just got elected as secretary of SABR. So it is good for us to have someone with his training and experience on board.

He was also interviewed at Hardball Times

Monday, June 8, 2009

Should We Keep An Eye On The Rockies?

They just swept the Cardinals in a 4 game series in St. Louis. Prior to the series, the Rockies were 21-32 and the Cards were 31-23. And last year the Cards won 12 more games (84 vs. 76).

The Rockies also won their last game in Houston just before going on to St. Louis, beating the Astros 10-3. The scores against the Cards were 11-4, 10-1, 7-2 and 5-2. So no game was close. Also, before that win against the Astros, the Rockies had been averaging only 3.83 runs per game on the road. Then they score 43 in 5 games. The starting pitchers they faced were not a bad lot. Below are their names followed first by their ERAs going into the game with the Rockies and their ERAs from last year.

Wandy Rodríguez-2.26, 3.54
Adam Wainwright-3.38, 3.20
Todd Wellemeyer-5.05, 3.71
Joel Piñeiro-3.86, 5.15
Brad Thompson-4.12, 5.15

No all-stars, but they are capable major league hurlers. Combined they have to be at least average. Fellow SABR member Dean Hendrickson suggested that firing Clint Hurdle is what changed things. I sure don't know.

Saturday, June 6, 2009

Rick Reuschel for the Hall of Fame

(this was posted on the Chicago Sports Review site a few years ago but as far as I can tell, alot of my articles there have been taken down. Someone at Joe Posnanski's site mentioned that Reuschel was very under rated. So I have been wanting to post some of my old CSR articles, so this seems like a good time) First, some highlights:

-His strike-out-to-walk ratio was 31% better than the league average

-He gave up 21.6% fewer HRs than average

Yes, I’m crazy. But not because I think Reuschel deserves to be in the Hall of Fame. What makes me crazy is that I was doing a lot ridiculously time-consuming analysis that opened my eyes. As of now, I don’t know how he has done in the voting or if he is still eligible for induction based on the writer’s vote or if he has to wait for the Veterans Committee.

Before I get into explaining the actual analysis, lets review a few things. A pitcher needs to prevent the other team from scoring and, in this endeavor, he can be aided or hindered by his fielders. So one thing to look at are his defense-independent pitching stats. Many of you probably know that I am taking a page out of Voros McCracken’s book on this. Readers who have not heard of him, Google his name. He came up with the idea a few years ago that on balls in play (not HRs, walks or strikeouts), the batting average that most pitchers allow is not too much different from other pitchers and that it appears to be mainly influenced by the fielders and hitters. Analysts Tom Tippett and Mike Emeigh, to name just two, have challenged McCracken’s thesis. But it is still pretty good.

So we should look at the things a pitcher controls, like HRs, walks and strikeouts (and even if the pitcher has some influence on the batting average on balls in play, the fielders and hitters still play a role, so we are still isolating just what the pitcher does and so it is a legitimate analysis). But how a pitcher does in HRs, walks or strikeouts must be put into context. They should be adjusted for the league average and for park effects.

So using the linear regression technique, I came up with a formula for estimating a pitcher’s ERA. I looked at all pitching seasons from 1920-2000 with 150+ IP. Here is the formula that I got

ERA = .44*HR + .4*BB - .3*K

The intercept or constant term was less than .0001. The r-squared was .422 and the standard error was .75. But for each pitcher I used not his ERA, but how much he differed from the league average in the given year (actually how many standard deviations from the mean he was). Same thing for the HRs, walks and strikeouts. Using standard deviations, since they measure dispersion, does a better job of placing a pitcher in the context of his season than what percent above or below the mean they are in some stat (or the absolute difference) since we see where a pitcher fits in the statistical distribution of the given year.

Once I had this equation, I plugged in each pitcher's data on HRs, BBs and Ks and ranked them all (2000+ IP minimum during the 1920-2000 period) in terms of how many standard deviations below the mean they were in their careers (using only seasons when they pitched 150+ IP-data came from the Sean Lahman data base). Each pitcher’s HRs were adjusted for park effects before their HRs were plugged into the formula. I used park effect data from fellow economist and SABR member Ron Selter. So if a pitcher was 1 standard deviation below the mean, he got a -.44. If he were, say, half a standard deviation better on walks and strikeouts, he gets a -.2 and a -.15. So he would come out -.79. How I used the park data is explained below in technical notes (I don’t have any data on how parks affect strikeouts and walks so those were not adjusted).

Rick Reuschel was 14th! Yes. That seems to be a high enough ranking in an 80-year period to merit the Hall of Fame. Here are the top 20 in terms of how many standard deviations below the average ERA they were for their careers:

Dazzy Vance –1.25
Lefty Grove –1.21
Roger Clemens –1.18
Greg Maddux –1.14
Carl Hubbell -.94
Randy Johnson -.92
Kevin Brown -.91
Dwight Gooden -.89
Mike Garcia -.84
Hal Newhouser -.82
Sandy Koufax -.81
Bert Blyleven -.75
Ron Guidry -.73
Rick Reuschel -.71
G. Alexander -.70
Gaylord Perry -.68
Urban Schocker -.65
John Smoltz -.65
Lefty Gomez -.65
Bob Gibson -.62

(I also looked at ERA relative to the league average in addition to this standard deviation technique and he would be 20th, still a very high ranking). Reuschel is in obviously very good company. This means that, when only looking at pitcher controlled factors, he was outstanding at preventing runs in the context of his era and parks. This was done while pitching 3500 innings (38th since 1920) and winning over 200 games.

Some conventional stats back up my claim. From 1972 –1984, the years Reuschel was on the Cubs, he was 23rd among all major league pitchers with 1000+ IP in HRs allowed relative to the league average (thanks to the Lee Sinins Sabermetric Encyclopedia). He allowed 25% HRs fewer than the average pitcher would have, pitching in Wrigley Field! Wrigley was a great HR park during this period, compared to other NL parks, allowing 42% more HRs than average. Yet Reuschel was one of the best in baseball at preventing HRs during this period!

For his entire career, Reuschel gave up 21.6% fewer HRs than average. This is 41st for all pitchers from 1920-2004 with 2000+ IP. Pitchers he is ahead of include

Randy Johnson
Dwight Gooden
Bob Feller
Bob Lemon
Dazzy Vance
Allie Reynolds
Lefty Gomez
Warren Spahn
Herb Pennock
Waite Hoyt
Jim Palmer

He is 50th in strike-out-to-walk ratio relative to the league average in all of baseball history for pitchers with 3000+IP. His strike-out-to-walk ratio was 2.16, 31% better than the league average of 1.65. Pitchers he is ahead of include

Catfish Hunter
Warren Spahn
Bob Gibson
Whitey Ford
Nolan Ryan
Waite Hoyt
Jack Morris
Orel Hershiser
Phil Niekro
Jim Palmer
Ted Lyons

Jim Palmer, for example, had the luxury of great fielders behind him, like Brooks Robinson, Mark Belanger, Dave Johnson, Bobby Grich and Paul Blair. Did the Cubs have anyone that good from 1972-1984?

So, we can see that Reuschel was very, very good in things that the pitcher mostly controls: HRs, BBs and Ks. His high rankings in these stats warrant his induction into the Hall of Fame, especially when we see some of the pitchers he is ahead of.

Technical note: In using the park effects to adjust for HRs, I take the factor, say 120, and find the number that is half-way between it and 100 since a pitcher only pitches half his innings in his own park. Then I would use 110. If a pitcher allowed 1 HR per 9 IP, I divide 1 by 1.10 and get .91. Then I see how far from the league average that is. That difference gets divided by the standard deviation. Then that gets plugged into the formula.

Friday, June 5, 2009

Can Joe Mauer Bat .400 This Year?

His average right now is .436 and people are already talking about it. You can read a couple of articles about this here and here and here. But he only has 110 at-bats so far. He will need to hit about .390 the rest of the way to finish at .400 (assuming about 3.5 at-bats per game and 107 more games).

Just about a year ago I had a post on Chipper Jones batting .400. You can read that here. He was at .421 on June 6, in over 200 ABs but he finished at .364. In that post, I looked at the previous .400 hitters. What I found was that on average, since 1900, the mean of their previous career average was .342, they hit .378 on average the year before, their average age was 27 and the league average in the year they hit .400 was, on average, .282.

How does Mauer compare to this profile? He came in to 2009 with a career average of .317 and hit .328 last year. So those numbers are not very close to the .400 hitters. His age is about right. But the league average in the AL this year is just .267, well below the norm of .282. When Ted Williams hit .406 in 1941 (the last time anyone hit .400), the league average was .266. But that included the pitchers. Taking them out, the average was .276. So except for age, Mauer does not fit the profile of a .400 hitter.

One more thing. The highest average ever recorded by a catcher who qualified for the batting title was .362 by Mike Piazza in 1997. That means that he (Mauer) would be 38 points ahead of the next best. Here are the top 2 in season average for the other 7 positions. No one else has a 38 point lead

George Sisler 1922 .420
George Sisler 1920 .407
Bill Terry 1930 .401

Nap Lajoie 1901 .426
Rogers Hornsby 1924 .424

Hughie Jennings 1896 .401
Luke Appling 1936 .388

John McGraw 1899 .391
George Brett 1980 .390

Tip O'Neill 1887 .435
Ed Delahanty 1899 .410
Jesse Burkett 1896 .410

Hugh Duffy 1894 .440
Ty Cobb 1911 .420

Willie Keeler 1897 .424
Joe Jackson 1911 .408

The only one with a lead even close to 38 points is O'Neill. So a .400 average from Mauer would be simply unprecedented by this other standard

Thursday, June 4, 2009

Is Lefty Grove The Most Underrated Player In History?

Joe Posnanski thinks so. I concur (and not just because Joe has a Polish sounding last name, although that is often reason enough). See his post Lefty. Grove did very poorly in a recent opinion poll on who was the greatest lefthanded pitcher ever.

I have written several articles that show Grove may have been the best ever. I took park effects into account, normalized to the league average, used fielding independent ERA, and calculated wins above replacement level. I also looked at the best 5-year performances and Grove almost always comes out on top. In fact, he often had 2 distinct 5-year periods among the leaders. Here are the articles:

The Best Five-Year Pitching Performances

The Best Five-Year Pitching Performances Since 1920 Based on Fielding Independent ERA

The Best Pitchers Since 1920

The All-Time Leaders in Park-Adjusted Pitching Wins Above Replacement Level

Grove appears to be the top lefty on this last one, although if 2006 through today were counted, Randy Johnson might be better. But through last year, Randy Johnson was still 133 runs saved behind Grove in about 100 more IP (adjusting for park effects and league average, from the Lee Sinins Complete Baseball Encyclopedia).

Tuesday, June 2, 2009

Homeruns And The New Yankee Stadium (they are about 55% more likely than elsewhere)

Just in case no one else has posted this, Yankee batters have a HR% at home of 5.6% while it is 3.5% on the road. So the frequency is 60% higher at home (5.6/3.5 = 1.6). Yankee pitchers allow a HR% of 4.9% at home and 3.24% on the road. So that frequency is 51% higher at the new Yankee stadium (4.9/3.24 = 1.51).

That 5.6% the Yankee batters have at home would mean that each batter in your lineup would hit about 35 HRs for the whole season, assuming each batter in the lineup got 620 ABs (last year the average AL team had 5,580 ABs and that divided by 9 is 620). Then .056*620 = 34.72. Of course, you would have to do that both home and away.

Monday, June 1, 2009

Are Mauer And Morneau The New M & M Boys?

Back in the 1960s, slugging Yankee teammates Roger Maris and Mickey Mantle were sometimes called the "M & M boys." I don't recall if Willie Mays and Willie McCovey got that nickname (although a Topps baseball card called them "fence busters" even though they did not study to be come cops like Shaquille O'Neal). Now, the Twins have Justin Morneau and Joe Mauer. They both just had great months in May. Morneau hit .361 with a .459 OBP and a .713 SLG (for an OPS of 1.172). Mauer did even better, with the line of .414-.500-.838, for an OPS of 1.338. So I started to wonder how that stacked up against the best months from the earlier versions of the M & M boys.

I found all the months when the three pairs of teammates both had an OPS of 1.000 or higher (minimum 20 games played). Then I added them and also multiplied them (multiplying might give a better idea of a great 1-2 punch since both players have to do well). The results are ranked by the product of their monthly OPS in the table below:

Mantle and Maris were sensational in July 1961. Mantle's line was .375-.508-.854. He hit 14 HRs and had 28 RBIs in 29 games. Maris had .330-.403-.755 with 13 HRs and 31 RBIs in 28 games. The Yankees went 20-9 while scoring 162 runs. The Twins this May did not fare so well, going only 14-16, although they did score 168 runs. Morneau had 9 HRs and 29 RBIs while Mauer had 11 & 32.

The performance of Mays and McCovey in September 1968 is amazing because it was the year of the pitcher. But the Giants entered September 12 games behind the Cardinals and still finished 9 back in 2nd place. They were eliminated on September 15 and only went 15-12.

I did try to adjust each player's OPS for the league average. In doing so, I took off .024 for Mauer & Morneau in June 2006 since in 2006 that was the difference between the 2006 NL OPS with and without pitchers included. For this past May, I used .021, the difference from the 2007 NL. So each player had his OPS divided by the relevant league average and normalized it by multiplying it by an OPS of .725. Then I summed them and multiplied them as before. The results, ranked by product, are in the table below:

Sources included Retrosheet, The Lee Sinins Complete Baseball Encyclopedia, ESPN site, and Yahoo site