Saturday, July 11, 2009

How Important is Home Field Advantage in the World Series?

That is the name of an article I wrote a few years ago for Beyond the Boxscore. It is at How Important is Home Field Advantage in the World Series?. I figured out the probability of the team with HFA winning in 4, 5, 6 and 7 games and then added those together to get 51.52%. Not everyone agreed with my methods and you can read about that in the comments. I got reminded of this because Sky Andrecheck recently wrote an interesting article at Baseball Analysts called Is The All-Star Game The Biggest Remaining Game for Dodgers?. He came up with 51.26%. We both used the historical average of home teams winning 54% of the time in regular season games.

Monday, July 6, 2009

My Sabermetrics Page Has Moved

Click on this link to go to the new address Cyril Morong's Sabermetric Research. This link has alot of my research on it. If some of those links don't work, please be patient. I am working on getting it all straightened out. Geocities is going away and I just switched to the new Yahoo service. So the new address is

http://cyrilmorong.com

Sunday, July 5, 2009

Do Pitchers Differ In Their Ability To Prevent HRs? (and does it persist over time?)

There was a very intriguing post a few weeks ago at the Hardball Times by Derek Carty called Using FIP to evaluate pitchers? I wouldn’t. The FIP refers to "fielding independent ERA," the idea that pitchers should be evaluated only on outcomes that don't involve the fielders. That would include HRs allowed. But, the article said:

"Here's how things work: a pitcher can influence the rate of fly balls he gives up. By this logic, the more fly balls allowed, the more total balls will clear the fences for home runs (all else being equal). However, while a starting pitcher can control the rate of fly balls allowed, he cannot do a very good job of controlling the rate at which those fly balls become home runs (with very few exceptions).

To put it more simply, starting pitchers don't have any underlying ability to prevent home runs—the best they can do is prevent fly balls. If those fly balls are clearing the fence at too high a rate (or too low), we say that the pitcher has been unlucky (or lucky)."

I am not sure I completely agree with this. It could be that there is a difference in flyballs allowed that accounts for the HR rates allowed across pitchers. But whatever the reason, the year-to-year correlation of HR rates allowed by pitchers, although not as high as they are for their walk rates and strikeout rates, they are not small.

The data I looked at involves year-to-year correlations of various years for pitchers who faced at least 500 batters in both of two consecutive seasons. The table below summarizes the results. Starting with the 1955 season, I eliminated IBBs from the calculations. HBP were counted as walks in all years. The columns show the correlation between the rates allowed for each stat year-to-year. The last line is the simple average of all the correlations.

Overall, the correlations are much higher for strikeout rates and walk rates (the denominator I used in all cases was batters faced). But the correlations do seem to be getting higher for the HR rates. It was very surprising to see how low they were in some of the earlier years.

One more thing that I tried (and this really makes me think that we should keep looking at HR rates) is that I found a high correlation in HR rates from one period to the next using more years. For that, I found all the pitchers that had 1000+ batters faced in both the 2003-05 period and the 2006-08 period. The correlations for walk rates and strikeout rates from period 1 to period 2 were 0.736and 0.767, respectively. But for HR rates it was 0.505. This seems high enough to say that, yes, pitchers do differ in the HR rates they allow, even if the reason is their flyball rates.

Saturday, July 4, 2009

Albert Pujols Has A Good Chance To Win The Triple Crown

He leads the NL in both HRs and RBIs by 7. He is 2nd in AVG with .336 while Hanley Ramirez is hitting .344. But I compared Pujols to Ramirez and the rest of the NL top ten in AVG, and based on previous performances, Pujols has done much better in AVG. The table below shows the current averages of the NL top 10. It also shows what they hit in 2008, their current career average, and their highest average before 2008.

In terms of what he hit last year, his career average and his high average, Pujols is well ahead of the other guys in the top ten. Sandoval only had 145 ABs in 2008 and only has 422 so far in his career. Pujols also has a history of hitting well after the All-Star break. Last year it was .366 and for his career it is .344. Looks like he has a good chance to lead in AVG once the season is over.

Monday, June 29, 2009

Which Players Had The Best HR-To-Strikeout Ratios?

I looked at every player with 5000+ PAs since 1920. I found their relative HRs and their relative strikeouts. Then found the ratio of the two. Ken Williams, for example, hit 3.70 times as many HRs as the average player of his time and league while striking out only 75% as often as the average player. Since his ratio of ratios (3.7/.75 = 4.93) is the highest of anyone in the study, he is ranked first. The data comes from the Lee Sinins Complete Baseball Encyclopedia. The table below shows the top 25:



DiMaggio hit only 41% of his HRs at home in his career while Williams hit 72%. So it is likely the case that DiMaggio would rank first, and probably by a wide margin, if HRs were park adjusted. Ted Williams hit less than 50% of his HRs at home.

The next table shows which players had the lowest relative strikeout rates among guys who hit 40+ HRs. Again, no pikers here. In 2004, Bonds had only 41 strikeouts while the average player would have had 100. I am so proud to see the demonstration of Polish power with 3 for Ted Kluszewski and 1 for Carl Yastrzemski (whose 1970 season ranks 27th). Don't forget Stan Musial is 13th on the above list.

Monday, June 22, 2009

Harold Reynolds And Using Context To Evaluate Hitters

ESPN analyst and former major league player wrote a blog entry called Enjoy it for what it's worth. Sky Kalkman at "Beyond the Boxscore" wrote a response called Defending Harold Reynolds. Reynolds criticizes some of the "newer" stats like OPS:

"Not all statistics work. Some do, some don't. And one of the stats that has become real popular is OPS. On-base plus slugging. All of a sudden, it's this stat that defines whether a guy is a good ball player or not. And the fact of the matter is, if you're a power hitter then the situation will dictate what a pitcher does with you - either walk you or pitch you real careful. So more than likely you're going to end up on base and therefore your on-base percentage goes up. This in my mind has become the stat the everyone thinks is the be all and end all. It is not. If you have a ball club that's a great offensive team then that changes everything. But if you have a guy like Adrian Gonzalez, for example, his OPS is going to be high - he's got a lot of home runs and walks a lot...because you're not going to pitch to him. Power guys like Giambi and Dunn have always had high OPS because no one wants to pitch to them. But it takes two hits to score them from first."

Reynolds began by saying that context and situation matters and it probably does. But this raises the question of how much? Some of my past research touches on these issues and I will discuss that below. But first, even if you don't like OPS, or OBP + SLG, it is still better than the traditional stats (for example, he mentions that Ichiro Suzuki gets 200+ hits every year). The 1998 STATS, INC. Baseball Scoreboard book had a nice little study that showed that the team with the highest OPS in a game has a winning percentage of .852 while it was .804 for batting average (they looked at several other stats and OPS had the highest winning percentage).

But let's look at some of what Reynolds said specifically. He seems to be saying that when a slugger walks on a weak hitting team, it is not so valuable. But I had done some analysis on this. It was called The Value of OBP and SLG by Lineup Position for High-Scoring and Low-Scoring Teams. If you go to this link, you will see that the marginal run value for the cleanup hitter's OBP is actually higher on the low scoring team.

Now how much might context matter or change our evaluation of hitters if we are using OBP and SLG? My analysis on this is called Evaluating Hitters Based on Their Lineup Slot. The most anyone was adjusted was a +6.2 runs per season, for Luis Castillo. So if I took into account that he was a leadoff hitter instead of a generic hitter, his value to his team would be about 6 more runs a year. This seems pretty small. So context does not change our valuation much.

Then there is the issue of situational hitting. My analysis on this is called The Problem With “Total Clutch” Hitting Statistics. What I found was that OPS was highly correlated with how much impact a hitter had on winning and losing depending upon the situation. The stat I used was Ed Oswalt’s measure “player’s win value” (or PWV). It makes a HR in a close and late game more valuable than one in a blowout. It calculates how much each hitter's result changed his team's chances of winning. The correlation between PWV/PA and OPS was .948 (a perfect correlation is 1.00). The relationship was even stronger when I broke down OPS into its separate components of OBP and SLG. So the bottom line is that we really don't need to know the situations a player faced to evaluate him. His regular stats tell us that.

Monday, June 15, 2009

Which Players Had The Most Surprising Walk Rates? (Part 2)

Click on Part 1 to see what I did last January. Then I looked at walk rates relative to the league average as a function of isolated power, relative to the league average with the idea being that it is harder to walk alot if you are not a power hitter.

What inspired me to go back and do more on this was a discussion of walks between Bill James and Joe Posnanski at Talkin' about the underappreciated base on balls, with Bill James. Another interesting take on walks appeared in Baseball Magazine in 1917. The article was by FC Lane and seems ahead of its time. It was called The Base on Balls: Why Should the Records Ignore This Powerful Factor in Brainy Baseball?

This time I also included a variable for height and one for stealing. Height was in inches and stealing was stolen bases divided by singles + walks + HBP. Sort of a frequency. That was also relative to the league average. The idea is that shorter guys have an easier time walking and guys who steal alot won't get walked too much if the pitcher can help it. Here is the regression equation. Everything is relative to the league average except height. My data sourse in the Lee Sinins Complete Baseball Encyclopedia.

Walks = 195.58 - 1.25*SB - 1.8*HT + .369*ISO

The stats are all converted to a number relative to 100. If you were average at something, then you get a 100 (except for SB where 1.00 was average). Height and isolated power were significant but stealing was not.

The graph below shows the players with the most surprising walk rates. That is, their walks relative to the league average were the most above league average compared to what the equation predicted.



So Thomas walked 2.19 times as often as the average hitter. His isolated power was only 57% of the league average, he was 71 inches tall and his stolen base rate was only 68% of the league average. Now the guys who walked the least compared to expectations.



I will try to give more details later. But time to give a test.

I am back. The r-squared was .148 and the standard error was about 30. I also tried taking logs of all the variables but the results were no better. For the linear regression there was no correlation between the prediction error and any of the independent variables. I also wonder if height should be relative to the league average. But it raises the question if a 6'0" tall pitcher has a harder time throwing strikes to a 5'6" batter than a 5'6" pitcher. I don't but I assumed the height of the pitcher did not matter.