Cybermetrics: November 2008

Sunday, November 30, 2008

Do The Best Hitters Strikeout More Than Other Hitters (And Has This Changed Over Time)?

I found the correlation between strikeout frequency and offensive winning percentage (OWP) decade by decade. I started with the NL from 1910-1919 and the AL 1913-19 (there was a period after 1900 before this when strikeouts for batters was not compiled). I used players with 2000+ PAs in each decade or time period. Strikeout frequency was calculated two ways, per PA and per AB. So the table below has the correlation between OWP and strikeout frequency for each period. The PA column shows the correlation between OWP and strikeouts per PA and the AB column does the same for strikeouts per AB.

There seems to be quite a bit of fluctuation over time. I don't think I have any good reasons why. Most of the time the correlation is positive, meaning that the better hitters usually strikeout more that average. I am surprised that the correlations have come down since 1980 and that they are not as high today as they were in the 1960s and 1970s. This is because we have guys like Ryan Howard and Adam Dunn around.

It is also interesting that the 1930s were much higher than the periods right before and after. Same for the 1960s and 1970s. The table below shows the top ten batters in OWP for the 1930s and their strikeout rates.

The simple average of the two strikeout rates for these ten were 7.87% and 9.29% while the rates for the entire group in the 1930s were 6.94% and 7.79%. So the very best hitters struckout alot more than average then.

The next table shows the top ten in strikeouts per AB from the 1930s. The simple average of the OWP of these players was .631. Ruth was over .800 and Foxx and Greenberg were over .700 and three others were over .600.

In the AL 1913-19, the top ten in OWP had strikeout rates of 5.81% and 6.75% while the averages for the whole group were 6.97% and 7.99%. So in this period and league, the best hitters struckout alot less than average.

Sunday, November 23, 2008

Should Ryan Howard Try To Strikeout Less?

You might think so. In both 2006 and 2007 he led the major leagues in "contact average." I define that as hits divided by (AB - K + SF). His contact average in 2006 was .448 and in 2007 it was .421 (although it fell to .372 in 2008). And he strikes out about 190 times a year. So more contact would mean more hits, right? Maybe, maybe not. I looked at this issue a few years ago with Strikeouts and the value of hitters.

Generally I found that when batters cut down their strikeout rates from year to year, they hit better. But I also found the effect was slight. Here is an exerpt:

"Using the data from the 2002-3 seasons, I ran a regression with change in AVG being the dependent variable and change in strikeouts per AB being the independent variable.

The equations was:

AVGChange = -.00036 - .274*(SO/AB)Change

This means that if a player cut his strikeouts down by 100, his hits would go up by 27.4. That is like saying on his additional ABs when he does not strikeout, he bats .274. This may not be impressive because for all of these players over the 2002-3 seasons, they already bat about .336 when they don't strikeout. Also, the r-squared was only .068, meaning that the regression explains only about 6.8% of the variation in AVGChange. So if there is any negative side to striking out, it is probably not too large."

This got me to thinking what happens to batter's contact average when their strikeout rate changes (something I had not looked at in this earlier study). I found all the hitters in baseball who had 300+ ABs in both 2006 and 2007 (190 plyaers). Then I calculated their strikeout rates (K/AB), their contact rates and how each one changed from 2006 to 2007. The correlation between the change in strikeout rate and the change in contact rate was .142. So if a batter's strikeout rate increased, his batting average while making contact also increased. Looking at the changes from 2005 to 2006 gave a .18 correlation.

Maybe this makes sense. If you swing harder, you strike out more. But a harder swing means the ball is hit harder, which should mean more hits. So combined with the earlier study, a player should be careful if he thinks he should make a big effort to strikeout less.

Sunday, November 9, 2008

Which Players Had The Most Uncharacteristically Good Seasons? (adjusted for their age)

I did this last week but did not adjust for age. The key stat I used is offensive winning percentage, so read last week's post to understand it. The idea is to find out which player had a season that deviated the most from his norm or career average. But I did not take age into account. Player performance improves, then peaks, then declines. The typical peak may be as young as 25. So a player doing 100 points better than his norm at age 25 is not the same as doing 100 points better at age 38. To find the expected performance at a given age, I found the relationship between age and average OWP at each age using all players with 15+ seasons of 400+ PAs. That relationship is

OWP = -0.0008*AGESQUARED + 0.0474*AGE - 0.0574

This comes from regression analysis which had an r-squared of .95, meaning that 95% of the variation in an age's average OWP is explained by the equation. The standard error was .008 or pretty low. But as Bill James, Phil Birnbaum and probably many others have pointed out, averaging each player's OWP at a given age to predict career trends can have many problems. One is that as we get to older ages, there are not many players to use to get an average because so many players are not good enough to even play anymore. If those retired guys had kept playing, the average OWP for ages 39, 40, etc. would be much lower. So this equation will underestimate how unusual some seasons might have been for older players.

To predict a player's OWP at a given age, the above equation is used. But an adjustment is made based on his career norm, too. The average OWP by age for the group was .588. If a player had a .550 career OWP, then at any age his predicted OWP is adjusted down by .038 (a player with a career OWP of .638 would have each predicted OWP upped by .050). Once that was done, I found the 50 top seasons in terms of OWP above the prediction. The table below shows this. For example, Tommy Tucker in 1989 had an OWP of .783 at age 25. The equation predicts that he would have an OWP of .628. But his career OWP was .495, or .093 below the norm. So his adjusted prediction is .535. Since .783 - .535 = .248, his OWP was .248 better than expected. This was the highest positive difference ever (you will need to click on the table to see a larger version).

Barry Bonds' 2004 season at age 39 is number 31. His 2002 season is 54th, his 2001 season is 163rd and his 2003 season is 172nd. There were a total of 6319 season. So the four Bonds seasons from 2001-04 (ages 36-39) are all in the top 2.7%. He is the only player in the top 3% to have 4 seasons.

Sunday, November 2, 2008

Which Players Had The Most Uncharacteristically Good Seasons?

Many fans know that Norm Cash batted .361 in 1961. He also had 41 HRs and 132 RBIs. Never batted .300 again (his last year was 1974) nor did he ever reach 40 HRs or 100 RBIs. Perhaps this is the most atypically good season ever. He clearly performed well above what ended up as being his career norms (was it the corked bat mentioned in the ESPN almanac? I recall that physicist Robert Adair said a corked would not really help).

Anyway, to study this, I looked at all players with 10+ seasons with 400+ PAs through 2005 (there were 504 players). I found the simple mean of their yearly offensive winning percentage or OWP (a Bill James stat that says what a team's winning percentage would be if all 9 batters were identical and you gave up an average number of runs). Since I used data from the Lee Sinins complete baseball encyclopedia, OWP is also park adjusted. Then I subtracted that mean from their best year. The following table shows the top 25 in terms of best minus average OWP. Cash's 1961 season was 25th. Another table follows that only looks at seasons since 1920.

Cybermetrics