Saturday, March 24, 2012

Power Hitting 2B Men Over Time

This issue came up recently on the email list of the Hornsby chapter of SABR. Someone said that the power hitting 2B man was not necessarily a recent phenomenon.

I covered something similar in a post from 2009 called Positional Hitting Over Time. I looked at how each position did in SLG relative to the league average. Here is what I said about 2B men:
"2B men started declining in the AL in the 1940s, then starting rising again in the 1960s. But even now they have not reached their earlier peak. They started declining in the 1920s in the NL but then started back up in the late 1950s."

Below are the top 25 2B men with 5000+ PAs in SLG relative to the league average. The top 4 played along time ago and it looks like 6 of the top 10 played before 1940 or mostly before 1930. The numbers are rate, the player's SLG and the league SLG. For Hornsby, we have .577/.392 = 1.47 and multiplied by 100 we get 147. The table after that uses ISO. Data from the Lee Sinins Complete Baseball Encyclopedia.

It looks like many of the power hitting 2B men played a long time ago.

Tuesday, March 13, 2012

Do Batters Learn During Their Career?

This is a guest post by Tom Ruane. It is from some SABR list posts in 1999. I was reminded of them by a recent post at "The Book" blog by mgl called What effect does batter/pitcher familiarity have on performance?

Tom Ruane posted three separate items and I combined them all into one site which you can read at Do Batters Learn During Their Career? The basic question is how batters do against pitchers the more they face them.

Below is an excerpt from the first post Tom did in 1999 and then a list of some issues he dealt with in the next two posts.
"Back in 1996, Dave Smith, the president and founder of Retrosheet, did a study showing that batters tended to improve against a pitcher as a game went on. His paper was entitled "Do Batters Learn During a Game?" and, rather than attempt to summarize his methods and findings, I recommend that you read his work for yourself. A copy of the paper can be found at:

This got me to wondering if a batter showed a similar increase in performance as his career progressed. Was a batter or a pitcher at a disadvantage the first time they ever faced each other? How did their performance change as they got more familiar? In order to attempt to answer these questions, I looked at all batter-pitcher matchups from 1980 to 1998. I only considered matchups where the batter and pitcher faced each other at least 20 times over the course of their careers and examined how they did in each of their first 20 confrontations. (Note: since my data started in mid-career for all players active prior to 1980, I did not include any match-ups where both pitcher and batter were active in the 1970s.)

[Editor's note: Tom has a very detailed table here so click on the link I provided to see it]

So Batters seem to be at a noticeable disadvantage when facing an unfamiliar pitcher. The field seems to level off after the third time they see each other, but the batter still seems to get slightly better the more he sees a pitcher. Here's the breakdown in groups of five plate appearances:

1- 5 .727
6-10 .742
11-15 .744
16-20 .758

Five of the top six slugging percentages and four of the top five on-base percentages occurred in PAs 16 through 20."

Now some issues that Tom addressed in the second two posts:
Someone wondered if requiring twenty or more plate appearances was biasing the sample by eliminating the batters who fail to learn (and so leave the big leagues before facing any one pitcher enough to meet my criteria).

Someone else argued that what we're really seeing here are within-game effects, that the first appearance is earlier in a game than the second, which is probably earlier in a game than the third.

Someone else wondered if the later plate appearances came in higher run scoring years.

There was also some question at the time about how much of this effect was really due to batter learning during a game.

Friday, March 9, 2012

Does Consistent Play Help a Team Win?

That is the title of a recent post at Fangraphs by Bill Petti. He looked at the volatility of runs scored per game and runs allowed per game for teams. Click here to read it.

I am not totally clear on his technique, but here is one of his conclusions:
"Overall, RS_Vol [game to game runs scored volatility] had a negative relationship to team wins. So the more consistent a team’s run scoring, game to game, the higher their win total. The relationship was the same for RA_Vol [game to game runs allowed volatility], just much stronger."

I did a study a couple of years ago. It is below. I agree that "RS_Vol had a negative relationship to team wins" but disagree that and got the opposite result for runs allowed. I also found that just runs scored per game and runs allowed per game mattered alot more than consistency.

I took both consistency and runs per game (both scored and allowed) into account. He only took volatility (or consistency) into account. I also found that adding in variables for consistency did not improve the accuracy of the model very much.

So here is my study from April 2010

It seems to matter, but maybe much less than simply scoring and preventing runs.

I looked at two periods, 1963-68 and 1996-2000. In each case, I first ran a regression with team winning percentage as the dependent variable and runs per game and opponents' runs per game as the independent variables. Then I added two variables in a second regression which measured consistency. HITCON was the standard deviation (SD) of runs per game divided by runs per game (just the SD would not be right since high scoring teams will have a greater SD). PITCON does something similar on the pitching side.

For 1963-68 (120 teams), the first regression equation was

PCT = .528 + .108*R - .115*OR

Again, R & OR are per game. The r-squared was .903, meaning that the equation explains 90.3% of the variation in the dependent variable. The standard error was .023. For a 162 games, that works out to about 3.73 wins. Now the 2nd regression with the consistency variables added in.

PCT = .493 + .098*R - .103*OR - .084*HITCON + .117*PITCON

The r-squared did rise, but only slightly, to .912 while the standard error fell to 3.59 wins per season. The coefficinet values on the consistency variables seem to make sense. The more consistent hitting teams win more for a given average runs per game while the less consistent pitching teams win more. That may seem strange, but if you allowed 4 runs per game on average you would win at least 81 games if you gave up 0 runs half the time and 8 the other half. You would win some of those 8 runs games, so you would have a winning record. If it were a league that had an average of 4 runs per game, you would win more than expected.

On the surface, it might look like the consistency variables are pretty important. But the coefficient values are only about as high as they are for R & OR because the consistency variables are alot lower. For example, average runs per game was 3.86 while the HITCON average was.73. So the coefficient values have to be relatively high on the consistency variables.

The R & OR variables were more significant, with higher t-values. Here they are for all four:

R: 16
OR: -17.96
HITCON: -2.2
PITCON: 2.99

I also found the number of exta wins that would be generated by a one standard deviation improvement in each variable. That means scoring more runs, giving up fewer runs, scoring more consistently and giving up runs less consistently (because the coefficient on that PITCON was positive).

R: 7.68
OR: 8.12
HITCON: 1.05
PITCON: 1.34

So a one SD improvement in run scoring consistency (HITCON) adds 1.05 wins. That is a lot less that the 7.68 for simply scoring. We could say something similar on the pitching side. So it looks like a team should be more interested in just trying to score runs than being more consistent. Less consistency on the pitching side is desirable, but not nearly as much as simply preventing runs.

For the 1996-2000 period (146 teams), the first regression was

PCT = .500 + .0944*R - .0945*OR

The r-squared was .894 and the standard error worked out to 3.63 wins per season. The second regression was

PCT = .441 + .085*R - .082*OR - .139*HITCON + .202*PITCON

The r-squared was .915 and the standard error worked out to 3.28 wins per season. The results are similar to those of the 1963-68 period. Adding the consistency variables does improve the accuracy of the model, but only slightly. The signs on the coefficients are the same.

The t-values were

R: 22.5
OR: -21.3
HITCON: -3.3

The number of exta wins that would be generated by a one standard deviation improvement in each variable were:

R: 7.34
OR: 7.4
HITCON: 1.02
PITCON: 1.81

These numbers a very close to the numbers for the 1963-68 period. So again, it looks like it is much more important to score and prevent runs than become more consistent (or less, on the pitching side). This is true for a low scoring era, 1963-68, when the average runs per game was 3.86 as well as for the latter period when it was 4.97.

Sources: Retrosheet, Baseball Reference, Sean Lahman Baseball Archive