Cybermetrics: August 2008

Sunday, August 31, 2008

More On The Changing Historical Relationship Between Walks, HBPs and HRs

What I posted last week was something I posted on the SABR list last year. At that time, someone raised a question about this. Below is the question and how I responded, with a little more research. I think my basic finding is that there are not more HBP these days due to pitchers throwing faster.

"Cyril mentioned that current pitchers seem to be more willing to hit batters than pitchers in the past. How about since a lot more pitchers now pitch the ball around 90 MPH, it's harder for batters to get out of the way. Historically, have the pitchers leading the leagues in HB been hard throwers (more Ks) or poor control pitchers (more BBs)?"

I did some analysis on this although it is not exactly what John Lewis suggests. I took the top 500 pitchers in batters faced (seasonal data) from 1960-69 and 1997-2006. I ran a regression in each case in which the HBP rate was the dependent variable and the strikeout rate and the walk rate were the independent variables. Intentional walks were removed.

Here is the regression equation for the 1960s

HBP = .00387 + .0177*BB + .00186*SO

For the 1997-2006 period it was

HBP = .005 + .0031*BB + .00486*SO

The r-squared in the first case was just .013 and in the second it was .025. The r-squared tells us what percent of the variation in the dependent variable is explained by the model. So it is pretty weak. But the T-values for BBs and SOs in the first case were 2.44 and .44. So the walk rate is statistically significant. For the second period they were 3.32 and 1.13.

In the first period, a one standard deviation increase in BB rate increased HBP rate .000392. For the strikeout rate it was .00007. So if a pitcher increases his walk rate he increases his HBP rate more than if he increases his SO rate. For the second period these numbers were .00065 and .00022. So again, the walk rate has a bigger impact.

So all of this suggests that it is worse control in general that increases the HBP rate.

*********************

Now another response to that question

The other day I discussed a regression relating HBP, BBs and SOs. I did that again but I added in HRs with the idea that a pitcher might be more likely hit a guy who hit a HR last time up (or the next guy). I again looked at both the 1960s and the last 10 years. Skipping the regression details (except to say the coefficient values and the r-sqaured values did not change much), the interesting thing I found was that HRs had a negative relationship with HBP in the 1960s but it was positive in the last 10 years. So in the 1960s, a pitcher who gave up more HRs hit fewer batters but today a pitcher who gives up more HRs hits more batters.

Having an increase in HR% of .01 over 1000 batters faced reduced HBP in the 1960s by about .23. In the last 10 years, they went up by .33. A 1 standard deviation increase in HR% in the 1960s decreased HBP by .15. In the last 10 years it increased HBP by .24 (again, over 1000 batters). The standard deviation of HR% in the 1960s was .0066. In the last 10 years it was .0075.

The T-value on HRs was not significant for either time period. But maybe the difference in their coefficients could be. Anyone know if you can look at two different regressions and run some kind of a test to see if the difference between coefficients from the regressions is significant?

I ran a regression which combined the two periods. There was a dummy variable for time period. It indicates that pitching in the last 10 years instead of the 1960s, holding everything else constant, means 2.5 more HBP per 1000 batters faced. The T-value was 8.98. In other words, highly significant.

I also ran a regression with the dummy variable and the dummy variable was multiplied by each of the other variables (HRs, BBs, SOs). In this case the dummy for time period was just about zero and not significant. The value of the HR*dummy coefficient was .055 (although the T-value was just 1.53 and about 2 is usually needed for significance). So I think the .055 value means that any given increase in HR% in the last 10 years would make the HBP rate go up by .055 more than in the 1960s. So over 1000 batters faced, if your HR% goes up by .01 (say you give up 10 more HRs) you would hit .55 more batters in the last 10 years than you would have in the 1960s.

Monday, August 25, 2008

The Changing Historical Relationship Between Walks, HBPs and HRs

Since I posted something on HBP's last week, I thought I would post a couple of items that I put on the SABR list last year. Here they are.

As many of you probably know, the HBP rate has been on a general increase for many years (since about 1980). But one thing that could account for it is that pitchers have poorer control than they used to (I am not saying that they do-just that it could be a reason for the rise in HBP rates). So I thought that it might be useful to look at the HBP-to-walk ratio over time. I created 4 graphs and they are at

http://www.geocities.com/cyrilmorong@sbcglobal.net/HBPWalks.doc

There is one graph for each league. The first one is the HBP-to-walk ratio using all walks and the second one excludes intentional walks (they were not officially recorded until 1955). I also started the NL in 1897 or so because it did not look like all of the HBP were recorded by then. The file is a Microsoft Word file so when you click on it you might be asked to open it in that program. You will have to say yes.

Both leagues were around .16 about 1900. That is, there were 16 HBP for every 100 walks. But by around 1940 or so, it was 4 (or fewer) HBP per 100 walks. For both leagues, the rate has been rising since 1980. This suggests to me that the higher HBP rates these days is not due to poor control. There may be other issues involved so we might not be able to conclude that.

************************************************

Yesterday I discussed the HBP rate relative to the walk rate and how HBP/Walks has risen over time. But I also thought about how HRs might affect this. If a player hits a HR, the pitcher might want to pitch inside more to that player or anyone else on that team. This could lead to more HBP. Maybe even sometimes pitchers intentionally try to hit someone because of HRs. So I looked at HBP/HR over time. Since 1920, in both leagues, the rate has pretty much stayed under .5. But, of course, control is an issue, too. So I figured out the non-intentional walk rate each season since 1955 for both leagues and then the historical average from 1955-2006 in both leagues.

For each league/season, I then divided the non intentional walk rate by the average over the 1955-2006 period. If a league/season had a rate that was 10% higher than the historical average, then they got a 1.10. The HBP/HR rate for that league/season was divided by 1.10. So I deflate the HBP/HR rate by 10% since that league/season's pitchers had control that was 10% worse than average, which could partly account for a higher HBP/HR rate. So I did that for all league/seasons. The new number is called the adjusted HBP/HR rate. I graphed this for each league since 1955. The two graphs are at

http://www.geocities.com/cyrilmorong@sbcglobal.net/HBPHR.doc

The file is a Microsoft Word file so when you click on it you might be asked to open it in that program. You will have to say yes.

What I see here, is that if you adjust for HRs and control (as measured by the walk rate), is that pitchers today seem pretty willing to hit batters. Does this mean that they are willing to pitch inside and that the high HBP rate is a side affect of that? We probably can't tell for sure since we don't have stats on how many pitches are thrown inside. But certainly pitchers today are willing to hit batters. In the AL, each of the last 6 seasons is above the historical average of my adjusted HBP/HR rate (which is about .27). In the AL, 5 of the top 6 seasons in the unadjusted HBP/HR rate were from 2001-05. 2006 was the 11th highest.

In the NL, the historical average of the adjusted HBP/HR rate is also about .27 and each of the last 6 years is above that. 6 of the 10 highest unadjusted HBP/HR rates were from 2001-06. One of the reasons I looked into this issues is that it came up at the most recent SABR convention. There was a panel on St. Louis baseball and the former player all said that pitchers today don't pitch inside enough, that they leave the ball out over the plate too much and that they are reluctant to hit, or be aggressive with guys who are hitting HRs. Based on what I have done, this does not seem to be true.

Sunday, August 17, 2008

Are Good Pitchers More Likely To Hit Batters Who Hit Them Well?

I started wondering about this after last week's post on whether or not HR hitters are more likely to get hit by the pitch in recent times than they did in the 1950s and 60s. I took the top 10 in wins from 1960-69 and from 1998-2007. Then I found the correlation between their HBP% and HR%, OPS and SLG. For HBP% the formula was HBP/(HBP + AB). The other stats are calculated normally. My table below shows only 5 pitchers in the last 10 years since only 5 of them had faced 30+ batters in at least 50 ABs (those were the cutoffs I used). The data comes from Retrosheet. You can click on table to see a bigger version. A batter's record against a pitcher also includes cases not in the specified period. It includes their entire careers.

There may not be alot to learn here. Some guys have negative correlations and many are very low. The two who standout are Bunning and Mussina. According to the Lee Sinins Complete Baseball Encyclopedia, Bunning hit 160 batters while the average would have hit only 90. Relative to the league average, he was the 3rd most likely to hit a batter in the 1960s with 1000+ IP.

Mussina is very interesting. In his career he only hit 52 batters while the average pitcher would have hit 125. He was the 10th least likely to hit a batter relative to the league average in the last 10 years. Yet he has very high correlations on OPS & SLG. It seems like if a guy hit Mussina well, he was more likely to hit him. Yet Mussina has been very good at not hitting people in general. Has be been selectively and intentionally hitting certain guys? Of the 39 batters who have 50+ ABs against Mussina in the last 10 years, only 10 have been hit at least once. But their collective AVG against him is .321 (again, that is for their whole careers, not just the last 10 years). The other batters combined for only a .253 AVG. Getting back to the 10 who have been hit, they have collectively slugged .541 in their careers against Mussina.

Saturday, August 9, 2008

Do Sluggers Get Hit By The Pitch More Than They Used To?

I found the correlation between HR frequency and HBP frequency for each decade since the 1950s. In one case the denominator was AB + HBP, in the other it was AB + BB + HBP. Here are the correlations for the first case, starting with the 1950s

0.029
0.119
0.088
0.222
0.186
0.17

Now for the second measure.

0.022
0.101
0.072
0.22
0.173
0.128

The correlations are higher in the 80s, 90s and the 2000s, meaning players who hit HRs more frequently are more likely to get hit by a pitch than in the the 50s, 60s and 70s. So when old-timers tell you something like "if you hit a HR off Bob Gibson, next time you got brushed back or put on your but," don't believe them. If that kind of thing was going so much, there would have been more hit batters (some of those brushbacks would be a little off the mark, so the pitch would hit you, not just come close). And the correlation would have been higher between HR hitting and getting hit back in those days. But they are higher now.

In fact, hitting a HR in the 1990s increased your chances alot more than hitting a HR in the 1960s. Here is the regression equation from the 1960s

HBP% = 0.0311*HR% + 0.0058

Now for the 1990s

HBP% = 0.0573*HR% + 0.0065

Since .0573/.0311 = 1.83, hitting a HR in the 1990s was 83% more dangerous in the 1990s than it was in the 1960s. And the T-value on HR% in the 1990s was significant (2.84) while it was not significant in the 1960s (1.52).

Sunday, August 3, 2008

Predicting 2nd Half Winning Pct With First Half OPS Differential and Winning Pct

Earlier in the season, I did a post on which teams had the best OPS differentials. So I thought it might be interesting to see what has a higer correlation with second half (actually post all-star) winning pct: first half (actually pre all-star) winning pct or first half OPS differential? Using the data from ESPN, here are those correlations for the years 2000-2007. The first half pct is the first number and the 2nd is OPS differential.

0.384**0.498
0.384**0.34
0.708**0.669
0.612**0.708
0.625**0.607
0.327**0.297
0.226**0.237
0.444**0.361

Interesting that the correlations were much higher in 2002-4. Overall, it looks like first half pct does a slightly better job. The average correlation for the first half winning pct is 0.46375 and for first half OPS it is 0.46463. So a very slight edge for OPS.

I expected a bigger edge for OPS since it gives a good idea of a team's performance and pct can be more affected by luck in a short time span. Maybe it reflects how good the closer or bullpen is and that carries over from half to half.

Cybermetrics