Cybermetrics: Did Roger Clemens Have the Best Age-Adjusted Season Ever in 2005? Part 2.

I originally posted this to Beyond the Box Score in 2006.

Did Roger Clemens Have the Best Age-Adjusted Season Ever in 2005? Part 2.

I looked at this question earlier this week. The stat I used to measure pitcher performance was RSAA, which tells us how many runs a pitcher saves above average (it is also park adjusted). But it is affected by the quality of the fielders. Now I will look at the issue by trying to create a defense independent stat to rate the pitchers, following the idea of Voros McCracken who found that pitchers have little influence over what happens on balls in play. The measure I will use won’t be as sophisticated as the one he uses, but it will give us some idea of which pitchers did better than we might have expected based on their age while removing the influence of the fielders.

First, I found all pitcher seasons with 150 or more IP from 1920-2005. Then I ran a regression with each pitcher’s ERA relative to the league average being the dependent variable and strikeouts, HRs and walks (all relative to the league average) being the independent variables (data is from the Lee Sinins “Complete Baseball Encyclopedia,”). Here is the regression equation:

(1) Rel ERA = .658 + .279*BB + .242*HR - .201*SO

Again, these are all relative to the league average. Each pitcher’s stats can be plugged into this formula to get their defense independent relative ERA (the correct term might be fielding independent, but I’m not sure). In any case, this is their expected relative ERA based only on stats that they are responsible. Once I had this for all pitchers in the study, I cut down the total to only pitchers who had at least 10 seasons with 150+ IP. Then I found the relationship betwee this predicted relative ERA (or defense independent ERA or DIPS ERA) and age. The graph below shows the relationship.

This graph covers the ages 22-44 for pitchers who had at least 10 seasons with 150+ IP. There were a few ages younger and older than this, but not many. The relationship was actually better when only these ages was covered. The numbers in the graph are the average DIPS ERA for each age in this sub-group of pitchers.

I could have just used the average DIPS ERA for each age for all the pitchers. But we can’t simply find the average DIPS ERA for each age because its possible that a pitcher must be pretty good to be used at very young and/or very old ages. Sometimes the average for very old ages is pretty high because only good pitchers are still around. By only looking at pitchers who had at least 10 seasons with 150+ IP, we get a more realistic aging pattern since this group is likely to be pretty good and we therefore don’t have to worry about the old guys being good since all of these guys are good since they pitched so long.

The equation which shows the relationship between DIPS ERA and AGE

(2) DIPS ERA = .00052*AGESQUARED - .0322*AGE + 1.3755

But this was for the pitchers who had at least 10 seasons with 150+ IP. For all the pitchers, I assumed they had the same aging pattern, but I moved the intercept up to take into account the inferior quality as compared to the smaller group. The shift in the intercept was equal to the the difference in average DIPS ERA between the whole group of pitchers (.9466) and the smaller group (.9011). That was .0455. That got added to 1.3755, making the intercept 1.421. So the equation to predict DIPS ERA becomes

(3) DIPS ERA = .00052*AGESQUARED - .0322*AGE + 1.421

Now each pitcher’s age was plugged into equation (3) to get a predicted DIPS ERA. Then that got compared to the value from equation (1). Let’s take Dazzy Vance from 1925, example (age 34). His predicted DIPS ERA would be be .9042 based on that age (plugging an AGE of 34 into equation (3)). But his DIPS ERA from equation (1), based on his relative strikeouts, HRs and walks was .4546. So he was .4496 better than his age predicted.

The next step was to see how many runs he saved as a result of this. He pitched 265.33 innings. That makes 29.5 complete games. The league ERA that year was 4.27, so his age predicted ERA would be 3.86 (.9042*4.27). But his predicted relative ERA (or DIPS ERA) was 1.94 (4.24*.4546). That difference is 1.92 (3.86 – 1.94). That gets multiplied by 29.5 to get 56.6 runs saved. This was the most runs saved once AGE was taken into account and I considered only the pitcher’s stats. Here are the top 25 age adjusted seasons in runs saved based solely on the pitcher’s stats.

Pitcher	YEAR	AGE	Runs Saved
Dazzy Vance	1925	34	56.5956533
Pedro Martinez	1999	27	56.50779595
Lefty Grove	1930	30	55.3703178
Bob Feller	1940	21	49.41058275
Dazzy Vance	1924	33	49.24335387
Pedro Martinez	2000	28	46.46541235
Dazzy Vance	1928	37	43.97893864
Bob Feller	1939	20	43.40136659
Dazzy Vance	1930	39	43.13738011
Roger Clemens	1997	34	41.05611036
Bert Blyleven	1973	22	40.72525954
Dolf Luque	1923	32	40.26181751
Johnny Allen	1936	30	40.07463135
Randy Johnson	1995	31	38.66133362
Lefty Grove	1927	27	37.97919687
Pete Donohue	1925	24	37.79832039
Lefty Gomez	1937	28	37.72100203
Randy Johnson	2004	40	36.97886686
Kevin Brown	1998	33	36.85872621
Lefty Grove	1926	26	36.54630565
Lefty Grove	1931	31	36.33560254
Lefty Grove	1929	29	36.208095
Lefty Grove	1932	32	36.1143788
Cy Blanton	1935	26	36.10329252
Bob Feller	1946	27	36.05170998

These stats are not park adjusted. That is why Pet Donohue is up there. He pitched in a low-run park. Clemens of 2005 is not here. He would only be 215^th.

I used one other method last week to find the best age adjusted seasons. I subtracted the normal peak age from each guy’s age and took the absolute value. That got multiplied by the number of runs saved (which was not age adjusted as in the above explanation-it was simply (IP/9)*(the difference between league ERA and the ERA predicted by equation (1)). The peak age was 29.89. I found that by finding the average age for the top 250 seasons in predicted relative ERA (using equation (1)). For example, Dazzy Vance in 1930 was aged 39. That minus 29.89 is 9.11. He saved 53.75 runs. His relative ERA predicted by equation (1) to be .6237. That times the league ERA of 4.97 is 3.10. The difference is 1.87. He pitched 258.667 innings. That divided by 9 is 28.75. That times the 1.87 difference is the 53.75 runs saved. That gets multiplied by 9.11. That gave him 489.67 “age points.” The top 25 in “age points” are listed below.

Pitcher	YEAR	AGE	Runs Saved	Age Points
Dazzy Vance	1930	39	53.75051586	489.6671995
Bob Feller	1940	21	54.8268672	487.4108494
Bob Feller	1939	20	46.90794659	463.9195918
Randy Johnson	2004	40	44.86122903	453.5470255
Dazzy Vance	1928	37	54.65467339	388.5947278
Bert Blyleven	1973	22	46.98793732	370.7348255
Dwight Gooden	1985	20	37.02470115	366.1742944
Dwight Gooden	1984	19	32.87340375	357.9913668
Jack Quinn	1928	44	24.19020867	341.3238443
Waite Hoyt	1921	21	37.74713278	335.5720104
Randy Johnson	2001	37	44.7325033	318.0480985
Vida Blue	1971	21	34.27435979	304.6990586
Babe Adams	1922	40	29.91900156	302.4811057
Dazzy Vance	1929	38	37.20115978	301.7014058
Roger Clemens	2005	42	24.7192402	299.3499988
Lefty Grove	1937	37	42.06229935	299.0629484
Nolan Ryan	1989	42	24.30933947	294.3861009
Bob Feller	1938	19	26.98090041	293.8220054
Dazzy Vance	1925	34	68.65550985	282.1741455
Frank Tanana	1975	21	31.45033843	279.5935086
Pete Donohue	1925	24	46.90945662	276.2966995
Lefty Gomez	1931	22	34.76705924	274.3120974
Dutch Leonard	1949	40	27.05088775	273.4844751
Randy Johnson	2000	36	44.15458989	269.7845442
Mark Prior	2003	22	34.1780478	269.6647972

Clemens makes this list. Notice that there are some relatively young pitchers here. By using absolute value, the farther a pitcher is from the “peak age,” the more points he would get. So this puts a guy 10 years over peak on the same footing as a guy who is 10 years under.

Here is the response to a comment

I tried a 4th order polynomial, and certainly it fits the data better. But that is a select group of pitchers. I am using it to try to estimate the "true" pattern for all pitchers. I am not sure that finding the absolute best fit for a small group of pitchers should be applied to a much larger group. The 4th order polynomial changes directions a few times. That may not happen for the whole group of pitchers (6,690). The smaller group, the guys who had 10+ seasons with 150+ IP, had about 1,900 guys. I think it is more reasonable to assume a u-shaped function and try to make the best of it.

On HRs, I tried a regression with HRs, BBs, and SOs, per 9 IP as the independent variables and ERA (not taken relative to the league average) as the dependent variable. The values for HRs, BBs, and SOs are more like what we would expect (HRs were 1.47). Then I got a predicted value for each pitcher then divided that by the league average. Then I ranked those pitchers. The problem is that almost all of the best 25 or so seasons then come after 1990. That is not the case in the method I described above.

There might be something strange going on because those pitchers came in a high HR and perhaps high strikeout ERA. Maybe that is why the value for relative HRs came out so low in the first place.

Cybermetrics

Friday, June 20, 2014

Did Roger Clemens Have the Best Age-Adjusted Season Ever in 2005? Part 2.

No comments:

About Me

Links

Blog Archive