Did Roger Clemens Have the Best AgeAdjusted Season Ever in
2005? Part 2.
I looked at this question earlier this week. The stat I used
to measure pitcher performance was RSAA, which tells us how many runs a pitcher
saves above average (it is also park adjusted). But it is affected by the
quality of the fielders. Now I will look at the issue by trying to create a
defense independent stat to rate the pitchers, following the idea of Voros
McCracken who found that pitchers have little influence over what happens on
balls in play. The measure I will use won’t be as sophisticated as the one he
uses, but it will give us some idea of which pitchers did better than we might
have expected based on their age while removing the influence of the fielders.
First, I found all pitcher seasons with 150 or more IP from
19202005. Then I ran a regression with each pitcher’s ERA relative to the
league average being the dependent variable and strikeouts, HRs and walks (all
relative to the league average) being the independent variables (data is from
the Lee Sinins “Complete Baseball
Encyclopedia,”). Here is the regression equation:
(1) Rel ERA = .658 + .279*BB + .242*HR  .201*SO
Again, these are all relative to the league average. Each
pitcher’s stats can be plugged into this formula to get their defense
independent relative ERA (the correct term might be fielding independent, but
I’m not sure). In any case, this is their expected relative ERA based only on
stats that they are responsible. Once I had this for all pitchers in the study,
I cut down the total to only pitchers who had at least 10 seasons with 150+ IP.
Then I found the relationship betwee this predicted relative ERA (or defense
independent ERA or DIPS ERA) and age. The graph below shows the relationship.
This graph covers the ages 2244 for pitchers who had at
least 10 seasons with 150+ IP. There were a few ages younger and older than
this, but not many. The relationship was actually better when only these ages
was covered. The numbers in the graph are the average DIPS ERA for each age in
this subgroup of pitchers.
I could have just used the average DIPS ERA for each age for
all the pitchers. But we can’t simply find the average DIPS ERA for each age
because its possible that a pitcher must be pretty good to be used at very
young and/or very old ages. Sometimes the average for very old ages is pretty
high because only good pitchers are still around. By only looking at pitchers
who had at least 10 seasons with 150+ IP, we get a more realistic aging pattern
since this group is likely to be pretty good and we therefore don’t have to
worry about the old guys being good since all of these guys are good since they
pitched so long.
The equation which shows the relationship between DIPS ERA and
AGE
(2) DIPS ERA = .00052*AGESQUARED  .0322*AGE + 1.3755
But this was for the pitchers who had at least 10 seasons
with 150+ IP. For all the pitchers, I assumed they had the same aging pattern,
but I moved the intercept up to take into account the inferior quality as
compared to the smaller group. The shift in the intercept was equal to the the
difference in average DIPS ERA between the whole group of pitchers (.9466) and
the smaller group (.9011). That was .0455. That got added to 1.3755, making the
intercept 1.421. So the equation to predict DIPS ERA becomes
(3) DIPS ERA = .00052*AGESQUARED  .0322*AGE + 1.421
Now each pitcher’s age was plugged into equation (3) to get
a predicted DIPS ERA. Then that got compared to the value from equation (1).
Let’s take Dazzy Vance from 1925, example (age 34). His predicted DIPS ERA
would be be .9042 based on that age (plugging an AGE of 34 into equation (3)).
But his DIPS ERA from equation (1), based on his relative strikeouts, HRs and
walks was .4546. So he was .4496 better than his age predicted.
The next step was to see how many runs he saved as a result
of this. He pitched 265.33 innings. That makes 29.5 complete games. The league
ERA that year was 4.27, so his age predicted ERA would be 3.86
(.9042*4.27). But his predicted relative
ERA (or DIPS ERA) was 1.94 (4.24*.4546). That difference is 1.92 (3.86 – 1.94).
That gets multiplied by 29.5 to get 56.6 runs saved. This was the most runs
saved once AGE was taken into account and I considered only the pitcher’s stats.
Here are the top 25 age adjusted seasons in runs saved based solely on the
pitcher’s stats.
Pitcher

YEAR

AGE

Runs
Saved

Dazzy
Vance

1925

34

56.5956533

Pedro
Martinez

1999

27

56.50779595

Lefty
Grove

1930

30

55.3703178

Bob
Feller

1940

21

49.41058275

Dazzy
Vance

1924

33

49.24335387

Pedro
Martinez

2000

28

46.46541235

Dazzy
Vance

1928

37

43.97893864

Bob
Feller

1939

20

43.40136659

Dazzy
Vance

1930

39

43.13738011

Roger
Clemens

1997

34

41.05611036

Bert
Blyleven

1973

22

40.72525954

Dolf
Luque

1923

32

40.26181751

Johnny
Allen

1936

30

40.07463135

Randy
Johnson

1995

31

38.66133362

Lefty
Grove

1927

27

37.97919687

Pete
Donohue

1925

24

37.79832039

Lefty
Gomez

1937

28

37.72100203

Randy
Johnson

2004

40

36.97886686

Kevin
Brown

1998

33

36.85872621

Lefty
Grove

1926

26

36.54630565

Lefty
Grove

1931

31

36.33560254

Lefty
Grove

1929

29

36.208095

Lefty
Grove

1932

32

36.1143788

Cy
Blanton

1935

26

36.10329252

Bob
Feller

1946

27

36.05170998

These stats are not park adjusted. That is why Pet Donohue
is up there. He pitched in a lowrun park. Clemens of 2005 is not here. He
would only be 215^{th}.
I used one other method last week to find the best age
adjusted seasons. I subtracted the normal peak age from each guy’s age and took
the absolute value. That got multiplied by the number of runs saved (which was
not age adjusted as in the above explanationit was simply (IP/9)*(the
difference between league ERA and the ERA predicted by equation (1)). The peak
age was 29.89. I found that by finding the average age for the top 250 seasons
in predicted relative ERA (using equation (1)). For example, Dazzy Vance in
1930 was aged 39. That minus 29.89 is 9.11. He saved 53.75 runs. His relative
ERA predicted by equation (1) to be .6237. That times the league ERA of 4.97 is
3.10. The difference is 1.87. He pitched 258.667 innings. That divided by 9 is
28.75. That times the 1.87 difference is the 53.75 runs saved. That gets
multiplied by 9.11. That gave him 489.67 “age points.” The top 25 in “age
points” are listed below.
Pitcher

YEAR

AGE

Runs
Saved

Age
Points

Dazzy
Vance

1930

39

53.75051586

489.6671995

Bob
Feller

1940

21

54.8268672

487.4108494

Bob
Feller

1939

20

46.90794659

463.9195918

Randy
Johnson

2004

40

44.86122903

453.5470255

Dazzy
Vance

1928

37

54.65467339

388.5947278

Bert
Blyleven

1973

22

46.98793732

370.7348255

Dwight
Gooden

1985

20

37.02470115

366.1742944

Dwight
Gooden

1984

19

32.87340375

357.9913668

Jack
Quinn

1928

44

24.19020867

341.3238443

Waite
Hoyt

1921

21

37.74713278

335.5720104

Randy
Johnson

2001

37

44.7325033

318.0480985

Vida
Blue

1971

21

34.27435979

304.6990586

Babe
Adams

1922

40

29.91900156

302.4811057

Dazzy
Vance

1929

38

37.20115978

301.7014058

Roger
Clemens

2005

42

24.7192402

299.3499988

Lefty
Grove

1937

37

42.06229935

299.0629484

Nolan
Ryan

1989

42

24.30933947

294.3861009

Bob
Feller

1938

19

26.98090041

293.8220054

Dazzy
Vance

1925

34

68.65550985

282.1741455

Frank
Tanana

1975

21

31.45033843

279.5935086

Pete
Donohue

1925

24

46.90945662

276.2966995

Lefty
Gomez

1931

22

34.76705924

274.3120974

Dutch
Leonard

1949

40

27.05088775

273.4844751

Randy
Johnson

2000

36

44.15458989

269.7845442

Mark
Prior

2003

22

34.1780478

269.6647972

Clemens makes this list. Notice that there are some
relatively young pitchers here. By using absolute value, the farther a pitcher
is from the “peak age,” the more points he would get. So this puts a guy 10
years over peak on the same footing as a guy who is 10 years under.
Here is the response to a comment
I tried a 4th order polynomial, and certainly it fits the data better.
But that is a select group of pitchers. I am using it to try to estimate
the "true" pattern for all pitchers. I am not sure that finding the
absolute best fit for a small group of pitchers should be applied to a
much larger group. The 4th order polynomial changes directions a few
times. That may not happen for the whole group of pitchers (6,690). The
smaller group, the guys who had 10+ seasons with 150+ IP, had about
1,900 guys. I think it is more reasonable to assume a ushaped function
and try to make the best of it.
On HRs, I tried a regression with HRs, BBs, and SOs, per 9 IP as the independent variables and ERA (not taken relative to the league average) as the dependent variable. The values for HRs, BBs, and SOs are more like what we would expect (HRs were 1.47). Then I got a predicted value for each pitcher then divided that by the league average. Then I ranked those pitchers. The problem is that almost all of the best 25 or so seasons then come after 1990. That is not the case in the method I described above.
There might be something strange going on because those pitchers came in a high HR and perhaps high strikeout ERA. Maybe that is why the value for relative HRs came out so low in the first place.
No comments:
Post a Comment