Cybermetrics: November 2009

Thursday, November 26, 2009

Baseball's "300" Hitters

What players have had both 300+ times reaching base (RB) and 300+ total bases (TB) in the same season? Not many. RB includes hits, walks and HBP. To see a list of all such occurrences, go to 300RBTB. It is in chronological order.

When you get there, the list on the left has all the players who did it. The list on the right shows some near misses, guys who had 280+ in each stat but did not make it. The tables also show each player's offensive winning percentage and RCAA or runs created above average, which is park adjusted since it is from the Lee Sinins Complete Baseball Encyclopedia. The most recent occurrence was Pujols in 2009, with 310 RB & 374 TB. The table below shows the leaders in such seasons.

The next table shows the very near misses, guys who had 297+ in both stats but not 300+ in both. Frank Thomas has one other very near miss. In 1995, when the season was only 144 or 145 games, he had 294 RB & 299 TB.

Now the breakdown by decade:

1890s 2
1910s 2
1920s 19
1930s 20
1940s 9
1950s 6
1960s 3
1970s 2
1980s 3
1990s 21
2000s 24

Sunday, November 22, 2009

Did The Yankees Buy A World Championship In 2009?

That was the subject of a recent Wall Street Journal article by economist Andrew Zimbalist titled The Yankees Didn't Buy the World Series. On the surface, it would seem that they did. They ususally have the highest payroll and they signed three big free agents in the off season, 1B man Mark Teixeira and pitchers C. C. Sabathia & A. J. Burnett. Teixeira led the American League in home runs and runs batted in while the two pitchers both finished in the top 20 in earned run average and the top 11 in innings pitched.

But Zimbalist said:

"It's a little surprising, but the statistical relationship between a team's winning percentage and its payroll is not very high. When I plot payroll and win percentage on the same graph, the two variables don't always move together. In other words, knowing a team's payroll does not enable one to know a team's win percentage.

More precisely, depending on the year, I find somewhere between 15% and 30% of the variance in team win percentage can be explained by the variance in team payroll. That means between 70% and 85% of a team's on-field success is explained by factors other than payroll. Those factors can include front office smarts, good team chemistry, player health, effective drafting and player development, intelligent trades, a manager's in-game decision-making, luck, and more."

Some readers, however, disagreed, making some good points in the letters to the editor a few days later (see In Baseball’s World Series, Money Loads the Bases). The best point may have been made by Ira H. Malis who mentioned that the top 4 teams in salaries in the American League make up, on average, 60% of the teams that make the playoffs.

Economist T. Norman Van Cott makes a good point in support of Zimbalist, that long before the period of free agency, when players can sell their services to the highest bidder (with certain limits), the Yankees dominated baseball. But we need to recall that before 1965, a player coming out of high school or college could sign a contract with any team (but then the reserve clause kept them tied to that team forever). The Yankees had money advantage over the other teams in getting good players in the first place. They could offer bigger bonuses and the promise of often getting a World Series check and making business connections in New York.

Zimbalist also only analyzed the salary and win relationship one year at a time. I did something different last year, using team averages over many years. I found a stronger relationship between salaries and wins that Zimbalist did, that almost 50% of variance in team win percentage can be explained by the variance in team payroll Here is that post (Another look at salaries and wins).

Alot of people have looked at this. But I started thinking about it again after I came across some data at JC Bradbury's site. You can view that data here. The data shows how many games, on average, that teams won each year from 1986-2005. It also shows how much above or below the league average in total salary each team paid in percentage terms. Again, it shows yearly averages. Suppose a team was 10% above average one year and 30% above average another year, they would get 20 (if were just over two years).

What I did was to run a regression with average wins per year as the dependent variable and the average salary (SAL, the % above or below the league average) as the independent variable.

Here is the regression equation

Wins = 0.157*SAL + 80.22

The r-squared was .489 and the standard error was 3.89 wins. The T-value for SAL was 5.17. The .157 means that if you spent 10% more on salaries than the average team, you win 1.57 more games than the average team. A zero for SAL would mean that a team spent the average amount on salaries. A negative number means the team spent below the average salary level. The table below summarizes each team.

Tampa Bay, for example, on average, had a payroll that was 38.87% below the league average. They were predicted to win 74.12 but only 64.33 wins per game. If a team were to spend 100% more than average, it should win about 96-97 games a year. The Yankees had the highest payroll above average. They spent about 70% more than the average team. They were predicted to win 91.26 games a year but actually only won 90.24.

I think the results are fairly strong. 16 of the 30 teams were predicted to within 3 or fewer wins. Only 3 were off by 6 or more wins. I think what I did differently than JC Bradbury was to use the average annual values for each team, instead of each team's data for each year. By using the averages, I think the randomness from year-to-year is eliminated. A team can sign a big free agent and maybe one year he does not do well. Or you get lucky and some non-arbitration eligible young players do very well. So by averaging, some of the good and bad luck gets flushed out.

The graph below also summarizes the results. You can see that the relationship is strong.

Tuesday, November 17, 2009

Age And Performance Of Outfielders And First Basemen Who Had Long Careers

I found all the players who were primarily 1Bmen and/or OFers who had 15+ seasons with 400+ PAs and found their average RCAA at every age from 21-39. RCAA is runs created above average. It comes from the Lee Sinins Complete Baseball Encyclopedia. Here is how he defines it: “It’s the difference between a player’s RC total and the total for an average player who used the same amount of his team’s outs. A negative RCAA indicates a below average player in this category.” It is also park adjusted.

The graph below shows the averages at each age. It surprised me to see that there is no peak but a plateau from 25-29. I sure don't know why it would be like that for this group.

The table below gives the average for each age as well as the number of players at each age. Seems like a pretty stable number of players from 24-36.

Sunday, November 15, 2009

Aging Patterns And Full-Time Players

In 2006 I wrote an article for Beyond the Boxscore called Player Aging Patterns Over Time. One thing I showed there was what percentage of the full-time players in any decade were a given age (I used 400+ PAs for full-time). Below is a graph of the distribution for 1991-2000. The trend line is the two year moving average. Usually 27 is the highest, with around 10% of the full-time players being that age.

Now for the 1961-1970 period. No reason why I picked these decades. Just wanted to show a sample.

The next graph as all the decades starting with 1901-10.

The table below shows the average of each decade's percentage for each age (it was just a simple average, so for age 27, for example, I just added its percentage from each decade and divided by 11, the number of decades I used although 2001-05 was only a half decade).

Tuesday, November 10, 2009

Should Andy Pettitte Make The Hall Of Fame?

This got discussed recently at Baseball Think Factory after Sean Forman wrote Pettitte Falls Short for the Hall of Fame for the NY Times. So here is my take on it.

I first looked at where he ranked all time in RSAA. That stat is from the Lee Sinins Complete Baseball Encyclopedia. It is "RSAA--Runs saved against average. It's the amount of runs that a pitcher saved vs. what an average pitcher would have allowed," including park adjustments. Pettitte now has 204 RSAA. That ranks him 77th all-time. Seems like too low of a rank to make the Hall. But he is 18th among lefites. Maybe left-handed pitchers have a tougher time than righties, so maybe the bar should be a little lower for them. It is not anyone's fault if they are left-handed. They could not have simply worked hard to become a righty. Of course, it also is the case that lefties simply have less value since there are many more right-handed batters. And maybe the Hall has to recognize how much value a pitcher had. But I will continue to show where Pettitte ranks among lefties.

Next I found the RSAA per IP for all pitchers with 2000+ career IP. Pettitte had .0697 (or . 63 runs per 9 IP). That was good enough for 58th. But among lefties, he was 13th. Then I found each pitcher's expected winning percentage using the Bill James pythagorean formula and assumed a league average of 4.5 runs per game. Each pitcher was given a number of games equal to his IP/9. That was multiplied by the expected winning percentage to get projected wins. I then subtracted from that the number of wins a replacement pitcher would have won. For that, I assumed a .400 winning pct. This process predicted that Pettitte would win 186.8 games while the replacement would win 130.06. So that gives him 56.74 WARP or wins above replacement pitcher. He ranks 87th in this WARP measure but is 20th among lefties.

But runs saved is partly determined by the fielders. So I created simple fielding independent ERA. I looked at all all pitchers with 2000+ career IP and used the following stats, all relative to the league average: ERA, HR, SO and BB. 100 is average. A number over 100 means better than average. I ran a regression with ERA as the dependent variable and the others as the independent variables. Here is the equation

ERA = 37.96 + .187*BB + .262*SO + .202*HR

Here are Pettitte's numbers:

BB 122
SO 103
HR 144

So, for example, he gave up 44% fewer HRs than the average pitcher (this comes from Lee Sinins Complete Baseball Encyclopedia). Plugging these numbers into the equation, Pettitte gets 116.85, meaning his projected ERA based on fielding independent stats is 16.85% better than the league average. But his actual ERA is 17% better, so he just happens to project well. Anwyay, he ranks 69th overall but is 20th among lefties.

I also computed a WARP using this predicted ERA in the manner described above. Pettitte ranks 89th while being 21st among lefties with 57.62.

The biggest think in his favor is ranking 13th in RSAA/IP for lefties. But some of his other ranks are pretty low. I think the Hall of Fame has about 219 players, of whom 71 are pitcers or 32.4%. If a team has 25 players and pitchers are 40% of the team, then the Hall should have pitchers (about 87). But if all the position players are deserving (not likely, but I will play along anyway), then about 38 more pitchers need to be in (109/257 is about .4). Only 15 of the pitchers were lefties and Pettitte does rank fairly high among lefties. And if there should be 109 pitchers, he seems to be in the top 109 all-time. Even if there should be 87, even his worst rank that I found is close to that. Of course, all of this assumes that there are no undeserving players or pitchers in right now.

A couple of other things. I thought maybe Pettitte got an advantage pitching at Yankee stadium above the normal park adjustments since he is a lefty and might face alot of righties there where they have a harder time hitting HRs. But from Retrosheet, he gave up a HR% (based on batters faced) at home of 1.94%. On the road it was 2.01%. That does not seem to out of the ordinary. But his HR% (based on ABs) vs. righties has been 2.18% while vs. lefties it has been 2.18% as well. It seems like it should be higher against righties because over the last three years in MLB left-handed pitchers have allowed a HR% of about .5 percentage points higher against right-handed batters. That points to him getting an advantage from Yankee Stadium, but then his home HR% does not seem to give him much of an edge. So I don't know what to conclude from that.

I also once created what I called the Pitcher’s Homerun/Walk Rating. It combined a pitcher's ability to prevent both HRs and BBs into one index rating. Pettitte was 23rd among pitchers with 2000+ IP from 1920-2006. Now it looks like he has slipped to 32nd. But that is out of 277 pitchers. Pretty darn good.

Wednesday, November 4, 2009

Starting Pitchers As Relievers Over Time

Many fans know that starters were often also used as relievers in the past. Lefty Grove, for example, only started 30 games the year he won 31 games (in 1931). He came in 11 times as a reliever. In 1930, he won 28 games while starting 32 and coming in to relieve 18 times.

On May 23, 1911, Christy Mathewson pitched a complete game victory giving up only 1 earned run. Then on May 26, he pitched the last 1 and 2/3 innings to get a win. When he came in in the 8th, the Phillies had two men on and had just scored 2 runs to tie the game. Then he got a double play. The Giants scored 2 in the bottom of the 8th and Mathewson pitched the 9th for the win, giving up no hits. The next day he pitched a complete game shutout.

But how often did starters pitch in relief in the past and how has this changed over time? I looked at the percentage of games pitched in relief by starters each decade starting with 1900-09. In each decade I found this % for the season leaders in games started. The number of pitchers in the leaders were 3 for each team in each year. I figured that each team would have at least 3 guys who started fairly often. But I also looked at the % for all pitchers who started at least 31 games (and at least 33 beginning in 1960). So the table below shows these percentages:

The first column shows the % of games pitched in relief by the leaders in starts. That would be the top 480 in games started in a season for the 1920s, for example. So in that group, 19.5% of their games were in relief. The next column shows the % of games pitched in relief by pitchers who started at least 31 games (up to the 1950s) or 33 games since the 1960s. The trends are pretty clear.

The graph below shows the percentages over time.