Cybermetrics: May 2008

Saturday, May 31, 2008

Which teams have the best OPS differentials so far in 2008?

Since the season is about one-third over (teams have played 54 games or close to that number), it might be a good time to look at this.

I've done some research before where I came up with the following equation to explain a team's winning percentage

PCT = 1.21*OPSDIFF + .5

Where OPSDIFF is OPS differential, a team's hitting OPS minus the OPS it's pitchers allow. OPS is on-base percentage plus slugging percentage. You can read that earlier study here.

In the graphs below, teams in each league are ranked by OPS differential. The next column shows each team's actual winning percentage followed by their pct predicted by the equation. Then their actual wins, predicted wins and the difference.

The Angels have a -.030 OPPSDIFF but have a .571 winning pct. Maybe they have been lucky so far this year. But check out the Astros. They have a -.060 differential yet have a winning record! Then there are the Twins who have a winning record with a -.069 OPPSDIFF. So far, the Cubs and Red Sox are the strongest teams in their respective leagues. The Braves are very strong, too, but their record so far does not show it.

Sunday, May 25, 2008

Another look at salaries and wins

Alot of people have looked at this. But I started thinking about it again after I came across some data at JC Bradbury's site. You can view that data here. The data shows how many games, on average, that teams won each year from 1986-2005. It also shows how much above or below the league average in total salary each team paid in percentage terms. Again, it shows yearly averages. Suppose a team was 10% above average one year and 30% above average another year, they would get 20 (if were just over two years).

What I did was to run a regression with average wins per year as the dependent variable and the average salary (SAL, the % above or below the league average) as the independent variable.

Here is the regression equation

Wins = 0.157*SAL + 80.22

The r-squared was .489 and the standard error was 3.89 wins. The T-value for SAL was 5.17. The .157 means that if you spent 10% more on salaries than the average team, you win 1.57 more games than the average team. A zero for SAL would mean that a team spent the average amount on salaries. A negative number means the team spent below the average salary level. The table below summarizes each team.

Tampa Bay, for example, on average, had a payroll that was 38.87% below the league average. They were predicted to win 74.12 but only 64.33 wins per game. If a team were to spend 100% more than average, it should win about 96-97 games a year. The Yankees had the highest payroll above average. They spent about 70% more than the average team. They were predicted to win 91.26 games a year but actually only won 90.24.

I think the results are fairly strong. 16 of the 30 teams were predicted to within 3 or fewer wins. Only 3 were off by 6 or more wins. I think what I did differently than JC Bradbury was to use the average annual values for each team, instead of each team's data for each year. By using the averages, I think the randomness from year-to-year is eliminated. A team can sign a big free agent and maybe one year he does not do well. Or you get lucky and some non-arbitration eligible young players do very well. So by averaging, some of the good and bad luck gets flushed out.

The graph below also summarizes the results. You can see that the relationship is strong.

Monday, May 19, 2008

Can Brandon Webb win 25 games this year?

Since he already has 9 wins and might get another 26-27 starts, it seems possible. But since 1980, only one pitcher has won 25 or more games, Bob Welch, who won 27 in 1990.

I looked at all the seasons since 1946 with 25+ wins. Those pitchers averaged 38.5 starts per season and 308 IP. Webb's career high in games started is 35 and for IP it is 236.33. Mel Parnell, in 1949, had the lowest number of starts for a 25 win pitcher (33). But Parnell won one game in relief and had 295 IP.

The pitchers who had 35 or fewer starts (Webb's career high) that won 25+ games averaged 274 IP, well above Webb's career high of 236. This group of pitchers also averaged 37 games pitched, so they had 2.5 relief appearances on average, which could help them win an extra game or two (that group of 6 pitchers averaged 34.5 starts). Webb has only pitched 1 game in relief in his entire career.

The lowest number of IP for a 25 win pitcher since 1946 was Welch's 238 in 1990, just a bit more than Webb's career high. If Webb were to make 26 more starts, he would have to win 16 of those games or 61.5% of his starts. In his career, he has won 43% of his starts and last year he won 52.9%. He has been allowing 2.98 runs per 9 IP and the Diamondbacks are scoring 5.4 runs per game. Using the Bill James Pythagorean formual, that works out to a winning percentage of .767.

If he were to get the decision in 82.3% of his starts (his % from last year) the rest of the way (26 starts) and if he had a .767 winning percentage in those games, he wins 16.4 games. Added to the 9 he already has, he gets to 25. So he needs to keep pitching as well as he has this year and the Diamondbacks need to keep scoring 5.4 runs per game.

Tuesday, May 13, 2008

Berkman is on a hot streak, but Cecil Cooper had a few good ones, too

Berkman, of course, is on a great streak. He is batting .641 in May with a 1.205 SLG. You can click here to see his May stats. So Astro manager, Cecil Cooper, who was a pretty good major league hitter (he batted .352 one year) said:

"I was never in a streak like that. Never, ever."

He is either forgetting or being a little modest. He went 23 for 41 (.561 AVG) in a 10 game stretch in Aug 1980. Of course that is short of Berkman's 25 for 39 (.641) and Cooper had "only" 3 2Bs and 2 HRs. But that is still pretty darn good.

Through the first 11 games in 1979, Cooper had a .465 AVG and a .930 SLG over 43 ABs.

He also had 18 hits in 7 games in 32 ABs in a stretch in Aug 1981 for .563 AVG. and 29 TBs (a .906 SLG)

I got all these stats using Retrosheet.

Cybermetrics