Thursday, February 20, 2014

"Variation in payroll in baseball explains less than 20 percent of the variation in winning percentage"

That quote is from Dave Berri, professor of economics at Southern Utah University. He researches the economics of sports. See Why Masahiro Tanaka’s Yankees Contract Is Good for Baseball at Freakonomics, 2-4-2014.

Berri goes on to say:
"One reason why spending doesn’t match outcomes is that forecasting the future in baseball is difficult. We can look at the stats and know who was “good” or “bad” in the past, but the future – especially for pitchers – is hard to predict.  Consequently, it is hard for the richest teams to simply spend money and win."
He doesn't give any more details, like what years he looked at, if he combined many years together in a regression analysis, etc.

I did a post in 2008 called Another look at salaries and wins. I got payroll explaining close to 50% by using average wins per year over a long periods and each team's average percentage above or below the league average in salary. By using the averages, I think the randomness from year-to-year is eliminated. Here is that post:

Alot of people have looked at this. But I started thinking about it again after I came across some data at JC Bradbury's site. You can view that data here. The data shows how many games, on average, that teams won each year from 1986-2005. It also shows how much above or below the league average in total salary each team paid in percentage terms. Again, it shows yearly averages. Suppose a team was 10% above average one year and 30% above average another year, they would get 20 (if were just over two years).

What I did was to run a regression with average wins per year as the dependent variable and the average salary (SAL, the % above or below the league average) as the independent variable.

Here is the regression equation

Wins = 0.157*SAL + 80.22

The r-squared was .489 and the standard error was 3.89 wins. The T-value for SAL was 5.17. The .157 means that if you spent 10% more on salaries than the average team, you win 1.57 more games than the average team. A zero for SAL would mean that a team spent the average amount on salaries. A negative number means the team spent below the average salary level. The table below summarizes each team.



Tampa Bay, for example, on average, had a payroll that was 38.87% below the league average. They were predicted to win 74.12 but only 64.33 wins per game. If a team were to spend 100% more than average, it should win about 96-97 games a year. The Yankees had the highest payroll above average. They spent about 70% more than the average team. They were predicted to win 91.26 games a year but actually only won 90.24.

I think the results are fairly strong. 16 of the 30 teams were predicted to within 3 or fewer wins. Only 3 were off by 6 or more wins. I think what I did differently than JC Bradbury was to use the average annual values for each team, instead of each team's data for each year. By using the averages, I think the randomness from year-to-year is eliminated. A team can sign a big free agent and maybe one year he does not do well. Or you get lucky and some non-arbitration eligible young players do very well. So by averaging, some of the good and bad luck gets flushed out.

The graph below also summarizes the results. You can see that the relationship is strong.