## Tuesday, May 24, 2016

### What Might Explain The Difference Between A Team's ERA And Their FIP ERA?

To see a good explanation of FIP (fielding independent ERA), go to this Fangraphs link

FIP.

The idea is to estimate what a pitcher's ERA might be based only on his walks, strikeouts and HRs allowed. It was inspired by research by Voros McCracken. Pitchers might not have much control over what happens on balls in play (which excludes HRs). I thought that the FIP ERA formula they give there was created by Tangotiger, but they don't mention him. Data used here comes from Fangraphs and the Baseball Reference Play Index.

The first thing I did was to correlate (ERA - FIP ERA) with a team's defensive efficiency rating (DER). That is the percentage of balls in play converted into outs. If teams are good at this, then the number of runs they allow is affected less by walks, strikeouts and HR. I looked at all teams from 2013-2015, so 90 teams.

The correlation was -.786. As DER rises, a team's actual ERA falls farther below the ERA predicted by it's walks, strikeouts and HRs. It is probably not a big surprise to see a fairly high correlation here. Teams that have a high DER will allow fewer runs, so the actual ERA will be lower than for a team with a low DER that allows the same number of walks, strikeouts and HRs.

Then I ran a regression where the ERA - FIP ERA differential (DIFF) depended on DER and the difference between the OPS allowed with runners on base and the OPS allowed with none on (OPSDIFF). Here is the regression equation

DIFF  = 2.17*OPSDIFF - 17.6*DER + 12.1

The r-squared is .709, meaning that 70.9% of the variation in the dependent variable (DIFF) is explained by the two independent variables. If we squared the original correlation of -.786, we get .617. So there is some improvement in the model by adding the OPSDIFF variable. The standard error of the regression was .13.

The t-values for the variables are

OPSDIFF) 5.26
DER) -13.59

So they are both significant.

I also calculated the impact of a one standard deviation (SD) change in each variable on DIFF

OPSDIFF) .07
DER) -.19

As OPSDIFF increases, it means that teams are getting hit harder with runners on than otherwise, so we expect the ERA - FIPERA difference to rise. A one SD increase in OPSDIFF raises DIFF .07 (here I am talking about a team getting 1 SD worse). A team will have a higher ERA than expected if they have bad timing and get hit unusually hard with runners on base.

For DER, a one  SD improvement leads to the actual ERA falling .19 farther below the ERA predicted by walks, strikeouts and HR (here I am talking about a team getting 1 SD better).

But there is still about 30% of the variation in DIFF that this model does not explain. I thought also taking into account OPS allowed with runners on and none on would make a bigger difference than it did.