## Friday, March 24, 2023

### Log5 vs. Strat-O-Matic

Click here to read the Wikipedia article on Log5. I will compare the batting average we expect a hitter to get vs. a pitcher from the tabletop baseball game Strat-O-Matic to what Log5 predicts.

Log5 is a formula that estimates the winning pct one team will have against another. For example if team A has .700 winning pct while team B has a .600 pct (and they each played in the same league and played a large number of games), what winning pct would team A have against team B?

It can also be used to estimate what batting average a player will have against a pitcher if we know the hitter's average, the average the pitcher allowed and the league average. That formula is

where PB is the hitter’s average, PP is the average allowed by the pitcher and PL is the league average. PB,P is the probability (or average) that the player will get a hit against the pitcher.

The Strat-O-Matic (SM) game works by rolling dice. Half the time you have to look at the batter's card and half the time the pitcher's card (determined by the dice).  The hitter and pitcher cards have rows and columns and you know where to look for the result from the dice.

Suppose the league average is .270. A .300 hitter is facing a pitcher who allowed a .300 average. The hitter has to have an average higher than .300 on his card since that is how he does against the whole league and the average pitcher allows a .270 average. This pitcher is worse than average, so the hitter should do even better than he normally does.

As far as I can tell, what SM does is give the batter's card a .330 average. Since he was 30 pints better than the league average, we add that to .300 to get .330. So that means that if he faced a league average pitcher all the time (one who allows a .270 average), he would end up batting .300 since half the time he would hit .330 and the other half .270 (since the game is set up to be on the hitter's card half the time and the pitcher's card half the time). The pitcher who allows a .300 average would have a .330 card. So this hitter would hit .330 against this pitcher.

I don't know if either Log% or Strat-O-Matic take into account that hitters don't face pitchers on their own team (and vice-versa). If the league average is .270 and your team's pitchers allowed a .260 average, then the average pitcher you faced will be a little above .270. Also, I don't know if either method takes into account unbalanced schedules. You play a disproportionate share of games against your own division. A batter might face a group of pitchers whose composite average allowed does not match the overall league average. I don't know how much difference any of this makes.

I created two tables of all the predicted batting averages, one using the Strat-O-Matic method (Table 1) and the other using the Log5 method (Table 2) for a league with a .270 batting average. Batting averages for the hitters and pitchers both ranged from .150 to .390 in increments of .010. The averages allowed by the pitchers are read going across and for the batters they are read going down (it looks like it actually does not matter who gets rows and who gets columns).

There is also a third table that shows the differences (Strato minus Log5). That is Table 3

Table 1 for Strato shows that if a .250 hitter faced a pitcher who allowed a .200 average, he would bat .180. Log5 says he would bat .184. So the difference is -.004. That is shown in Table 3.

There are a total of 625 cases. In 419 of them, the difference between Strato and Log5 is .010 or less. Those cases are in bold red in Table 3. There are 50 cases where the difference is .025 or more. Those are in bold green in Table 3.

So it looks like in almost two-thirds of the cases, Strato and Log5 differ by no more than .010. In less than 10% of the cases do they differ by .025 or more.

The 1980 AL season had a league average of .269. There were 134 players with 300+ PAs. 121 of them had averages between .230 and .330 (90%). There 112 pitchers who faced 300+ batters and 102 of them were between .230 & .330 (91%). Data from Stathead.

The numbers below show the difference in batting average expected in the 4 cases of the endpoints of these brackets facing each other in terms (Strato minus Log5)

.230 batters vs. .230 pitcher) -.004
.230 batters vs. .330 pitcher) .005
.330 batters vs. .230 pitcher)  .005
.330 batters vs. .330 pitcher)-.006

So 90% of the cases would differ by no more than -.006. Now what about the most extreme cases? Here the highest and lowest averages for the batters and pitchers from 1980:

Batters
George Brett .390
Kiko Garcia .199

Pitchers
Ed Figueroa .364
Mike Norris .209

Below are the predicted averages for the 4 cases using Strato and Log5

Brett vs. Figueroa) Strato .485, Log5 .499
Brett vs. Norris) Strato .330, Log5 .315
Garcia vs. Figueroa) Strato .294, Log5 .279
Garcia vs. Norris) Strato .139, Log5 .151

So even at those extremes, there is not that much difference.

Another season I looked at was the 1996 NL.  Here are the extremes in AVG for all players and pitchers who had 300+ PAs or 300+ batters faced.

Batters
Jim Eisenreich .361
Rey Sanchez .211

Pitchers
Chris Hammond .315
Trevor Hoffman .161
Mel Rojas .193

The reason I put in Rojas is that Hoffman's season is a bit of an outlier (and Rojas had the next lowest average allowed). It is the 16th lowest average allowed in the Stathead data base for pitchers with 300+ batters faced. They have only 33 seasons of .170 or lower. So Hoffman 1996 is an unusual case.

Below are the predicted averages for the 6 cases using Strato and Log5. The NL batting average in 1996 was .262.

Eisenreich vs. Hammond) Strato .414, Log5 .422
Eisenreich vs. Rojas) Strato .292, Log5 .276
Eisenreich vs. Hoffman) Strato .260, Log5 .234
Sanchez vs. Hammond) Strato .264, Log5 .257
Sanchez vs. Rojas) Strato .142, Log5 .153
Sanchez vs. Hoffman) Strato .110, Log5 .126

These are generally pretty close except for  Eisenreich vs. Hoffman where there is a .026 difference between Strato and Log5. But again, it was an unusual season for Hoffman. The differences we observe are in the same range as for the extremes in the 1980 AL. So it is likely that a large % of the match ups in the 1996 NL would have very little difference between Strato and Log5.

## Thursday, March 9, 2023

### How would the team with the highest OPS differential (1927 Yankees) do against the team with second highest OPS differential?

Last August I compiled the top 25 teams in OPS differential from 1901-2021. The 2022 Dodgers would tie for 4th place while the 2022 Astros would be tied for 15th. See Highest Team OPS Differentials, 1901-2021.

The 1927 Yankees are first with .197. If we only include full seasons, the 2019 Astros are 2nd with .167, then the 1939 Yankees and 2022 Dodgers (both at .158), then the 2019 Dodgers and 1902 Pirates (both at .149). So the 1927 Yankees are well ahead of the pack.

The 2020 Dodgers had .194 but that was only a 60 game season.

I wondered what this huge edge for the 1927 Yankees means mathematically. So the first thing I did was to estimate their winning pct. In regressions, the equation I get for pct is usually something like

Pct = .5 + 1.3*OPSDIFF

That would give the 1927 Yankees a .756 winning pct (in real life it was only .714 so they had some bad luck or bad timing). The 2nd place Astros would have .717 by this equation.

What winning pct might a .756 team have against a .717 team? For that, I turned to a Bill James formula called Log5 (and I think some of the credit also goes to a guy named Dallas Adams). The formula assumes that the two teams faced the same competition. That is not the case here, but all I want to know is what the mathematical difference is between the teams based on their respective OPS differentials.

Here is the formula for telling us the winning pct for team 1 (1927 Yankees) if they play a set of games against team 2 (2019 Astros)

PW1*(1-PW2) divided by PW1*(1-PW2) + PW2*(1-PW1)

PW1 is the Yankees winning pct (.756). PW2 is the Astros pct (.717). Plugging in those numbers, we get .550. So the 1927 Yankees are so much better than even the next best team (albeit only mathematically) that they would beat that team 55% of the time.

A .550 winning pct is 89 wins over 162 games and is not too far off from what it takes to come in first place or make the playoffs. Even against the next best team, the 1927 Yankees are close to contenders in a normal league.

Update March 11, 2023: I came across an interesting comment by Tangotiger at "Bill James Online." He showed a simpler way to estimate the winning pct that gets the same answer as Log5. See The Log5 Method, Etc, Etc, Etc..

"To show how the odds ratio method works, and how it gives the identical results to log5.

You have a .600 team (.6 wins per .4 losses or 1.5 wins ratio per loss) facing a .400 team (.4 wins per .6 losses or 0.67 wins ratio per loss).

When a .600 v .400, you would do:
1.5 / 0.67 = 2.25

That's 2.25 wins per 1 loss. To convert a ratio to a percentage:
2.25 / (2.25 + 1) = .692

Just like log5."

In this case, for the Yankees, they have 3.1 wins for every loss (.756/(1 - .756) =3.1). For the Astros it is 2.53 wins for every loss (.717/(1 - .717) = 2.53).

Then 3.1/2.53 = 1.22

Then 1.22/(1.22 + 1) = 1.22/2.22 = .550.

That is what I got using Log5.

## Wednesday, March 8, 2023

### Team OPS differentials for 2022

The numbers are in the table below. POPS is the OPS allowed by the team's pitching staff. The Dodgers hitters had a .775 OPS while their pitchers allowed .617. That gives them a .158 differential. Data from Baseball Reference and Stathead.

Last August I compiled the top 25 teams from 1901-2021. The 2022 Dodgers would tie for 4th place while the 2022 Astros would be tied for 15th. See Highest Team OPS Differentials, 1901-2021.

 Team OPS POPS W-L% Diff Los Angeles Dodgers 0.775 0.617 0.685 0.158 Houston Astros 0.743 0.613 0.654 0.130 New York Yankees 0.751 0.640 0.611 0.111 Atlanta Braves 0.761 0.651 0.623 0.110 New York Mets 0.744 0.676 0.623 0.068 Toronto Blue Jays 0.760 0.707 0.568 0.053 St. Louis Cardinals 0.745 0.696 0.574 0.049 Philadelphia Phillies 0.739 0.701 0.537 0.038 Cleveland Guardians 0.699 0.665 0.568 0.034 Milwaukee Brewers 0.724 0.691 0.531 0.033 San Diego Padres 0.700 0.679 0.549 0.021 Tampa Bay Rays 0.686 0.669 0.531 0.017 Seattle Mariners 0.704 0.688 0.556 0.016 Minnesota Twins 0.718 0.707 0.481 0.011 San Francisco Giants 0.705 0.702 0.500 0.003 Los Angeles Angels 0.687 0.685 0.451 0.002 Chicago White Sox 0.698 0.700 0.500 -0.002 Boston Red Sox 0.731 0.745 0.481 -0.014 Texas Rangers 0.696 0.722 0.420 -0.026 Baltimore Orioles 0.695 0.724 0.512 -0.029 Chicago Cubs 0.698 0.731 0.457 -0.033 Arizona Diamondbacks 0.689 0.729 0.457 -0.040 Miami Marlins 0.658 0.720 0.426 -0.062 Colorado Rockies 0.713 0.786 0.420 -0.073 Detroit Tigers 0.632 0.710 0.407 -0.078 Kansas City Royals 0.686 0.770 0.401 -0.084 Cincinnati Reds 0.676 0.766 0.383 -0.090 Pittsburgh Pirates 0.655 0.748 0.383 -0.093 Washington Nationals 0.688 0.791 0.340 -0.103 Oakland Athletics 0.626 0.750 0.370 -0.124

The graph below shows the relationship between team OPS differential and team W-L%.

The trend line shows that there was a very close relationship last year. The regression equation is

W-L% = 1.2151x + 0.4998

The R² (R squared) = 0.9343, meaning 93.43% of the variation in team W-L% is explained by OPS differential.

I was surprised by such a close relationship for just one year of data. Some regressions that I have done on this averaged team OPS differential and team W-L% over a five year period and had a lower R squared. See The Relationship Between OPS Differential And Winning Percentage Using 5 Year Averages.