Friday, January 9, 2026

Value of OBP and SLG by Lineup Position

I first posted this to the "Beyond the Boxscore" site back in 2006. That site might not exist any more.

So here is the first post I did on this. I did others later on and I will probably add them here soon.

One question that often comes up is "what is the relative value of on-base percentage (OBP) and slugging percentage (SLG)?" Is OBP 50% more important than SLG? Or 60%? Or something else? A stat called OPS simply adds the two, giving them equal weight. But maybe the weight should not be equal. For example, here is the regression equation of team runs per game for the years 2001-03:

R/G = 17.11*OBP + 11.13*SLG - 5.66

This makes OBP about 53% more important than SLG, a fairly typical result. But it is possible that OBP might be more important for certain positions in the lineup, like the leadoff batter. And for SLG, it might be more important for the cleanup hitter. To check this out, I ran a regression in which team runs per game was the dependent variable (DV) and the OBP and SLG of each lineup slot as the independent variables (IVs). OBP1 means the OBP of the leadoff batter, SLG3 means the SLG of the third place hitter, etc. I used data from Retrosheet for the 1989-2002 seasons. Retrosheet shows the stats for each team by lineup position. Below are the coefficient values for the IVs.

There is quite a variance. A point of OBP is worth about .003 runs per game from the leadoff man (a .021 increase in the leadoff OBP would be about .063 runs more per game or 10 for a whole season, which usually means about 1 win) The value of OBP is much less for the number 8 man. For the leadoff man, OBP is three times as important as SLG. For the cleanup hitter, they are almost the same. So this analysis shows that the relative values of OBP and SLG could be different depending on the lineup position of the batter in question.

Mark Pankin has already looked at this issue using a tool called Markov Chains. He presented his results at the SABR convention in 2004. His study is on line at:

http://www.pankin.com/sabr34.pdf

There could be multicollinearity in my analysis, meaning that the coefficient estimates are not as reliable as they could be because IVs are highly correlated with each other. I discuss what I did to detect multicollinearity below. But if this were a problem, I tried a different, but similar model where the IVs would likely be less correlated with each other.

Each lineup slot had 3 variables: walk percentage, hit percentage and extra-base percentage. For walks, hits, and extra-bases, the denominator was plate appearances (PAs). This is a little different than comparing OBP and SLG since OBP has PAs as the denominator and SLG has ABs. Also, by using extra-bases, it is a little like isolated power. SLG is not always as good measure of power because a guy who hits a single drives up his SLG. Isolated power is SLG - AVG, or extra-bases divided by ABs. Of course, here, I am using PAs. H1 is the hit% of the leadoff man, W1 is the walk% of the leadoff man, XB1 is the extra-base% of the leadoff man, etc. Here are the coefficient estimates:

Again, there are some big differences. The value of a walk to the leadoff man is twice what it is for the number 6 man. The cleanup hitter has the highest extra-base value.

I did try some other variables. I had SBs and CS per game in the first model with OBP and SLG. Things were generally fine there except that in a couple of cases, the value of a CS was positive and in one case the value of a SB was negative. Why some lineup slots would have negative values for SBs or positive values for CS is not clear. I tried one regression with just the AL since they have the DH and a regular player bats ninth. The results seemed about the same. Email me if you want those.

Multicollinearity. In the first model with OBP and SLG, most of the correlations between the IVs were under .5. But some were higher and they were all the OBP and SLG for corresponding lineup positions. The correlation between OBP1 and SLG1 was .596. Those correlations ranged from .596 to .739, except for OBP9 and SLG9, which was very high, at .897. But in the second model, only one correlation between IVs was over .5 and that was H9 and XB9 at .648. The vast majority of the others were under .2.

Another way to check for multicollinearity is to run regressions in which one IV is a function of all of the other IVs. In the first model with OBP and SLG, the r-squared was generally in the .5-.6 range (that was 18 regressions). R-squared tells us how what percentage of the variation in the DV is explained by the model. There is a stat called the "variance inflation factor" or VIF. It is 1/(1 - r-squared). So if r-squared was .5, 1- .5 = .5. Then 1/.5 = 2. A couple of sources I looked at suggested that if the VIF is under 10, multicollinearity is not a problem. Most of these were about 2. One got close to 6 (that was SLG9). I did come across one source that said there is no rule about the value of VIF and multicollinearity.

For the second model, I only ran a couple of these regressions where one IV depended on all the others. The first one was W1 and the r-squared was only about .2. I tried XB9 (which corresponds a little to SLG9, the one that was closest to being a problem in the other model) and the r-squared was only about .4, which would mean a very low VIF of about 1.7.

Also, multicollinearity is supposed to be a problem where the standard errors of the coefficient estimates are high. This makes it hard for the estimates to be significant. But that was generally not the case here. One thing I don't know about is that there might be some kind of joint hypothesis about the VIF. Maybe if you have a large number of IVs it only takes a certain number to have a VIF over 2 or something like that for there to be a problem.

 

Saturday, December 20, 2025

Which batters led their league in total bases by 100 or more since 1900?

I used the Lee Sinins Complete Baseball Encyclopedia and Baseball Reference to compile this list. It has the top 11 differentials between the leader and the 2nd place guy (I did 11 in this case since the Holmes year was a war year).

Player

TB

Diff

Year

Rogers Hornsby

450

136

1922

Jim Rice

406

113

1978

Stan Musial

429

113

1948

Babe Ruth

457

92

1921

Tommy Holmes

367

88

1945

Babe Ruth

391

85

1924

Stan Musial

366

83

1946

Aaron Judge

392

82

1922

Shohei Ohtani

411

80

2024

Ty Cobb

335

74

1917

Joe Medwick

406

73

1937

Rice (1978) is the only batter with 400+ total bases in a season when no other batter in his league that year even reached 300+ total bases. Eddie Murray was 2nd with 293. Other Hall of Famers in the AL that year include Reggie Jackson, Carlton Fisk, George Brett and Rod Carew. 

The next table has the top ratios of the leader's TBs to the TBs of the 2nd place guy.

Player

TB

Ratio

Year

Rogers Hornsby

450

1.433

1922

Jim Rice

406

1.386

1978

Stan Musial

429

1.358

1948

Tommy Holmes

367

1.315

1945

Stan Musial

366

1.293

1946

Ty Cobb

335

1.284

1917

Babe Ruth

391

1.278

1924

Aaron Judge

392

1.265

2022

Nap Lajoie

350

1.254

1901

Babe Ruth

457

1.252

1921

Shohei Ohtani

411

1.242

2024

I noticed something else about Rice in 1978. That was one of three straight years (1977-79) when he had both 200+ hits and 350+ TBs. Reaching each of those levels three straight years is very rare (and even 2 straight has not been done very often). Here are the leaders in streaks in reaching those levels.

Player

Years

Streak

Lou Gehrig

1930-32

3

Joe Medwick

1935-37

3

Jim Rice

1977-79

3

Rogers Hornsby

1921-22

2

Babe Ruth

1923-24

2

Rogers Hornsby

1924-25

2

Lou Gehrig

1927-28

2

Chuck Klein

1929-30

2

Al Simmons

1929-30

2

Jimmie Foxx

1932-33

2

Chuck Klein

1932-33

2

Hank Greenberg

1934-35

2

Joe DiMaggio

1936-37

2

Lou Gehrig

1936-37

2

Stan Musial

1948-49

2

Don Mattingly

1985-86

2

And there are not many cases of three straight years of both 200+ hits and 300+ TBs. 

Player

Years

Streak

Chuck Klein

1929-33

5

Paul Waner

1927-30

4

Bill Terry

1929-32

4

Al Simmons

1929-32

4

George Sisler

1920-22

3

Rogers Hornsby

1920-22

3

Lou Gehrig

1930-32

3

Charlie Gehringer

1934-36

3

Joe Medwick

1935-37

3

Jim Rice

1977-79

3

Steve Garvey

1978-80

3

Don Mattingly

1984-86

3

Kirby Puckett

1986-88

3

Michael Young

2004-06

3

I wonder if streaks like this are what make people want to vote for Rice or Mattingly for the Hall. For a short period fans and writers would see a guy get hits and extra-base hits at levels rarely seen. You would be pretty impressed. This ignores other valuable things like on-base percentage. But it would look like these guys were reaching some unusual heights.