Monday, June 17, 2013

Trout Could Join A Group That Includes Only Ted Williams And Ty Cobb

Trout has a 161 OPS+ so far this year and last year he had 171. So he could have 2 seasons of 160 or higher through the age of 21 (he won't turn 21 until after June 30). Below are all the seasons of OPS+ of at least 160 through the age of 21, with at least 400 plate appearances. Both Cobb and Williams did it twice. Trout could join them this year. Even having one season like this puts you in very impressive company. They generally had long, productive, if not Hall of Fame, careers.

 
 
What if I lower the standard to a 150 OPS+? Here are all the players to have 2 seasons of at least a 150 OPS+ through the age of 21.
 
Mel Ott
Rogers Hornsby
Ted Williams
Ty Cobb
 
It seems like Trout has a great chance to join that group. Here are all the players who had 1 or more seasons of at least a 150 OPS+ through the age of 21
 
Al Kaline
Albert Pujols
Alex Rodriguez
Cesar Cedeno
Denny Lyons
Eddie Mathews
Fred Carroll
Hal Trosky
Jimmie Foxx
Ken Griffey
Mel Ott
Mickey Mantle
Mike Tiernan
Mike Trout
Rogers Hornsby
Sam Crawford
Stan Musial
Ted Williams
Tom McCreery
Tris Speaker
Ty Cobb
 
Again, a very outstanding group of players.
 

Wednesday, June 12, 2013

Josh Hamilton's Strange Season

He has 23 extra-base hits this year and only 20 RBIs. As I show later, he his hitting terribly with runners on base this year while in his career he has generally been pretty good. He is on  a pace to get 57 XBHs this year. Below is a list of all the players in the last 10 years to get 50+ XBHs yet have more XBHs than RBI. They tend to look like guys who bat 1-2 while Hamilton has been batting mostly 4-5 this year with 23 ABs batting 2.


I think Sizemore in 2006 was the only player ever to get 90+ XBHs in a season yet have more of those than RBI. He batted mostly leadoff. I think the record positive difference is Frank Baumholtz in 1953. He had 46 XBHs and 25 RBI. He was pretty much a 1-2 man.

Here are Hamilton's splits for this year and his career.

 

Click here to go to Hamilton's Yahoo page

Wednesday, June 5, 2013

Players Who Had 90+ Extra-Base Hits In A Season Before They Turned 25

Here is the list


I got this from the Lee Sinins Complete Baseball Encyclopedia. The age column shows their age as of June 30. So if a guy turned 25 on July 1, his age for that season is listed at 24. But all of these guys turned 25 after the season in question. They are all pretty much Hall of Famers or guys who put up Hall of Fame type numbers, except for Sizemore and Trosky.

But even Trosky was very good. Through age 27, he is 124th in career WAR among position players. If he had finished with that rank for his whole career, he would be a borderline case. He only played 312 games after the age of 27. His SABR bio says he suffered from very bad headaches starting around that time. He also finished in the top 7 in OPS+ 4 times.

Sizemore had 53 2Bs, 11 2Bs, 28 HRs.

Through age 25, he was 48th in career WAR among position players. He had 4 straight years in the top 10 including a number 1. Maybe he was headed for a Hall of Fame career before injuries.

And here all the guys that had from 85-89 under age 25 (I did not check to see if they turned 25 in the season in question so a few of them might have made the cutoff after their 25th birthday). It is also a pretty impressive list.


Monday, June 3, 2013

Chris Davis Has A .722 SLG Over His Last 302 ABs

That goes back to last year, including Sept and Oct. He has 30 HRs in that stretch. His SLG this year is .754. That is about 83% higher than the league average, which is .413. If he kept that up for the whole season, it will be one of the greatest relative SLGs ever, at 183. Here are the top 20.


 
 
 See also
 
Chris Davis' Absurd Season by Matt Hunter of "Beyond the Boxscore"

Friday, May 31, 2013

How On-Base Percentage and Slugging Percentage Affect Winning

This is something that I posted at Beyond the Boxscore in 2006.

It's probably obvious that if a team increases its on-base percentage (OBP) or slugging percentage (SLG), its winning percentage will go up. Get more runners on and hit for more power, you win more games. But how many more? If OBP goes up by as much as SLG, will they both lead to the same increase in wins? What does it mean for OBP to go up by as much as SLG? The same number of points? By the same percentage? Or should we look at something slightly more sophisticated, like a one standard deviation increase for each one? And what about reducing the OBP and SLG of your opponents? How many wins will that bring?

To try to get a handle on this, I used linear regression to find an equation for team winning percentage (I looked at all teams from 1989-2002). This is what I got

PCT = .493 + 2.01*OBP + .858*SLG - 2.06*OPPOBP - .806*OPPSLG

OPPOBP and OPPSLG are, respectively, the OBP and SLG teams allow their opponents. Given this relationship, how many more games will team win if they increase OBP and SLG (or reduce their opponents' OBP and SLG)? Table 1 shows the various increases in wins for a given change in performance


For example, if team OBP goes up by .010, wins over a 162 game season will increase by 3.26 (2.01*.01*162 = 3.26). For team SLG, it will go up 1.39 wins. The next column shows that the OBP increase is 2.35 times as important as the SLG increase. Lowering your opponents OBP and SLG have about the same effect and relationship.
 
The average team OBP was about .331. So a 10% increase would be about .033. The average team SLG was about .411, so a 10% increase would be about .041. The numbers were the same for OPPOBP and OPPSLG. A 10% increase in OBP adds 10.79 wins while a 10% increase in SLG adds 5.71. This makes OBP 1.89 times as important as SLG. The changes are about the same on the pitching side.
 
Standard deviation (SD) is a measure of spread or dispersion. The SD of OBP was .0149. That increase would add 4.85 wins. The SD for SLG was .0311. That increase would add 4.32 wins. In this case OBP is 1.12 times as important as SLG. On the pitching side, the SDs were about the same, so the results are similar.
 
So the relative win value of OBP and SLG can depend on how you frame the question or what kind of change you are looking at. In regressions with team runs per game as the dependent variable instead of winning percentage, the coefficient value on OBP is usually about 1.5 or 1.6 times that of SLG. It is more than double here for some reason. I am not sure why.
 
I also did the analysis with isolated power (ISO) instead of SLG. ISO is SLG minus AVG and is a better measure of power hitting than SLG, since a guy could get a single every time up and have an SLG of 1.000 with no extra base power. In this case, the regression equation was
 
PCT = .499 + 2.52*OBP + .962*ISO - 2.54*OPPOBP - .923*OPPISO
 
Table 2 shows the various win increases. I won't discuss those results since it would just repeat the previous discussion. The numbers mean the same things they meant in Table 1. The average ISO was .147 and the SD of ISO was .0227. Those were about the same on the pitching side.
 
 
Technical notes: The r-squared for the first regression .817, meaning that 81.7% of the variation in team winning percentage is explained by the equation. The standard error was .0297. That is about 4.8 wins a season. All of the independent variables were statistically significant, with all T-values above 8 or less than -8. The r-squared for the second regression .818, meaning that 81.8% of the variation in team winning percentage is explained by the equation. The standard error was .0297. That is about 4.8 wins a season. All of the independent variables were statistically significant, with all T-values above 8 or less than -8. There were 394 teams.
 
Now the comments
 

Correlation

How much do OBP and SLG correlate with each other? If there is a high correlation between the two, OBP might be sucking up some of the effect of SLG%. Isolated power might help reduce some of that but not all. If there something you could use that breaks OBP into its component parts, Walks and Hits?
 

Correlation

The correlation between OBP & SLG was .777. For OPPOBP & OPPSLG, it was .838. For ISO & OBP it was .616. For OPPOBP & OPPISO it was .703. Those seem high, so collinearity may be a problem. But I had low standard errors for the coefficient estimates, which is usually an indication that collinearity is not a problem.
 
Another way to check for multicollinearity is to run regressions in which one IV is a function of all of the other IVs. In the first model with OBP and SLG, the r-squared was about .5 when OBP was the dependent variable and the other variables (SLG, OPPOPB, OPPSLG) were the independent variables. There is a stat called the "variance inflation factor" or VIF. It is 1/(1 - r-squared). So if r-squared was .5, 1 - .5 = .5. Then 1/.5 = 2. A couple of sources I looked at suggested that if the VIF is under 10, multicollinearity is not a problem. So in this case, the VIF is only about 2. For the other 3 cases, VIF only got as high as 4. I did come across one source that said there is no rule about the value of VIF and multicollinearity.
 
But I did run the following regression based on your suggestions
 
PCT = .491 + 1.04*EXB +2.72*H + 2.53*W - 1*OPPEXB - 2.7*OPPH - 2.54*OPPW
 
EXB is extra bases/PA (PA = walks + ABs)
H is hits/PA
W = walks/PA
 
So it looks like a pretty big difference between getting hits and getting on base and hitting for power. Here are the win changes for a 1 SD improvement
 
EXB 3.38
W 4.64
H 4.43
OPPEXB 3.06
OPPH 5.19
OPPW 4.15
 

Sunday, May 26, 2013

Players Who Had A Line Drive Percentage Of At Least 30%

Sean Forman compiled this list for me. I noticed the other day that Miguel Cabrerra had 30% so far this year. I wondered what the record was and Sean was kind enough to come up with an answer. Interesting that about 90% of them are between 1996 and 2002 yet the stat goes back to 1988. Not sure why no one has done it the last 10 years. It is the % of all balls put in play that are line drives.


Wednesday, March 13, 2013

Rabbit Maranville, Mr. RBI

Why him? The following Bill James formula predicts his RBIs better than any other player:

RBI = (TB/4) + HRs

It predicts he would have had 883.75 RBIs while he actually had 884. For every 700 PAs, or about a full season, that is only off by +.016. That is the most accurate prediction for all players with 5,000+ PAs from 1876-2012 (I used Baseball Reference and RBIs might not be available for all pre-1900 years). Click here to see the rankings. The rankings are arranged by how much over or under a player was predicted.

Cap Anson was predicted to have 77.24 RBIs per 700 PAs while he actually had about 130. So he gets +52.76. Of course, this does not mean he was necessarily a great clutch hitter (although he could have been-he did lead the league 8 times in RBIs according to Baseball Reference and if you notice, he is 7 RBIs ahead of the next best guy, so he looks like a bit of an outlier). But his team led the league in OBP several times back then and in other years was often near the top.

So what might be going on with Anson? For one, he did not hit many HRs (just 97). But no one did back then so you had low HR guys batting in the middle of the order, where you would get more than the average number of RBI opportunities. Second, he might have played in some years when the league OBP was high. Third, more players reached on errors back then, creating even more opportunities.

Over the last 10 years, the formula has predicted about 20 more RBIs per team each year in the AL than they actually got. In the NL, it is about 25 more. So the prediction is coming in around 3% too high. Again, we are in a low error period, so not as many runners are reaching on errors as in other period's in baseball's history.

In, fact there is a high correlation between how often runners reach (by whatever means) and the size of the prediction error for a whole league in any given year. I added the OBP each year to the error rate (ERATE) each year (ERATE is 1 - fielding percentage). That sum was then correlated with how big the prediction error was per team (the more teams you have the bigger the error might be). For all of NL history, that correlation is .87 and for the AL it is .85. So years when an entire league had more RBIs than predicted it most likely had alot more baserunners than normal, by hits, walks, HBP and errors.

Now getting back to Maranville, he tended to bat leadoff, 2nd or 7th. Hardly great RBI slots. So you might expect him to get less than the number of RBIs expected. But he did play mainly in the 1920s and 30s, when OBPs were high and the ERATE was higher. He also hit well with runners on base. Retrosheet only has about 1300 of his 8800 career ABs broken down for this. But with none on he batted .277. With runners on, .317 and with runners in scoring position, .324. Click here to see his splits.

If you look at the rankings from the first link, you can see that many of the batters who had the biggest negative differentials (meaning they got fewer RBIs than expected) were leadoff men.

This formula may apply best to power hitters who bat in the middle of the order. So I also looked at how well the formula predicted for all players with 300+ career HRs. Click here to see that link. The guy that jumps out there is Al Simmons. He got about 25 more RBIs than expected per season and the next highest is Greenberg at about 18.

Now Simmons batted 4th most of his career (especially with the A's) and had Max Bishop leading off alot of that time. Bishop had a career OBP of .423. The 2-3 hitters probably averaged around .365. So he had alot of opportunities. But Retrosheet has an even smaller number of on-base splits for Simmons so it is hard to tell if he was a clutch hitter.

I was surprised to see Willie Mays so far down. He had 15 fewer RBIs than predicted per season yet he hit well with runners on.Click here to see his splits. Maybe he got intentionally walked alot with runners in scoring position. Mantle and Barry Bonds are also near the bottom and the same thing could have happened to them. Alfonso Soriano has batted leadoff nearly half of his PAs, so that may be why he is last.

Bill James discusses this formula in his book Solid Fool's Gold: Detours on the Way to Conventional Wisdom

I have published two articles about RBI prediction:

RBIs, Opportunities and Power Hitting

Do Hitter’s Get Their Expected RBIs?