Friday, April 30, 2010

Explaining The Rays Fast Start

They are 17-5 while they outscored their opponents 142-73.That works out to a pythagorean pct of .791 or 17.4 wins. So it seems like sabermetrically they are getting what they are supposed to. But they are actually coming up huge in the clutch so far.

Their hitting OPS is .809 while their opponent's is .700. That differntial of .109 translates into a winning pct of .632. based on an old regression I ran. The equation was

Pct = .5 +1.21*OPSDIFF

That would give them only 13.9 wins, 3.5 fewer than pythagoras. Here is where the clutch comes in. Their OPS with runners on base (ROB) is .926 while with none on it is only .710. Last year in MLB, the average for those two was .763 & .741. Teams do slightly better with ROB, but not over 200 point of OPS better like the Rays this year. Of course, a .634 pct would mean 102 wins for the year, which is still outstanding.

Their pitchers have allowed a ROB OPS of .647 while it is .735 with none on. Again, much higher than a normal split. Their ROB advantage is .088 while last year teams usually gave up an OPS of .022 higher with ROB. You might also notice that with none on, the Rays' opponents are out-OPSing them .735 to .710!

But with runners in scoring position (RISP), things are even more startling. Their hitters have a .999 OPS! Their pitchers have allowed only a .515 OPS. Now last year in MLB, the OPS by hitters in all situations was .750 and with RISP it was .761, only slightly better. The Rays are .190 better. On the pitching side, they are allowing an OPS of .185 less with RISP than overall. But last year, teams normally allowe an OPS of .011 higher with RISP.

In close and late situations (CL), their OPS is just .807. But that is only .002 less than overall. Last year in the AL, OPS in CL situations was .721, .043 lower than the overall of .764. For the Rays to be even close to their normal OPS in CL situations is very good.

Their pitchers have allowed an OPS of just .615 in the CL, while it is .700 overall. So that is a gain of .085. Last year the average AL pitching staff allowed an OPS of .759 in all situations, while it was .719 in the CL, a margin of .040. So the Rays this year have doubled that margin.

Finally, a regression gives the following equation for winning pct by breaking down the OPS differential into ROB and none on cases. It is

Pct = .772*ROBDIFF + .508*NONEDIFF + .4995

It projects the Rays to have a .702 pct. Now the earlier regression gave them .634 while their actual pct is .773. So that one was .139 too low. This one raised the projection by .068, meaning that taking into account their ROB performances explains about have the difference. Considering their RISP performance as well would probably close the gap even more.

Monday, April 26, 2010

Do Home Runs Cause More Hit Batters?

Possibly. This issue came up in a NY Times article by STUART MILLER called Plunking Parallel: Steroid Use and Hit Batsmen (Hat Tip: JC Bradbury). I have done some work on this. Here are some findings, in no particular order:

From More On The Changing Historical Relationship Between Walks, HBPs and HRs
-There is a significant positive relationship between a pitcher's walk rate and his HBP rate

-In the 1960s, a pitcher who gave up more HRs hit fewer batters but today a pitcher who gives up more HRs hits more batters.

From The Changing Historical Relationship Between Walks, HBPs and HRs
-For both leagues, the HBP/Walk rate has been rising since 1980 (so poor control is not the only reason for more HBP).

-In recent years (up through 2007), the HBP/HR rate has been relatively high, even adjusting HBPs for control as measured by the walk rate.

From Do Sluggers Get Hit By The Pitch More Than They Used To?
-players who hit HRs more frequently are now more likely to get hit by a pitch than in the the 50s, 60s and 70s.

-hitting a HR in the 1990s was 83% more dangerous in the 1990s than it was in the 1960s in terms of causing the player to be HBP.

Saturday, April 24, 2010

How Much Does Team Consistency Matter?

It seems to matter, but maybe much less than simply scoring and preventing runs. I looked at two periods, 1963-68 and 1996-2000. In each case, I first ran a regression with team winning percentage as the dependent variable and runs per game and opponents' runs per game as the independent variables. 

Then I added two variables in a second regression which measured consistency. HITCON was the standard deviation (SD) of runs per game divided by runs per game (just the SD would not be right since high scoring teams will have a greater SD). PITCON does something similar on the pitching side.

For 1963-68 (120 teams), the first regression equation was 

PCT = .528 + .108*R - .115*OR 

Again, R & OR are per game. The r-squared was .903, meaning that the equation explains 90.3% of the variation in the dependent variable. The standard error was .023. For a 162 games, that works out to about 3.73 wins. 

Now the 2nd regression with the consistency variables added in. 

PCT = .493 + .098*R - .103*OR - .084*HITCON + .117*PITCON 

The r-squared did rise, but only slightly, to .912 while the standard error fell to 3.59 wins per season. 

The coefficient values on the consistency variables seem to make sense. The more consistent hitting teams win more for a given average runs per game while the less consistent pitching teams win more.

That may seem strange, but if you allowed 4 runs per game on average you would win at least 81 games if you gave up 0 runs half the time and 8 the other half. You would win some of those 8 runs games, so you would have a winning record. If it were a league that had an average of 4 runs per game, you would win more than expected. 

On the surface, it might look like the consistency variables are pretty important. But the coefficient values are only about as high as they are for R & OR because the consistency variables are alot lower. For example, average runs per game was 3.86 while the HITCON average was .73. So the coefficient values have to be relatively high on the consistency variables. The R & OR variables were more significant, with higher t-values. 

Here they are for all four: 

R: 16 

OR: -17.96 

HITCON: -2.2 

PITCON: 2.99 

I also found the number of extra wins that would be generated by a one standard deviation improvement in each variable. That means scoring more runs, giving up fewer runs, scoring more consistently and giving up runs less consistently (because the coefficient on that PITCON was positive). 

R: 7.68 

OR: 8.12 

HITCON: 1.05 

PITCON: 1.34 

So a one SD improvement in run scoring consistency (HITCON) adds 1.05 wins. That is a lot less that the 7.68 for simply scoring. We could say something similar on the pitching side. So it looks like a team should be more interested in just trying to score runs than being more consistent. Less consistency on the pitching side is desirable, but not nearly as much as simply preventing runs. 

For the 1996-2000 period (146 teams), the first regression was 

PCT = .500 + .0944*R - .0945*OR 

The r-squared was .894 and the standard error worked out to 3.63 wins per season. 

The second regression was 

 PCT = .441 + .085*R - .082*OR - .139*HITCON + .202*PITCON 

The r-squared was .915 and the standard error worked out to 3.28 wins per season. The results are similar to those of the 1963-68 period. Adding the consistency variables does improve the accuracy of the model, but only slightly. The signs on the coefficients are the same. 

The t-values were 

R: 22.5 

OR: -21.3 

HITCON: -3.3 

PITCON: 5.3 

The number of extra wins that would be generated by a one standard deviation improvement in each variable were: 

R: 7.34 

OR: 7.4 

HITCON: 1.02

 PITCON: 1.81 

These numbers a very close to the numbers for the 1963-68 period. So again, it looks like it is much more important to score and prevent runs than become more consistent (or less, on the pitching side). This is true for a low scoring era, 1963-68, when the average runs per game was 3.86 as well as for the latter period when it was 4.97. Sources: Retrosheet, Baseball Reference, Sean Lahman Baseball Archive

Sunday, April 18, 2010

What Were The Best Relative Base Stealing Careers?

This is based on what I did last week to look at the best seasons. It involves finding base stealings runs (using the linear weights values) divided by times reaching first base. Then I found how much bigger that was than the league average and ranked the players. I looked at all players who had 200+ SBs since 1920 since that is when the AL started keeping a constant, year-by-year, track of caught stealing (the NL began in 1951). So a few NLers were left out: Pee Wee Reese, Richie Ashburn, Billy Werber, Frankie Frisch and Kiki Cuyler. I did included Max Carey since the Lee Sinins Complete Baseball Encyclopedia shows his CS for 1913-25, although I don't know how accurate that is. So the two tables below show the top 50.




Update: Tom Hanrahan just published something similar in "By The Numbers" called The Greatest Base Thief.

Sunday, April 11, 2010

What Were The Best Relative Base Stealing Seasons?

I took all players who had 20+ SBs in seasons when CS was also recorded from the Lee Sinins Complete Baseball Encyclopedia. These seasons were:

AL: 1914-5, AL 1920-2009
NL: 1915, 1920-25, 1951-2009

Then I calculated each player's basetealing runs (BSR) using the run values from the Baseball Encyclopedia by Pete Palmer. A SB has .22 and a CS has -.35. Once every player's BSR was calculated, I divided it by the number of times he reached first base by single, walk or HBP. That gave him a "rate." Then I calculated the average rate for that league and season. Finally, the difference was calculated. The table below shows the best 25 seasons for players who had 20+ SBs but also 100+ times reaching first base (1866 players).


In 1986, Coleman only had a .301 OBP, so that limits his opportunities to steal. His SB numbers are very close to what Wills had in 1962. But Wills had a .347 OBP. I expected that Max Carey would be up there, since in 1922 he stole 51 bases and was only caught twice! That was in a league that stole 755 bases with 634 CS. That means the rest of the league was 704-632, only a 52.7% success rate. But Carey's 1922 season was 74th while his best season was 1920 at 51st (52-10).

The player that jumps out is Fritz Maisel of the Yankees. He had a good but not great OBP of .334. But the league went 1657-1373 for a percentage of 54.7%.

I did not try to find out actual opportunities. A player can get a walk, but if there is a runner on 2nd, he has a harder time stealing. I also used the same SB/CS value for everyone. These run values can vary by year and league, depending on the run environment. Another issue is pinch running some of these guys stole bases (and were caught) while pinch running. Taking this into account might affect the rating. The table below shows how many times each of the leaders pinch ran and their SB & CS while doing so (data from Retrosheet).


My guess is that if I factored in pinch running that things would not change that much. In some cases, players stayed in the game after pinch-running. Then they got on base as a hitter and stole a base or were caught. Maisel played 150 games that year. Retrosheet lists him with 148 at 2B. He probably did not pinch run very much, if at all. He also had 630 PAs, so he is pretty close to a full-time player.

Now the leaders among players who reached base between 25 and 99 times. These guys either pinch ran alot or did not play alot.


Others who had high ratings but did alot of pinch running were

Matt Alexander 1977
Don Hopkins 1975
Larry Lintz 1976
Matt Alexander 1976

And there was also Herb Washington who stole 29 bases in 1974 solely as a pinch runner.

Wednesday, April 7, 2010

Scouts vs. Statheads: What Might Branch Rickey Say?

Some sportswriters still like to make fun of the statheads or sabermetricians who never played the game and still live in their mom's basement. But to those writers I say "read the 1954 LIFE magazine article where Branch Rickey discusses some very modern looking formulas." This article is online and was called GOODBY TO SOME OLD BASEBALL IDEAS: The 'Brain' of the game unveils formula that statistically disproves cherished myths and demonstrates what really wins. Some of the new stats he proposed were "on-base average" and "isolated power." The article even shows many formulas, some of which are complex.

Rickey is in the Hall of Fame for his work as an executive. But he also played and managed. I think if you ridicule statheads, you would probably ridicule Rickey. Here is the introductory paragraph:

"Baseball people generally are allergic to new ideas. We are slow to change. For 51 years I have judged baseball by personal observation, by considered opinion and by accepted statistical methods. But recently I have come upon a device for measuring baseball which has compelled me to put different values on some of my oldest and most cherished theories. It reveals some new and startling truths about the nature of the game. It is a means of gauging with a high degree of accuracy important factors which contribute to winning and losing baseball games. It is most disconcerting and at the same time the most constructive thing to come into baseball in my memory."
That is followed by a fairly complex formula. Then Rickey asks "Can this bizarre mathematical device be put to any practical use?" And his answer? "It can indeed! It can be applied to any major league club for any season or part of a season to diagnose points of weakness and strength."

So Rickey, perhaps one of the most influential men ever invovled in baseball, saw the need for new and complex ways of analyzing the game. How can some writers, and some GMs, not see this today?

But what about intangibles? Rickey says:

"But somehow baseball's intangibles balance out. They reflect themselves in other ways. Over an entire season, or many seasons, individuals and teams build an accumulation of mathematical constants. A man can work with them. He can measure results and establish values. He can then construct a formula which expresses something tangible, and that is why this formula was devised."
After compiling many stats and data, what did Rickey do? "We took the figures to mathematicians at a famous research institute. Did they know baseball? No, but that was not essential."

Did RBIs' figure in Rickey's formula? No. "As a statistic, RBIs were not only misleading but dishonest."

There is much more to read in this article that is of interest. Near the end of the article he mentions getting his scouts involved in finding players with power, guys who will improve the ability of his team (the Pirates) to bring runners home. But that is based on the formula. Imagine that. Rickey was going to tell his scouts what to look for based on a formula.

If you have never read the article, I think you are in for a treat since it is so well written and it was written so long ago.

Sunday, April 4, 2010

Which Players Had The Most Surprising Walk Rates? (Part 3)

To read the second part go to Which Players Had The Most Surprising Walk Rates? (Part 2).

In Part 1, I looked at walk rates relative to the league average as a function of isolated power, relative to the league average with the idea being that it is harder to walk alot if you are not a power hitter.

In Part 2, I also included a variable for height and one for stealing. Height was in inches and stealing was stolen bases divided by singles + walks + HBP. Sort of a frequency. That was also relative to the league average. The idea is that shorter guys have an easier time walking and guys who steal alot won't get walked too much if the pitcher can help it. My data sourse in the Lee Sinins Complete Baseball Encyclopedia. I used all players with 5000+ PAs.

But "By The Numbers," the newsletter of SABRs statistical analysis committee recently published an article by Tom Hanrahan called Which Batter Had the Greatest “Eye”?. He used a different method than I used. One variable we both had was ISO, but he squared it. I had tried taking logs of the variables but it did not improve the results. But I thought I would redo the regression with all the same variables except that I would square ISO.

Here is the regression equation. Everything is relative to the league average except height.

Walks = 214.25 - 1.51*SB - 1.79*HT + .0016*ISOSQD

The r-squared went up a bit, from .149 to .166. The standard error fell 30.66 to 30.35 (the variable "Walks" is actually a player's relative walk rate-if you walked 100 times while the average player walked 50 times, your rate is 200 and 100 is average). The table below shows the top 25 in terms of having a walk rate greater than that predicted by the regression equation.


So the model predicted that Thomas would have a walk rate of about 91 while it was actually 219. So he was about 128 above expectations. So he has the most surprising walk rate. He was not very tall or short at 71 inches. He had a reasonably large strike zone. Pitchers would not have walked him out of fear, since he had an relative ISO or "isolated power" of 57 (that means he had only 57% of the average level of power). Pitchers might not have minded walking him since his SB rate was only .68 (he only stole 68% as often as the average runner). That would help him get walks. But with the coefficient on SB rate at 1.79, even if he had been Rickey Henderson (whose SB rate was 4.6 times the league average), Thomas' walk rate would only go down about 5.9 (since 1.51*(4.6 - .68) is about 5.9). That would not change things much-he would still be near the top in surprising walk rates.

The overall top 25 did not change much from Part 2. But Ted Williams did fall all the way to 74th. He is now predicted to have a rate of 159 instead of 144. John Olerud rose from 26th to 24th.

Here are the players who led in getting fewer walks than predicted.


This looks pretty close to the list from Part 2.

Here are the leaders that in the best eye that Hanrahan found.

M Bishop
R Thomas
J McGraw
E Stanky
T Hartsel
M Huggins
F Fain
E Yost
R Henderson

They are on my list except for McGraw and Fain, who did not get 5000 or more PAs.

Thursday, April 1, 2010