Thursday, December 30, 2010

How Much Of A Yankee Killer Was Frank Lary?

His career record against them was 28-13. From 1955-61, it was 27-10 with a 3.06 ERA while his ERA against everyone else was 3.42. But did he really pitch better or differently against the Yankees?

Let's start with strikeout-to-walk ratio. In those years, Lary's was 1.62 against non-Yankee teams (I included HBP and took out IBBs-all data from Retrosheet). Against NY, it was 1.71. That may seem consistent with the "Yankee Killer" nick name, but over those years the Yankees themselves had a 1.43 ratio while the rest of the league had 1.32. So the typical pitcher had a strikeout-to-walk ratio that was .11 higher against the Yanks than everyone else. Lary was .09 better. So he was doing just about what other pitchers did.

Now HRs or HR rate (I use HRs divided by PAs with IBBs taken out). Lary allowed the Yanks a 2.6988% while he allowed the rest of the league 1.75%. So the Yanks did about 0.948 percentage points better against Lary than the average team from the rest of the AL. But that is just about normal. Over these years, the Yankees had a rate of 3.0097% while the rest of the league had a rate of 2.162%. The Yankees were about 0.848 percentage points better than the league average. So again, Lary's relative performance vs. NY is about what it was for other pitchers.

What about other hits? Lary's non-HR hit% against NY was .199 while against other teams it was .217. So that is a fairly big improvement. Some how he was better at preventing hits against the Yankees than he was against other teams. The Yankees themselves had a .205 rate while the rest of the league had .204. So the typical pitcher allowed more hits (but not alot more) to the Yankees than they normally did.

So it seems like the one thing that Lary was good at when he faced the Yankees was in preventing them from getting singles, doubles and triples. But the difference was only .018. Over, say, 36 PAs per game, that is just .648 hits. The run value of those hits is about .55 (the weighted average of the linear weights values that Pete Palmer established). So that makes a run value of .36 (interesting that that is just about the difference between his ERA against other teams and the one he had against the Yankees, 3.42 vs. 3.06).

The Tigers did score 4.93 runs per game in his starts against the Yankees from 1955-61. They averaged 4.61 runs per game overall. So the hitters rose to the occassion to support him. And maybe the fielders played a role in lowering the rate of non-HR hits he allowed. So it is possible that Lary became the "Yankee Killer" due to the aid of his teammates.

Wednesday, December 22, 2010

Bert Blyleven vs. Jack Morris

It seems like people who favor Morris over Blyleven say Morris was better in the clutch or better in big games. So I try to look at those issues here.

The table below shows their stats in 3 situations: runners on base (ROB), runners in scoring position (RISP), and close and late (CL). Data from Retrosheet.

I did not try to adjust these numbers for the league average. Blyleven might get a slight edge since the early 70s were not a big hitting era. But much of their careers did overlap. The only place where either pitcher has a big edge is Morris's edge in AVG in CL situations. But that .021 does not add up to alot. Blyleven had 2,129 ABs faced in those cases. That amounts to about 44 hits or 2 per season. That seems pretty small.

The next table shows their post season stats. League Championship Series and World Series are combined.

Morris has just about twice the IP. So if you doubled Blyleven's stats, you can see that there is not much difference between the two. Blyleven would have 86 hits, just about what Morris has. Same for HRs. But he would have more strikeouts and fewer walks.

I also looked at how they did in September pennant races. If a team finished 10 or more games ahead or behind, it was not considered to be a pennant race. If a team finished less than 10 games ahead or behind and if they were 5 or fewer games ahead or behind at the end of play of Aug. 31, it was considered a pennant race. 1991 for the Twins was not considered a pennant race (Morris was on that team). They began Sept. 7 games ahead (GA). On Sept. 15 they were 7.5 GA and they finished the season 8 GA. 1981 was not included since it was a strike year with a split season. Many teams were within a few games in Sept. This is highly unusual and winning the 2nd half only gave you a chance to play for the divisional title.

So the years I have for Morris as Sept. pennant races are 83, 87, 88, 92, 93. For Blyleven they were 77-80, 87, 89. Each pitcher had a total of 231.66 IP (Oct. data was included). Some of this data might inlcude games pitched after the divisional title was decided. But I did not feel like spending the time to figure that out. The table below shows how each pitcher did in these cases.

Again, it does not look like there is much difference between the two. So given Blyleven's far superior career stats (and peak value as measured by stats like WAR), he still deserves to make the Hall of Fame ahead of Morris. Whatever edge in the clutch or big games Morris might have, it is definitely not enough to put him ahead of Blyleven.

Wednesday, December 15, 2010

A Crude Measure Of The Most "All-Around" Players Since 1957

I started thinking about this when Cooper Nielson in a Baseball Think Factory discussion said:

"I suppose the "best all-around player" argument could go like this (keep in mind this is not my argument and not one I even agree with, but one that could conceivably and logically put Walker #1 in his era): There are five traditional baseball tools: hitting (for average), hitting for power, running, playing defense, and throwing."

See Cooperstowners in Canada: Larry Walker should be the second Canadian player elected to Cooperstown.

So here is how the crude measure works:

Multiply Gold Glove awards times 30. The idea here was to scale a great player in this stat to a great player in HRs or SBs. Brooks Robinson had the most GGs among position players with 16 and 16*30 = 480, close to 500.

Divide non-HR hits by 5. If a player had 2500 non-HR hits, you get 500.

Multiply SB*HR*non-HR*GG (with the above mentioned adjustments being made for GG and non-HR). If player had no GGs, I stopped multiplying so they did not end up at zero.

For Willie Mays it was 42,129,996,480. That is way too high a number to work with. So I raised it to the .25 power. That gave him 453, a more familiar kind of number to baseball fans. But that was divided by PAs and then multiplied by 10 to get the final number. Mays then had .363 (a nice number, close to the highest all-time batting average of .366 belonging to Ty Cobb). Here is the top 25:

1 Willie Mays 0.363
2 Torii Hunter 0.362
3 Barry Bonds 0.357
4 Larry Walker 0.355
5 Ichiro Suzuki 0.352
6 Ryne Sandberg 0.349
7 Eric Davis 0.345
8 Cesar Cedeno 0.345
9 Roberto Alomar 0.337
10 Devon White 0.333
11 Andruw Jones 0.330
12 Andre Dawson 0.327
13 Garry Maddox 0.325
14 Bobby Bonds 0.316
15 Andy Van Slyke 0.313
16 Mike Schmidt 0.311
17 Ken Griffey Jr. 0.309
18 Carlos Beltran 0.302
19 Paul Blair 0.296
20 Joe Morgan 0.295
21 Marquis Grissom 0.293
22 Ivan Rodriguez 0.292
23 Dwayne Murphy 0.291
24 Bill White 0.285
25 Jimmy Rollins 0.284

If I started with his stats from 1957 on, when they started giving out Gold Gloves, Mays gets .378.

Sunday, December 12, 2010

What Might Explain Ron Santo's Low Hall Of Fame Voting Percentages?

It seems like it might be for the reasons I have have seen people give the last week or so: no post-season exposure, somewhat short career (he did not reach 10,000 PAs), lack of milestones like 3000 hits or 500 HRs and lack of MVP awards.

Last year and earlier this year I posted some regression generated equations that tried to explain the percentage of the Hall of Fame vote player got in their first year of eligibility (and also their highest percentage). The model I came up with was based on some trial and error. That seemed unavoidable, since it is hard to have priors on what exactly the voters are thinking. The model looked at all players that became eligible for the first time from 1980-2009.

The model uses the following data to explain vote percentage:

Reaching 10,000 PAs
500 HRs
3000 hits
500 SBs
Gold Gloves
All-Star games
World Series performance
MVP awards

Gold Gloves and All-Star games got capped at certain levels which were then squared. The idea was that those things have an exponential effect which tapers off. There were also interaction terms for World Series performance, Gold Gloves and All-Star games. The idea there was that getting lots of Gold Gloves and playing in lots of All-Star games has more than an additive effect (after I discuss what the model predicted for Santo, technical details like regression results and variable descriptons will be covered).

Santo's first year percentage was 3.9%. Normally, he would no longer be eligible in the writers' voting. But he and some other players were re-instated in 1985. He got 13.4%. The model predicted that he would get 17.65%. The standard error was .08. So even if we give him 8% more, that only jumps him up to 21.4%. Still a pretty low total for a first year (Billy Williams got 23.4% in his first year in 1982 and steadily increased until he got 85.7% in 1987).

Santo's highest percentage was 43%. The model predicted it would be 30%. So he actually did better than that. The standard error was .117. So he was predicted to be about 4 standard errors below what is needed for induction, 75%. And his actual highest percentage was still about 3 standard errors below 75%. Billy Williams highest predicted percentage was 29.6% while it was actually 85.7%. That differential of 56.1% is the highest positive differential. Why Williams is in and Santo isn't is an interesting question.

Here was the equation where the player's first year vote percentage was the dependent variable

PCT = -.010 + .00086(WSAS) + .048(GGAS) + .070(MVP) + .404(3000 HIT) + .280(500 HR) + .002(ASSQ10) - .00089(GGSQ7) + .071(500SB) - .006(WSIMPSQ50) + .100(10000PA)

The adjusted r-squared was .898 The standard error was .08.

Here was the equation where the player's highest vote percentage was the dependent variable

PCT = -.014 + .00037(WSAS/1000) + .025(GGAS/1000) + .067(MVP) + .257(3000 HIT) + .201(500 HR) + .0048(ASSQ10) - .0013(GGSQ7) + .071(500SB) - .00167(WSIMPSQ50/1000) + .137(10000PA)

The adjusted r-squared was .861 The standard error was .117.

MVP is number of MVP awards won, 3000H is a dummy variable (1 if a player reached it, 0 otherwise). The 500HR is also a dummy variable as it is for 500SB and 10000PA (if you made it to 10,000 career plate appearances, you get a 1, 0 otherwise). I used all the voting data from 1990-2009.

What is ASSQ10? It is the square of the number of All-star games played in squared. But AS games played is maxed out at 10. The assumption here is that being an all-star has a positive exponential effect but only up to a point where no more games helps (I have a graph below to help explain this). The GGSQ7 is the same thing for Gold Gloves.

WSIMPSQ50 involves World Series play. First, WSIMP is World Series PAs times OPS. The idea here that the more you play in the World Series the more votes you would get, but by multiplying it by OPS, it also includes how well you played (or just hit). This gets maxed out at 50 and is squared, for the same reason as all-star games (yes, Reggie Jackson is first here and way ahead of everyone else at 141, with Dave Justice and Lonnie Smith tied for 2nd at 101).

The last two variables are interaction variables. GGAS is the gold glove variable multiplied by the all-star variable and WSAS is the world series variable times the all-star game variable. It looks strange that the coefficient values on GGSQ7 and WSIMPSQ50 are negative. But you might notice that they are positive on the interactive variables. I think this is like when a regression uses both X and X-squared in a regression if the phenomena is non-linear (an inverted parabola, for example). The coefficient on X ends up being positive while the x-squared coefficient is negative. The reason I put in these interactive variables was to see if players who were strong in both got an extra boost, as if there was some synergy going on. It seems like they did get an extra boost.

Since the dependent variable can only go from 0 to 100, the coefficient would be very low. So I divided these three variables by 1000 (my stat package was showing coefficient values of .00000 before I did this).

Monday, December 6, 2010

Did Santo Play In An Era Of Poor Third Basemen?

Here are the offensive winning percentages for NL 3B men for different periods. Data from the Lee Sinins Complete baseball encyclopedia.

1941-50) .507
1951-60) .495
1961-70) .516
1971-80) .512
1981-90) .498

Santo had .618 from 1961-70. He was about 9% of the total, so without him it was probably about .506. Nothing unusual. The guys Santo got compared to were not sub par in hitting.

Santo lead the NL 4 straight years in Total Zone Runs (fielding) for 3rd basemen (from Baseball Reference). But his total over those 4 years, 39, is one of the lowest (BR starts this stat in the early 1950s-I calculated the cumulative total of the leaders over each 4 year period regardless of who it was). It is tied for the 7th lowest in the NL. The lowest is 35 and some of the periods that were lower include the 1981 strike year. The average cumulative 4 year total for the leaders was about 55 in the NL and 72 in the AL. Only two periods in the AL were below 40.

So it is possible that in some years Santo benefits by being compared to poor fielding 3rd basemen. But this is probably not alot of his overall value.

See Yearly League Leaders & Records for Total Zone Runs as 3B at BR. Santo's numbers from 1965-8, the years he lead, seem low compared to the AL in those years and the NL in the years both before and after.

Sunday, December 5, 2010

Santo Was Valuable Outside Of Wrigley Field

Santo did seem to benefit alot from Wrigley. But what if we tried to estimate only his value in road games? Doing a quick calculation to find his road OBP & SLG from 1960-73, I got .346 & .413. Does not sound that great. But in his time, it was pretty valuable. Here is the relationship from regression analysis between runs per game and OBP & SLG:

R/G =16.55*OBP + 10.56*SLG - 5.15

A team with an OBP of .342 and an SLG of .413 would score 4.93 R/G. The league average in those years was about 4.06. That would give us a Pythagorean pct of .596. Pretty darn good.

I also ran a regression with winning pct being the dependent variable and runs per game and opponents runs per game being the independent variables. Here is the equation

Pct = .515 + .111*RG - .114*ORG

If a team scored 4.93 runs per game and allowed 4.06 per game, they would have a .596 pct. That is how good Santo was just in road games.