Click on Part 1 to see what I did last January. Then I looked at walk rates relative to the league average as a function of isolated power, relative to the league average with the idea being that it is harder to walk alot if you are not a power hitter.
What inspired me to go back and do more on this was a discussion of walks between Bill James and Joe Posnanski at Talkin' about the underappreciated base on balls, with Bill James. Another interesting take on walks appeared in Baseball Magazine in 1917. The article was by FC Lane and seems ahead of its time. It was called The Base on Balls: Why Should the Records Ignore This Powerful Factor in Brainy Baseball?
This time I also included a variable for height and one for stealing. Height was in inches and stealing was stolen bases divided by singles + walks + HBP. Sort of a frequency. That was also relative to the league average. The idea is that shorter guys have an easier time walking and guys who steal alot won't get walked too much if the pitcher can help it. Here is the regression equation. Everything is relative to the league average except height. My data sourse in the Lee Sinins Complete Baseball Encyclopedia.
Walks = 195.58 - 1.25*SB - 1.8*HT + .369*ISO
The stats are all converted to a number relative to 100. If you were average at something, then you get a 100 (except for SB where 1.00 was average). Height and isolated power were significant but stealing was not.
The graph below shows the players with the most surprising walk rates. That is, their walks relative to the league average were the most above league average compared to what the equation predicted.
So Thomas walked 2.19 times as often as the average hitter. His isolated power was only 57% of the league average, he was 71 inches tall and his stolen base rate was only 68% of the league average. Now the guys who walked the least compared to expectations.
I will try to give more details later. But time to give a test.
I am back. The r-squared was .148 and the standard error was about 30. I also tried taking logs of all the variables but the results were no better. For the linear regression there was no correlation between the prediction error and any of the independent variables. I also wonder if height should be relative to the league average. But it raises the question if a 6'0" tall pitcher has a harder time throwing strikes to a 5'6" batter than a 5'6" pitcher. I don't but I assumed the height of the pitcher did not matter.