It seems obvious: Hitters who are fast and get on base alot. You also probably don't want someone who hits alot of HRs, since you want those guys to bat with runners on. So I tried to devise a stat that would capture this. Here it is:
(2B + 1.25*3B - HR + SB)/outs
In other words, how many times a player gets into scoring position per out. Since triples are worth about 25% more than 2B's according to run expectancy tables, I multiply them by 1.25. By dividing by outs, the ability to get on base is taken into account since if you make an out you don't reach base. Also, outs include caught stealing. By subtracting HRs I am saying that guys that hit alot of HRs, even though they may have other good leadoff traits, are "penalized" here, since they might be better suited to batting lower in the order. But I also ran the numbers without subtracting HRs (the correlation between the two different formulas for all the players in the study was about .86). The table below shows the top 15 from 2007 among players with 400+ PAs using both methods.
The players in the top 15 are probably not big surprises. But are they really that great at being leadoff men? Do they increase team runs by batting leadoff comapared to anyone else? To try to answer these questions, I turned to some analysis I did on lineups two years ago. You can read those articles here and here. In that research, I studied the impact on team scoring by what each slot in the lineup did. In the latter of those two articles, team runs per game was the dependent variable in a linear regression while walk%, hit%, extra-base%, SB per game and CS per game were the independent variables. The regression found a run value for each event and for each lineup slot.
I plugged in the values for those events for Jose Reyes for the number one slot to see what impact he would have on team runs per game. But I also did the same for Adam Dunn, a player who you probably would not think of as making a good leadoff man. In my rankings above, he is 208th out of 216 players. In fact, I tried both Reyes and Dunn in the leadoff slot and both in the clean up slot. The table below shows their relevant stats and the run values for each lineup slot.
If Reyes bats first, his numbers combine to make 1.326 while if Dunn bats 4th we get 1.453 (the regression had an intercept or constant equal to about -5, so to get a number for team runs per game I would have to plug in numbers for all slots, multiply things out then subtract 5-the numbers here are just individual contributions). So those two add up to 2.779. But what if Dunn batted first and Reyes batted 4th? Dunn gets 1.521 and Reyes gets 1.306 for a total of 2.827. That is actually better than having Reyes bat first and Dunn 4th. Your team would score .0485 more runs per game or about 7.86 more per season. The reason it happens this way is that Dunn walks more (101 vs. 30) and if you went to one of my links above, you can see that the run value for walks is highest for the leadoff slot.
Now if a team really tried this, Dunn might not get walked so much since he won't be as big a threat batting with the bases empty. But if the guys right behind him don't have much power and since he is not fast, they might walk him more. Reyes might not get as many extra base hits since some of his triples and doubles are a result of speed and with runners on base he might have someone clogging the bases. I looked at his career stats on that and the results are mixed. It is also possible that Dunn would not score on hits that would have scored Reyes and since some of Reyes' doubles are a result of speed more than hitting distance, his doubles might drive in fewer runs than Dunn's doubles. Would it make a 7.86 run difference over the course of a season? Maybe, but even if it did, it is still interesting that batting Dunn first and Reyes fourth, instead of vice-versa, does not seem to hurt scoring that much, even though Reyes is rated far better as a leadoff man by my measure (which seems to make some sense).
I also tried using David Pinto's optimal lineup finder, based on my lineup research. I set a lineup with Reyes batting first and Dunn 4th. Then I used Retrosheet data to fill in the rest of the lineup. I used the OBP & SLG of each lineup slot for the NL in 2007. This tool has two methods, each based on my two separate lineup studies that used different years. Having Reyes batting first and Dunn 4th with everyone else being league average for their slot generated 4.93 to 4.94 runs per game. But the tool in each case did find that Reyes should bat leadoff. In one case it had Dunn batting 4th which generated 5.04 runs per game. In the other case it had Dunn 2nd for 4.99 runs per game. In the two cases where I had Dunn first and Reyes 4th, the runs per game were 4.90 and 4.93 (as stated above, the reverse yielded 4.93 and 4.94).
Now that model did not include stealing. But again, even though Reyes batting first and Dunn 4th does better than vice-versa, it is not by much. If stealing were included in Pinto's tool, it would be a bigger difference. But recall that in the model with things broken down by hits, walks and extrabases, Dunn batting first did better.
Then I ran a simulation using the Star Simulator. I plugged in all the numbers for each lineup slot again using Retrosheet data (2007 NL). The simulation had the average team scoring about 754 runs per season, about 2% less than in real life. But it also had about 2% fewer ABs (maybe because it only does offense and does not have extra inning games). Then I put Reyes first and Dunn 4th. The team scored 793.8 runs per season. If it were reversed, it was 791.5. So the difference, although in favor of Reyes batting first, is only 1.8 runs over a season. Having Reyes bat first with an average cleanup hitter, it was 773.36. With Dunn batting leadoff it was 788.27! So having Adam Dunn instead of Jose Reyes as your leadoff hitter would means about 15 more runs per season.
If you go back to my earlier analysis, from the second table, if we just multiply out the impact of Dunn batting first and Reyes batting first, we get 1.52 for Dunn and 1.33 for Reyes. Over 162 games that difference of about .019 is about 31.6 runs!
This all seems to be about tradeoffs. Getting on base versus speed and having a high OBP guy bat lead off versus losing his power if he batted in the middle of the lineup. I am looking for a way to incorporate all those factors in to find the optimal leadoff man. So I tried one more thing. I calculated each guy's impact in batting leadoff (like the way I did using the second table). So each player has a leadoff impact. But even if someone gets a high score there, it might not be a good idea to bat them first since you might lose an even better score or impact from another slot they might bat in. So I found each guy's impact in all nine slots. Then that got subtracted from their leadoff impact.
Barry Bonds, for example, had a leadoff impact of 1.77. His impact in the number 2 slot was 1.72. So he is .05 better batting leadoff than 2nd. He was .23 better number at 1 than number 3. I did that all the way down to the number 9 slot. Here are all of Bonds' differences
That adds up to about 2.4. Then I added up all of those differences for each player and ranked them from highest to lowest. Remember, that I am taking into account not just how good they would be leading off, but how much better (or worse) they would be than batting elsewhere. Below are the top 15 leadoff men from last year, even taking into account what you would lose by not having them bat elsewhere (based on walks, hits, extrabases, SB and CS)
Just to be complete, here are the top 15 (based on walks, hits, extrabases, SB and CS) while not adjusting for how well they would hit elsewhere. It is some of the same players as above,but not identical