June 12, 2011: This post completely fraks up the calculation of David Smyth's Base Runs statistic. I've now fixed that, and added the data from 2009 and 2010. You can find all the updated tables here.
I'm a big fan of Sabermetrics, the use of statistical information to understand how baseball teams win games. Part of this is my love for the game, part my natural tilt toward numerical data, and part is that I've always enjoyed reading Bill James's work (full disclosure: he and I overlapped at KU, though we never met). Not to mention the fact that, from my desk, I can see several editions of both The Baseball Encyclopedia and Total Baseball.
But … in the old days, we judged batters by average (> 0.300 is good), home runs (> 30), and runs batted in (> 100). That was it. These stats have some problems: batting average doesn't tell you how many times a guy gets on base by walking, you can only bat in runs when your teammates are already on base, and as for home runs — well, OK, home runs are a pretty fair way to determine part of a players value.
The inadequacy of the traditional trio of AVG/HR/RBI led to the development of new measures for player performance: On-base percentage, slugging average
, runs created, etc., etc. The problem is that off the top of my head I don't know what's a good number for any of these statistics. OK, a slugging percentage of 0.900 is better than 0.400, but is a player who slugs 0.500 a power hitter, or just Joe Shlabotnik?
This post is an attempt to get a handle on some of those questions. I've taken individual player data from 1995-2008 from Sean Lahman's Baseball Archive Database, selected those players who were eligible for batting titles under rule 10.22a (a minimum of 3.1 plate appearances per scheduled game), and determined averages and standard deviations for each category for each year in the database.
Limiting the study to post-lockout seasons means that we're looking at the modern
game, i.e., small parks, large batters, lots of players attempting home runs, and consequently lots of players striking out. The restriction to players eligible for the batting title means that we're only looking a everyday players, i.e., those that some organization has deemed worthy be a starter on a major league team. This means that the average
values we find will be somewhat higher than the true average over all major league players. That's OK, we're looking for a definition of good, not average.
Oh, one more thing. The database treats players that were traded during the season as two separate entities. So, for example, it has two entries for Manny Ramirez, Boston 2008, and Manny Ramirez, Los Angeles, 2008. Neither was eligible for the batting title, so the 2008 Manny isn't these averages.
To show you how it works, let's look at the traditional
statistics from these years:
Year | # Players | Batting Avg. | Home Runs | Runs Batted In | |||
---|---|---|---|---|---|---|---|
Ave. | σ | Ave. | σ | Ave. | σ | ||
1995 | 125 | 0.286 | 0.028 | 17.83 | 10.53 | 73.40 | 24.04 |
1996 | 137 | 0.288 | 0.027 | 21.36 | 12.81 | 84.99 | 29.39 |
1997 | 134 | 0.283 | 0.029 | 19.97 | 11.29 | 80.96 | 26.26 |
1998 | 148 | 0.287 | 0.026 | 20.93 | 13.01 | 82.50 | 28.44 |
1999 | 152 | 0.290 | 0.026 | 22.02 | 12.61 | 85.40 | 28.49 |
2000 | 150 | 0.288 | 0.030 | 22.01 | 11.73 | 85.70 | 27.73 |
2001 | 146 | 0.282 | 0.028 | 21.62 | 13.40 | 81.89 | 29.11 |
2002 | 143 | 0.278 | 0.027 | 20.78 | 11.22 | 78.94 | 24.13 |
2003 | 146 | 0.281 | 0.027 | 19.86 | 11.06 | 78.86 | 24.17 |
2004 | 154 | 0.284 | 0.026 | 20.79 | 11.01 | 79.12 | 23.58 |
2005 | 143 | 0.280 | 0.022 | 19.97 | 10.94 | 78.04 | 23.31 |
2006 | 146 | 0.286 | 0.024 | 20.46 | 12.18 | 80.95 | 25.70 |
2007 | 156 | 0.283 | 0.028 | 19.13 | 9.92 | 79.76 | 23.37 |
2008 | 137 | 0.280 | 0.025 | 19.62 | 9.76 | 78.03 | 22.90 |
All | 2016 | 0.284 | 0.027 | 20.49 | 11.61 | 80.70 | 25.99 |
So this tells us that since the year that must not be named, on average about 140 players played pretty much every day in a given year. They hit, on average, 0.284, with 20 home runs and 81 RBIs. The standard deviation for each statistic is given under the label σ, and for completeness we've added in the total for all regular-player-years since 1995.
OK, those numbers are a little bit lower than the ideal
numbers I listed above (0.300/30/100), but remember we aren't just looking at a superstar, we're looking at all the guys who are good enough to play every day. And, really, if you didn't know anything else about a player, wouldn't you be inclined to take a guy whose stats were (0.284/20/81)?
But, as I noted before, Avg/HR/RBI isn't really a great determination of player worth. In recent years, the consensus has been that we're better off looking at how often a player gets on base (his on base percentage, or OBP), slugging percentage
(SLG), and the sum of those two, On Base Plus Slugging, (OPS), where
OPS = OBP + SLG .
After all, if you can't get on base, you can't score, (and if you can't get on base, you'll only get 27 at bats in a game), and if you don't have some power, represented by SLG > AVG, then you won't drive in the runners that do get on. OPS, well, on its own OPS doesn't mean very much, but it is useful as a one-number statistic representing the worth of a player.
Here's the results for the players in our study:
Year | # Players | On Base | Slugging | OPS | |||
---|---|---|---|---|---|---|---|
Ave. | σ | Ave. | σ | Ave. | σ | ||
1995 | 125 | 0.360 | 0.036 | 0.460 | 0.075 | 0.819 | 0.100 |
1996 | 137 | 0.361 | 0.041 | 0.472 | 0.085 | 0.833 | 0.117 |
1997 | 134 | 0.354 | 0.041 | 0.460 | 0.075 | 0.815 | 0.108 |
1998 | 148 | 0.355 | 0.036 | 0.467 | 0.081 | 0.822 | 0.108 |
1999 | 152 | 0.364 | 0.039 | 0.479 | 0.077 | 0.843 | 0.107 |
2000 | 150 | 0.362 | 0.041 | 0.479 | 0.085 | 0.841 | 0.119 |
2001 | 146 | 0.353 | 0.040 | 0.470 | 0.092 | 0.823 | 0.127 |
2002 | 143 | 0.351 | 0.044 | 0.461 | 0.080 | 0.812 | 0.118 |
2003 | 146 | 0.351 | 0.039 | 0.461 | 0.075 | 0.812 | 0.108 |
2004 | 154 | 0.355 | 0.039 | 0.468 | 0.073 | 0.822 | 0.104 |
2005 | 143 | 0.348 | 0.034 | 0.458 | 0.069 | 0.806 | 0.095 |
2006 | 146 | 0.355 | 0.035 | 0.469 | 0.073 | 0.824 | 0.100 |
2007 | 156 | 0.354 | 0.036 | 0.458 | 0.066 | 0.812 | 0.095 |
2008 | 137 | 0.350 | 0.033 | 0.457 | 0.065 | 0.807 | 0.089 |
All | 2016 | 0.355 | 0.038 | 0.466 | 0.077 | 0.821 | 0.107 |
That means, of course, that the average regular player gets on base 35.5% of the time, slugs just under 0.500, and OPS is over 0.800. It also means that anybody with an OBP/SLG/OPS line better than 0.355/0.466/0.821 is a pretty good player.
While AVG/HR/RBI and OPB/SLG/OPS tell a good deal about a players ability, that hasn't stopped people from generating other statistics. The idea that one number can represent a player's ability is particularly popular. The most famous of these, of course, is Bill James's Runs Created (RC) (we'll use the technical version
). Another, similar statistic is David Smyth's Base Runs (BsR), which has the advantage that it will give the correct
number of runs for extreme cases, such as a team that only hits home runs.
Now both Runs Created and Base Runs are cumulative stats, that is, the more games you play, the higher these stats get. To get a better idea of what these look like in a per game situation, I computed the number of outs made by a player, where
Outs = At Bats - Hits + Caught Stealing + Sacrifice Hits + Sacrifice Flies + Grounded into Double Play
and assumed that 24 outs made a game. So, for example, a player who made 480 outs in a season would have played 20 games, if he played by himself. So I'd divide his Runs Created and Base Runs by 20 to get RC/Game and BsR/Game, respectively. (For what it's worth, the players in this study averaged 413 outs per season, or 17 games worth of outs, at 24 outs/game.)
Year | # Players | Runs Created | RC/Game | Base Runs | BsR/Game | ||||
---|---|---|---|---|---|---|---|---|---|
Avg | σ | Avg | σ | Avg | σ | Avg | σ | ||
1995 | 125 | 101.88 | 24.49 | 6.587 | 1.584 | 86.41 | 22.60 | 5.584 | 1.459 |
1996 | 137 | 116.08 | 30.93 | 6.739 | 1.838 | 99.16 | 28.23 | 5.767 | 1.741 |
1997 | 134 | 109.42 | 28.46 | 6.394 | 1.725 | 93.94 | 25.14 | 5.496 | 1.556 |
1998 | 148 | 113.96 | 27.06 | 6.535 | 1.620 | 96.94 | 26.19 | 5.556 | 1.543 |
1999 | 152 | 117.44 | 29.20 | 6.888 | 1.735 | 100.54 | 26.80 | 5.892 | 1.579 |
2000 | 150 | 116.83 | 31.93 | 6.851 | 1.964 | 100.61 | 28.39 | 5.909 | 1.786 |
2001 | 146 | 110.87 | 31.86 | 6.457 | 1.899 | 95.71 | 31.06 | 5.584 | 1.905 |
2002 | 143 | 106.99 | 28.27 | 6.251 | 1.972 | 93.73 | 27.35 | 5.499 | 1.978 |
2003 | 146 | 108.87 | 28.24 | 6.375 | 1.735 | 93.02 | 26.00 | 5.451 | 1.653 |
2004 | 154 | 111.54 | 27.71 | 6.516 | 1.908 | 95.62 | 25.71 | 5.606 | 1.939 |
2005 | 143 | 107.46 | 25.60 | 6.176 | 1.410 | 92.19 | 23.23 | 5.318 | 1.368 |
2006 | 146 | 113.09 | 25.25 | 6.542 | 1.432 | 96.13 | 24.18 | 5.570 | 1.455 |
2007 | 156 | 110.98 | 27.57 | 6.442 | 1.544 | 93.58 | 23.37 | 5.437 | 1.350 |
2008 | 137 | 108.99 | 25.17 | 6.284 | 1.462 | 92.95 | 21.49 | 5.367 | 1.291 |
All | 2016 | 111.18 | 28.32 | 6.505 | 1.722 | 95.16 | 26.02 | 5.576 | 1.636 |
So our hypothetical average everyday player is responsible for around 100 runs per year. A team of these players would score about 6 runs per game.
And that's it. Hopefully this exercise gives us an idea of what an average
everyday major league ballplayer looks like.
0 comments:
Post a Comment