Saturday, October 10, 2009

A Pretty Good Average

June 12, 2011: This post completely fraks up the calculation of David Smyth's Base Runs statistic. I've now fixed that, and added the data from 2009 and 2010. You can find all the updated tables here.

I'm a big fan of Sabermetrics, the use of statistical information to understand how baseball teams win games. Part of this is my love for the game, part my natural tilt toward numerical data, and part is that I've always enjoyed reading Bill James's work (full disclosure: he and I overlapped at KU, though we never met). Not to mention the fact that, from my desk, I can see several editions of both The Baseball Encyclopedia and Total Baseball.

But … in the old days, we judged batters by average (> 0.300 is good), home runs (> 30), and runs batted in (> 100). That was it. These stats have some problems: batting average doesn't tell you how many times a guy gets on base by walking, you can only bat in runs when your teammates are already on base, and as for home runs — well, OK, home runs are a pretty fair way to determine part of a players value.

The inadequacy of the traditional trio of AVG/HR/RBI led to the development of new measures for player performance: On-base percentage, slugging average, runs created, etc., etc. The problem is that off the top of my head I don't know what's a good number for any of these statistics. OK, a slugging percentage of 0.900 is better than 0.400, but is a player who slugs 0.500 a power hitter, or just Joe Shlabotnik?

This post is an attempt to get a handle on some of those questions. I've taken individual player data from 1995-2008 from Sean Lahman's Baseball Archive Database, selected those players who were eligible for batting titles under rule 10.22a (a minimum of 3.1 plate appearances per scheduled game), and determined averages and standard deviations for each category for each year in the database.

Limiting the study to post-lockout seasons means that we're looking at the modern game, i.e., small parks, large batters, lots of players attempting home runs, and consequently lots of players striking out. The restriction to players eligible for the batting title means that we're only looking a everyday players, i.e., those that some organization has deemed worthy be a starter on a major league team. This means that the average values we find will be somewhat higher than the true average over all major league players. That's OK, we're looking for a definition of good, not average.

Oh, one more thing. The database treats players that were traded during the season as two separate entities. So, for example, it has two entries for Manny Ramirez, Boston 2008, and Manny Ramirez, Los Angeles, 2008. Neither was eligible for the batting title, so the 2008 Manny isn't these averages.

To show you how it works, let's look at the traditional statistics from these years:

Year # Players Batting Avg. Home Runs Runs Batted In
    Ave. σ Ave. σ Ave. σ
1995 125 0.286 0.028 17.83 10.53 73.40 24.04
1996 137 0.288 0.027 21.36 12.81 84.99 29.39
1997 134 0.283 0.029 19.97 11.29 80.96 26.26
1998 148 0.287 0.026 20.93 13.01 82.50 28.44
1999 152 0.290 0.026 22.02 12.61 85.40 28.49
2000 150 0.288 0.030 22.01 11.73 85.70 27.73
2001 146 0.282 0.028 21.62 13.40 81.89 29.11
2002 143 0.278 0.027 20.78 11.22 78.94 24.13
2003 146 0.281 0.027 19.86 11.06 78.86 24.17
2004 154 0.284 0.026 20.79 11.01 79.12 23.58
2005 143 0.280 0.022 19.97 10.94 78.04 23.31
2006 146 0.286 0.024 20.46 12.18 80.95 25.70
2007 156 0.283 0.028 19.13 9.92 79.76 23.37
2008 137 0.280 0.025 19.62 9.76 78.03 22.90
All 2016 0.284 0.027 20.49 11.61 80.70 25.99

So this tells us that since the year that must not be named, on average about 140 players played pretty much every day in a given year. They hit, on average, 0.284, with 20 home runs and 81 RBIs. The standard deviation for each statistic is given under the label σ, and for completeness we've added in the total for all regular-player-years since 1995.

OK, those numbers are a little bit lower than the ideal numbers I listed above (0.300/30/100), but remember we aren't just looking at a superstar, we're looking at all the guys who are good enough to play every day. And, really, if you didn't know anything else about a player, wouldn't you be inclined to take a guy whose stats were (0.284/20/81)?

But, as I noted before, Avg/HR/RBI isn't really a great determination of player worth. In recent years, the consensus has been that we're better off looking at how often a player gets on base (his on base percentage, or OBP), slugging percentage (SLG), and the sum of those two, On Base Plus Slugging, (OPS), where

OPS = OBP + SLG .

After all, if you can't get on base, you can't score, (and if you can't get on base, you'll only get 27 at bats in a game), and if you don't have some power, represented by SLG > AVG, then you won't drive in the runners that do get on. OPS, well, on its own OPS doesn't mean very much, but it is useful as a one-number statistic representing the worth of a player.

Here's the results for the players in our study:

Year # Players On Base Slugging OPS
    Ave. σ Ave. σ Ave. σ
1995 125 0.360 0.036 0.460 0.075 0.819 0.100
1996 137 0.361 0.041 0.472 0.085 0.833 0.117
1997 134 0.354 0.041 0.460 0.075 0.815 0.108
1998 148 0.355 0.036 0.467 0.081 0.822 0.108
1999 152 0.364 0.039 0.479 0.077 0.843 0.107
2000 150 0.362 0.041 0.479 0.085 0.841 0.119
2001 146 0.353 0.040 0.470 0.092 0.823 0.127
2002 143 0.351 0.044 0.461 0.080 0.812 0.118
2003 146 0.351 0.039 0.461 0.075 0.812 0.108
2004 154 0.355 0.039 0.468 0.073 0.822 0.104
2005 143 0.348 0.034 0.458 0.069 0.806 0.095
2006 146 0.355 0.035 0.469 0.073 0.824 0.100
2007 156 0.354 0.036 0.458 0.066 0.812 0.095
2008 137 0.350 0.033 0.457 0.065 0.807 0.089
All2016 0.355 0.038 0.466 0.077 0.821 0.107

That means, of course, that the average regular player gets on base 35.5% of the time, slugs just under 0.500, and OPS is over 0.800. It also means that anybody with an OBP/SLG/OPS line better than 0.355/0.466/0.821 is a pretty good player.

While AVG/HR/RBI and OPB/SLG/OPS tell a good deal about a players ability, that hasn't stopped people from generating other statistics. The idea that one number can represent a player's ability is particularly popular. The most famous of these, of course, is Bill James's Runs Created (RC) (we'll use the technical version). Another, similar statistic is David Smyth's Base Runs (BsR), which has the advantage that it will give the correct number of runs for extreme cases, such as a team that only hits home runs.

Now both Runs Created and Base Runs are cumulative stats, that is, the more games you play, the higher these stats get. To get a better idea of what these look like in a per game situation, I computed the number of outs made by a player, where

Outs = At Bats - Hits + Caught Stealing + Sacrifice Hits + Sacrifice Flies + Grounded into Double Play

and assumed that 24 outs made a game. So, for example, a player who made 480 outs in a season would have played 20 games, if he played by himself. So I'd divide his Runs Created and Base Runs by 20 to get RC/Game and BsR/Game, respectively. (For what it's worth, the players in this study averaged 413 outs per season, or 17 games worth of outs, at 24 outs/game.)

Year # Players Runs Created RC/Game Base Runs BsR/Game
    Avg σ Avg σ Avg σ Avg σ
1995 125 101.88 24.49 6.587 1.584 86.41 22.60 5.584 1.459
1996 137 116.08 30.93 6.739 1.838 99.16 28.23 5.767 1.741
1997 134 109.42 28.46 6.394 1.725 93.94 25.14 5.496 1.556
1998 148 113.96 27.06 6.535 1.620 96.94 26.19 5.556 1.543
1999 152 117.44 29.20 6.888 1.735 100.54 26.80 5.892 1.579
2000 150 116.83 31.93 6.851 1.964 100.61 28.39 5.909 1.786
2001 146 110.87 31.86 6.457 1.899 95.71 31.06 5.584 1.905
2002 143 106.99 28.27 6.251 1.972 93.73 27.35 5.499 1.978
2003 146 108.87 28.24 6.375 1.735 93.02 26.00 5.451 1.653
2004 154 111.54 27.71 6.516 1.908 95.62 25.71 5.606 1.939
2005 143 107.46 25.60 6.176 1.410 92.19 23.23 5.318 1.368
2006 146 113.09 25.25 6.542 1.432 96.13 24.18 5.570 1.455
2007 156 110.98 27.57 6.442 1.544 93.58 23.37 5.437 1.350
2008 137 108.99 25.17 6.284 1.462 92.95 21.49 5.367 1.291
All 2016 111.18 28.32 6.505 1.722 95.16 26.02 5.576 1.636

So our hypothetical average everyday player is responsible for around 100 runs per year. A team of these players would score about 6 runs per game.

And that's it. Hopefully this exercise gives us an idea of what an average everyday major league ballplayer looks like.

0 comments: