Sunday, July 31, 2011

Almost Spot On

I'm currently listening to The Essential Kansas, the 70's boy band (I jest) from Topeka. For some reason, I've never bought a Kansas album, and I don't plan to in the near future. Yet I'm listening completely legally, and not paying a dime.

That's because Spotify has finally made it from Europe to the U.S. I read about it in a New York Times article (I just access the free stuff there, too) the other day and decided to try it out. As far as I can tell, the deal is:

  • With a free-as-in-beer account, you can stream any song in Spotify's vast library (and it's vast, if somewhat uneven).
  • For now, you can listen to an unlimited amount of music, only interrupted by two minutes of commercials any hour.
  • After six months, your free account is limited to ten hours of music per month. I read that in the NYT article, good luck in finding it in Spotify's account description.
  • Of course, there are other plans that let you stream unlimited amounts of music, without ads, for a price.
  • You have to use Spotify's music player.

That last part was almost a deal breaker. (Can you break a free deal?) There is no generally available Linux client. There is an alpha version of a Linux player, but it doesn't seem to have been worked on for a year or so, and it can only be used with the paid accounts, because we haven't found a reliable way to display ads yet.

However, the Windows client runs under Wine, and Spotify gives detailed instructions on how to get it started. You can't play your local MP3 files through Spotify (not sure I'd want to), but the stuff that comes over the web sounds good — according to the NYT article, it's in 160-kbps Ogg Vorbis format.

All right, how is it? Funny you should ask.

The sound quality is good enough for my speakers-in-the-monitor setup. Beyond that I couldn't say. Audio purists will probably find some fault, but the real purists are listening to vinyl anyway. The playlist is extensive, but not complete. OK, I didn't expect the Beatles here, but the Eagles are mostly missing. You do get what seems to be complete coverage of the Rolling Stones, Simon & Garfunkel, Creedence Clearwater Revival, Bob Marley, Peter, Paul & Mary, the Mommas and the Papas, the aforementioned Kansas, Roger Miller, Glen Campbell, Johnny Cash, a limited amount of Dylan, and even Hugh Laurie reading Three Men in a Boat. They have what seems to be the complete Jimmie Rodgers the Elder, but I can't find anybody's version of Jimmie Rodgers the Younger's classic ballad It's Over. This would frustrate me if I was actually paying for this stuff, but since it's all free I really can't complain. It seems as though they're working on it.

Oh yes, one more thing: if you want to sign up for the free account, you need an invite. Apparently paid-for account holders can give you an invite, or you can go to Spotify.com and ask for one. It took about thirty seconds for me to get an email after I asked, but YMMV.

All in all, it seems to be a pretty good service, at least for now. If you put a map of the continental U.S. on a dartboard so that Kansas is the bull's eye, Spotify's dart hits at about Oklahoma City. Add the Eagles, more Dylan, and the Beatles (Ha!), and they can hit Holyrood.

Sunday, July 17, 2011

When He's Right, He's Right

Penguin Pete. Opinionated and passionate about it.

And boy, when he's right, he's right.

Monday, July 11, 2011

Baseball After the All Star Break

Some years ago, I did a predictive study on how Major League Baseball teams would rank at the end of the season, based on their records at the All Star Break and the Pythagorean projection of future wins, based on the runs scored and allowed by each team.

It didn't work all that well. In particular, I predicted that Boston would win the AL East pennant, and Washington would be the NL Wild Card. That sorta didn't happen.

Nevertheless, I'll try again. Here's the table, based on the MLB standings at the All Star Break. The method is the same as last time, so you can read all about it there.

American League
East  W   L  PCT Place GB  RS   RA  Pyth GL PW PL TW TL PCT GB
New York Yankees 53 35 0.602 2 1 455 334 0.637 74 47.14 26.86 100.14 61.86 0.618 0.00
Boston 55 35 0.611 1 0 482 371 0.617 72 44.42 27.58 99.42 62.58 0.614 0.73
Tampa Bay 49 41 0.544 3 6 380 343 0.546 72 39.35 32.65 88.35 73.65 0.545 11.80
Toronto 45 47 0.489 4 11 426 416 0.511 70 35.76 34.24 80.76 81.24 0.498 19.39
Baltimore 36 52 0.409 5 18 355 454 0.390 74 28.85 45.15 64.85 97.15 0.400 35.29
Central  W   L  PCT Place GB  RS   RA  Pyth GL PW PL TW TL PCT GB
Cleveland 47 42 0.528 2 0.5 386 382 0.505 73 36.85 36.15 83.85 78.15 0.518 0.00
Detroit 49 43 0.533 1 0 413 421 0.491 70 34.39 35.61 83.39 78.61 0.515 0.46
Chicago White Sox 44 48 0.478 3 5 366 383 0.479 70 33.55 36.45 77.55 84.45 0.479 6.29
Minnesota 41 48 0.461 4 6.5 347 414 0.420 73 30.69 42.31 71.69 90.31 0.443 12.16
Kansas City 37 54 0.407 5 11.5 402 449 0.450 71 31.94 39.06 68.94 93.06 0.426 14.91
West  W   L  PCT Place GB  RS   RA  Pyth GL PW PL TW TL PCT GB
Texas 51 41 0.554 1 0 457 404 0.556 70 38.91 31.09 89.91 72.09 0.555 0.00
Los Angeles Angels 50 42 0.543 2 1 355 330 0.533 70 37.32 32.68 87.32 74.68 0.539 2.59
Seattle 43 48 0.473 3 7.5 301 319 0.474 71 33.63 37.37 76.63 85.37 0.473 13.28
Oakland 39 53 0.424 4 12 315 339 0.467 70 32.66 37.34 71.66 90.34 0.442 18.24
National League
East  W   L  PCT Place GB  RS   RA  Pyth GL PW PL TW TL PCT GB
Philadelphia 57 34 0.626 1 0 384 295 0.618 71 43.86 27.14 100.86 61.14 0.623 0.00
Atlanta 54 38 0.587 2 3.5 365 312 0.571 70 39.96 30.04 93.96 68.04 0.580 6.89
New York Mets 46 45 0.505 3 11 399 388 0.513 71 36.40 34.60 82.40 79.60 0.509 18.46
Washington 46 46 0.500 4 11.5 352 354 0.497 70 34.82 35.18 80.82 81.18 0.499 20.04
Florida 43 48 0.473 5 14 352 396 0.447 71 31.71 39.29 74.71 87.29 0.461 26.15
Central  W   L  PCT Place GB  RS   RA  Pyth GL PW PL TW TL PCT GB
St. Louis 49 43 0.533 2 0 433 407 0.528 70 36.97 33.03 85.97 76.03 0.531 0.00
Milwaukee 49 43 0.533 1 0 405 406 0.499 70 34.92 35.08 83.92 78.08 0.518 2.05
Pittsburgh 47 43 0.522 3 1 354 346 0.510 72 36.75 35.25 83.75 78.25 0.517 2.22
Cincinnati 45 47 0.489 4 4 437 408 0.531 70 37.18 32.82 82.18 79.82 0.507 3.79
Chicago Cubs 37 55 0.402 5 12 375 459 0.409 70 28.63 41.37 65.63 96.37 0.405 20.34
Houston 30 62 0.326 6 19 358 464 0.384 70 26.89 43.11 56.89 105.11 0.351 29.08
West  W   L  PCT Place GB  RS   RA  Pyth GL PW PL TW TL PCT GB
San Francisco 52 40 0.565 1 0 332 322 0.514 70 35.97 34.03 87.97 74.03 0.543 0.00
Arizona 49 43 0.533 2 3 416 407 0.510 70 35.70 34.30 84.70 77.30 0.523 3.28
Colorado 43 48 0.473 3 8.5 395 407 0.486 71 34.53 36.47 77.53 84.47 0.479 10.44
Los Angeles Dodgers 41 51 0.446 4 11 340 373 0.458 70 32.06 37.94 73.06 88.94 0.451 14.92
San Diego 40 52 0.435 5 12 304 338 0.452 70 31.63 38.37 71.63 90.37 0.442 16.34

Abbreviations:

  • W: Current team wins
  • L: Current team loses
  • PCT: Winning rate
  • Place: Current place in standings
  • GB: Games Behind
  • RS: Total Runs scored by team
  • RA: Total Runs allowed by team
  • Pyth: Pythagorean expected win rate. Following MLB, I used an exponent of 1.82 rather than the original James value of 2. It doesn't make a lot of difference, and didn't change the order.
  • GL: Games left in season for the team
  • PW: Projected wins in remainder of season, assuming they win at the Pythagorean rate
  • PL: Projected Pythagorean loses
  • TW: Total wins, current + projected Pythagorean
  • TL: Total loses
  • PCT: Projected final winning ratio
  • GB: Projected final games behind

OK, not a lot of changes going on. Despite an anemic offense, San Francisco's fantastic pitching will keep them in first in the NL West. Philadelphia will win the NL East going away, even though Atlanta wins the NL Wild Card. Texas will hang on in the AL West.

There are a few predicted swaps, highlighted in yellow: St. Louis will pull ahead of Milwaukee. And Cleveland will (yawn) edge out Detroit. Surprisingly, the only changes occur at the top, which probably says something about competitive balance in MLB.

And, finally, Red Sox will be the AL Wild Card. Which means …

Frak

Saturday, July 02, 2011

The Home Team Wins Most Ties (In Baseball, Anyway)

I've been playing around with Retrosheet to see how often a baseball team wins a game if it's, say, five runs ahead at the end of the fourth inning. I plan to get that up sometime during this long weekend, but while doing the study I found another interesting result.

Retrosheet's Play-by-Play files, along with the cwevent program from Chadwick, let you extract all sorts of information from almost every MLB game played between 1950-2010. From that data I extracted every game that was tied at the end of a half-inning, and figured out who eventually won. Then I counted up the number of times the home team won for each half-inning. The results are shown below:

Probability Home Team wins baseball game if it is tied at the end of a half-inning

Click on graph to see a larger figure

The black diamond represents the situation at the start of the game, the red diamonds the situation where the game is tied in the middle of the inning, and the blue diamonds when it's tied at the end of an inning. We'll get to the error bars in a minute.

So what is all of this? Well, at the beginning of a game the score is obviously tied, so that should be part of the study. So if we look at all 115,748 games in the database, we find:

  • The Home Team won 62,418 games,
  • the Visiting Team won 53,192 games, and
  • there were 138 games that were tied when the game was called.

If we throw out the ties, then the Home Team won 53.990% of the games that went to a decision. That's the black diamond at the far left of the graph. The 54% win rate is baseball's version of Home Field Advantage, and it has been very constant:

Probability Home Team wins baseball game in a given year

Click on graph to see a larger figure

I performed the same calculation for all the games which were tied at the end of a half inning. For example, if the game is tied at the end of the fifth, the home team has a 52.0% chance of eventually winning the game. Tied after the top of the sixth? It's up to 60%.

So what do the error bars represent? Basically they give you an idea of the number of games in the sample. Suppose that in a given game the home team wins with probability p. Then in an N game sample the probability that the home team wins n games follows the binomial distribution, e.g.

             N!        n      N-n
P(N,n) = -----------  p  (1-p)
          n! (N-n)!

If we look at a large number of N-game samples, then we'll find that on average the home team will win N p games, which makes sense. The standard deviation will be [N p (1-p)]½. Since the graph normalized everything by the number of games played, the error bars are the standard deviation divided by N, or [p (1-p)/N]½. Since most values of p are between 0.5 and 0.7, wider error bars basically tell you that fewer games have gotten to that point. And when the error bars get really wide, as they do after the fourteenth inning or so, it says there aren't enough statistics available to give you meaningful information.

What does it all mean, you ask? Well, first it says that the home team advantage is real. Why there is a home field advantage is another question, and there is not enough information here to answer that question.

Then there's the observation that the home team has a larger advantage if the game is tied in the middle of the inning than it does if the game is tied at the end of an inning. That's just common sense. In the middle of the fourth inning, the visiting team has five more innings at the plate. The home team has six — five for sure, and one more if they need it. This isn't the home team bats last advantage, it's the home team gets one more at-bat than the visitors advantage, not the same thing.

Next, we see that if the game is tied at the end of an inning, the home team's advantage decreases slightly, so that at the end of eight innings it's only 51.94%. Presumably that's because the home team does have some advantage in being at home, but as the game progresses they have less and less chance to use that advantage. After the fifth inning the home team's advantage oscillates around 52%, down from the 54% advantage they had at the start of the game.

Indeed, the fact that the blue dots go down from innings 1-4 suggests that the home team bats last advantage isn't worth a whole lot. If it was, you'd expect the advantage to be greater in tie-game situations as the game wears on, because that last at-bat becomes a larger and larger proportion of what's left of the game.

Finally, there is that dip in the red diamonds between the first and second inning. If the game is tied going into the bottom of the first, meaning that the visiting team didn't score, then the home team will win 59.15% of the time, with a standard deviation of 0.17%. If the game is tied going into the bottom of the second, however, then the home team only has a 58.01% chance of winning, with σ = 0.22%. That's a five-σ change in the probability, which I would think is statistically significant. The win rate is pretty much constant in the third inning ( 58.16% ± 0.27%) and then starts going up, as you'd expect, since the home team has proportionately more at-bats than the visitors at the middle of an inning.

Why is this so? I have no idea. All I can say is that if your a visiting baseball team, it's better to be tied with the home team in the middle of the second inning than it is to be tied before the home team comes to bat. And you better be ahead by the middle of the third.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at www.retrosheet.org.