Get Play-By-Play NBA SportVU Data

SportVU is a video system made by STATS Inc, used by teams to track movement of players and the ball on the court. I looked, but couldn’t find anywhere describing how to get play-by-play data, so did some investgating myself. The great has every game here. Clicking on any of the linked values in the table brings up a popup with an option ‘Movement’:

NBA play-by-play SportVU data

Inspecting the network request that goes out when clicking on Movement shows us that fires off an AJAX request to We can infer that eventid=2 refers to field goal attempts, and gameid=0021500003 likely refers to the third game in 2015. Indeed, changing the gameid to 0021400003 returns data for a game dated October 28, 2014, between the Rockets and Lakers.

Once you get the eventid of the type you want, you could theoretically enumerate all the games in a season this way, or across multiple seasons. The response comes in JSON format, and Savvas Tjortjoglou nicely detailed that in this post. Happy number crunching!

Unshorten URLs with

Obfuscated urls stink. Being able to hover over a link and see where it goes is a tremendous feature of any modern browser. Use unshorten on your command line to see where they go. The project is at github. Put somewhere on your path and make it executable. You don’t even have to leave the .py on the end!

$ git clone
$ chmod u+x
$ mv /usr/local/bin/unshorten
$ unshorten

Hopefully it’s useful.

Multi-line statements with PLY

I’ve been experimenting with PLY and was encountering an obscure error that was surprisingly Google-resistant. I’ve been building up a language and a REPL to interpret commands. When trying to parse a statement spanning multiple lines, I would get an error {TypeError}Can't convert 'type' object to str implicitly. Debugging revealed the parser was running into an EOF. The usual sources online didn’t appear to have any information about it, either.

It turns out the problem wasn’t PLY, but Python’s input(). It only reads one line at a time. So even if you paste in

def foo(bar):
    return bar + 10

the method will only pass along the string 'def foo(bar):. As a workaround, try something like this:

def evaluate(string):
    # your eval function

parser = yacc.yacc()

while True:
    string = ''
    line = input('reap> ')
    while line != '':
        string += line
        line = input()


There’s your read-eval-print loop, spanning multiple lines of user input. After your block is finished, hit enter twice to finish the block, and you should be good to go.

US states and words with no letters in common

“Ohio is the only state not to share a letter with the word mackerel” my buddy tells me.

Of course, I need to find out what other words don’t share any letters with states. So I start hunting for a large English dictionary. It turns out the Brown corpus, included in NLTK has ~1 million words. So I whip up a python script to figure out what state-word combos pass. The code lives here.

Spoiler alert: apparently there’s 38969. At least. That’s just words from the particular corpus I used. Ohio is the most popular (least popular?) state, clocking in at 1085. Next closest is Mississippi, at 678. Third is Alabama with 599. Ohio doesn’t have any A’s or E’s, so that helps.

NBA winning percentages

During a Milwaukee Bucks broadcast on February 11, John McGlocklin mentioned something to the effect of “the first team to get to 100 points usually wins.” Intuitively, this makes some sense. If you’re the first team to a given score, you have the lead at that point. 100 points is usually a late-game score, and having the lead near the end of the game makes it more likely you are going to win.

I was curious as to whether the data backed this up. I collected play-by-play scoring data for the 2013-2014 season, and ran some analysis on it. First, the number of times the first team to X points won or lost:
point totals

That’s a little hard to read in the higher-scoring (interesting) portion of the graph. Here’s the winning percentages plotted as a function of score:
point ratios

Some notes:

  • There were 1319 games played, including playoffs.
  • Teams win at a .932 clip for scoring 100 first. Not a bad rule of thumb!
  • On the other hand: at no point having the lead gives you a sub-.500 winning percentage. So you could also say “the first team to 1 point usually wins”.
  • The worst is 1 or 2 points, yielding wins at .547. Even getting to 3 first improves your chances a bit, an extra percent and a half, all the way up to .562.
  • The highest point total 145. The Rockets beat Lakers on the road on April 8. The game didn’t even go to overtime!
  • There is a curious bump in the ratio at 121. This is an artifact of not many teams scoring that many and still losing - only 3 to 5 losses in that range. Makes for a noisy signal.
  • No team scoring 128 or more lost.

The Bucks were first to 100 that night, and beat the Kings 111-103 :)

Thanks to for the play-by-play data.

← Newer Page 1 of 4