I have been interested in the game flow and dynamics of NBA games for a while, and so have decided to start a series of blog posts that will hopefully culminate in a web app that processes and displays game flows for present and historical games. Before that, however, I wanted to sink my teeth into the data and run a few analysis that may be of interest.
To start of with, I wrote a Python script that scraped all available play-by-play data for all games played between the seasons 2001-2014. The output was stored in a SQL database of ~450Mb, which translate to a 7,364,787 x 8 matrix when read in memory. With the data now available for analysis, we can proceed toowards producing some vizualizations of the data. First, I set out to explore the score differential for each team, which I figured would be an interesting proxy for overall performance. In this blog post, I looked at the average number of points that teams were ahead/behind when they were playing at home or away.
Here, the score differential attributed to a team during any given season was simply calculated as the average score difference observed at every scoring event. More formally, we can write this as:
where is the total number of scoring events played by a team at home or away, and are the scores for the home and away team at scoring event , respectively. Therefore, a positive indicates that a team tends to be in the lead, while a negative indicates that teams tend to be behind. The first chart below shows the average score differential for teams playing on their home-court.
Average score differential for teams when playing at home
One cool thing about these plots is that they allow to quickly see which teams have been consistently excellent at home. For example, we can see that the San Antonio Spurs and Dallas Mavericks are two teams that have achieved positive score differentials all the way from 2000 to 2014 when playing at home. Could it be a Texas thing?
Strong performance at home for the Spurs and Mavericks
Next, we can look at the score differentials achieved by teams playing away. In this case, we see that San Antonio is again the team that plays the best when away from home. Of the historically worst teams out there, we have New York, Toronto, Golden State and Utah.
Average score differential for teams playing away
Worst performing teams away
The charts below show the advantages that steams from teams playing on their home-court. Surisingly, the average score differential per season (both away and at home) was not correlated to how many wins were achieved by each team. This is clearly shown in the two plots below. I would guess that this is the due to the fact that many teams go on signifcant runs that may actually superseed the fact that they are not necessarily in the lead all the time.
Correlation between home score differential and win shares
Correlation between away score differential and win shares
In this first part, I have looked at the average number of points that teams lead or trail across all games played in a season. I have actually seen other examples online of gameflow charts that typically display score differential during the course of a game. While this method of looking a game flow is interesting, it is somewhat limited. In the upcoming part of this blog series, I will look at scoring streaks, proportion of time in the lead, and largest leads.