Avatar photo

Who’s Ready for Some FOOTBALL (Data)?

If you live in America right now, you’re probably aware that the Super Bowl is about to happen. If you work in technology and happen to live in the Bay Area, as many of our readers do, then you’ve been inundated with this news. Did you know that the 49ers were playing? And that the Giants won the World Series? And wouldn’t it be cool if both the Giants and the 49ers won championships in the same year?

The answer to all these questions, is, of course: OF COURSE! I’m excited to see my 49ers stomp all over Ray Lewis and the Ravens. But rather than get completely distracted by the game and by the huge amounts of gross food I’m going to eat, I thought I’d take a few minutes to crunch through some football data using Keen IO and share some interesting tidbits.

My first step was to find a source of data, which I did at Armchair Analysis. These guys provide a sweet set of data of all NFL plays for seasons dating back to 2000. Since I’m cheap, I opted for the free data set which encompasses plays from 2000–2011. The data is awesome — it not only tells you what happened on the play, but what yard it started on, who was playing, where it was played, what the temperature was, what penalties happened, how far a kick-off was, the year a player was drafted, and more. It’s super cool.

I loaded it into Keen, which was pretty easy once I had denormalized the data. I treated each play as a discrete event. If you want the code I used, let me know on Twitter!

Once that was done, it was time to start crunching (data, not abs — I’m the one doing the analysis, not the one playing, remember?).

Let’s start with a basic thing. How many plays were run in the NFL over the past seasons? It’s trivial to do this with Keen IO:

As you can see, there were just a tad over 518 thousand total plays. Wow! That’s fun, but I only care about the 49ers, so let’s see how many plays they ran on offense. The only thing we have to do is add a quick filter on our data:

Okay, so there were just a tad over 15,700 offensive plays by the 49ers. I wonder what kind of plays they run more often? Let’s see how often they rush and pass. The only thing we have to do to our Keen IO query is add a “group_by” parameter and we’ll group by the “play_type” property:

Interesting. You can see that significantly more pass plays (6,739) were run than rushing plays (5,312). Now I’m curious what the average yardage gained was, broken down by play type. Let’s change our query to do an average and we’ll get the average of the property called “yards_gained”:

Cool! Now we know that the 49ers gained, on average, about 9.1 yards per pass and 4.3 yards per rush. The passing average seems very high, so let’s dig into the data source a bit and figure out why. If you look at it closely, you can see that the “yards_gained” property is only defined on passes that were completed! So our average doesn’t include incomplete passes, basically. Let’s see how many passes were complete and incomplete:

Makes more sense now. Out of the 6,739 passes, only 3,686 were completed. If we take our average of 9.1 yards per pass (for 3,686 plays), now we can calculate an average which includes incompletes as well. Let me bust out my trusty TI-89 here… and the REAL average is 5.0 yards. I can buy that!

I’m going to stop here and turn it over to you. I’ve created a Keen IO account that anybody can use: keennfl@mailinator.com / goniners. Log in with these credentials to use our API Workbench. Here you can start exploring the data I’ve uploaded! Note that I’ve disabled ADDING events to this project, so none of you naughty folks out there can mess up this lovely data set.

P.S. View source on this page to see the JavaScript I wrote to show these charts to you. They’ve all been rendered using our Keen IO JavaScript SDK and visualization libraries!