Avatar photo

Analyzing your Keen data with Jupyter Notebooks

At Keen, we provide the Explorer to help our customers quickly analyze and visualize their data. However, sometimes we can’t analyze or visualize data the way we want to using the Explorer. When I run into this problem when I’m analyzing Keen data, I turn to Python.

One of the tools I use regularly to analyze our own data is Jupyter Notebook, a browser application that allows data scientists to run Python code in an interactive environment and create inline visualizations. I use this tool often because I can quickly visualize and tweak multiple queries at once using Python, and I can also easily share those results with others. If my teammates need a slightly different analysis, they can easily copy my example and modify it to their own needs. Additionally, Jupyter Notebook is a great way to document how we’re analyzing our data.

Over the next few weeks, I’m going to share some of the ways I use Python and Jupyter Notebook to analyze and visualize my data. In this post, we’re going focus on setting up Jupyter Notebook and making queries with Keen’s Python client.

Setup

Follow the installation instructions on the Jupyter Notebook. After you’re done installing Jupyter Notebook, use pip to install Keen Python client.

pip install keen

Getting Data

We’re going to be looking at earthquake data that I’ve been storing in a Keen project. Insert jupyter notebook in your command line, run it, and create a new Notebook project. Set up a client object in Python so we can query data using the project id and read key provided in the example below.

import keen from keen.client import KeenClient
KEEN_PROJECT_ID = "572dfdae3831443195b2f30c"
KEEN_READ_KEY = "5de7f166da2e36f6c8617347a7a729cfda6d5413db8d88d7f696b61ddaa4fe1e5cdb7d019de9bb0ac846d91e83cdac01e973585d0fba43fadf92f06a695558b890665da824a0cf6a946ac09f5746c9102d228a1165323fdd0c52c92b80e78eca"
client = KeenClient(
    project_id=KEEN_PROJECT_ID,
    read_key=KEEN_READ_KEY
)

Run a simple query to test if everything’s working. In the example below, we’re getting number of earthquakes from the time range we’ll be working with (October to February).

total_earthquakes = client.count_unique("earthquakes",
    timeframe={
        "start": "2016-10-01 00:00:00+00:00",
        "end": "2017-2-28 00:00:00+00:00"
    },
    target_property="id"
)
# 43782

That’s it! We can run all the types of queries in our Python environment. For instance, if we wanted the number of earthquakes by day, all we need to do is add a line to the query above:

earthquakes_by_day = client.count_unique("earthquakes",
    timeframe={
        "start": "2016-10-01 00:00:00+00:00",
        "end": "2017-2-28 00:00:00+00:00"
    },
    target_property="id",
    interval="daily"
)
#print(earthquakes_by_day)

Because we’re just dealing with Python objects, we can just use Python code to quickly get us answers about our data, like minimum and maximum number of earthquakes per day.

print(max(earthquakes_by_day, key=lambda x: x["value"]))
print(min(earthquakes_by_day, key=lambda x: x["value"]))

We’re only scratching the surface of what we can do with Python and Keen IO. You can clone this project and analyze your own Keen IO datasets! Next time, we’ll talk about how to visualize the data we’re getting back from Keen using Matplotlib.