Data Extractions

Data Extractions allow you to get your event data out of Keen IO. We strongly believe that you should always have full access to all of your data, and we aim to make it as simple and painless as possible.

The following types of Data Extractions are currently supported in the Keen Analysis API:

Extraction API

The Extraction API is used by making an HTTP GET request like this:

https://api.keen.io/3.0/projects/<project_id>/queries/extraction?api_key=<read_key>&event_collection=<event_collection>

Extractions take the following parameters:

  • api_key (optional) - The API Key for the project containing the data you are analyzing. The API key can alternatively be provided in the request header. See Authentication for more information.
  • event_collection (required) - The name of the event collection you are analyzing.
  • filters (optional) - Filters are used to narrow down the events used in an analysis request based on event property values.
  • timeframe (optional) - A Timeframe specifies the events to use for analysis based on a window of time.
  • email (optional) - If an email address is specified, an email will be sent to it when your extraction is ready for download. If email is not specified, your extraction will be processed synchronously and your data will be returned as JSON.
  • latest (optional) - An integer containing the number of most recent events to extract.
  • property_names (optional) - A URL-encoded array of strings containing properties you wish to extract. If this parameter is omitted, all properties will be returned.

Data Extraction to JSON

Extracting your data to JSON is a synchronous operation. That means that the data is returned in the response of the API call. All extraction requests are of this type unless the email parameter is included.

https://api.keen.io/3.0/projects/<project_id>/queries/extraction?api_key=<read_key>&event_collection=<event_collection>

Limits

  • The number of events per request is limited to 100,000. If you request more data than that, the request will error.

Data Extraction to CSV file

Extracting your data to CSV format is an asynchronous operation. This means that the data will be sent to you sometime after the request has been made. To retrieve your data by CSV, simply add the email parameter to your request, and we’ll send you an email with a link to your .csv document.

https://api.keen.io/3.0/projects/<project_id>/queries/extraction?api_key=<read_key>&event_collection=<event_collection>&email=<email>

Note: You can supply the content_encoding parameter with a value of “gzip” to cause the output to be compressed. This only works with async, CSV requests!

Limits

  • The number of events that can be extracted per .csv file is 10,000,000. If you request more data than that, the request will error.

Extract Most Recent Events

Add a latest parameter to your extraction request to get back the last 5 events, last 10 events, etc. This parameter can be used with both JSON and CSV extractions.

https://api.keen.io/3.0/projects/<project_id>/queries/extraction?api_key=<read_key>&event_collection=<event_collection>&latest=<number>

Extract Only Certain Properties

Add a property_names parameter to your extraction request to only return a given set of properties. This value must be a URL-encoded JSON array of fully qualified property names. This parameter works on both JSON and CSV extractions.

https://api.keen.io/3.0/projects/<project_id>/queries/extraction?api_key=<read_key>&event_collection=<event_collection>&property_names=<property_name_array>

For instance, if you want to retrieve only the keen.timestamp and keen.created_at property, you would create a JSON array of the property names like this:

["keen.timestamp","keen.created_at"]

then URL-encode the string to this:

%5B%22keen.timestamp%22%2C%22keen.created_at%22%5D

Notes on Data Extraction

Technical Reference: Extraction Resource

Here is some additional info related to data extraction:

  • If you don’t specify any filters, your extract will include every event in an Event Collection. All Event Properties are included for each event in the extract. The files can get quite large. Use timeframes and filters to narrow the inventory of events that you extract.
  • Every event in your extraction will have a keen.timestamp property. That’s the value used for sorting events by Timeframe. The timezone of this timestamp is UTC.
  • There is currently no way to specify the order of the properties (columns) in your extraction file. They might not come out in the order you expect, but they will all be there.
  • Extractions are done by Event Collection. If you want to extract 100% of your data from Keen, you’ll need to run an extraction for each Event Collection.

So, what are you waiting for? It only takes a few minutes and a few lines of code to start collecting the events that really matter to you.

Sign Up Free