Compute Pricing Guide

At the core, the Keen Compute API is your method of asking questions about the data you have collected. It is priced in a simple pay-as-you-go model. This guide will explain how those prices are calculated as well as demonstrate the cost-saving capabilities of our advanced compute features: Cached Queries and Cached Datasets.

What does “Properties Scanned” mean?

In short, it represents the amount of data we had to process to answer the query you requested.

The formula used to calculate the number of properties scanned per query looks like this:

E * P = N

E is the number of events that exist within the timeframe you provided.
P is the number of properties per event required to calculated the query.
N is the total properties scanned.

To calculate P for a given query, simply count the unique properties referenced in the filters, group_by, or target_property parameters and add 1 to that number. This means that if you’re filtering and grouping by the same property, that only increases P by 1. This also means a simple count (which references no properties) has P = 1.

Following this formula, a query over 2 million events that uses 3 properties has an N of 6 million. This query would cost 6 cents. If this query is powering a KPI in a dashboard that is viewed 20 times per day, or 600 times per month, the monthly cost of that KPI would be $36.00. Read on to see how caching can help bring that price down.

Your data is indexed intelligently in our system to minimize the number of properties we have to touch for a given query. Events can have hundreds of unique properties, but most queries typically only require scanning a few of them. You’ll only end up paying for the ones you actually query.

How are Extractions priced?

Giving you access to your raw data is very important for us at Keen. We provide the ability to extract chunks of your data in .csv or json format via our Extraction API.

The pricing for this follows the same pricing formula mentioned above. The P in this case is equal to the one plus the number of properties per event you want to extract. By default, this is all of them. However, you can restrict this with the property_names parameter to only return the ones you care about.

How are funnels priced?

Funnels are a powerful tool for analyzing data. They allow you analyze a cohort’s behavior across multiple events.

Because a funnel can have multiple steps, each step is evaluated against the pricing formula and then added up for the total.

How Caching Saves on Query Costs

By default, our compute API calculates answers at the time of request. While we pride ourselves in the speed at which we can crunch data, as your amount of data grows, so will your response time. If you’re presenting analytics in the form of a dashboard, then user experience can suffer. This is where our various caching features can have a huge impact not only on user experience, but also on your bill.

Cached Queries

Cached Queries take a query definition and run the query once on a time interval. The result is then kept in a cache so when it is retrieved, it is pulled from the cache instead of recomputing it.

In the case where a dashboard will be viewed 100 times per day and those queries are all being calculated from scratch every time, the compute bill will rise very quickly. If we move them to Cached Queries, they will only be calculated between 1 and 24 times per day (configurable) thus reducing the amount of compute that needs to be done. On top of that, the data required to power the dashboard will be served from the cache for increased speed.

The formula for calculating the total monthly properties scanned for a Cached Query looks like this:

E * P * R = N

E is the number of events that exist within the timeframe you provided.
P is the number of properties per event required to calculated the query.   R is the number of times this query is ran per month. This is based on the refresh rate.
N is the total properties scanned.

Referring back to our example above, we have now turned our query into a Cached Query with a refresh rate of every 4 hours. This means that our R is 180. Using the same N of 6 million, the cost of this Cached Query is now $15.80. This is because the raw compute used is $10.80 plus the $5 for a Cached Query.

Cached Datasets

Cached Datasets operate differently in that the data is processed as it flows into Keen, so the compute cost is instead based on addition of new relevant data to the query. It also gives you access to powerful segmentation across different times and dimensions of your data.

The formula for calculating the total monthly properties scanned for a Cached Dataset looks like this:

E * P * 2 = N

E is the number of events ingested during the month that are relevant for the query.
P is the number of properties per event required to calculated the query.
N is the total properties scanned.

Building upon the example Query and Cached Query above, imagine we convert this to a Cached Dataset. Our E is 2 million, and our P is 3, so our N is 12 million. This means our total cost for this dataset is $10.12. The compute cost is 12 cents plus the $10 cost of a Cached Dataset.

Examples

Here are some examples of calculating the P for various queries. It is assumed that all of these queries contain the timeframe parameter.

Query Definition P
sum on property A, filter on property B 3
sum on property A, filter on property A, group_by property A 2
count collection C, filter on property A 2
count collection C 1
funnel
step 1 with actor property A, filter on property B
step 2 with actor property A
5
extraction on property A, B, C 4