Avatar photo

Controlling query complexity with Execution Metadata

Keen’s pricing is mostly based on two factors: how much new data you collect and how much you crunch. The first part is more straightforward, but we have received many questions about the detailed costs of computations. This is especially relevant in cases where our customers use Keen to allow their users to freely query data but the cost of those queries is unpredictable and unclear.

We have released a feature that allows you to see the complexity of ad-hoc queries at the execution time and use that information to create more efficient queries.

The quick overview of Keen’s compute pricing guide

Each ad-hoc computation request requires us to scan your events. You pay for scanning events created in the timeframe matching the one defined in your query (the wider the timeframe, the more events are scanned). This is the first factor. The second one depends on the number of unique properties referenced in a query (the more filters, group_by properties etc., the higher value of this factor). Then we multiply both of them and the result is called total properties scanned. As our standard pricing is straightforward, you can simply calculate how much you will be charged for an executed request. But how do you get to these two numbers?

The answer: Execution Metadata

We have recently introduced an optional query parameter: include_metadata. Every time you execute a query and specify include_metadata=true, then the response will be enhanced with additional details:

[html]

{
"result": 5,
"execution_metadata": {
"events_scanned": 1000,
"properties_per_event": 4,
"total_properties_scanned": 4000,
"total_processing_time": 0.09057211875915527
}
}

[/html]

The total_properties_scanned property is what contributes to applied charges. However, both events_scanned and properties_per_event may help you tweak your query so you only pay for what you really need. The total_processing_time provides you with the number of seconds your query took.

How might you leverage this?

The total_properties_scanned is obviously useful when prototyping new metrics that you wish to show users since they can help you estimate the cost (and performance) of new metrics and insights. But it can be especially useful for scenarios where end users have access to their own dashboard to run ad-hoc queries or potentially costly queries.

In scenarios like this, you might integrate the execution details with your error tracking service (such as Sentry.io, Rollbar, etc). That would allow you to get alerts when a query takes longer than X seconds to execute or when a user runs a query that results in more than Y property scans.

Planned future enhancements

Execution Metadata is currently available for ad-hoc queries executed via Keen’s API, and in the near future, we will release a Data Explorer update to add the ability to review query execution details readily in the UI.

Cached Queries are executed periodically, so the real cost of using them is more complicated to calculate. You pay for cache updates instead of fetching results so this enhancement will not be as useful here due to the variables involved. The number of daily updates depends on how you configured the query’s refresh rate, and the number of scanned properties and streamed events may vary due to passing time. However, we are exploring how to expose execution costs in a different way.

Cached Datasets costs are also based solely on the queries that Keen runs internally to update the cache; there is no cost for fetching results. We will be exploring ways to help you better understand the costs associated with your Cached Datasets over the coming weeks.

The big picture

Our goal at Keen is to provide user-facing analytics infrastructure that is much lighter weight and lower overhead than what our customers would have to build and maintain on their own. Providing real-time visibility into the cost-effectiveness of queries is just one way that we can help our customers optimize their use of our platform.