Ad-Hoc Query Performance
This guide will walk you through tips for the following:
Increase Query Speed
The amount of time a query request takes is proportional to the size of the collection you are querying & the timeframe of data your query spans. Thus, there are a few easy ways to improve query performance:
1. Shorten the timeframe in your query
The smaller the timeframe, the faster the query. A query on one week of data will be 4X faster than a query on 1 month of data (roughly).
2. Reduce your collection size
A query analyzing 400M events will take about twice as long as a query analyzing 200M events. For this reason, we recommend that you don’t store all event types in a single mega-collection. Keen was designed to have multiple collections for each type of action (Signups, Opens, Messages, etc.). See our Data Modeling Guide for more details and tips.
3. Include a timeframe for each funnel step
If you’re running funnels, include a timeframe on each step of the funnel to limit the amount of data scanned in each step.
4. Cache your queries
Running the same query over and over? Use our Query Caching feature.
These recommendations mostly apply to collections with millions of events or more. Queries on small datasets will almost always be fast, and you might not notice a difference with the techniques above :).
Calculating Metrics for Many Users at Once
Say you want to calculate the maximum purchase amount for each of your 3,000 accounts. Your first instinct might be to run 3,000 queries like “average the purchase.amount for user ‘eq’ X”. However, you can actually run this query for all of your accounts at once!
A. “average the purchase.amount for users, grouped_by user.id”
or, run the query on a set of users:
B. “average the purchase.amount for users ‘in’ [X, Y, Z, …], grouped_by user.id”
Both of these queries will return an array of query results by user.id.
Of course, there are limitations to the number of accounts you can request in a single HTTP query (your request size will hit the maximum size once your array gets too big). Sometimes it may be strategic to loop through your accounts to do group_bys on groups of, say, 500 users at a time.
Running Simultaneous Queries
Here are some tips for running a large number of queries simultaneously to avoid hitting timeouts and query limits:
1. Minimize the number of queries kicked off simultaneously
You’ll want to minimize the number of queries that your dashboard kicks off at a time in order to increase loading speed and avoid query limits. (There are also limits to the number of http requests a browser will wait on at once).
2. Optimize your data model & query pattern
Optimizing your data model for faster-completing queries reduces the number that will be running in parallel (each will complete faster).
3. Break your dashboard up into sections where possible
Revealing parts of a dashboard in sections rather than one giant report reduces the number of queries run at one time, increases load speed, and makes it easier for your teams and customers to navigate to the data they neded.
4. Use multi-analysis
There is a query type called multi-analysis which allows you to combine some types of queries into a single request.
5. Use group_by
You can use the “group_by” parameter to get lots of data in a single query. For example, count the number of checkouts grouped by scenario rather than running a separate query for each scenario. You can also group_by multiple properties at a time. For example, group-by scenario and mobile_or_web to get a full matrix of segmented results at once.
6. Pre-computing the queries with Datasets or Cached Queries!