Avatar photo

How to do a Join on Event Data

Joins are a powerful feature in traditional static databases that combine data stored in two or more entity tables using SQL. You might use a join to answer a question like “Which customer spent the most money last month?”

A lot of our customers have asked us “Can I do a join with event data?”While you can’t do a traditional join on event data, you can accomplish exactly the same outcome by running a group_by on an event property. It’s flexible and very easy!

Here’s how:

First, if you were using a traditional entity database recall that the information you might need for your particular analyses are stored in strict, separate tables. With Event Data which allows for non-normalized and flexible schemas, all of that information properties about the user and their actions are already there: right inside the event, every single time!

To illustrate this, let’s take a look at a sample event. The event is an action triggered by a customer or user. Along a definition of what action occurred, the event contains other properties regarding when it occurred, who did it, on what device, and other most recent information about the user at that time.

If you want to learn more about the differences between Entity Data and Event Data, check out our more detailed guide on: “How to Think About Event Data”.

Here’s a sample Purchase event. If you work at an e-commerce company, you will probably want to track purchases (very important!). Every time you track a purchase event, you can include rich data about the purchase.

Sample Data Model for a “Purchase” event:

purchases = {
   "user": {
       "first_name": "Arya",
       "last_name": "Stark",
       "email": "as@keen.io",
       "id": 22
   }
   "order": {
       "id": "XD-01-25"
   },
   "product": {
       "list_price": 19.99,
       "description": "This is the best Dog Shirt",
       "name": "Dog Shirt",
       "id": 10
   },
   "keen": { // these keen properties are automatically added to each event
       "timestamp": "2015-06-16T23:24:05.558Z", // when the event occurred
       "created_at": "2015-06-16T23:24:05.558Z", // when the event is written
       "id": "5580b0153bc6964d87a3a657" // unique event id
   }
}

As you can see, every time a purchase is made we are tracking relevant information such as:

  • The action that occurred: a purchase
  • Details about the user
  • Order data
  • Product information
  • Timestamp information (when the action occurred)

These properties are included right in the event and allow for rich analyses. In this format we gain the ability to easily derive efficient querying for sums, counts, averages, and quick and other aggregation analyses.

Here are some real answers we can obtain from asking our sample event data some business-critical questions:

  • Which products were purchased most often?
  • Which users have spent the most money?
  • What is the average order value?

Each of these analyses can be answered in one analysis call — without joins!

Let’s try it out with: “What was the most popular product?” Here’s what our analysis call would look like:

new Keen.Query("count", {
    eventCollection: "purchases",
    groupBy: "product.name",
    timeframe: "last_week",
   });

Analysis Result: Of our products, the “Mallard” and the “Horse Shirt” are the most popular.

0_vzslTSowVGznk1IW (1).png

Now, let’s say… we want to know which customer made the most purchases last week.

Which user made the most purchases? Here’s our query:

new Keen.Query("count", {
     eventCollection: "purchases",
     groupBy: "user.first_name",
     timeframe: "last_week", 
   });

Analysis Result: Username “Sansa” & “Stannis” tie as our most heavy shoppers.

0_J5cuI9wnnWQnE-nW (1).png

Finally, let’s find out what our total gross revenue is across all users.

What is my total gross revenue?

new Keen.Query("sum", {
     eventCollection: "purchases",
     targetProperty : "product.price",
     timeframe: "last_week", 
   });

Analysis Result: $439 (not bad for oddly specific, Game of Thrones, animal-themed t-shirts!)

0_bZ33NTcu2XAP4ZRp.png

Event Data vs. Entity Data

With entity data, you would use a SQL command which would require multiple joins on several tables to answer these questions. To answer the question “What was the most popular product?” you would need to have your users table, a products table, and a purchases table ready. You would get the same result, but the path to get there would be longer.

With Event Data, when an event is triggered you’ll know everything you know about the user and the user’s action as well as the point in time it occurred. This serves as a snapshot of the user as you know him/her. Some examples of rich properties and information your event can include are their name, username, account number, userid, age at that time, what device they were using, what was purchased, and any other properties you have available. Keen IO also features five separate data enrichments which automatically append a lot of helpful data at analysis time — they include IP to Geo, User Agent Parsing, and URL Parsing as examples. When you’re ready to query, this rich snapshot becomes incredibly powerful.

The most important point with event data is to think carefully about the kind of questions you’d like answered when you set up your data tracking. That way you’ll be sure to have the information available when it comes time to query.

So which is better: Entity Data or Event Data?

Both have their strengths. In general, entity data is best for storing static information about nouns (users, inventory, assets, etc.) while event data is ideal for tracking data related to verbs (signup, purchase, click, upgrade, etc.)

Very often, the questions that are most important to your business revolve around these user actions, and event data allows you to run analytics on them without having to do joins at all. Another benefit of event data is its ability to collect high volume — lots and lots of non-normalized data, and does not require a strict schema. This flexibility offers a lot of power to adapt to new types of properties you may want to include in your data at a later point, as well as the ability to query or segment against any property you’ve included with your data!

In the case there are data sources that really cannot be combined whether via an engineering challenge or the fact that the need was very ad-hoc in your data setup process and came after-the-fact, if you’re using Keen IO Streamsyou can stream your data via S3 to AWS. This allows you the ability to run SQL on your event data and join with tables you’ve loaded into the Amazon environment.

Learn More About Modeling Event Data

To learn more about what to track, and when, check out our Data Modeling Guide.

Know of data in mind that you want to start exploring? Sign up for a free account now.

Questions? Join our Slack chat or DM on Twitter @jandwiches 🍞.