Avatar photo

Event Data vs Entity Data — How to store user properties in Keen IO

I frequently get questions about how entity data (aka state data) like user properties should be captured in an event database like Keen IO. For example:

It looks like Keen IO is based on event data. Is there any idea of user traits? Like a user’s age, email address, or ID?

What about something like the number of projects a user has? Is that a user trait or is the recommended way to measure “number of projects” to look at how many times they’ve triggered “Created a Project”?

   
lovely photo by #WOCinTech!

I love getting questions like this, because it shows the person is really digging into their event data modeling and wants to make sure they are doing it the right way. Here’s my response.

Short answer:

You absolutely should record properties or traits about your users in Keen IO. They should be stored as point-in-time properties of the events that you send. There is no separate Keen IO data store for these types of properties. Generally, developers store “current” data or state about their users in their own app database, and then send the current state of those properties along with any events related to a particular user.

  • To count how many projects were created in a given timeframe, count the number of Create_Project events.
  • To find out how many projects were created by a specific user, count the number of Create_Project events, filtered by user.id.
  • To get a list of projects created by a specific user, do a select_unique on Create_Project, filtered by user.id = <user.id>.
  • To find out how many unique users created projects, count the number of Create_Project events with a “count_unique” filter on user.id.
  • Use group_by to get a list of all users or organizations that created projects, and the number of projects they created.
  • Use any of your event properties to do further segmentation and filtering!

Long answer:

It’s important to understand the difference between “entity data” and “event data”.

Entity data is what people normally think of when they imagine data in a database. For example, a user table with columns like first name, last name, organization ID, user ID. Each row in the table represents a user. Perhaps you have a table for products which has properties about your products, where each row is a unique item. Entity data describes the current state of your application (your users, products, prices, etc).

In contrast, event data describes actions that happened in your application. Perhaps the biggest difference between event data and entity data is scale. For every 1 user in your app, you might have hundreds, even thousands of events. Sometimes users are not the only things generating event data. Server logs are another example of event data. Ben Johnson calls event data “behavior data” and I really like the way he describes it in his presentation on “behavioral-databases”:

Event data is described by an action, a timestamp, and snapshot of state.

Another way I describe event data is a verb, a timestamp, and nouns with properties.

So for the Create_Project example your “event collection” name is the action ”Create_Project”. Your timestamp will be applied at the time the event is sent, and state will be all the properties about the user, their platform, your app (perhaps what version of your website they are on), and the new information about the project itself. Using this model you can build extremely advanced analytics about every interaction in your app (or your business).

Every time you record an event, you should send all the relevant entity information that you can. For example, all user traits (user.id, user.age, user.gender, user.email, user.referrer, user.location.state, user.location.country), all agent traits (e.g. agent.browser.version, agent.os, etc), and all the traits you have for any other relevant entities in your app (for example, perhaps you have information about a user’s organization like organization.id,organization.name, organization.subscriptionlevel). Project would be another example of an object for which you could record properties like project.id, project.name, project.birthdate.

By including rich entity information in your event data object, you open up a vast array of filtering opportunities and analysis capabilities on your event data, without having to join various tables across your relational database.