Google BigQuery Integration (Beta)

Stream enriched event data to Google BigQuery

Thank you for being an early adopter of Keen’s BigQuery integration beta. We’re evaluating several competing priorities to inform our 2018 product roadmap. Given other customer demands, development on Keen’s BigQuery beta integration is indefinitely paused. While continuing to function, the integration is not being actively supported for the time being. Apologies for any inconvenience.

Stream inbound event data into a BigQuery project

Stream your full-resolution, enriched event data to Google BigQuery (BQ) with a scalable, fully-managed, highly reliable, and cost-efficient data pipeline.

How it works: Use Keen’s suite of SDKs and sources (Stripe, SendGrid, etc) to collect your arbitrary JSON event data from anything connected to the internet. Events can describe any user interaction and have up to 1000 custom-defined properties. Turn on enrichments to automatically append location and device information to your data. Once data is flowing into Keen, you can follow this guide to then stream your incoming data to BigQuery in realtime.

This BigQuery integration empowers analysts and teams who prefer SQL to query user behavior and marketing attribution data. Once your event data is in BigQuery, you can painlessly integrate with the latest and greatest enterprise BI technologies such as Tableau, Looker, Metabase, and Informatica.

You will be able to:

  • Run SQL Queries and quickly answer business questions on your application data
  • Store all of your user interactions in one consolidated Enterprise Data Warehouse
  • Analyze data using favorite BI & visualization tools (Informatica, Looker, Qlik, SnapLogic, Tableau, Talend, Google Analytics)
  • Dig deeper with more complex analysis, including joins, window functions, and arrays
  • Leverage a fully managed and supported infrastructure with no pipeline to maintain

Note: If there are other cloud data warehouses where you would like to stream your enriched data, we would love to hear from you.

Table of Contents

Configuring your Keen IO project to stream to BigQuery

Here is an overview of key terms that Keen and Google use and how the main concepts map to each other:

Keen IO Terminology Google BigQuery Terminology
Organization Project
Project Dataset
Event Collection Table
Event Properties Columns
Events Rows

Configuration Steps

Step 1: If you don’t already have one, create a Google Cloud Platform account here. Be sure to “enable BigQuery”.

Step 2: Go to Keen IO and select the Keen Organization you’d like to start streaming to BigQuery. Select the “Settings” tab from the top menu bar to access the settings for your Organization. Using the left side menu, select “Google BigQuery Integration” and press the “Authenticate” button to initiate a connect with your Google Cloud Platform account.

Keen IO - Settings Configuration for Google BigQuery

Step 3: Choose your Google Account. Select your account and then click “Allow”. You have now succesfully authenticated with Google.

Step 4: Next, open your Google Cloud dashboard here and press “Select a project” to create the Google project you’d like to stream Keen data to. If you don’t have a project in mind yet, press the + arrow to name and “Create a project”. See image below.

Google Cloud Dashboard- Create a BigQuery project

Step 5: Note the name of this new Google project and return to Keen IO Organization’s Settings. On the screen where you previously authenticated, please enter in your Google Cloud project ID. Tip: A valid Google Cloud project ID will be a dash separated version of your new Google project’s name. It will look something like this: amazing-apps-keen.

Enter your Google Cloud Project ID to  Keen Settings

Step 6: On the Google Cloud UI, from the home screen’s left side menu navigate to and select “BigQuery”.

Find BigQuery in the Google Cloud dashboard's menu

Step 7: Next, let’s create a new Dataset in BigQuery. This will be the streaming destination for all data coming from one of your Keen IO Projects. Select the new Google Project you’ve created (in our screenshot example our project is called “Amazing Apps Keen”). Use the drop down arrow to “Create Dataset”. Name your Dataset and select “OK”.

Create a new Dataset in Bigquery

Step 8: Return to Keen IO and now select from within your Keen Organization the Keen Project you’d like to start streaming to BigQuery.

Step 9: Navigate to your Keen Project’s “Streams” tab to access your streams configurations and settings for this Project. Scroll down past the other streams options to “Google Cloud Big Query” and click on the “Configure BigQuery Streaming” button.

Step 10: Enter name of the BigQuery Dataset you’d like the data from your Keen IO Project to flow into. Now click the “Update” button.

Configure your Keen Project Settings

That’s it! You’re now set up your BigQuery and Keen IO integration.

How to Run SQL on your Keen IO Event Data

Step 1: Open up BigQuery

Step 2: Navigate to your your BigQuery Project and Dataset

Step 3: Create a new query

That’s it! Begin joining event data with entity data tables. Below shows what a sample SQL query looks like in the basic BigQuery console:

Create a SQL Query on Keen Data using joins in Google BigQuery

Data Enrichment and Data Availability

Your event data still enjoys the benefits of all Keen Streams features such as collection via 15+ SDKs, data enrichment of your events, and a copy is still written to Keen IO for data analysis, compute, and visualization.

As data streams to Keen your event data can be ehanced via our Data Enrichment capabilities. The following data enrichments are available for your events: IP to Geo information, DateTime parser, URL parser, and User Agent enrichment.

Data is streamed to BigQuery within an hour of being received, enriched, processed, and written by Keen. Data in BigQuery is partitioned by day based on the keen.timestamp property.

Note: Data being streamed to BigQuery must have a valid and accurate keen.timestamp that is also no more than 30 days in the past and five days into the future.

Historical Data

As soon you configure a project to stream to BigQuery, data will being loading into your BigQuery Project. To import historical data, please reach out. Up to the previous 30 days of your historical data that exists in Keen can be loaded into your BigQuery Project. This can take a few days and its progress can be tracked using the BigQuery console.

Schema and Naming

Each of your Keen IO Projects will be written to BigQuery as a BigQuery Dataset. Each table in your BigQuery Dataset represents a Keen event collection, and each column within the table represents the properties in your data collection. As your events occur they are appended to the table as a new row.

Keen IO allows for dynamic data collection and analysis but BigQuery has stricter schema. To handle this, the first value sent for a property will be stored in a BigQuery column with that data type. If a subsequent event sends a value of a different data type for that same property, it will be stored in a separate column with a suffix on the column name that indicates the data type, for example "property-name-foobar_string”

BigQuery table names and field names have naming and character restrictions that are stricter than Keen’s naming conventions. We will convert collection and property names to match these limits.

Table Names

  • Contain up to 1,024 characters. Contain letters (a-z, A-Z), numbers, and underscores

Field Names

  • The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_), and must start with a letter or underscore.
  • The maximum length is 128 characters.
  • The name cannot contain “.” which Keen uses to represent nested properties.
  • BigQuery’s field names are not case sensitive. This means if your event schema in Keen contains case sensitive property names (eg ProductID and productid) there will be a collision and an error will be logged.

To read more about how errors may occur and how they are logged see Error Logging below.

Because of these restrictions, your Keen property names may be transformed or truncated when streamed to BigQuery. If the truncated name matches another property name, that causes a collision. If the properties that encounter a conflict have the same data type they will be combined. If the properties that encountered a conflict have different data types, a new property will be written. You will find a new column created with a descriptive suffix indicating its data type.

Error Logging

A table named keen_errors will automatically be created and maintained for each of your Keen IO Projects.

Connection Errors

There may be times in which all the events for a given time period may not be relayed to your streaming destination. This can be caused by network latency, 3rd party system failure, or an active issue. A table named keen_errors will automatically be created and maintained for each of your Keen IO Projects.

If you remove authentication to your BigQuery account, we will not be able to stream events.

Schema Errors

At present, BigQuery locks streaming events after table schema is changed. If you add a new property to your Keen event, writes to the corresponding BigQuery table will fail for up to 15 minutes. A work-around is currently in development.

Business Intelligence and Visualization

One of the best parts of using BigQuery is that it integrates easily with many of the top Business Intelligence (BI) tools. Here are a few selected guides:

Google Data Studio is free and is ridiculously easy to use with BigQuery

Metabase is also free because it’s open source!

Tableau is one of the most popular BI options.

Here is a full list of BI tools supported by BigQuery:

  • Anodot
  • Arimo
  • AtScale
  • Bime
  • Chartio
  • CoolaData
  • Domo
  • IBM Cognos Analytics
  • iCharts
  • Indicative
  • Dundas BI
  • Looker
  • Metabase
  • Mode
  • MicroStrategy
  • OWOX
  • QlikView
  • re:dash
  • SAP Analytics Cloud
  • Tableau
  • Yellowfin
  • Zoomdata