Google BigQuery Integration (Beta)
Stream enriched event data to Google BigQuery
Thank you for being an early adopter of Keen’s BigQuery integration beta. We’re evaluating several competing priorities to inform our 2018 product roadmap. Given other customer demands, development on Keen’s BigQuery beta integration is indefinitely paused. While continuing to function, the integration is not being actively supported for the time being. Apologies for any inconvenience.
Stream inbound event data into a BigQuery project
Stream your full-resolution, enriched event data to Google BigQuery (BQ) with a scalable, fully-managed, highly reliable, and cost-efficient data pipeline.
How it works: Use Keen’s suite of SDKs and sources (Stripe, SendGrid, etc) to collect your arbitrary JSON event data from anything connected to the internet. Events can describe any user interaction and have up to 1000 custom-defined properties. Turn on enrichments to automatically append location and device information to your data. Once data is flowing into Keen, you can follow this guide to then stream your incoming data to BigQuery in realtime.
This BigQuery integration empowers analysts and teams who prefer SQL to query user behavior and marketing attribution data. Once your event data is in BigQuery, you can painlessly integrate with the latest and greatest enterprise BI technologies such as Tableau, Looker, Metabase, and Informatica.
You will be able to:
- Run SQL Queries and quickly answer business questions on your application data
- Store all of your user interactions in one consolidated Enterprise Data Warehouse
- Analyze data using favorite BI & visualization tools (Informatica, Looker, Qlik, SnapLogic, Tableau, Talend, Google Analytics)
- Dig deeper with more complex analysis, including joins, window functions, and arrays
- Leverage a fully managed and supported infrastructure with no pipeline to maintain
Note: If there are other cloud data warehouses where you would like to stream your enriched data, we would love to hear from you.
Table of Contents
- Send Your Keen Event Data to BigQuery
- Run Your First SQL Query
- Data Enrichment & Data Availability
- Historical Data
- Your Data: Schema & Naming
- Error Logging & Connection Loss
- Additional Resources: Integrate with Popular BI Tools
Configuring your Keen IO project to stream to BigQuery
Here is an overview of key terms that Keen and Google use and how the main concepts map to each other:
|Keen IO Terminology||Google BigQuery Terminology|
Step 1: If you don’t already have one, create a Google Cloud Platform account here. Be sure to “enable BigQuery”.
Step 2: Go to Keen IO and select the Keen Organization you’d like to start streaming to BigQuery. Select the “Settings” tab from the top menu bar to access the settings for your Organization. Using the left side menu, select “Google BigQuery Integration” and press the “Authenticate” button to initiate a connect with your Google Cloud Platform account.
Step 3: Choose your Google Account. Select your account and then click “Allow”. You have now succesfully authenticated with Google.
Step 4: Next, open your Google Cloud dashboard here and press “Select a project” to create the Google project you’d like to stream Keen data to. If you don’t have a project in mind yet, press the + arrow to name and “Create a project”. See image below.
Step 5: Note the name of this new Google project and return to Keen IO Organization’s Settings. On the screen where you previously authenticated, please enter in your Google Cloud project ID. Tip: A valid Google Cloud project ID will be a dash separated version of your new Google project’s name. It will look something like this:
Step 6: On the Google Cloud UI, from the home screen’s left side menu navigate to and select “BigQuery”.
Step 7: Next, let’s create a new Dataset in BigQuery. This will be the streaming destination for all data coming from one of your Keen IO Projects. Select the new Google Project you’ve created (in our screenshot example our project is called “Amazing Apps Keen”). Use the drop down arrow to “Create Dataset”. Name your Dataset and select “OK”.
Step 8: Return to Keen IO and now select from within your Keen Organization the Keen Project you’d like to start streaming to BigQuery.
Step 9: Navigate to your Keen Project’s “Streams” tab to access your streams configurations and settings for this Project. Scroll down past the other streams options to “Google Cloud Big Query” and click on the “Configure BigQuery Streaming” button.
Step 10: Enter name of the BigQuery Dataset you’d like the data from your Keen IO Project to flow into. Now click the “Update” button.
That’s it! You’re now set up your BigQuery and Keen IO integration.
How to Run SQL on your Keen IO Event Data
Step 1: Open up BigQuery
Step 2: Navigate to your your BigQuery Project and Dataset
Step 3: Create a new query
That’s it! Begin joining event data with entity data tables. Below shows what a sample SQL query looks like in the basic BigQuery console:
Data Enrichment and Data Availability
Your event data still enjoys the benefits of all Keen Streams features such as collection via 15+ SDKs, data enrichment of your events, and a copy is still written to Keen IO for data analysis, compute, and visualization.
As data streams to Keen your event data can be ehanced via our Data Enrichment capabilities. The following data enrichments are available for your events: IP to Geo information, DateTime parser, URL parser, and User Agent enrichment.
Data is streamed to BigQuery within an hour of being received, enriched, processed, and written by Keen. Data in BigQuery is partitioned by day based on the keen.timestamp property.
Note: Data being streamed to BigQuery must have a valid and accurate keen.timestamp that is also no more than 30 days in the past and five days into the future.
As soon you configure a project to stream to BigQuery, data will being loading into your BigQuery Project. To import historical data, please reach out. Up to the previous 30 days of your historical data that exists in Keen can be loaded into your BigQuery Project. This can take a few days and its progress can be tracked using the BigQuery console.
Schema and Naming
Each of your Keen IO Projects will be written to BigQuery as a BigQuery Dataset. Each table in your BigQuery Dataset represents a Keen event collection, and each column within the table represents the properties in your data collection. As your events occur they are appended to the table as a new row.
Keen IO allows for dynamic data collection and analysis but BigQuery has stricter schema. To handle this, the first value sent for a property will be stored in a BigQuery column with that data type. If a subsequent event sends a value of a different data type for that same property, it will be stored in a separate column with a suffix on the column name that indicates the data type, for example "property-name-foobar_string”
BigQuery table names and field names have naming and character restrictions that are stricter than Keen’s naming conventions. We will convert collection and property names to match these limits.
- Contain up to 1,024 characters. Contain letters (a-z, A-Z), numbers, and underscores
- The name must contain only letters (a-z, A-Z), numbers (0-9), or underscores (
_), and must start with a letter or underscore.
- The maximum length is 128 characters.
- The name cannot contain “.” which Keen uses to represent nested properties.
- BigQuery’s field names are not case sensitive. This means if your event schema in Keen contains case sensitive property names (eg ProductID and productid) there will be a collision and an error will be logged.
To read more about how errors may occur and how they are logged see Error Logging below.
Because of these restrictions, your Keen property names may be transformed or truncated when streamed to BigQuery. If the truncated name matches another property name, that causes a collision. If the properties that encounter a conflict have the same data type they will be combined. If the properties that encountered a conflict have different data types, a new property will be written. You will find a new column created with a descriptive suffix indicating its data type.
A table named
keen_errors will automatically be created and maintained for each of your Keen IO Projects.
There may be times in which all the events for a given time period may not be relayed to your streaming destination. This can be caused by network latency, 3rd party system failure, or an active issue. A table named
keen_errors will automatically be created and maintained for each of your Keen IO Projects.
If you remove authentication to your BigQuery account, we will not be able to stream events.
At present, BigQuery locks streaming events after table schema is changed. If you add a new property to your Keen event, writes to the corresponding BigQuery table will fail for up to 15 minutes. A work-around is currently in development.
Business Intelligence and Visualization
One of the best parts of using BigQuery is that it integrates easily with many of the top Business Intelligence (BI) tools. Here are a few selected guides:
Google Data Studio is free and is ridiculously easy to use with BigQuery
Metabase is also free because it’s open source!
Tableau is one of the most popular BI options.
Here is a full list of BI tools supported by BigQuery:
- IBM Cognos Analytics
- Dundas BI
- SAP Analytics Cloud