Keen Status Page: For when you need data about your data about your customers’ data

About a week ago, I decided to update our status page again. Our previous version of the status page appeared to have slowly evolved into what looked like an EKG monitor. For our query durations it displayed frighteningly high peaks and sudden downward shifts. And yet, during these periods of seemingly abnormal behavior, we did not update our status page with any mention that our system was behaving abnormally. The reason: our status page was viciously lying misrepresenting data.

Our internal dashboards were displaying happy pictures while the status page displayed doom and gloom. I fixed this discrepancy, but in the interest of transparency and honesty, I’d like to discuss the changes I made to these figures and the motivations behind them.

There is a new display for extractions:

Help! I am alt-text hiding in this blog. Someone please help me before they---

Extractions are requests for raw data stored with Keen IO. I am avoiding referring to them as queries because they have very different performance characteristics from queries such as counts or averages. Queries, as I am defining them, are answers to questions such as “How many customers made a purchase last month?” or “What was the average shipping cost of all purchases ordered from Canada during the month of July?” By contrast, extractions really only answer the question “Would you be so kind as to give me all of my data from the last week?”

Extractions will typically take longer to complete for a number of reasons and rarely relate to how long queries are performing at a given time. Thus, I created a separate display for them. This is displaying a line of samples of the median time to complete extractions. The upper-right hand corner displays an average (mean) of those samples over the past 24 hours. The extraction graph will typically be spiky because the usage pattern for extractions is different from queries. When someone wants a group of data, they typically request a lot of data at once in a burst and then cease asking for data.

The other big change was to the display labelled “Query Duration”, which has been renamed “Median Query Duration” and a number of changes have been made to the data that feeds it.

Another graph, yay!

The first change was to remove the durations of failed queries. Queries can fail for a number of reasons. These failed queries have a strong tendency to have long reported durations. This drags up the graph to report higher query durations for the average case. I do not feel this is very helpful to those visiting the status page. My reasoning is that people with incorrectly configured setups do not represent the general users that are visiting the status page. For those users that are encountering errors, a status page that looks very healthy will help them recognize they are having an atypical experience and will then reach out to us or the community to get their issue resolved.

The second change to this display was to move the extractions out from the graph. Before, users running large extractions were shifting this curve upward, when in fact other queries were not impacted at all.

The final change is the most minor, and that was to switch from using a mean-based average to a median-based average. This change keeps extremely fast as well as extremely slow queries from influencing the graph as much. I believe the typical user’s situation as represented by a median is more relevant to viewers of our status page.

I hope this explanation is useful and, more importantly, I hope the new status page gives you more helpful information about the state of Keen IO’s systems.

(The following cat played only a minor role in writing this post.)

Cute or evil, you decide.

Devin Ekins

Engineer. Tells lame jokes. Only sometimes wears a cape.

Introducing Open Source Data Explorer

Empowering teams to answer their own questions with data

We’re extremely excited to announce the new Open Source Data Explorer, a point-and-click query interface to analyze and visualize your event data.

We believe everyone should be able to use data to make decisions. The Data Explorer makes the power and flexibility of our analytics API more accessible. Now that the Explorer is open source, developers can embed it anywhere - making it easier to build analytics tools for your teams or add value for your customers by incorporating white-labeled analytics directly within your product.

What can I do with the Data Explorer?

The Data Explorer interacts directly with event data stored in Keen IO. It’s an extremely simple and intuitive query interface built to explore event data. All of the analysis functions are built in. Even better, your teams and your customers do not need to know complex query language like SQL to run their analysis. Here’s how you might use the open source Data Explorer:

  • Empower your teams - Quickly empower teams with a tool to easily answer their own questions with data
  • Improve your product - If you’re a SaaS company using Keen to power customer-facing dashboards, you can give your customers another tool to explore the data that matters to them.
  • Build something new - Build a completely new analytics product - Want to build an analytics panel just for product managers? Or your own version of Google Analytics? Data Explorer lowers the barrier for you to do so.

Key features

The Explorer has all the functionality of an analysis tool built in and ready to go. Your teams and customers can intuitively build queries, create charts, and extract data within seconds.

Query building

  • Choose your collection - signups, downloads, pageviews - whatever collection of data you need
  • Choose your analysis type - count, count unique, sum, min, max, average, select unique, percentile, median
  • Ask deep questions by running a group by on any property
  • Build a filter for your query, using the event type as a base for your filter - choose from string, number, null, list, Boolean, or date/time
  • Try out the geo-filter, which enables you to to filter events by latitude/longitude
  • Pick a date and time range for your query using our calendar selector

Visualize the results

  • Toggle between different visualizations of your data, choosing from chart types including area, line, table, or pie. You can also view your results in a metric or JSON format.
  • Embed the charts anywhere by viewing the source code and pasting it anywhere
  • Save your favorite queries, so you can come back and access them again and again
  • Extract your events - view the raw data by sending a full extraction to your email

Why now?

We built Keen IO to solve the increasingly difficult challenge of event data collection, storage, and analysis at scale. We aim to make it easy to not only analyze data via API, but also to expose data to your teams and customers who need it.

Our first couple years at Keen, we focused primarily on building the analytics API and backend tools. While that remains our top priority, we now have a team of engineers focused on building out our front-end and visualization offerings, and Explorer is one of our open source product releases. We’re excited about growing this team to better serve your needs.

We’re so grateful for all of the feedback we’ve received from our developer community along the way. If you have any feedback or questions, please send us an email or ping us on Slack!

Ready to explore your own data? Create a free Keen IO account, check out the sample demo or fork the project on Github.

Happy Exploring!

Justin Johnson

community guy, hacker, music nut. i like to help people build stuff.

How to Give and Receive Effective Feedback

The most valuable lesson I learned in grad school was not a theory, problem-solving technique, or research method. It was learning how others perceived me.

For most of my life I’d had a fear of taking up too much space. In the classroom or workplace I was careful not to speak up too often, lest I be perceived as attention-seeking, egocentric, or dominating. My reticence to speak was built on a series of assumptions I thought of as simple truths.

That changed during a class called Group Dynamics, known in some circles as “Touchy-Feely,” in which much of the curriculum involved giving one another feedback and sharing our perceptions. I was shocked to discover that my peers did not find me space-consuming as I feared, but rather too quiet, wishing that I would speak up more often to share my thoughts. They encouraged me to take up even more space in the room. It was life-changing.

Finding my voice in group situations enabled an entirely new career path of consulting, facilitation, and leadership that I was surprised to discover suits me quite well.

It surprised me that rather than going deeper within, under the banner of self-improvement (as I had previously done via introspection, journaling, therapy, meditation, etc.), this catalyzing learning was only possible by perceiving myself through others’ eyes, rather that my own. And it was far more actionable and life-enhancing.

What impact do our words and behaviors actually have on others?

We can guess, but we don’t really know until we ask. The answers are often different than we expect. I thought I was being accommodating, conscientious, and polite, but others saw me as withholding, aloof, and withdrawn. That is quite a delta.

Yet all I had to do to close the gap between intention and impact was to ask. My peers held a wealth of information about me, which, if I asked in the right way, I could unlock.

These deltas of intention and impact happen all the time from interpersonal one-on-one relationships to large-scale brand perceptions. A company may believe it is presenting its product as mature, sleek, and clean, while its consumers actually find it dull and unengaging. This discrepancy is why focus groups and branding firms exist in our marketplace.

Effective customer research, crowd-sourcing, and supply-chain optimization projects decrease the delta between product intention and consumer reception. But who provides this service for the individual?

Obtaining individual feedback is as simple as asking for it, but this does not make it easy.

The word feedback often has a cringe-worthy association as an opportunity for someone to deliver unsolicited criticism or an annual event in which someone rates you against a contrived scoring system to determine your compensation, career trajectory, and ranked placement against your peers.

But real feedback, information carried from an output back to the input, provides a wealth of insight and opportunity to learn.

In my case, I had been told previously that I was intimidating. But no one ever explained to me why that was. I was left to interpret for myself. I made up a story about being “too much.” Not until graduate school did someone explain to me that it was my quietness that was intimidating. My silence made people feel judged. I was intending to be polite and the impact was intimidation. That is a big difference.

Receiving feedback on your actual impact allows you to narrow the delta between intention and impact and increase your effectiveness.

Good feedback has this goal at its heart: success for the recipient.

Good feedback is also actionable. It is not directed at one’s character (e.g. “you are boring, careless, or intimidating.”) Useful feedback identifies the specific, observable behaviors that lead to the character labels. Telling someone they are boring does very little for them except make them defensive and hurt their feelings; they don’t know what about them is boring (the stories they tell? the way they dress? the tone of their voice?).

However, when someone describes a specific behavior (e.g. “when you speak in monologue without pausing, I find myself losing interest”), the recipient has the data needed to change the impact. Graceful feedback empowers the recipient.

Isn’t the opportunity to be more effective a lovely gift we can give each other?

Here at Keen, our coaching team is striving to change perceptions about direct feedback from being a scary, confrontational event to a learning opportunity and expression of caring.

In addition to individual coaching sessions, we provide an Effective Communication Learning Lab to all employees. We teach the principles of non-violent communication and active listening while giving participants opportunities to practice requesting, delivering, and receiving feedback in a conscientious, honest, and caring way.

Narrowing the delta of intention and impact can be broken into three steps:

  1. Acknowledge that the way you intend to be perceived is not necessarily what is happening.

  2. Find a group of people you trust to deliver honest and caring feedback.

  3. Muster the courage to ask and listen.

By seeking and sharing meaningful feedback, you can overcome misconceptions about yourself and others, discover hidden strengths and talents, and build trust with the people close to you. Most importantly, you can close the gap between intention and impact and be perceived the way you truly want to be seen.

Lisa Nielsen

People developer, behavioral science enthusiast and baking diva.

How to do a join on event data

Joins are a powerful feature in traditional static databases that combine data stored in two or more entity tables using SQL. You might use a join to answer a question like “Which customer spent the most money last month?”

A lot of our customers have asked us “Can I do a join with event data?”

The answer is: While you can’t do a traditional join on event data, you can accomplish exactly the same outcome by running a group_by on an event property. It’s pretty cool and very easy!

Here’s how:

First, imagine all the information you might want to join together if you were using a traditional entity database. With event data, all of that information is already there, right inside the event, every single time!

To understand this, let’s take a look at how event data is stored. An event is triggered by a customer or user’s actions, and this event contains data regarding what the action was, when it occurred, and the most up-to-date information about the state of that user at that time.

For example, if you work at an e-commerce company, you will probably want to track purchases. Every time you track a purchase, you can include information about that purchase. Here’s an example of some of the information you might want to track on every purchase and how you would model this event with event data:

purchases = {
   "user": {
       "first_name": "Arya",
       "last_name": "Stark",
       "email": "as@keen.io",
       "id": 22
   }
   "order": {
       "id": "XD-01-25"
   },
   "product": {
       "list_price": 19.99,
       "description": "This is the best Dog Shirt",
       "name": "Dog Shirt",
       "id": 10
   },
   "keen": { // these keen properties are automatically added to each event
       "timestamp": "2015-06-16T23:24:05.558Z", // when the event occurred
       "created_at": "2015-06-16T23:24:05.558Z", // when the event is written
       "id": "5580b0153bc6964d87a3a657" // unique event id
   }
}

As you can see, every time a purchase is made we are tracking things like:

  • User information
  • Order information
  • Product information
  • Time

This format allows for quick and efficient aggregation querying: that is, the ability to easily derive sums, counts, averages, and other calculations. With this format, we will be able to ask questions like:

  • Which products were purchased most often?
  • Which users have spent the most money?
  • What is the average order value?

We can do this all in one simple query! As an example, let’s ask the question, “What was the most popular product?” Here’s what the query would look like:

What is the most popular product purchased?

new Keen.Query("count", {
    eventCollection: "purchases",
    groupBy: "product.name",
    timeframe: "last_week",
   }); 

Result: The Mallard and The Horse Shirt are the most popular.

Now, let’s say we want to know which customer made the most purchases last week.

Which user made the most purchases?

new Keen.Query("count", {
     eventCollection: "purchases",
     groupBy: "user.first_name",
     timeframe: "last_week", 
   }); 

Query Result: Sansa & Stannis tie!

Finally, let’s find out what our total gross revenue is across all users.

What is my total gross revenue?

new Keen.Query("sum", {
     eventCollection: "purchases",
     targetProperty : "product.price",
     timeframe: "last_week", 
   }); 

Query Result: $439 (not bad for animal-themed t-shirts!)

With entity data, you could use SQL to answer these questions by running joins on multiple tables. To answer the question “What was the most popular product?” you would need to have a users table, a products table, and a purchases table. You would get the same result, but the path to get there would be longer.

In Keen, when an event is triggered you’ll include everything you know about the user at that point in time. This serves as a snapshot of the user as you know him/her. That information can include data about who (their name, username, account number, userid, age at that time), what device they were using, what was purchased, and any other properties you have available. When you’re ready to query, this snapshot becomes incredibly powerful.

If you want to learn more about the difference between Entity Data and Event Data, check out this guide on How to Think About Event Data.

The most important point with event data is to think carefully about the kind of questions you’d like answered when you set up your data tracking. That way you’ll be sure to have the information available when it comes time to query.

To learn more about what to track, and when, check out our Data Modeling Guide.

So which is better: entity data or event data?

Both have their strengths. In general, entity data is best for storing static information about nouns (users, inventory, assets, etc.) while event data is ideal for tracking data related to verbs (signup, purchase, click, upgrade, etc.)

Very often, the questions that are most important to your business revolve around these user actions, and event data allows you to run analytics on them without having to do joins at all.

Learn more about modeling event data

Check out our data modeling guide and sign up for a free account to start playing around with your own event data. Questions? Reach out to us or post a question on Slack!

Maggie Jan

Data Scientist, Engineer, Teacher & Learner

New status page metrics

Devin, one of our platform engineers, recently made a change to our Keen IO Status page. He sent out a great email to the rest of the Keen IO team with a detailed explanation of how and why. Since this was a new user-facing metric, I wanted to share this with the users since it will help users debug, check on our platform’s status, and get a clearer picture on the inner workings of Keen IO. Thanks for taking the time to write this email, Devin! -Taylor

TL;DR We have a new user-facing metric for transparency and to act as an aid in debugging for our engineering teams.

Out with the old

I have updated our status page with a new metric and removed an old one. Previously, we had a metric displayed that showed the “Write Event Delta”, or the number of events that our users had supplied to us for writing that were still waiting to be written to Cassandra. This metric wasn’t particularly meaningful to our users – it is hard to know what 3,000 events waiting meant versus 12,000 events waiting.

In with the new

The new metric is the “Event Write Delay”. This indicator shows how long events are waiting to be written to our data store, Cassandra, in milliseconds.

Event Write Delay graph

On a normal day, Keen events are available to be queried approximately 6 seconds after sending them. We wanted to provide further transparency into the length of time our users will have to wait between writing and reading at any given time, so we added the Event Write Delay metric to our status page.

This metric matters because until an event has been written in Cassandra, it will not show up in any queries. We are displaying the 95th percentile of these delays which is a conservative estimate of how long a customer should expect their events to wait before being available for queries.

The 95th percentile typically hovers around 8.5 seconds over one day’s window, while the 50th percentile hovers around 6 seconds as mentioned earlier. The graph may change when we make a configuration change or experience a relevant incident that could push these delays upward, we don’t expect this to happen very often as we work hard to make sure the event write delay stays consistent!

Who does this impact?

First, our users have better access to company transparency, which is a win. Secondly, our support team can point to this graph to help answer questions about why events are not immediately showing up in queries.

Additionally, this can serve as a debugging aid for the Platform and Middleware teams.

How is this measured?

As events are passed to us, they pass through a “bolt” (a piece of code), which writes batches of events to Cassandra. This bolt is the location where I added some code that will sample roughly every 2000th event that we write. We compare the current time to the keen.created_at property and take the difference. This tells us how long the event waited before it was written to Cassandra. Sampling only 0.05% of our events written still gives us about 3 events every second which I feel is sufficient to produce this metric without incurring any performance costs.

Special Thanks (because regular thanks wouldn’t suffice)

Shout out to Cory for helping with the visualization aspect in the status page and Datadog. Double shout out to Kevin for helping me understand enough of our back-end to make this happen as well as reviewing the code.

We also recently enabled Webhook Notifications on our status page, which you can subscribe to as seen below at status.keen.io. This can be super helpful if you are wanting to be notified via a webhook about an incident on our platform. Our goal is to give users as many tools as possible for their toolkit when using Keen IO. -Taylor
Drawing

Devin Ekins

Engineer. Tells lame jokes. Only sometimes wears a cape.

How we improved our sales workflow with Slack

We use Slack a lot at Keen IO. We’re constantly using and building Slack integrations to improve our workflow. We’re kind of obsessed. We realized we needed a way to aid our sales and customer success workflows on Slack, so we built a tool that lets people type a command that looks like this:

/company slack.com

and pulls up a response like this:

The company info is retrieved from Clearbit’s API. This has been incredibly useful for our Sales and Customer Success teams when they need to look up information about a new signup or an existing customer.

We’ve open sourced all of the code on Github. If you want to use this integration for your own company just follow these steps:

What you’ll need:

Step 1: Grab your Clearbit API key
Step 2: Create a Slack Incoming Webhook (you can reuse an existing one)
Step 3: Copy the webhook URL - you’ll need that later
Step 4: Create a Slack slash command. Preferably /company for the command. The URL should point to your Pushpop instance, on the /slack/company path. Copy the Token - you’ll need that later
Step 5: Create a new job in your Pushpop instance, using the company info source.
Step 6: Add all of the environment variables

  • CLEARBIT_KEY is the Clearbit API key from Step 1<
  • SLACK_WEBHOOK_URL is the webhook URL from Step 2
  • SLACK_TOKEN_COMPANY is the slash command token from Step 3

Step 7: Restart Pushpop (make sure you’re running pushpop as a webserver)
Step 8: Type /company keen.io into slack!

Person Info

We can also look up information on individual people.

This creates a slash command that will retrieve info about a person (via email address) from Clearbit, and send it back in to Slack.

The person info will look like this in Slack:

How to set up the /person command:

Step 1: Grab your Clearbit API key
Step 2: Create a Slack Incoming Webhook (you can reuse an existing one)
Step 3: Copy the webhook URL - you’ll need that later
Step 4: Create a Slack slash command Preferably /person for the command
Step 5: The URL should point to your Pushpop instance, on the /slack/person path. Copy the Token - you’ll need that later
Step 6: Create a new job in your Pushpop instance, using the person info source.
Step 7: Add all of the environment variables

  • CLEARBIT_KEY is the Clearbit API key from Step 1
  • SLACK_WEBHOOK_URL is the webhook URL from Step 2
  • SLACK_TOKEN_COMPANY is the slash command token from Step 3

Step 8: Restart Pushpop (make sure you’re running pushpop as a webserver)
Step 9: Type /person jack@squareup.com into slack!

That’s it! Check it out on github to learn more. If you have any questions or ideas of your own drop by our community Slack channel.

Joe

Joe Wegner

Open source something

So You’ve Decided to Build Analytics In-House

So you’ve decided to take the plunge and build an in-house analytics system for your company. Maybe you’ve outgrown Google Analytics and Mixpanel, or maybe you’re an early-stage business with unique analytics needs that can’t be solved by existing software. Whatever your reasons, you’ve probably started to write up some requirements, fired up an IDE, and are ready to start cranking out some code.

At Keen we began this process several years ago and we’ve been iterating on it ever since, having successes and stumbles along the way. We wanted to share some of the lessons we learned to help you through the build process.

Today we’ll give an overview of key areas to consider when building an in-house analytics system. We’ll follow up with detailed posts on these areas in the weeks to come.

Input

Before you build your in-house analytics system, you need to consider what inputs will be coming into it, both expected and unexpected. Assuming you already know what kinds of data you want to track and what your data-model will look like, here are a few things to think about:

  • Scalability

  • Traffic variability

  • DDOS

  • Rate limiting and traffic management

  • Good old-fashioned input validation

Each of these concerns needs to be addressed properly to make sure that your users get a solid experience. Most of them go quite a bit beyond checking inputs to a function.

We’ve all heard about defensive programming, validating inputs, and script injection. When you build a public-facing analytics system there are a variety of different types of malicious inputs, not all of which manifest themselves as readily as others. Defending against a DDOS event requires architectural decisions around what is an acceptable load profile. Managing rate limiting is heavily informed by what sort of a business or service you want to run, and is also impacted by the level of service you want to give certain users.

Some questions to ask: Are all users equal? Do certain users somehow need to be treated differently from others? Considering these questions in advance will help you build the right system for your users’ needs.

Storage

Today, almost all web applications require developers to select at least one storage solution, and this is an especially important consideration for an in-house analytics system. Some key questions to consider are:

  • What sort of scale are you looking to support?

  • What is the relationship between reads/writes?

  • Are you trying to build a forever solution or something for right now?

  • How well do you know the technology?

  • How supportive is the community?

The better set up you are to answer these questions, the more successful your solution will be.

At Keen we use Cassandra as our primary data store and have a few other storage solutions for rate limiting, application data, etc… We chose Cassandra as our primary store because of its performance and availability characteristics. Another decision point was how well it scales with writes when the data volume gets very large. We will discuss this in more depth in a future post.

Tech Selection

There are more technologies available to developers today than ever before. How do you know which ones will work best for your analytics needs? What OS do you use? What caching technologies?

At Keen we have gone through this process numerous times as we built and scaled our analytics platform. One recent example was selecting the language for two of the systems in our middleware layer: caching and query routing. These are fairly well-studied problems that don’t require bleeding-edge technologies to solve well.

Here are the criteria we used to make our selection:

  • We needed a mature toolchain that would allow us to predictably troubleshoot and deploy our software

  • We needed a language that was statically typed and concise

  • We did not need everyone to have prior knowledge of the language (since we didn’t have an existing codebase to build on top of)

With these factors in mind, we ended up eyeing a Java Virtual Machine (JVM). The toolset is mature, performance is adequate, it is very predictable and has a large set of frameworks to solve common problems. However, we didn’t want to develop in Java as it tends to be overly verbose for our needs.

In the end we decided to use Scala. It runs on the JVM so we get all of the benefits of the mature toolchain, but we are able to avoid the extra verbosity of the Java language itself. We were able to build a few services with Scala with quick results and have been very happy with both the language and the tooling around it.

Querying + Visualization

Once you’ve figured out where your data will live, you will need to decide how to give your teams access to it. What will reporting look like? Will you build a query interface teams can use to run their own analysis? Will you create custom dashboards for individual teams: product, marketing, and sales?

At Keen, we built query capabilities into an API, powered by Storm. The query capabilities allow users to run counts, sums, max, mins, and funnels on top of the data stored in Cassandra. We also built a JavaScript library so users can visualize their queries in charts and dashboards. We wanted to make it super simple to run queries, so we created a Data Explorer - a point-and-click query interface built using React and Flux. It hooks in with our JavaScript visualization library to generate a wide variety of charts and graphs.

Troubleshooting

Ok, so now your service is up and running, you are providing value to your teams, and business is up and to the right. Unfortunately you have a team member who isn’t particularly happy with query performance. “Why are my queries slow?” they ask.

You now have to dig in to understand why it is taking so long to serve a query. This feels odd because you specifically chose technologies that scale well and performance a month ago was blazingly fast.

Where do you start? In most analytics solutions there are a number of systems involved with serving the request. There is usually an inbound write queue, some query dispatching mechanism, an HTTP API layer, various tiers for request processing, storage layers, etc… It is critical to be able to trace a request end to end as well as monitor the aggregate performance of each component of the system and understand total response times.

At Keen we have invested in all of these areas to ensure we have real-time visibility into performance of the service. Here’s an overview of our process:

  • Monitor each physical server and each component

  • Monitor end to end performance

  • Build internal systems that trace requests throughout our stack

  • Build auto-detection for performance issues that notify a human Keen engineer to investigate further

This investigation process leverages our JVM tools, along with various custom tools and testing environments that help us quickly pinpoint and fix the problem when the system is underperforming.

Murphy’s Law

Yep. This is actually a thing: “If something can go wrong, it will.” Inevitably pieces of your analytics solution will have issues, if not the whole system itself. I touched on this in the troubleshooting section, but there are much larger issues you will need to think through, such as:

  • How are you laying out your servers in the network?

  • How do you deal with data corruption or data loss?

  • What is your backup and recovery timeline and strategy?

  • What happens when a critical team member moves on to another role or company?

Imagine these scenarios. Maybe you were using FoundationDB, only to have it scooped up by Apple, and now you are trying to figure out how this impacts you. Maybe someone was expanding storage and took down all your load balancers because your machines weren’t labeled correctly. Maybe your sticks of memory went bad. Maybe Level3 just went down and took your whole service offline.

These represent just a few of issues you will likely run into as you run your own service. How well you can deal with them will help define how well you can serve your customers.

Stay tuned for more details

Over the next few months we will release in-depth posts covering each of the areas above to help you build a successful in-house analytics system. We look forward to sharing our thoughts and lessons we learned building out our service.

Want an alternative to build-it-yourself analytics?

We went through all the work of building an analytics infrastructure so you don’t have to. Our API’s for collecting, querying, and visualizing data let you get in-house analytics up and running fast, to you give your team and your customers the data they need.

Sign up for free to see how it works, or reach out to us with any questions.

Brad Henrickson

Builder of things.

DataViz Show and Tell

Thank you to everyone who listened, shared, and asked questions at our first Data Visualization Show and Tell. We learned a lot and had tons of fun. We hope you did too.

A big thank you to our speakers:

Keen.Dataset, Dustin Larimer


Drafting and Experimenting with Data to create a Final Visualization, Will Johnson


Your Data Doesn’t Mean What You Think it Does, Zan Armstrong


Github Language Visualizations, Rebecca Brugman
In-product viral growth patterns seen at DocuSign, Chip Christensen

Discovering hidden relationships through interactive network visualizations, Kaustuv De Biswas, Mappr

To stay up to date on data visualization projects and events, subscribe to our monthly dataviz digest :) If you have something you’d like to see featured in our next digest, shoot us an email!

Till next time!

Ricky Vidrio

empath adventurer

Announcing New Docs for Keen IO

We’re excited to announce the release of our new Keen API Documentation. We’ve updated both the content and design of our documentation to make it even easier for you to use the Keen API for collecting, querying, and visualizing your data.

Our new documentation includes:

API Reference: Look up all API resources here in our three-pane API Reference, complete with code samples in cURL, Javascript, Ruby, Python, PHP, Java, and .NET.    

Data Collection, Data Analysis, and Data Visualization: These newly designed overview pages give a snapshot of each of these areas, with quick links to take you to the right resources.    

Guides: Find how-to guides, recipes, and deep-dives into areas such as building funnels, conversion analysis, and user behavior metrics. We’ll be adding lots more guides here to help you make the most of using Keen and to get the maximum value from your data. Stay tuned!    

Quick-Start Guide: If you’re not already a user of Keen, you can get started here. You can also select an SDK from the options outlined on our SDK page, sorted by collection, analysis, and visualization.    

Integrations: Our many integrations with partners such as Stripe, SendGrid, and Runscope are featured here, with step-by-step instructions.

We can’t wait for you to get started using our new documentation and we’d love to get your feedback! Please send your comments to team@keen.io or chat with us on Slack!

Nahid Samsami

Product at Keen. Cat Advocate. Also known as Hidi.

Introducing Analytics Tracking for Arduino

We’ve heard the clamoring and finally we’re proud to announce that we have an Arduino library to send events to Keen IO! If you want to check out the code, its all open sourced here.

To get the creative ideas flowing, I started a sample project using this library to create a dashboard that tracks motion detection from a PIR sensor. The full code for the dashboard and Arduino snippet live here.

Activity Tracker

What are we building?

Have you ever wondered how active you are throughout the day, or if your cats are running around all over your desk at night? I have! What we will build here is a motion sensor hooked up to an Arduino Yún board that sends event data to Keen so we can display it on a nice dashboard.

Components Used

Setting up the Arduino Example

So, Keen IO requires SSL to work, and currently, the Yún is the only Arduino board that supports it. And, to make things even more fun, you have to do a little work with security certs to make it work. There’s a nice write-up on how to do that here.

Once the Yún is configured with the new certificate, it’s time to run the example code to make sure you can send events to Keen IO. One small caveat to the built-in example, since I am programming the board over wifi, I had to use Console instead of Serial to see debug output.

#include {Bridge.h}
#include {ApiClient.h}
#include {KeenClient.h}
#include {Console.h}

KeenClient keen;

void setup() {
  pinMode(13, OUTPUT);
  digitalWrite(13, LOW);
  Console.begin();
  digitalWrite(13, HIGH);

  Serial.begin();

  while (!Console);
  Console.println("Started!");
}

void loop() {
  keen.setApiVersion(F("3.0"));
  keen.setProjectId(F("YOUR_PROJECT_ID"));
  keen.setWriteKey(F("YOUR_WRITE_KEY"));

  keen.addEvent("motion_detections", "{\"cat\": 1}");
  keen.printRequest();

  while (keen.available()) {
    char c = keen.read();
    Console.print(c);
  }

  Console.println();
  Console.flush();

  delay(2000);
}

This code will boot up on the Yún, and then send an event to the motion_detections collection associated to your Keen account. If you’re programming it through the USB cable, the Serial object will be what you want to see debug output.

Tracking Motion

Before we write more code, we have to hook up the PIR sensor to the Arduino.

Now, what we really want is to track motion, but there are a few things we have to figure out to do that. First, we have to be able to parse the date and time, which isn’t very straightforward. I found this helpful example, which I then modified to parse out the pieces of a date and time that I would need for my data model.

//if there's a result from the date process, parse it:
while (date.available() > 0) {
  // get the result of the date process (should be day:month:year:day of week:hour:minute:second):
  String timeString = date.readString();

  // find the colons:
  int dayColon = timeString.indexOf(":");
  int monthColon = timeString.indexOf(":", dayColon + 1);
  int yearColon = timeString.indexOf(":", monthColon + 1);
  int dayOfWeekColon = timeString.indexOf(":", yearColon + 1);
  int hourColon = timeString.indexOf(":", dayOfWeekColon + 1);
  int minuteColon = timeString.indexOf(":", hourColon + 1);
  int secondColon = timeString.indexOf(":", minuteColon + 1);
  int nanoColon = timeString.lastIndexOf(":");

  // get the substrings for hour, minute second:
  String dayString = timeString.substring(0, dayColon); 
  String monthString = timeString.substring(dayColon+1, monthColon);
  String dayOfWeekString = timeString.substring(yearColon+1, dayOfWeekColon);
  String hourString = timeString.substring(dayOfWeekColon+1, hourColon);
  String minuteString = timeString.substring(hourColon+1, minuteColon);
  String secondString = timeString.substring(minuteColon+1, nanoColon);
  String nanoString = timeString.substring(nanoColon+1);

  // convert to ints, saving the previous second:
  // int year, month, month_day, day_of_week, hours, minutes, seconds;
  month_day = dayString.toInt();
  month = monthString.toInt();
  day_of_week = dayOfWeekString.toInt();
  hours = hourString.toInt();
  minutes = minuteString.toInt();
  lastSecond = seconds;
  seconds = secondString.toInt();
  nano = nanoString.toInt();

  // Need to make sure we don't send an erroneous first motion event.
  if (lastHour == -1) {
    lastHour = hours;
  }
}

There’s a lot of nasty boilerplate code in that snippet, but this lets us track the different numbers we need to look at things like active seconds per day, hour, month, etc.

Next, we want to add some logic to the main loop to detect when the PIR sensor picks up motion:

void loop() { 
  if (pirVal == HIGH) { 
    if (pirState == LOW) {
      digitalWrite(13, HIGH); // LED ON to show we see motion.
      Console.println("Motion detected!");
      pirState = HIGH;
      lastActivity = nano;

      keen.addEvent("motion_detections", "{\"motion_state\": \"start\"}");
      keen.printRequest();

      while (keen.available()) {
        char c = keen.read();
        Console.print(String(c));
      }

      Console.println();
    }
  } else {
    if (pirState == HIGH) {
      Console.println("Motion stopped!");
      pirState = LOW;
      digitalWrite(13, LOW);
      keen.addEvent("motion_detections", "{\"motion_state\": \"stop\"}");

      while (keen.available()) {
        char c = keen.read();
        Console.print(String(c));
      }

      Console.println();
    }
  }

  Console.flush();

  // poll every second
  delay(1000);
}

Setting up the Dashboard

I wanted to set up a quick dashboard to track motion, so I took our hero-thirds dashboard starter, and loaded that into an emberjs project (I wanted to learn ember as well). You can see a live demo here.

I played around in the Data Explorer until I found the visualizations I wanted, then added them to the page. The final version of the Arduino code is also available to view.

So with a few simple lines of code and a quick dashboard, you can start tracking some interesting data with your Arduinos on Keen IO!

Have ideas for a project or want to hack around on your Arduinos? Come chat with me in our Slack channel!

Alex Kleissner

Software engineer by day, Partybot by night