What We Learned When We Surveyed Our Developer Community

Survey Objectives

In December we sent out a 40 question survey to learn more about the developers we build products for. We’re pretty proud to have over 50,000 people on our platform, but there are an estimated 18M developers out there, so there are plenty more to reach. We believed data on our existing base could help us focus our efforts this year.

Diversity and inclusion are also things we think a lot about, and we wanted to get a benchmark for some of our community demographics.

Methodology

Developer Community expert Sarah-Jane Morris was the mastermind behind our survey. She canvassed the company to learn what we wanted to know, put together a great set of questions that went through several review cycles, then shipped it out using Typeform. We distributed the survey request through our usual community channels: an email to our base of 50k signups, tweets to our 30k Twitter friends, and multiple messages to our public Community Slack. Without offering any incentive, we got over 400 responses, which we then filtered to exclude employees and trolls.

It’s probably worth noting the inherent bias in the survey: It’s filled out by people who had the time, energy, and inclination to fill out a Keen IO Developer Community survey (thank you!).

Results

Below you’ll find highlights of our findings. Some matched our instincts, but others surprised us! We share this to be helpful to other developer communities, and hope that some of this might apply to other developer-focused products as well. 😄

We’ve put the results in 5 categories:

  • Demographics
  • Tools and Frameworks
  • Open Source Contributions
  • How Welcome Do You Feel in Our Community?
  • Community Events

Demographics

What is your gender identity?What is your race/ethnicity?

We weren’t shocked to learn that the largest demographic in our developer community are white males. One of the main motivations for running the survey was a desire to make our community more diverse and inclusive. We needed a way to benchmark where we are right now. Project Include has lots of great recommendations to help tracking these metrics.

We have noticed our local community at Keen events in San Francisco is more diverse than the larger worldwide Keen community. We will be working on ways to include more people online as well.

What is your age?

The age demographics of our community were a tad surprising. After all, aren’t there far more developers in their 20s than there are developers in their 30s? One idea that stands out is that community begets like community. The founding team and most of the early employees at Keen IO are in their thirties, and our community seems to have naturally spread predominantly to developers with similar demographics.

Is your first language English?

Wow! We found this statistic very interesting. It made us wonder, “How could we communicate better?”, especially as an API company that relies heavily on documentation. We don’t only mean communicate in someone’s first language, but also, “How can we communicate better in English?” Sometimes phrases in another language can be nuanced and cause confusion.

Tools and Frameworks

What framework(s) do you most frequently use?

As you can see, Node.js beat all the other frameworks by far with AngularJS coming in second. Depending on the type of developer community, this can be really helpful for a product team to know. Is it easy to start using our API with Node.js and other popular frameworks? Is the “time to first hello world” under 5 minutes? Do we have documentation to help support users using popular frameworks?

Do you use any of the following developer tools in production?

We confirmed our suspicion that Stripe, Twilio, and SendGrid have wide adoption in our developer community. This helps us reinforce our investments in integrations, tutorials, and collaborations with these companies.

Surprisingly, lots of respondents said they were using no developer tools in production. Perhaps this question was confusing? Do most developers considered APIs to be “developer tools?” This is something to think about for future surveys.

How can we improve our documentation?

We don’t blame you if your first thought when looking at this image is, “A word cloud in Comic Sans, really?!” Even though word clouds are silly, this one was interesting and is useful to anyone at an API or developer focused company. Developers want Documentation to have more examples!

We have taken this feedback to heart and have been adding much more examples to our docs over the last couple of months, like these examples visualization examples with JSFiddles, datetime enrichment examples, and video tracking code examples.

Open Source

52.64% contribute to an open source project

It’s really awesome that over half of the Keen IO community contributes to an open source project. We made sure to ask this question to not only developers, but everyone who took the survey. It’s important to remember that anyone can be an open source contributor, not just developers. A copy edit, documentation, a logo, bug reports, community management, project management, mockups, and marketing material are all forms of contribution. It’s also important to note that open source contribution is not for everyone. It takes time and other privileges to be able to do so. I gave a talk about it at OSCON London in October.

image

In most open source community research, the number of female contributors ranges from 1.5 to 10%. We were shocked to see out of those who identify as female, 31.15% contribute to an open source project. It would be interesting to see what this is for other communities.

Some suggestions to increase this percentage in other communities is to donate to organizations like Outreachy and Rails Girls Summer of Code who are working to increase participation from underrepresented groups in open source.

It will be interesting to see the results of The Open Source Survey, which is being designed by GitHub in partnership with the Open Source Initiative and researchers at Carnegie Mellon University. Once it is completed, it can help give us more insight into open source communities.

How Welcome Do You Feel in Our Community?

18.89% aware of the Keen IO Community Code of Conduct

We found this number very low. About 8 months ago, we announced the Keen IO Community Code of Conduct. After its release, we promoted it heavily to bring awareness to it. We still do, but as we learned from the survey we could do more.

Currently, we:

  • Have it as a requirement for any new Keen IO open source project
  • Mention it in event invitations for office events and during events
  • Have it as a handy Slack command in our Keen Community Slack

Some ideas to promote it more could be:

  • Including it on posters around the office when we hold events
  • Adding it to a welcome message to anyone joining Community Slack
  • Making sure 100% of our open source project include it alongside the Contributors guide!

Do you feel welcome in the Keen IO community?

We found that most people feel welcome in the Keen IO Community. One way we have tried to achieve this is our Community Code of Conduct. We also have a wide range of events in the office on everything from mental health in tech to communication labs. By having events like this we hope to create a community which has the same values of Introspection, Continuous Learning, Personal Agency, Honesty, and Empathy as we do internally. We have found that this welcomes in a larger group of people into our community than the standard developer events.

In other places like Keen IO Community Slack, we have other community members helping each other as well as Keenies helping users. We aren’t available at all hours, but when we are we try to be as welcoming as possible. There are other small ways to be welcoming like sending out stickers, shirts, or cute little animals.

image

Right now, when you sign up for Keen IO, you get an email that invites you to pair with a developer at Keen IO. Our “Invitation to Pair” email also encourages users to ask questions and join other places like Community Slack for more help. You have to find what works for you to create a welcoming community.

Respondents also suggested that we could strengthen the community by communicating more about what other developers are doing with Keen.

Community Events

Have you ever been to an event at Keen IO or hosted by Keen IO?

We found that only about 16% of people have been to a Keen event. Also, 16.63% of respondents live in the San Francisco/Bay Area, so we might be doing a great job of bringing in locals for the events in the Keen IO office. But what about the other 83%? In the past, Keen has done events all over the world. If you have ever done national or international events that involve travel, you know how time consuming and costly they can be. Livestreaming events can possibly bring in a larger audience to be a part of your events when attendees are limited by location.

What kinds of events would you like to see Keen host more of?

We thought the list of kinds of events respondents would like to see more of would also be helpful to others too. Some of the ideas are outside the normal realm of events. For example: Events working more with locals.. This could manifest itself in a few different ways. You could do technical events with organizations like Hack the Hood or you could choose to do an event where attendees volunteer their time to a local non-profit organization.

Do you attend developer conferences?Do you attend developer meetups?

We found about a quarter of respondents say they never go to developer conferences or meetups. If we were only focused on conferences and meetups, we would never get to interact with these developers.

Thanks to everyone who generosly took time out of their day to complete our survey. You can check out the full length of our survey results by downloading them here.

Taylor Barnett

developer, community builder, and huge fan of tacos

Design Dashboards To Help Customers Love Your Product

You got users. Your conversion rates are looking pretty good. However, your users aren’t really using your product. You have a retention issue.

There’s different ways of solving retention problems and I want to look at how you can use dashboards and data to help your users understand and love your product. 

When companies add native analytics or dashboards to their products, they usually start by building their “MVP dashboard”. This is usually a simple dashboard with a handful of charts.

Instead of staying here, you can work to improve your dashboards into something that helps users become better engaged and better customers. 

Designing analytics dashboards is a tricky task because you need to decide what data to show your users and what data they should ignore. You can’t just give them hundreds of charts and expect them to sort through the noise.

In this guest blog article, I want to walk you through 3 principles that you should keep in mind when designing analytics dashboards. You can then use these principles to analyze and deconstruct how other companies decided on their own dashboard designs.

Use Data to Close the Gap Between Metrics and Action

Let’s first understand why we want to create dashboards for our users.

Despite the ever increasing abundance of data, users (and companies) still struggle to take action. This is perhaps best explained in the Paradox of Choice TED Talk which states that as you give people more choices, their ability to choose (or take action) decreases.

Barry Schwartz had a great quote on “learning to choose”:

“Learning to choose is hard. Learning to choose well is harder. And learning to choose well in a world of unlimited possibilities is harder still, perhaps too hard.” - Source: The Paradox of Choice: Why More Is Less

This is where analytics dashboards and data comes in. We can help our users “choose” better options by presenting them with relevant data.

If done well, a great analytics dashboard can do the following:

  1. Inform and motivate on progress: You can remind your users how much they have accomplished and how much closer they are to their goal.
  2. Provide a complete and holistic picture: You can provide the user with a complete picture by pulling data from different sources. This tends to apply more to business use cases than consumer ones.
  3. Incentivize continued usage: You can also inspire your users to continuing using your product while still getting value out of it.

Our goal isn’t to have the all the data in the world. We simply need to find the data that will be relevant and useful to our users. Like most things, this is an iterative process where you get better with time.

Universal Design Principles Behind Actionable Dashboards

If you’re just starting to research how to track the right metrics that you will then show your users, then I recommend looking at getting a tracking plan in place. This simple document (usually a spreadsheet) can help you organize all of your events and properties in one place.

In this article, we will assume you’ve already completed the process of finding the right metrics for your users. Instead, let’s focus on the design and structure of our dashboards. For that, we can look at 3 principles that can help you decide what data to show and what to ignore.

Principle #1: Make It Visual

Instead of simply showing the number of new signups, Mailchimp shows us the overall change in “audience”. This metric takes into account uses who unsubscribed or who provided fake emails.

Numbers are great but visuals are even better. Visuals like charts allow us to communicate more information and make it easier to digest.

Let’s look at Mailchimp, an email marketing software. One of the core numbers that Mailchimp users want is the number of new newsletter signups.

Instead of simply showing the number of new signups, Mailchimp shows us the overall change in “audience”. This metric takes into account uses who unsubscribed or who provided fake emails.

Audience Change Email Marketing Chart

You have different options for visualizing your data including common chart types like bar, pie, and trend lines. You can also look at providing segments within a given chart e.g. newsletter audience change by list.

We can see the number in the top left and we can see a bar chart that shows us the changes broken down by day, week or month. The bar chart is showing us the trends within the selected data period.

Principle #2: Provide Context By Comparing Numbers Against Other Periods

Our second principle is about comparing data against other periods. Numbers by themselves don’t mean much. Is 15% good or bad? We simply don’t know until we add context and say “15% compared is good compared to 12% last month”.

Most companies will let you compare “this month vs last month” or “this week vs last week” which is a great start. You could also compare against other date periods which give even more context such as:

  • This month vs a rolling 6 month average
  • This month vs projections (set by user)
  • Greatest or smallest change

Mint.com does a great job at this kind of date comparisons. For example, you can see the overall trends within a 12 month period in the graph below:

12 month trend with dates

The average, greatest change and least spent provide context for your financial numbers. Am I projected to spend higher than my average? Am I now spending higher than the “Most Spent”?

Mint also sends out alerts when your numbers divert from the average as seen in the example below:

Dollars Spent with Comparison to Other Users

The examples and screenshots above show analytics that are embedded directly into your product and in your customer’s inbox as an email– this in-app experience is what Keen IO calls “Native Analytics”. Both of these examples of Native Analytics are great because they show how you can compare numbers against other date periods to add more context.

Principle #3: Overviews and Then Allow Drilldowns

The first screen or page that your user sees is critical. This usually functions as an overview of the most important metrics. You can then click into given chart to dig deeper.

When designing your overview screen/page, keep a few things in mind:

  • What are the 4-5 things that I want my users to know? You might want them to know 20 things but the overview only gives you enough space for a handful of numbers.
  • What kind of drilldowns do I want my users to take? You told them something crucial and now you want them to dig deeper into that number.
  • What actions do I want my users to take? Besides the drilldowns, what do I want users to do after they learn about “X number”?

Remember that analytics data is all about taking action. Always keep thinking of what you want your users to do and what they data they need to take action.

Ahrefs does a good job of providing a great overview screen with the ability to do drilldowns on specific numbers.

Charts and Graphs with Drill Down

Conclusion:

Designing a great dashboard is half science and half art. The art part is all about trying to understand what your user wants to see while the science part is about tracking how your users are engaging and using your dashboard (and product). If you’re thinking about creating a custom embedded dashboard for your customer, Keen IO has created tools for building API-based computations and visualizations that make getting started easy.

Do you have any other useful tips for how to design analytics dashboard? Let me know in the comments or you can message me @ugarteruben.

Ruben Ugarte

Founder of Practico Analytics. He helps venture-backed startups sort through all of their analytics noise to make informed decisions.

Apache Kafka vs Amazon Kinesis to Build a High Performance Distributed System

At Keen IO, we’ve been running Apache Kafka in a pretty big production capacity for years, and are extremely happy with the technology. We also do some things with Amazon Kinesis and are excited to continue to explore it.

Apache Kafka vs Amazon Kinesis

For any given problem, if you’ve narrowed it down to choosing between Kinesis and Kafka for the solution, the choice usually depends more on your company’s size, stage, funding, and culture than it does on your use case (although I believe that for some use cases, the answer is obviously Kafka, as I’ll get to later). If you’re a Distributed Systems engineering practice, have lots of distributed dev ops / cluster management / auto-scale / streaming processing / sysadmin chops, and prefer to interact with Linux vs. interacting with an API, you may choose Kafka regardless of other factors. The inverse is true if you’re more of and web, bot, or app development practice, are fans of any services like Amazon RDS, Amazon EC2, Twilio, and SendGrid more than services like Apache ZooKeeper and Puppet.

In somewhat-artificial tests: Kafka today has more horsepower out of the box on rough numbers. Thus Kafka today can be tuned to outperform Kinesis in terms of raw numbers on practically any given test– but are you really going to do all that tuning? And are those really the factors that matter most to you, or are there other pros and cons to consider? By analogy: a Corvette can beat a Toyota Corolla in a lot of tests, but maybe gas mileage is what matters most to you; or longevity; or interoberability? Or, like lots of business decisions, is it Total Cost of Ownership (TCO) that wins the day?

What follows is a bit of a side-by-side breakdown of the big chunks of the TCO for each technology.

Performance (can it do what I want?)

For the vast, vast, vast majority of the use cases you may be considering them for, you really can’t go wrong with either of these technologies from a performance perspective. There are other great posts (Ingestion Comparison Kafka vs Kinesis) that point to the numbers demonstrating where Kafka really shines in this department.

Advantage: Kafka — but performance is often a pass/fail question, and for nearly all cases, both pass.

Setup (human costs)

I would say Kinesis more than just slightly easier to set up than Kafka. When compared with roll-your-own on Kafka, Kinesis abstracts away a lot of problems (you mentioned cross-region stuff, but also you’d otherwise have to learn and manage Apache ZooKeeper, cluster management/provisioning/failover, configuration management, etc). Especially if you’re a first-time user of Kafka, it’s easy to sink days or weeks into making Kafka into a scale-ready, production environment. Whereas Kinesis will take you a couple of hours max, and as it’s in AWS, it’s production-worthy from the start.

Advantage: Kinesis, by a mile.

Ongoing ops (human costs)

It also might be worth adding that there can be a big difference between the ongoing operational burden of running your own infrastructure (and a 24-hour pager rotation to deal with hiccups, building a run book over time based on your learnings, etc — the standard Site Reliability stuff), vs. just paying for the engineers at AWS to do it for you.

In many Kafka deployments, the human costs related to this part of your stack alone could easily become a high hundreds of thousands of dollars per year.

The comment below is right: That ops work still has to be done by someone if you’re outsourcing it to Amazon, but it’s probably fair to say that Amazon has more expertise running Kinesis than your company will ever have running Kafka, plus the multi-tenancy of Kinesis gives Amazon’s ops team significant economies of scale.

Advantage: Kinesis, by a mile.

Ongoing ops (machine costs)

This one is hard to peg down, as the only way to be _certain _for your use case is to build fully-functional deployments on Kafka and on Kinesis, then load-test them both for costs. This is worthwhile for some investments, but not others. But we can make an educated guess. However, as Kafka exposes low-level interfaces, and you have access to the Linux OS itself, Kafka is much more tunable. This means (if you invest the human time), your costs can gone down over time based on your team’s learning, seeing your workload in production, and optimizing for your particular usage. Whereas with Kinesis, your costs will probably go down over time automatically because that’s how AWS as a business tends to work, but that cost reduction curve won’t be tailored to your workload (mathematically, it’ll work more like an averaging-out of the various ways Amazon’s other customers are using Kinesis — this means the more typical your workload is for them, the more you’ll benefit from AWS’ inevitable price reduction).

Meanwhile — and this is quite like comparing cloud instance costs (e.g. EC2) to dedicated hardware costs — there’s the utilization question: to what degree are you paying for unused machine/instance capacity? On this front, Kinesis has the standard advantage of all multi-tenant services, from Heroku and SendGrid product to commuter trains to HOV Lanes: it is far less likely to be as over-provisioned as a single-tenant alternative would be, which means a given project’s cost curve can much better match the shape of its usage curve. Yes, the vendor makes a profit margin on your usage, but AWS (and all of Amazon, really) is a classic example of Penetration Pricing, never focused on extracting big margins.

Advantage: Probably Kinesis, unless your project is super special snowflake.

Incident Risk

Your risks of production issues will be far lower with Kinesis, as others have answered here.

After your team has built up a few hundred engineer-years of managing your Kafka cluster — or if you can find a way to hire this rare and valuable expertise from the outside — these risks will decline significantly, so long as you’re also investing in really good monitoring, alerting, 24-hour pager rotations, etc. The learning curve will be less steep if your team also manages other heavy distributed systems.

But between go-live and when you have grown or acquired that expertise, can you afford outages and lost data in the meantime? The impact depends on your case and where it fits into your business. The risk is difficult to model mathematically, because if you could a given service outage or data loss incident well enough to model their impact, you’d know enough to avoid the incident entirely.

Advantage: Kinesis

Conclusion

In conclusion, the TCO is probably significantly lower for Kinesis. So is the risk. And in most projects, risk-adjusted TCO should be the final arbiter.

Addendum

So why do my team and I use Kafka, despite the fact that the risk-adjusted TCO may be higher?

The first answer is historical: Kinesis was announced in November 2013, which was well after we had built on Kafka. But we would almost certainly choose Kafka even if we were making the call today.

Two core reasons:

  • Event streaming is extremely core to what we do at our company. In the vast majority of use cases, data engineering is auxiliary to the product, but for us it is product: one of our products is called Keen Streams, and is itself a large-scale streaming event data input + transformation + enrichment + output service. Kafka helps power the backbone of the product, so tunability is key for our case.
  • Nothing is more tunable than running an open source project on your own stack, where you can instrument and tweak any layer of the stack (on top of Kafka, within Kafka, code in the Linux boxes underneath, and configuration of those boxes to conform to a variety of workloads). And because what we sell is somewhere between PaaS and IaaS ourselves, and because performance is a product feature for us as opposed to an auxiliary nice-to-have on an internal tool, we’ve chosen to invest heavily into that tuning and into the talent base to perform that tuning.

  • Apache Kafka is open source and can be deployed anywhere. Given that infrastructure cost is a key input to our gross margins, we enjoy a lot of benefits by being able to deploy into various environments — we’re currently running in multiple data-centers in both IBM and AWS. Meanwhile, data location is a key input to some enterprise customers’ decision-making process, so it’s valuable for us to maintain control over where all of our services, including the event queue itself, are deployed.

At Keen IO, we built a massively scalable event database that allows you to stream, store, compute, and visualize all via our lovingly-crafted APIs. Keen’s platform uses a combination of Tornado, Apache Storm, Apache Kafka, and Apache Cassandra, which allows for a highly available and scalable, distributed database. Have an experience or content you’d like to share? We enjoy creating content that’s helpful and insightful. Enjoyed the article? Check us out! Or email us– we would love to hear from you.

Kyle Wild

Founder/CEO at Keen IO. Way cooler on the internet than in person.

Announcing: Access Key Creation & Management Tool

Here at Keen IO we’ve built an Analytics API. We’re very excited to share a new feature with you– a tool for creating Access Keys. Access Keys are authentication tokens that can be used to grant permission to read or write data to Keen IO, or automatically enrich events with entity data.

Our new UI tool for managing Access Keys gives you a simple point-and-click method for generating keys, as well as an easy way to see all of the keys you have created. Of course, as always these Keys can be provisioned and customized via our lovingly-crafted API programmatically.

Edit Access Key

In case you’re not familiar with custom API Access Keys, the main use case for creating an Access Key is security. By defining custom permissions for different users, you can lean on Keen’s security features to serve data + customer-facing analytics instead of building your own tools from scratch.

Some other use cases for custom Access Keys include:

  • You’d like to build in custom permissions and follow security best practices
  • You’re presenting dashboards with analytics to your customers and want to ensure that customer A can’t see customer B’s data
  • When writing data you want to make sure that customer A & B’s data streams don’t mix
  • You’d like to make sure administrative operations in Keen IO (such as deletes) are not granted to all
  • You’d like to stream in data from other sources via webhook and still take advantage of Keen IO’s Data Enrichments
  • You’re interested in adding entity data or Master Data to every event

Your Access Key can allow you to add entity data to each of your events - this is a powerful ability. By specifying what you’d like to append just once in the Access Key’s autofill property, you can bake a particular user, customer, or company’s entity data into each event. For example:

“autofill”: {
  “user_id”: 232130982,
  “name”: “Some Name”
}

Boom💥 These properties will show up in every single event written with that key.

Autofill can also be used to enrich your incoming webhook data stream via Keen’s Data Enrichment Add-Ons. If you’re streaming SendGrid email data, long URL strings which exist in all Click events can be enriched and parsed to become more useful using the URL Enrichment. (Note: Because it’s a webhook, if a property is missing we currently have no way of notifying you if events fail. As always, test your integration.)

The SendGrid data model also includes in each event the IP address, so wherever a user opens the email you can maximize the power of the Access Keys and use autofill to enrich those Opened events with IP to Geo Data. Heck, enrich all of your events. Keen has five data enrichment tools for even more cool analyses.😋

Here’s an example Access Key definition. You’ll see the autofill property being used to include some entity data + the IP to Geo and User Agent Data Enrichments:

{
  "name": "Access Key for Acme Corp with Data Enrichments",
  "is_active": true,
  "permitted": ["writes"],
  "options": {
    "writes": {
      "autofill": {
        "customer": {
          "id": "93iskds39kd93id",
          "name": "Acme Corp."
        },
        "keen": {
          "addons": [ // Keen Addons to be used
            {
              "name": "keen:ip_to_geo", // IP to Geo parser add-on
              "input": {
                "ip": "ip_address" // Make sure the "ip_address" field exists in each event sent
              },
              "output" : "geo"
            },
            {
              "name": "keen:ua_parser", // User Agent parser add-on
              "input": {
                "ua_string": "user_agent" // Make sure the “user_agent” field exists in each event sent
              },
              "output": "tech"
            }
          ]
        }
      }
    }
  }
// continue rest of Access Key definition 
}

Do you have a use case where you might want to define and manage fine-grained permissions for who can access your data streams and analysis? Thinking of ways you can incorporate data enrichments or entity data into your events?

Sign up and try it out!

Maggie Jan

Data Scientist, Engineer, Teacher & Learner

Introducing Transparent Pricing

To our current and future customers:

Hi, I’m Ryan, the Chief Product Officer here at Keen IO. I’m excited to make a handful of announcements today about changes to our pricing and to how you buy and use the platform.

Firstly, we’re moving to a new metered pricing system. This means you can now pay directly for your usage of the various components of our platform (which are divided into three products: Streams, Compute, and Access), rather than having to choose a bundled plan that was measured only on data ingest.

Second, we’re removing the $1000/month cap on what you can buy in a self-service fashion (previously you had to negotiate a custom, annual contract with us if your usage was above a certain level).

Finally, on feature availability: many of our platform’s capabilities (e.g. Stream to S3, Cached Datasets, and the entire Access product) have only been available as part of custom, annual contracts. The feedback we got from our users was that while these technical capabilities were quite exciting, the prospect of having to engage in a negotiation process just to use them was not. Now and in the future we’ll be working to make all of our functionality discoverable, learnable, and purchasable online, in a self-service fashion – and of course, transparently priced.

Already a fan & using Keen? We’re working on a transition timeline for current customers. Until you hear more from us, your billing won’t change. You can see your payment tiers listed under your billing settings. We’ll be transitioning your accounts to the new pricing in waves, but there are several thousand of you, so this will take some time. The target date we have in mind for completing this transition is March 31st, 2017.

We’ve been gathering an exhaustive set of customer feedback for the past few months, which has guided our decisions here, but of course we haven’t been able to talk to everyone. If you have any feedback on any of this, or questions about your account specifically, we would love to hear from you! Please reach out to us at team@keen.io, or login and start a new conversation with us using the Intercom chat at the bottom-right of the page.

Stay tuned for an exciting 2017,

Ryan & the whole team at Keen IO

Ryan Spraetz

Problem Solver — in business, coding, building, and bouldering.

New Architecture Design for Distributed Caching Fleet

We’re happy to announce we recently rolled out significant performance improvements to the Keen IO Compute API. This post explains how we were able to dramatically reduce query latency for all query types by evolving our query architecture.

Our platform processes millions of adoc and batch queries daily, while maintaining a 99.9%+ uptime.

Improved Query Response Times

Let’s start with the results. First, overall response times have improved. Queries to Keen (via the API or through the Explorer) should be faster now. The following graph shows the impact of the changes.

95th Percentile Query Duration

Improved Query Consistency

We have also made our query processing more robust by fixing a bug in our platform that could cause query results to fluctuate (different results for the same query) during certain operational incidents like this one.

The Magic of Caching

These dramatic results have been possible due to more effective caching of data within our query platform.

We’ve been working on improving query response times for many months and to understand the most recent update it would be useful to have a little background on how Keen uses caching and how it’s evolved over time.

Query Caching Evolution

At the lowest level we have a fleet of workers (within Apache Storm) responsible for computing query results. Any query can be considered as a function that processes events.

Query = function(events)

Workers pull pending queries from a queue, load the relevant events from the database, and apply the appropriate computation to get the result. The amount of data needed to process a query varies a lot but some of the larger queries need to iterate over hundreds of millions of events, over just a few seconds.

If you want to know more about how we handle queries of varying complexity and ensure consistent response times I wrote a blog post on that earlier which is available at here.

Simplified view of a Query being processed(Simplified view of a Query being processed)

We started experimenting with caching about a year ago. Initially, we had a simple memcached based cache running on each storm worker for frequently accessed data. At this stage, the main problem that we had to solve was invalidating data from the cache.

Cache Invalidation

We don’t store individual events as individual records in Cassandra because that won’t be efficient, so instead we group events (by collection and timestamps) into what we call ‘buckets’. These buckets sometimes get updated when new events come in or if our background compaction process decides that the events need to be re-grouped for efficiency.

If we used a caching scheme that relied on a TTL or expiry, we would end up with queries showing stale or inconsistent results. Additionally, one instance of cache per worker means that different workers could have a different view of the same data.

This was not acceptable and we needed to make sure that cache would never return data that has been updated. To solve this problem, we

  1. Added a last-updated-at timestamp to each cache entry, and
  2. Set-up memcached to evict data based on an LRU algorithm.

The scheme we used to store events was something like the following:

Cache Key = collection_name+bucket_id+bucket_last_updated_at_

Cache Value = bucket (or an array of events)

The important thing here is that we use a timestamp bucket_last_updated_at as part of our cache key. The query processing code first reads a master index in our DB that gives it a list of buckets to read for that particular query. We made sure that the index also gets updated when a bucket is updated and has the latest timestamp. This way the query execution code knows the timestamp for each bucket to read and if the cache has an older version it would be simply ignored and eventually evicted.

So our first iteration of the cache looked something like the following:

Query Caching V1(Query Caching V1)

This was successful in reducing load to Cassandra and worked for many months but we weren’t fully able to utilize the potential of caching because we were limited by the memory on a single storm machine.

We went on to create a distributed caching fleet. We decided to use Twitter’s Twemproxy as a proxy to front a number of memcached servers. Twemproxy handles sharding of data and dealing with server failures etc.

This configuration allows us to pool the spare memory on all our storm machines and create a big, distributed-cache cluster.

Query Caching V2(Query Caching V2)

Once we rolled out the new configuration the impact was pretty dramatic. We saw a major increase in cache hit-rate and improvements in query performance.

Improved cache hit rate after distributed caching rollout(Improved cache hit rate after distributed caching rollout)

Improving Query Consistency

Keen’s platform uses Apache Cassandra, which is a highly available and scalable, distributed database. We had a limitation in our architecture and usage of Cassandra such that we were susceptible to reading incomplete data for queries during operational issues with our database.

Improved cache hit rates meant that most of the query requests were served out of cache and we were less sensitive to latency increases in our backend database. We used this opportunity to move to using a higher Consistency Level with Cassandra.

Earlier we were reading one copy (out of multiple copies) of data from Cassandra for evaluating queries. This was prone to errors due to delays in replication of new data and was also affected by servers having hardware failures. We now read at least two copies of data each time we read from Cassandra.

This way if a particular server does not have the latest version of data or is having problems we are likely to get the latest version from another server which improves the reliability of our query results.

Manu Mahajan

Backend Software Developer. Human.

Delivering embedded analytics to your customers

More and more companies are making embedded analytics a core part of their product offering. Companies like Medium, Spotify, Slack, and Intercom are leveraging data as a core product offering to drive results. This isn’t just happening within high growth tech startups. In a recent survey, The Data Warehousing Institute found that around 30% of enterprises already have embedded analytics as a product offering in development or production and this effort is expected to double by the year 2018. Regardless of your industry or company size, you might have thought about ways to use data to engage your users, demonstrate product value, or create opportunities for upsell and differentiation. Whatever your objective, delivering embedded analytics to your customers can be a significant undertaking and addition to your product roadmap. You’ll need to tackle questions like: What is the purpose of having analytics for our customers? How will you display data to customers? Will you let your customers run their own analysis on data? Will you build in-house or leverage existing technology? How many engineering resources can you dedicate to this? What is the timeline?

We’ve put together a framework for thinking through all the moving parts of delivering embedded analytics to your customers so you’ll be best setup for success. Click here to view the handy PDF version.

Define your analytics objective

  • Can data help drive customer engagement?
  • How will providing embedded analytics to your customers differentiate your product?
  • Do you have dedicated resources to help build out this product?
  • Do you have executive buy in?

Data Readiness

  • Do you currently have customer data stored?
  • What sources do you need to collect data from? Are there APIs you can utilize for third party providers?
  • How clean is your data?
  • What format is your data in? Will you need to extract, load and transform it?
  • What are the key KPIs your customers care about?

Security & Access

  • How strict are the security requirements of your customers? What type of compliance do they require?
  • How granular do you want to get security permissions? Securing by company, by department, by role?
  • What are your hosting and infrastructure requirements?

Application UX

  • How do you want to display the analytics within your application?
  • How much control do you want customers to have over their analytics? Do you want to make it exportable? Do you want them to run their own queries?
  • Do you know where in the user flow you’d like to incorporate analytics?
  • Do you have a support structure set in place for customers who engage with your analytics service?

Performance

  • How real time do your customers need their data to be?
  • Do you have a sense for how many queries and how often you’ll need to run these queries per customer?

Engineering Resources

  • What are your current resource constraints?
  • Do you have data engineering and data modeling expertise?
  • Do you have a UI Engineer to design the look and feel of analytics into your application?
  • What additional resources will you need?

Delivery & Extensibility

  • Do you have a sense for the timeline to deliver an MVP?
  • How often do you expect your customer metrics to change?
  • Can you dedicate full time resources to build this?

Want to reference this list later? We’ve created this handy PDF checklist for you to print off. We also curated a list of 25 companies who are delivering analytics to their customers for fun inspiration.

Happy building! If you’d like to chat about how we’ve helped companies deliver analytics to their customers, give us a shout or request a demo.


We’ll be writing and sharing more content soon. Sign up to join thousands of builders and stay up to date on the latest tips for delivering analytics to your customers:

Alexa Meyer

Growth and UX. Cheese chaser. Aspiring behavioral economist.

Just Released: Download to CSV + Clone Queries

We have some very exciting news to share today! We’ve released some updates to Keen’s Data Explorer that we think you’ll enjoy. Keen IO users can now:

  • Download query results directly into CSV files
  • Clone saved queries

These two features have been widely requested by our community and we’re thrilled to make them available to everyone.

How to download query results to CSV

Now you can download query results displayed in the “Table” view as a CSV file from the Explorer. If you’ve entered a name for your query, that name will automatically be used as the CSV file name. If your query has not been named, we’ll provide a placeholder file name that you can update whenever you like.

To download a CSV:

  • Log in to your Keen IO account and run a query in the Explorer
  • Select the “Table” visualization type from the dropdown
  • Click “Download CSV”

How to clone a saved query

A cloned query is essentially a copy of a saved query. Once you’ve cloned a query, you can modify it without impacting the original query. This is especially handy when you want to build off of complex queries (like funnels with custom filters on each step) without having to enter all of the query parameters from scratch each time.

To clone a query:

  • Log in to your Keen IO account and select a saved query from the “Browse” tab
  • Click “Clone”
  • Enter a name for your cloned query and click “Save”

A note of thanks

A huge thank you goes out to Keen IO user and community member, Israel Menis, for their open source contributions to the Data Explorer. Their contributions helped make these features possible!

As always, if you have any questions or feedback, please reach out to us anytime. We hope cloned queries and CSV download help streamline your workflow.

Happy Exploring!

Sara Falkoff

Software Engineer

Announcing: Search on Keen Docs!

We’ve been spending time working on the Developer Experience of using Keen, making the Keen documentation searchable is one of the first updates with more to come.

Try it out here!

Searchable Docs

In the weeks to come, we’re excited to write a technical blog post on how we implemented search in our docs with Algolia. At Keen IO, we are a developer-first company and believe in creating a world-class developer experience. We have functional tools and API’s for our developers to build applications that show off their data quickly. And we also believe that the workflow on our site should be as easy-to-use as possible, and we’re committed to creating this positive Developer Experience.

Do you have feedback for our Developer Experience? Just drop us a comment or write to us at community@keen.io.

Happy Coding! 📊

–Developer Advocacy Team

Maggie Jan

Data Scientist, Engineer, Teacher & Learner

25 Examples of Native Analytics Data Designs in Modern Products

Data is so ubiquitous, we are sometimes oblivious to just how much of it we interact with—and how many companies are making it a core part of their product. Whether you’re aware of it or not, product leaders across industries are using data to drive engagement and prove value to their end-users. From Fitbit and Medium to Spotify and Slack, data is being leveraged not just for internal decision-making, but as an external product offering and differentiator.

These data-as-product features, often displayed as user-facing dashboards, are known as “native analytics” because they are offered natively within the context of the customer experience. We’ve gathered 25 examples of native analytics in modern software to highlight their power and hopefully inspire their further adoption.


Ahrefs Lets Website Owners Drill Down on Referrers

Every day, Ahrefs crawls 4 billion web pages, delivering a dense but digestible array of actionable insights from 12 trillion known links to website owners (and competitors), including referrers, social mentions, keyword searches, and a variety of site rankings.


AirBnB Helps Hosts Improve their Ratings and Revenue

In addition to providing intimate housing options in 161 countries to 60M+ guests, Airbnb also reminds its more than 600,000 hosts of the fruits of their labors—with earnings reports—and gently nudges them to provide positive guest experiences—with response rates and guest ratings.


Etsy Helps Build Dream Businesses

The go-to online shop Etsy, which boasts 35M+ products, provides its 1.5M+ sellers with engagement and sales data to help them turn their passion into the business of their dreams.


Eventbrite Alerts Organizers to Sales and Check-ins

Event organizers use Eventbrite to process 4M tickets a month to 2M events in 187 countries. They also turn to Eventbrite for real-time information, to stay up to date with ticket sales and revenue, to track day-of check-ins, and to understand how to better serve and connect with their attendees.


Facebook Expands Reach of Paid Services

With Facebook striving to take a bigger bite out of Google’s share of online ad sales, its strategic use of data has spread beyond the already robust Facebook Ads Manager to comprehensive metrics for Pages, including, of course, key opportunities to “boost” posts.


Fitbit Helps Users Reach Their Fitness Goals

Fitbit’s robust app, connected to any of its eight activity trackers, allows its 17M+ worldwide active users to track steps, distance, and active minutes to help them stay fit; track weight change, calories, and water intake to stay on pace with weight goals; and track sleep stats to help improve energy levels.


GitHub Tracks Evolving Code Bases

GitHub, the world’s largest host of source code with 35M+ repositories, allows its 14M+ users to gain visibility into their evolving code bases by tracking clones, views, visitors, commits, weekly additions and deletions, and team member activity.


Intercom Targets Tools—and Data—to Users’ Needs

Intercom, the “simple, personal, fun” customer communications platform, delivers targeted data-driven insights depending on which of the platform’s three products a team uses: Acquire tracks open, click, and reply rates; Engage tracks user profiles and activity stats; and Resolve tracks conversations, replies, and response times.


Jawbone UP Enables Ecosystem of Fitness Apps with Open API

Jawbone’s four UP trackers helps users hit fitness goals by providing insights related to heart rate, meals, mood, sleep, and physical activity both in its award-winning app, and through an extensive ecosystem of apps that draw data from the platform’s open API.


LinkedIn Premium Tracks Funnel Conversions

LinkedIn’s Premium suite of networking and brand-building tools helps demonstrate the ROI of sponsored campaigns by providing users with visibility into their engagement funnel—from impression, to click, to interaction, to acquired follower.


Medium Provides Publishers with Key Reader Metrics

Though Medium’s model is sometimes murky—publishing platform, publication, or social network?—it provides clear insights to its writers (or is that publishers?) in the form of views, reads, recommends, and referrers for published stories.


Mint Helps Users Budget and Save

Mint encourages users make better finance decisions and save up for big goals by giving them visibility into their spending trends, especially as they relate to personalized budgets.


Pinterest Allows Pinners to Track Engagement

The internet’s favorite mood board, Pinterest provides it 110M monthly active users with traffic and engagement stats including repins, impressions, reach, and clicks.


Pixlee Illuminates Its Unique Value Proposition

Pixlee helps brands build authentic marketing by making it easy to discover images shared by their customers, and then deploy them in digital campaigns. To help its clients understand the impact of this unique value proposition, Pixlee serves up an on-brand, real-time dashboard that presents custom metrics like “lightbox engagement” alongside traditional metrics like pageviews and conversions.


Shopkeep Improves Business Decision Making

Shopkeep’s all-in-one point-of-sale platform uses a wide range of data—from best-selling items to top-performing staff—to helps businesses make fact-based decisions that improve their bottom line.


Slack Delivers Visibility Into Internal Communications

The messaging app of choice for more than 60,000 teams—including 77 of the Fortune 100 companies — Slack delivers stats related to message frequency, type, and amount, plus storage and integrations.


Spotify Shares Stats as Stunning Visuals

Spotify’s stream-anywhere music service turns data insights into beautiful, bold visuals, informing their listeners of how many hours of songs they listened to in a year and ranking most-listened-to artists. They also help artists get the most from the platform by highlighting listeners by location and discovery sources.

Fan insights by Spotify


Square Zeros In On Peak Hours and Favorite Items

Going beyond credit card payments to comprehensive business solutions, Square provides business owners with real-time reports that include hourly sales by location, which help them hone in on peak hours and preferred products.


Strava Turns Everyday Activities Into Global Competitions

Strava turns everyday activities into athletic challenges by comparing its users’ performance stats against the community’s for a given walk, run, or ride. The app also used its 136B data points to create the Strava Insights microsite, providing insight into cycling trends in its 12 cities across the globe.


Swarm Updates the Foursquare Experience with New Gamified Features

Swarm adds additional gamification and social features to the original Foursquare check-in experience, providing users with their popular check-ins broken out by type, as well as friend rankings and leaderboards for nationwide “challenges.”


Triptease Builds Strong Relationships with Hotels

The Triptease smart widget allows hotels to display real-time prices for rooms listed by competing sites like Hotels.com to help convince guests to book directly and help the hotel build richer customer relationships. To keep a strong relationship with their own hotel-users, Triptease shows the impact on revenue of widget-enabled conversions, as well as the hotel’s real-time price rankings compared to other websites.


Twitter Beefs Up Its Business Case

As the internet’s 140-character collective consciousness positions itself more decisively as a boon for businesses, it has beefed up and beautified its analytics dashboard. Twitter’s dashboard now includes impressions, profile visits, mentions, and follower change for the past month, plus cards for Top Tweet, Top Follower, and Top Mention.


Vimeo Provides “Power” Stats in a Straightforward Interface

“We basically wanted to give users a power tool, but didn’t want them to feel like they needed a license to operate it,” explains Vimeo senior product designer Anthony Irwin of the video-hosting platform’s analytics tool. Today, Vimeo’s 100M+ users can dig deep—or stay high-level—on traffic, engagement, and viewer demographics.


Yelp Extrapolates Conversion-Generated Revenue

More than a ratings site for local businesses, Yelp also helps its 2.8M businesses engage and grow relationships with their customers. To highlight this value proposition, the company provides business users with a tally’s of customer leads generated through the platform, as well as a calculation of estimated related revenue.


Zype Helps Users Track Video Revenue

With a single interface, Zype makes it easy to publish and monetize video content across various platforms. Core to its value is the ability to provide users with key stats including monthly earnings, new subscriptions, and successful revenue models.


Building analytics into your product? We can help with that. Check out Native Analytics.

Want to see your stats featured in our next post? Send us a note

_

We’ll be releasing more guides and examples in the coming months. Subscribe to join hundreds of other product leaders who are following native analytics trends: ->

Alexa Meyer

Growth and UX. Cheese chaser. Aspiring behavioral economist.