Presenting The New Analytics at IBM Pulse

Pulse is IBM’s massive annual conference in Las Vegas and this year it’s all about Cloud. Like any conference there are announcements, demos, and Fall Out Boy opening for Elvis Costello.

Yet, hidden in a swanky nightclub an indoor mile away lies something else — a conference within a conference. It’s called dev@Pulse, and it is specifically for developers. Here, it’s ok to show code on a slide, or do a demo with curl, or to say the word Ubuntu.

I gave a 15-minute lightning talk here on Monday. While I didn’t show any code, I think the mixed crowd of startup and enterprise developers enjoyed hearing a startup’s perspective on analytics-in-the-cloud.

Here’s my presentation. The text is what was in my notes. If I had a perfect memory and nerves of steel it’s what I would have said, but in reality I’m sure it varies from the truth the video will tell.

— —


Hello everyone! I’m here today to talk about my experience building analytics in the cloud at Keen IO. Keen IO is an elegant, cloud-based API for doing data collection, reporting, and visualization.

But first, let’s get to know each other a bit. I know we’re at a nightclub, but I’m old fashioned. (pause for laughter)


I’m Josh. I’m the VP Engineering at Keen IO. I live in San Francisco, but I’m from Chicago. I enjoy reading and nature and my favorite food is fried chicken sandwich.

I’m fascinated with distributed systems and how we scale with software. I’m interested in real-time data processing systems and what you can build on them.

I’ve worked at companies from 1 employee to more than 100,000, and that makes me think about company culture a lot and specifically what it takes to scale great organizations.

I’m the author and maintainer of several open source projects, including Pingpong, a tool we built inside of Keen IO to do HTTP monitoring and open sourced last month. We hope it’s the first of many.

So, why are we doing what we’re doing? Why is the new analytics inside the cloud?


Here’s something you already know. Mobile is exploding. IoT is exploding. Connected devices are exploding. There is exponential growth in each of these areas. I didn’t put any numbers on the slide because they would have been obsolete by the time I gave this presentation.


These new devices and products generate an unprecedented amount of data. Developers and manufacturers must capture and use this data to make products better, customers happier, and stay competitive. He or she with the best data wins.


Oh-oh. But there’s a problem. Technology, right? Such a pain. It’s always something. The problem is that these companies are not big data companies. They are business-to-consumer apps or devices, or business-to-business sensors. Storage and scale of terabytes of data is not their core competency.


And big data is hard. Big data requires distributed systems engineering, dedicated operations teams, highly available architectures, and performance and scale expertise. It takes big words to do big data.

I know this firsthand. I’m building a big data startup, and I spent 5 years working with data in the enterprise and Silicon Valley. I’ve done Cassandra deployments, DB2 driver development, insurance data warehousing, and ad platform ETL for a big web company in Sunnyvale.

It takes a talented and diverse group of people to do big data well. It’s very expensive. Building a world-class big data team doesn’t make financial sense for most companies.

This is the crux of the issue — companies need data to beat their competition, but investing in a data platform at the expense of product development can have the reverse effect.


Cloud to the rescue! Whoo-hoo! This is a cloud conference — I knew you would love this slide. The cloud fixes everything! Got a sore elbow? Put some cloud on it! Rising real-estate prices in your neighborhood? Move to the cloud! Don’t feel like going to Best Buy on a Sunday to buy hardware? The cloud is right for you.


Keen IO is analytics-in-the-cloud for web, mobile, and IoT developers. No data team assembly required! Keen IO is self-service. Instead of building out big data infrastructure, app developers and manufacturers spend time building their products.


These products stream data to Keen IO. The Keen API does high-availability data collection, storage, and scaling. This is the gritty stuff, a complex mix of distributed systems architecture and operations. It’s our job to know what the latest version of Zookeeper is, or how to migrate to Kafka 0.8, or how to add nodes to a multi-DC Cassandra cluster with vnodes enabled. Our customers don’t have to think about any of that — this is the beauty of the API supply chain.

Instead, customers focus on what matters to them. Answering business questions. Building dashboards, reports, and visualizations. With Keen customers can get an impressive end-to-end analytics solution up and running in days, not months.


APIs are great. But to succeed, APIs must be elegant, intuitive, and powerful. And they must be documented! The most common reason that an API fails to get adoption, or any piece of software for that matter, is because of poor documentation. Documentation is the top of the funnel for API adoption and plays a big role in retention. Consider this my PSA to write great docs!

Another success factor for APIs is flexibility, especially if you’re serving a diverse customer base. At Keen our customers come from very different industries and geographies. Our API has to be very flexible.


Our service is APIs for KPIs, and everyone’s KPIs are different. KPIs are key performance indicators — metrics about how a feature or product or business is performing in the marketplace. Companies build what they measure, and what they measure are KPIs. The most effective KPIs are those that are unique to the company, not just copied in from the industry.

Every company’s KPIs should be different because every company’s product should be differentiated. This is why companies need custom analytics. Out-of-the-box, segment-specific, analytics products prescribe their users into a cohort that is expected to behave the same way. And the bigger the cohort, the more the effective dilution to each member’s competitiveness.


Good KPI APIs allow the customer to express their business domain with precision. Here’s our approach at Keen IO. Developers model their events in JSON — a flexible data representation that allows for multiple data types and nested hierarchies of properties. And it’s human readable, which makes construction and debugging easier.

KPI reports must be flexible as well. Offering canned reports wouldn’t work — again it would normalize customers into the same view of data, cutting them off from the competitive advantage of using Keen.

Instead of offering canned reports, we supply report building tools that anyone with a knowledge of JavaScript and HTML can use. This is an example of a custom dashboard built on top of Keen by one of our customers. No customer-managed backend required.


Before I wrap things up, I’d like to take a few minutes and talk about what it’s like to do analytics as a startup. I get asked this all the time. Keen IO is a small company today, and that means we have to be choosy about what we take on. Even with that, there are some challenges we face every day.

We do all of our own operations work. From getting new servers up on Softlayer to fighting fires in the wee hours, it’s all something our internal team does. We’re moving toward more sophisticated automation and recovery but that’s an evolving journey, not a check mark.

Sometimes a customer will ask us about a feature that we’d love to offer but don’t have yet. We have to say “it’s on roadmap” and then find a place and priority for it. We’re building a high-performing data stack from the ground up, not just wrapping a single database that already has a full feature set.

Analytics as a startup is challenging, but it’s even more rewarding.

We’re able to move very fast on the features we do build. We benefit from cutting-edge test and deployment frameworks. We have no affinities or dogma — we have the freedom to choose the best tool for the job. And because we’re in the cloud behind a unified API, once a feature is shipped all customers have access to it immediately.

We use cutting-edge technologies within the core product itself. Almost all of the technology on our backend is less (or much less) than 5 years old. Sure, we find bugs, but because everything is open source we’re able to fix them ourselves or work with contributors that can. Bugs are a small price to pay for providing next-level performance, availability, and cost.


Thank you! And thanks to IBM for bringing developers together with dev@Pulse. I hope I’ve given you some things to think about when it comes to APIs, KPIs, and leveraging the cloud to make your business more competitive. Enjoy the rest of the show!

I had a great time giving the talk, but we were also in Vegas, baby! So later on this happened.