Speed up dashboards with Query Caching

After ad-hoc analysis, many of you put your most valuable queries to repeated work in internal dashboards or front and center in your product. As your data volume and query complexity grow, you might find yourself asking “can I get access to these insights even faster? Can I improve my product’s experience by loading my customer’s metrics at the blink of an eye?” The answer is yes.

Data sent to Keen is available for querying almost immediately. We strongly believe that quick access empowers you to make decisions and build features that accurately reflect the state of your businesses world.

Query Caching allows you to trade this instant accuracy (or ‘freshness’) in exchange for increased query speed. The primary use case of caching is for queries that don’t need to have up-to-the-second answers, but that need to be answered as quickly as possible. If you’ve got a monthly, weekly, or even daily dashboard you are probably less concerned with the data from the last five minutes. But you certainly don’t want to wait over a minute to look at your historical trends!

Does your day ever feel like this?

Check out your product dashboard. Look at those charts. Up and to the right. Look at your Stephen Curry bobblehead. Up and down. Aw yeah, everything looks great. Boss emails a question. Back to your dashboard. Time to refresh. Loading. Loading. Loading. Are you still waiting for the {“result”: “valuable information”} to load, or are you already getting up to get your fourth cup of coffee and maybe play some foosball with Mike?

How many minutes are wasted due to slow loading dashboards? How many of your users leave your site before their visualizations finish loading? If you get frustrated waiting 111 seconds for a big query to finish, you can’t expect them to stick around. And I can’t speak for your code, but it’s not going to be happy blocking and waiting for an answer to “most popular cat shirts” to determine what recommendations to show.


Alright, better alt-tab back to your dashboard. You notice your acquisition funnel looks a little low in the last 30 seconds! Look to your right. Look to your left. Do you have an emergency lever you can pull to enhance your product immediately? Of course not. As a human, you probably never manually make drastic changes based on the last 30 seconds of behavior. You try to steer your product based on longer-term trends.


What you want, what your customers value, what your product needs is not always up-to-the-second accuracy but reliably rapid results. What your slow dashboard or product needs to leverage is Query Caching.

How to add Query Caching

Adding max_age to a Javascript query is as simple as adding a max_age parameter to your Keen queries:

var count = new Keen.Query("count", {
  eventCollection: "pageviews",
  groupBy: "property",
  timeframe: "this_7_days", 
  maxAge: 300
});

So go ahead, add the Caching max_age parameter into any query you wish was snappier! If you have any questions about query caching, building dashboards, or data analysis, don’t hesitate to reach out to us or ping us on Slack!

Peter Nachbaur

Passionate about throwing and catching flying discs and data

Observations from a Stress Test of Our Event Write Path

Manu, one of our platform infrastructure engineers, sent out an amazing write-up of a recent stress test on the Keen IO write event path. I wanted to share this to give a peek into the inner workings of our platform team as they build out, test, and strengthen the Keen IO infrastructure. Enjoy, and thanks again to Manu for sharing! -Kevin

tl;dr With the help of Terry’s awesome ETL tool we were able to do a stress test with our write event path. We were able to sustain a 3x write rate for almost half an hour without any impact. We were able to try out a couple of changes to confirm what made the throughput improve and the fact that we did actually improve throughput.

Background

Last month, we had a particular client sending us 2x more events than we were doing before that. We ran into some problems with not being able to keep up with the write load. Another similar event occured today during which we made multiple config changes to get everything back into a stable state. With Terry’s bulk insert of events we were able to confirm the impact in a more definitive way.

The image below overlays the write event volume graph on the write event delta graph and shows the timeline of the changes.   image

Observations

Observation #1: We were able to increase throughput by adding more IO executors and running more JVMs. Adding IO executors was hinted by Storm suggesting that capacity of those bolts was > 1.0 so that was fairly obvious. But increasing executors also seemed to increase the CPU load on some boxes. We were able to offset that by running more JVMs (or so I think)

Observation #2: Our overall throughput decreases when we are already backlogged. There was some talk about this being similar to a ‘thundering herd’ problem. When we are backlogged we need more resources to get out of the backlog but we read and write more data from Kafka, so Kafka (or the Kafka consumer / producer code in Storm) could get overbooked. It’s a theory but we need to investigate this further.

The reason why this is a real thing is that while we were able to handle a 3x increase in load in steady state, we couldn’t chew through the delta faster than incoming events, even after reducing the load by 2/3rds. Increasing executors and JVMs reduced the slope of the increase but couldn’t get it moving downwards like something that worked in steady state.

Observation #3: Our max Kafka fetch size was set to 2MB (later 1MB accidentally) and that was limiting the rate at which we read from Kafka. These values seem too low because we were stuck in a situation where Storm executors were free but they were not able to drain the queue. We just weren’t able to read from Kafka fast enough. Increasing this had the immediate effect that all the bolts in the topology started seeing a lot more load and showing high 'capacity’ values. It also reflected in an immediate drop in delta. This is one of the fastest recoveries I’ve seen :) ​

None of this caused a significant increase in load on Kafka or Cassandra. So I’m pretty sure we can go higher if we need to. This also demonstrates the usefulness of stress testing. We should try and make this a regular feature and possibly add better ways to stress test other parts of our stack like queries.

Manu Mahajan

Learner. Friend.

Pingpong: A reintroduction

About a year ago we introduced Pingpong, our open-source analytics app for anything with a URL. And in the past few months, we’ve beefed up the functionality, polished the interface, and given it a shiny new landing page. If you haven’t played around with Pingpong yet, there’s no better time to check it out and start getting deeper HTTP request data.

The New and Notable

1) Display that beautiful data

You might have seen our HTML visualization kit Dashboards pop up recently. Now, Pingpong is shipping with Dashboards to run the visual side of things. So all your response info is embeddable, skinnable, customizable, responsive, and yep — beautiful.

2) Push it real good

Pushpop is another one of our open-source projects, and it only made sense to build it right into Pingpong. It’s a scheduling and reporting framework that lets you set custom parameters to get notifications for the stuff you’re tracking. When your app or site is slow to respond, for instance, you can get an SMS sent to your phone. Neat, right? We thought so.

3) Click to deploy

Getting Pingpong up and running only takes a click. Deploy the app on your Heroku account with this nifty little button:

Not on the Heroku train? No worries. You can still launch a Pingpong instance in less than 5 minutes. Just add your URLs, deploy, and start learning. The README has everything you need to know, plus some bonus options and recipes to keep things interesting.

To help you along, here’s a super-quick walkthrough video:

Got it up and running? Let us know what you’re using it for — we’re always looking for cool and interesting use cases (because we sure haven’t thought of them all).

Alex Kleissner

Software engineer by day, Partybot by night

Introducing Pushpop for Slack: push your analytics data to Slack

A few months ago, Cory blogged about some of the slack hacks we use at Keen. That was a long list of wonderful integrations that Slack provides out-of-the-box. We use a lot of them, and love them deeply. Slack lets us keep track of most of our work in one place. The missing piece? Analytics. We didn’t have a way to push our most important analytics data to Slack channels so, we built it!

Pushpop for Slack

We built pushpop-slack, a new open source Slack integration that will push alerts to Slack based on your Keen IO event data. Pushpop-slack is an extension of Pushpop, our open source tool for sending analytics reports via email and text based on flexible triggers.

Slacking with Pushpop

The vast majority of our internal use of Pushpop has been sending emails or SMS for weekly analytics reports. Recently we have been thinking that it might be nice to have more lightweight notifications when notable trends start showing up in our analytics. These kind of notifications aren’t really important enough for email, but fit really well with how we use Slack.

pushpop-slack is pretty flexible - you can do all sorts of things like impersonate a user, set a goofy icon, and send attachments. As an example, here is a really basic job I set up that sends a slack message whenever you get more than 30 unique pageviews in an hour:

require 'pushpop'
require 'pushpop-keen'
require 'pushpop-slack'

job do

  every 1.hour

    keen do
      event_collection 'pageviews'
      analysis_type 'count_unique'
      target_property 'uuid'
      timeframe 'previous_1_hours'
    end

    step 'check_threshold' |pageviews| do
      pageviews['value'] > 30 # returning false cancels any subsequent steps
    end

    slack do |last_response, step_responses|
      channel '#general'
      username 'PushPop'
      message "We're on fire! We've had #{step_responses['keen']} pageviews in the past hour!"
    end

As you can see, it’s super simple to start generating messages, and with all of the other Pushpop integrations you can build some pretty cool notifications with very little effort.

Pushpop is similar to some other trigger services, in that it listens for events from anywhere, and then triggers an action to anywhere. The big difference is that Pushpop is much more malleable, and works better for complex recipes. Pushpop can have unlimited triggers and steps in a single job, and gives you a full ruby environment to mold and inspect data for more logical actions.

Right now, Pushpop works with event data from Keen IO. If you don’t yet have an account, you can sign up for free. We plan on adding more analytics backends soon. Post a github issue to let us know which backends you’d like to see supported!

If you’ve got any other questions/comments/requests/GIFS about pushpop-slack please drop them on the github issues page, or come chat with me in Slack!

Joe Wegner

Open source something

Knowing When It’s Time to Just Quit

Quitting my job was one of the best decisions of my life.

I had wanted to leave my previous company for a long time, but I wasn’t confident I would be able to find something better. I didn’t understand what useful skills I had, and I didn’t know what kind of job I would be good at.

I felt comfortable in my role and knew I was valued at my company even though I wasn’t always given opportunities to expand my horizons.

Then last year, work began to get worse.

I had always cared deeply about the project I was working on and the people I worked with directly. Then senior management basically disabled open communication with the people I had worked with for years. They didn’t seem to care about the distrust that was developing as long as we were under budget.

Additionally, the company I worked at was very risk averse. Individuals were not really allowed to make their own decisions. It was frustrating how little I could accomplish on my own.

Work sucked.

I felt stifled from the lack of communication and empowerment. I had low-self confidence in my ability to find another job, since I wasn’t a software engineer; and it seemed like that’s what most companies wanted.

I felt like I was getting lazier at my job and it was completely unnoticeable. And mostly I felt hopeless about ever finding a new job. I came to work feeling like Peter Gibbins from Office Space explaining to his therapist that “every single day of my life has been worse than the day before.”

Then, my relationship of over 10 years began falling apart, and I couldn’t take the misery in both my personal and professional life.
This pretty much sums it up

Something had to change.

Around that time, I also read Think Like A Freak by Levitt and Dubner, which had a chapter on how failure can provide valuable feedback and how quitting can be beneficial to your life.

That chapter pushed me over the edge, and I decided I was done. It made no difference that I didn’t have a job lined up. I was going to quit. And as soon as I made that decision, I felt strong and liberated. I agreed with Levitt and Dubner that it was ok if I failed. Regardless of what happened next, I had made this decision. It was mine. I owned that choice.

Looking back, I feel like I had been sleep-walking through the last 5 years of my life. Most of my days were spent at a job that sucked the life out of me, and then I would be too drained and tired to do anything but watch television when I came home.

Once I made my decision, I had time to think about what I actually wanted in my life. What was I actually good at? How could I meet people who would enable my success? What did I think success meant for me?

What could I do about living my life, instead of just plodding through each day?

One of the best parts about being unemployed was that I could reflect on what I actually wanted in my life. I realized that what I needed in order to be happy was a good working environment.

I needed an environment where the people around me were also self-reflective, where people tried their best to communicate openly and honestly, where there was a lot of positivity and encouragement for both work and personal goals, where people were valued as individuals, where all learning was encouraged regardless of whether it directly related to your job title, and where people actually cared and were invested in the company’s success.

I began attending networking events and conferences, and talking to friends about what kind of jobs they had and what a typical day was like. I saw a career counselor who encouraged me to do informational interviewing. I took the Gallup StrengthsFinder test, which made me feel more confident that I did have useful strengths. I started seeing that there were many different jobs I was qualified for.

And then I got lucky.

I think that luck is always out there for everyone, but sometimes you don’t see it. I saw it in the form of a company called Keen IO. And yes, I got the job through networking. I made connections with people who then decided to take a chance on me. So yes, I was lucky, but I would never have seen that opportunity had I not decided to just quit. I needed to understand what I wanted, and I needed to do something that empowered me instead of staying down on myself for remaining in an environment that was harmful to my well-being.

The kind of environment at Keen IO is right for me. It isn’t right for everyone. There are many different work environments, and I think that when you don’t know what you want and you have some financial security, you need to allow yourself the ability to take time and figure it out.

You also need to allow yourself to fail.

You need to be the one making the decisions, not the people around you. If you don’t, then you can remain stuck forever in an environment that can leave you unfulfilled and miserable. I am happy that I didn’t fail, and that I am in a place that makes me feel happier.

I wish I hadn’t needed the end of a relationship to be the catalyst that enabled me to quit. I wish I could have had the courage to quit years ago.

But that doesn’t matter. I did quit, and I am much happier today because of it.

Maria Dumanis

Good news everyone!

Thanks for sharing

The latest iteration of Open Source Show & Tell is now officially a rap. Thanks to everyone who came out to show, tell, learn, helicopter high-five (more about that later), and hang out.

We had a great time and learned a bunch from all of our awesome presenters. One of the ways we know we messed up a bit was by signing up too many community presentations, which meant that not everyone who prepared something had an opportunity to present. We’ll make sure not do that next time, but if there are any other suggestions you have to make the event better please don’t hesitate to send them over via a pull request.

If you didn’t get a chance to attend we will be posting videos of the talks shortly (including the helicopter high-five) so stay tuned. In the mean time, if you’re interested in having an Open Source Show & Tell in your city check out this playbook. Were happy to share our experiences, help you get the ball rolling, and make it a reality.

Thank you to our awesome presenters!

To stay up to date on open source projects and events, subscribe to community-code, an open source open source newsletter!

Justin Johnson

community guy, hacker, music nut. i like to help people build stuff.

We have a changelog!

Hey Keen-folk!

I’ve been giving weekly updates in our developer group about work we’ve completed at Keen to keep everyone informed. This week I’m shifting things up a bit and giving everyone a link to a more thorough and awesome changelog.

Keen Changelog - April 24th, 2015

This week we’re joined by our Platform team, who works on storage and compute systems that underly the overall Keen platform. They’re awesome and I’m happy to have them included in our update!

This is our first iteration of a weekly changelog and we’re testing out donedid.io as a platform for it. If you’ve got questions about anything on the changelog, let us know. We’re happy to clarify or discuss.

Cory Watson
Principal Infrastructure Engineer

Cory Watson

Bigger than a breadbox.

How Are You Structuring Your Startup?

Starting a company is overwhelmingly exciting.

You are pouring every ounce of your being into a product you believe will make the world a better place. Your team is small, fast, and motivated. If you’re the right combination of smart, lucky, and persistent, things will start to click and you’ll grow. You’ll grow way faster than you wanted to.

Congrats, you have capitalized on your opportunity to create an amazing product.  

Don’t miss out on your next big opportunity – to create an amazing organization.  

For some reason people neglect this. When they reach the size where some structure makes sense, they take shortcuts by copying the standard corporate hierarchy you find all over. I find this strange because everyone knows working in those structures sucks. For most people, it’s why they quit their job at Big Co. to join a startup.  

Why would you want to re-create the exact monster you just left?  

I’ve talked with a lot of startup dorks in the last 3 years. The recurring theme I’ve seen is that no one stops to ask “why do we do it that way?” when it comes to their organization. They’re amazing at asking “why do we do it that way?” when it comes to product. It’s that very question that led them to their innovative awesomeness. But for some reason it is far rarer to question the way they operate internally.

I recently had a conversation with a startup CEO. The first 30 minutes or so I sat in horror as he recounted these soap-opera-esque stories of political fights and power dynamics he’s currently wrestling with. The scary part was that he seemed so bought in that this was The Way.

When I asked him why his company was organized in such a way that made these problems exist, he said something interesting: “There is no other way to run a software company.”

I was super taken aback by this and I realized this is the kind of thinking that has to change. And to be clear, I don’t blame this CEO or think he’s a bad leader. In fact, quite the opposite. But the fact that so many intelligent and driven people haven’t ever thought about alternate ways to run an organization is a shame. I bet they could make something truly amazing.  

To reach the next level of structuring organizations, we need to:

  1. Acknowledge that the organization that worked during the Industrial Revolution isn’t suited for 21st Century technology.

  2. Recognize that companies have the power to change this and to experiment.  Just because no one has done it yet doesn’t mean it can’t be done.

  3. Bring existing examples (successes and failures) to the mainstream.

Number 1 feels obvious to me and building the case for it would require a whole other blog post (perhaps a dissertation).

Number 2 requires snapping yourself out of a comfortable (and unfortunate) follower mentality.

Number 3 requires just a small bit of research. Zappos is a pretty big example for starters.

Imagine what we could do if we applied a fraction of our overall creativity to the way we organize ourselves, and had the confidence to test out new approaches. I think the next set of truly massive, truly robust, truly long-lasting companies will be the ones that perfect this type of experimentation.  

And the ones that do it best right now will attract top talent.  

So don’t screw up and ignore it.

Ryan Spraetz

Problem Solver — in business, coding, building, and bouldering.

Integrate Eventbrite and Keen IO in 60 seconds

We were getting ready for an event recently and wanted an easy way to visualize all of our Eventbrite registrations. Eventbrite has handy webhooks that enabled us to quickly start sending our registration data to Keen IO. We used this data to build a dashboard to share with our team and promotional partners:

130 people will be at Open Source Show and Tell! woohoo

Here’s what you do:

  1. Create a free Keen IO account if you haven’t already
  2. Grab your Keen Project ID and Write Key
  3. Head over to your Eventbrite Account page
  4. Scroll down and click on ‘Webhooks’
  5. Add a webhook with your Keen URL: https://api.keen.io/3.0/projects/<KeenProjectId> /events/Eventbrite_Events?api_key=<KeenWriteKey>
  6. Watch your registration events start flowing into Keen!

Once your Eventbrite data is flowing in, you can use the Keen IO data explorer to start querying and visualizing your Eventbrite registrations.

A view of our daily eventbrite registrations

Or you can use our JavaScript library to create your very own custom dashboard to share with your team! We got our event dashboard live on the interwebs super quickly using Divshot.

Let us know if you have any questions. Have a great event!

Searching for a Better Way to Do On-Call Rotations

The other day, Cory (one of our platform infrastructure engineers) sent out a company-wide email about how Keen’s Platform and Middleware Teams were trying to make on-call more manageable. It was a really interesting glimpse into the challenges of ensuring round-the-clock reliability, while also maintaining healthy personal relationships and some degree of sanity.

I thought this might be helpful to other people working on-call and asked Cory if he’d be okay with sharing his original email on the blog. Always eager to help, he said yes, so here it is! Kevin

Hello party people!

Recently I was chatting with some folks and realized we’ve not talked much outside of the on-call group as to what’s been going on with on-call. I wanted to take some time to conduct some information out as to what we’ve been doing!

Recap

As many of you know things were pretty rough in February and part of March. A lot of long nights got pulled and we had to resort to swapping people out of on-call a few times to rest folks. We learned a lot. Primarily we learned how to band together to fix problems and help each other out. It was a time of sacrifice that many of us (and our families) are still recovering from.

While we’re here, we want to thank everyone for being so understanding and willing to help. We know many of you wanted to do anything you could to help. You didn’t have knowledge needed (yet!) to sit at a keyboard and fix a busted thing, but you all contributed in your own way and we appreciate it!

First, Current State

We have made huge strides in the last few weeks to improve the on-call situation. The most important metric is people’s attitudes and our rested state. This is hard to measure. From my seat the team is in a much happier place. Many of us have taken small vacations to help shore up our moods and repair some of our relationships.

What is measurable is the number of pages:

The spikes represents The Troubles™ and we’ve made a huge improvement. The recent uptick is not representative of problems. It’s representative of improvements we’ve made that took some tuning and got sorta noise for a week or so.

I can relay that Jay, who just came off primary on-call last week, called it one of the “lightest” on-call weeks in recent memory. Yay! Congrats to everyone spending so much time on these improvements.

Second, Mechanics

We’ve been meeting regularly to review how on-call works so we can optimize things for everyone. The first thing we decided was that adding new people to the rotation would not immediately help. In fact, as Brooke’s Law describes, it would’ve hurt us as we raced to recover from our problems. We made this clear to some of the new team members. This is not a permanent thing, just a short-term plan to mitigate the blast radius.

Breadth Is Hard

It’s tough to know your way around Keen’s entire stack. Our desire to be polyglot and leverage OSS tools means that we have a lot of stuff for people to learn. So much so that no single person knows how everything works. To that end we’ve begun to specialize our on-call rotation into three categories:

  • Stormshield: Cassandra, Storm, some Kafka bits
  • Middleware: Pine, Myrrh, Service, LBs, some Kafka bits
  • Triage: General overview of everything, meant to help mitigate simple failures and escalate harder ones

Thanks to Kevin, as of last week we officially have two on-call rotations and our alerts are divvied up between 3 escalations. We’re beginning to leverage both on-calls depending on the nature of the failure. This has some great side effects:

  • You have a domain expert on hand to help deal with a problem
  • You aren’t alone

We’re not done with these mechanical improvements. We’re still meeting every two weeks to iterate toward an on-call that is more approachable. We’re now discussing how to integrate new people and bring down the OMG ON-CALL IS HARD AND LONELY problem. Luckily we have a lot of on-call experience, smarts and compassion.

Third, Infrastructure Improvements

There has been a ton of work in the area of maintenance, bugfixes, upgrades and other contributions from nearly everyone in PLAT and MID. Here are some of the big items:

  • Complete overhaul of Zookeeper machines, which coordinate both our Storm and Kafka machines. (Thanks to Brad for keeping this going, which was really scary!)
  • Ongoing repair and improvements to our Cassandra data. (Shout out to Brad for stewarding all of the repairs and to Manu and Kevin for working with our Cassandra consultants!)
  • Revamp of our fleet of Storm machines to have gobs of memory and not run supervisor instances on our nimbus nodes. (Thanks for Shu for provisioning, overseeing upgrades and making all the changes for this.)
  • Overhaul of our “chat ops” deployment system to homogenize the deploy commands for all our stuff. Every Keen-created service is now consistently deployable from @robot! (Thanks to Alan for the revamp and to Shu for continued care and feeding of the bot!)
  • Continued improvement of a “query tracing” feature for diagnosing where slowdowns occur and where we can optimize execution of queries. (Thanks to Kevin for introducing this feature and to Manu for his amazing efforts at producing measurable analysis of query execution so that we can compare efforts going forward.)
  • Improvements in the efficiency of the compaction path, causing fewer pages and operation issues around compaction, as well as reducing overall load on Cassandra (Amazing effort by Kevin!)
  • Pine has evolved and developed a considerable number of protections to keep the service healthy. Some have been bumpy but overall we both stay out of trouble more often, and recover from trouble much faster under it’s supervision of query scheduling.
  • Keen-Service has seen dozens of bug fixes and improvements to logging, query tracking, error handling and general maintenance over the last few months. The most recent improvement fixed an oversight where a large number of queries were not being load balanced! (Shout out to Jay and Stephanie for their continued diligence and ingenuity in improving Keen-Service!)
  • Our observability and monitoring has been repeatedly improved and rethought across every service within Keen. We have considerably more fine-grained visibility in to how things are behaving from per-queue query durations to visibility in to specific wait times in storm bolts. (Amazing work by Stephanie in testing metrics in Service, Manu in creating Turmeric and every person who handles on call for continually improving our monitoring.)

I’m probably leaving out contributions by a bunch of people. Sorry, I did this from memory and tried to iterate through every major component I could think of.

The Future

Note that we’re not just focused on short term fixes. PLAT is actively working on query performance improvements, data storage/compaction improvements and a bunch of other stuff. MID is working on caching and continued improvements to Keen-Service and it’s future incarnations. There are also 3 new folks that have joined (or will be joining soon) to contribute their considerable experience to the mix. Yay!

Thanks!

On-call shouldn’t dominate our lives. It’s also a necessary and important part of how we maintain the trust our customers place in us every day. We’re lucky enough to work in a company where the power to control this major part of our job is in our hands. To that end we’re working weekly to make on-call an experience that as many people as possible can contribute to. It’s worth nothing that this point in Keen’s history is hard. We’re just big enough to need to specialize, just small enough to not have all the people (yet) that we need to specialize, and all present in a period of growth wherein this transition is hard and messy. Thanks to everyone for working every day to make this a supportive experience.

Henceforth we’ll try and collect information about this every month or so to conduct things out to everyone at Keen. If you’ve got any questions, let me know!

Cory Watson

Bigger than a breadbox.