Justin Johnson wrote this post on April 28, 2016
The 3rd annual Open Source Show and Tell is a wrap, and we had such a good time.
We kicked off the day with Mathias wowed us with the power of F#, .NET, and Visual Studio Code via some live coding, the demo gods were friendly on this day.
Ian Eyeberg gave us a thorough introduction with some good #jokes on the Emerging Unikernel Ecosystem.
Francesc, of Google fame, talked through a brief history of Open Source with some interesting behind the scenes details and an overview of how Google supports and uses OSS.
There were lots of other excellent talks about things like the first ever open source voting platform and hacking PostgressSQL do to advanced data parsing, which is pretty handy if you don’t want to hang out in RegEx land all the time. Check out the rest of the topics here.
Justin Johnson wrote this post on April 28, 2016
Alexa Meyer wrote this post on April 26, 2016
The digital media industry is marked by regular change and reinvention. As platforms change in popularity and power, companies must invest in new distribution channels to grow and engage audiences. At Keen IO, we support our customers by staying ahead of these trends and ensuring our cloud analytics platform is tuned to deliver unique capabilities and insights that are contextually relevant to each new major channel.
In this product spotlight, we focus on our solution for Apple TV and tvOS.
When a media company embraces a new distribution channel, the investment is often significant while the ROI is uncertain because the tools to manage performance are new and unproven. More simply, employing generic analytics tools is like flying a plane without computer-guided instrumentation. How can you effectively measure mobile in-app user events if you are using a tool that was designed to track pageviews? Some developers creatively hack a stopgap solution, but there is a better way.
Here at Keen, we intentionally built our platform to collect and deeply analyze data from anything connected to the internet. Media companies love that, and they have a word for it: omnichannel. It is this omnichannel architecture that enabled our customers to rapidly adopt tvOS analytics, just as in the past customers used Keen to power analytics in new data environments like the Pebble smartwatch, TI microcontrollers, and the NASA Mars rover.
As a case study of our Apple TV analytics solution, we interviewed Keen customer AJ+, Al Jazeera’s digital-only division. For AJ+, Apple TV represented a major opportunity to capture market share and grow audience, provided the company can analyze sentiment and engagement in the new channel.
With a mobile integration already in place, AJ+ was able to immediately add event tracking to their tvOS app and start analyzing user behavior on Apple TV at launch. As with all Keen integrations, the tvOS integration provided real-time insights that informed AJ+’s production decisions and gave them a leg up over the other media companies in the Apple TV app store, boosting engagement by 133%.
“With Keen, we didn’t need to wait around for a tool to create a specific SDK or launch support for an emerging platform like tvOS,” said AJ+ Platforms Manager Alberto Naranjo. “After an extensive vetting and selection process, we found that Keen is currently the only solution capable of providing the custom insights we needed without forcing us to build our own analytics infrastructure. Keen saves us a ton of time and helps us make data-informed production decisions with real-time information.”
To learn more about Keen IO Analytics for Apple TV and tvOS please contact us to schedule a demo.
Alexa Meyer wrote this post on April 26, 2016
Alexa Meyer wrote this post on April 04, 2016
DataEngConf SF is around the corner and we can’t wait! The Data Engineering and Data Science communities have really been taking off over the last few years as companies look to build self-serve data tools and extract real-time insights from the massive amount of data at their fingertips.
Here are 4 of the talks we’re really excited about:
- Bridging the Gap Between Data Engineering and Data Science - Josh Wills, Director of Data Engineering, Slack
We’re excited to hear Josh talk about these important and interdependent functions. There is still a great deal of misunderstanding about the boundaries between the roles and the different constraints that each is operating under.
- Beginning with Ourselves: Using Data Science to Improve Diversity at Airbnb - Elena Grewal, Data Science Manager, Airbnb
Airbnb used data to change the composition of their team from 15% women to 30%, all while maintaining high employee satisfaction scores across the team. Diversity and inclusivity is important to us at Keen, and we’re thrilled to see a company like Airbnb leading the charge in using data for good.
- Running Thousands of Ride Simulations at Scale - Saurabh Bajaj, Tech Lead, Data Platform, Lyft
How does Lyft power features like Lyft Line and driver dispatch so effortlessly? Luckily, Lyft has tons of data they can rely on to run simulations at scale to ensure the rider has a seamless experience every time.
- Unifying Real-Time and Historical Analytics at Scale Using the Lambda Architecture - Peter Nachbaur, Platform Architect, Keen IO
We’re excited that Peter will be talking about how we’ve scaled our analytics platform at Keen to process trillions of events per day for thousands of customers. He’ll share how we’ve evolved our custom query engine to unify real-time and historical analytics at scale using Cassandra, Apache Storm, and the Lambda Architecture.
You can check out check out all of the talks here.
If you want to hang out at DataEngConf with us, you can register for 20% off with the code “KEEN20X”. Hope to see you there!
Alexa Meyer wrote this post on April 04, 2016
Kevin Wofsy wrote this post on March 30, 2016
Have you ever obsessively refreshed a dashboard to check your favorite stats? Fitness, finance, travel, sports, politics, gaming, trending cat-memes, whatever…
I’m guessing the answer is: Of course. Who hasn’t? I’m doing it in another tab right now!
At Keen, we talk about the business value of data all the time. For teams, customers, companies, decision-makers. Numbers make everyone smarter. Charts and graphs = insights! And that’s true, definitely. But I think there’s another piece of the data story, and this is it: people just love their data. They’re into it.
Maybe I should warn you that I’m trained as an English teacher, not a data scientist. But I believe the real reason people go crazy for data is because it’s a concrete manifestation of an abstract desire.
Okay, I know, you think I’m your crazy English teacher from junior year, but hear me out on this.
What are things that people want?
Success and Mastery: This one is obvious. All those Key Performance Indicators for company and personal growth. Users subscribed. Miles run. Levels upped. Retirement dollars saved. Success is a feeling but a bar chart is a rectangle, and a rectangle is real!
Love and Belonging: Love may be the most complex human emotion, but stats on Tinder are surprisingly precise. A friend recently showed me his dashboard and said he now knows, with mathematical certainty, what his type is (and whose type he is).
Significance and Impact: Who hasn’t watched the counter tick up and up on how many people like the picture of your child, or dog, or spouse, or brisket? How many re-tweets you got when you came up with just the right witticism about that thing that happened?
When it’s done right, data taps into some serious emotion.
So we’ve decided to share some of the data we love, why we love it, and what we can learn from it. To kick things off, here are the top two data obsessions on my list.
Data Obsession #1: Flightdiary.net
I love to travel. To be more specific, I love to rack up frequent flyer miles in creative ways and see how far they can take me. And I mean I want to see it!
That’s why I flipped when I found Flightdiary.net. It is an aviation geek’s dream. I enter my flights and then they get visually represented on a map of the world.
Colors represent how often I’ve flown a route: yellow for once, red for twice, purple or purpler for three or more, white for flights yet to be flown.
Why do I love this? Because, as Tears for Fears sang in the 1980’s, everybody wants to rule the world. It used to be you needed an armada for that. Now I can just admire my colorful lines on this map and suddenly I am an explorer of continents, discoverer of destinations.
But what if I want to know more?
Of course I want to know more, and that’s why I can look at graphs of my top routes, longest routes, flights by airline, by aircraft, by seat location.
What’s in it for Flightdiary?
Sometimes I ask myself this. Mostly I ask because I want to make sure they survive so I can keep building my empire. And they don’t charge me any money for it, so what’s the deal?
Honestly, I don’t know. But I speculate that they are working to get me hooked, and then in the future they will use the data to show me ads or send targeted offers based on all the stuff they know about where I like to go.
That’s just a guess. But I wouldn’t mind, because they’re not just using my data for their own purposes. They’re letting me get value out of my data, too. In this case, the value is emotional, but what’s better than that? I like emotions.
Data Obsession #2: Querytracker.net
Like many English teachers and copywriters, I harbor an ambition to publish a book someday. And I have learned from past experience that writing the book is not the hardest part of the bargain. The most challenging part is getting an agent.
It sounds so Hollywood: getting an agent. Like something that happens by magic. But I don’t want to stake my dreams on magic. Dreams are dreamy enough as it is. I want to invest in data.
That’s where Querytracker.net comes in. This website maintains a database of all known literary agents. I can sort by genres they represent, whether they’re open to new clients, where they’re located, etc. I can save my favorites, keep track of all my queries and log the responses.
That’s amazing, and it’s all free!
But what if there’s a premium membership with even more data?
Well gee whiz, am I going to skimp on my dream? No way! I ponied up my 25 bucks like they were on fire. And here’s some of the stuff I got for it.
Data Timeline This feature shows me all the data points (without user info) of other members who have queried a particular agent. So if an agent hasn’t responded to a single soul since 2013, I know to save my heartache for another day.
By contrast, if a particular agent seems to be lightning quick with rejections but more reflective about requests for pages, then I know the data is giving me permission to be hopeful about a slow reply.
Premium Reports With these velvet-rope reports, I can see things like Submission Replies broken down by whether there was no response, a rejection, a full or partial manuscript requested, all the way up to the ultimate response: an offer of representation.
Learning from Data We Love
It’s only natural to analyze the value of data in purely numeric ways. Of course you should consider the numbers. And showing customers their data can absolutely pay off in very measurable ways: higher signups, referrals, advertising revenue, premium feature upgrades, and more.
But I think there’s a meta side to the whole analytics equation. By measuring and quantifying things your customers care about, you can get intangible benefits as a happy side bonus. Things like: loyalty, enthusiasm, buzz, excitement that might make them write blog posts about you (note: Flightdiary and Querytracker have no affiliation to me or to Keen - I just think they’re awesome.) As Nir Eyal puts it, you can use data to get your customers “hooked” on your product.
Do you want to use data to enhance customer love?
It’s actually pretty easy to build analytics into your app, site, game, or device so you can show customers the data they care most about. In fact, Bluecore built a customer-facing dashboard in less than 24 hours!
Kevin Wofsy wrote this post on March 30, 2016
Alexa Meyer wrote this post on March 29, 2016
We’re thrilled to announce that Will O’Brien will be joining Keen IO as chief operating officer! He’ll be focused on driving operational excellence and financial performance as we continue to meet the rising demand for our analytics platform and build out exciting new solutions for our Keen Pro customers.
Before Keen, Will was instrumental in helping high-growth technology companies achieve tremendous success. He most recently held executive leadership roles at BitGo, Big Fish Games, and TrialPay, helping these companies scale during periods of rapid growth. We’re super excited to have Will’s operating experience, values, and strategic vision as an asset on our growing team. (ps. we’re hiring!)
Beyond being an awesome addition to our executive team, Will possesses many talents that he was willing to share with us in his first couple weeks. Here’s Will serenading our entire office with an Eric Clapton cover during our Monday All Hands. Thanks, Will! Welcome to the team!
You can see the full press release announcing Will’s hiring below:
Keen IO, a leader in cloud analytics, today announced that it hired Will O’Brien as chief operating officer. O’Brien is a seasoned technology executive with a proven track record of scaling organizations, managing corporate financings, and structuring strategic partnerships. O’Brien joins Keen at a pivotal time to help the company meet the rising demand for its real-time, scalable, and extensible analytics platform.
“I am delighted to welcome Will O’Brien as COO of Keen IO. He is an extremely accomplished business executive who knows first-hand how big the opportunity is for cloud analytics,” said Keen founder and CEO, Kyle Wild. “Will’s operating experience, values, and strategic vision will be a tremendous asset to Keen during this time of rapid growth and expansion.”
“I am thrilled to join Keen IO as the company continues to define the future of the analytics industry,” said Will O’Brien. “Keen’s flexible and intuitive platform enables customers of all sizes to collect, enrich, and analyze custom event data at an unprecedented scale. Customers are flocking to Keen as they recognize it is no longer competitive to build analytics tools in-house or rely on the ‘one-size-fits-none’ dashboards that exemplified the first stage of digital business intelligence. They are using Keen’s platform to make decisions with real business impact and ROI.”
Last year, Keen expanded beyond its self-service analytics API and launched Keen Pro, an enterprise platform with flexible data models, embedded analytics, and support for large data volumes. As part of ongoing account management, Keen Pro customers also get hands-on expert advice from Keen’s engineers and data scientists to ensure their integration with Keen’s API is tuned for performance and delivers the desired results. The company has experienced tremendous customer growth and the Keen platform has been integrated across a wide range of categories, including mobile, gaming, e-commerce, media, IoT and retail.
As COO of Keen, O’Brien is tasked with driving operational excellence and financial performance. O’Brien arrives with a wealth of experience in building and running high growth technology firms including recent executive roles as CEO of BitGo, SVP at Big Fish Games (acquired for $885 million by Churchill Downs, NASDAQ:CHDN), and GM at TrialPay (acquired by Visa, NSYE:V). His angel investment and advisory portfolio includes more than 50 companies in a broad range of sectors including fintech, blockchain, gaming, media, healthtech and cloud services. O’Brien holds a B.A. in Computer Science from Harvard University and an MBA from MIT Sloan School of Management.
Alexa Meyer wrote this post on March 29, 2016
Lisa Nielsen wrote this post on March 21, 2016
Last week, we posted the Mission and Values we developed in a company-wide exercise at Keen IO. In this post, I’ll explore the thinking behind why we felt this was a useful exercise to support our organization’s growth. And what we are doing to make sure our values are more than just words on a wall.
The link between values and company culture
Culture is often written about as some magical secret sauce that can make or break organizational performance. Some people buy this and some don’t. Keen is in the culture-is-important camp. We see culture as a key element of everything we do, from employee engagement to innovation, productivity, brand, team performance, customer satisfaction, and more.
In my role on the People Team, and as a byproduct of three psychology degrees, I don’t view culture as an unpredictable spinoff that magically occurs when you put a bunch of awesome people in a room together (although we do love bringing together magically awesome people on a regular basis).
Instead, I think of culture as an output of the behaviors, structures, processes, reward systems, philosophies, and group norms that are executed day after day within an organization. And these behaviors are not random. They are driven, intentionally or not, by values.
Mission and Values are sometimes overlooked. And indeed, in many organizations, they are toothless statements: espoused declarations printed on a wall somewhere or referenced in an employee handbook, but not truly representative of how the organization actually behaves.
The tricky thing about values is that they are most honestly communicated by behavior, but they also drive behavior. A manager who speaks all day long about empowering her employees but overrules all of their decisions is embodying what value? Most of us would come up with a phrase based on her actions, not her words.
So why bother writing values down at all?
If culture is a complex set of behaviors and beliefs, isn’t it best shared through role modeling? Yes, it is! And this is how culture is most often grown, by the sharing of tribal knowledge in organizations: implicit behavior inadvertently communicated to new hires, emulated, and replicated.
Values are especially crucial when employees are faced with a choice: competition or cooperation? perfection or iteration? speed or quality? knowledge sharing or knowledge protection?
Workers are faced with these choice points everyday. In a slow-growing organization a new hire only has to wait for the next team meeting to observe and absorb the behaviors which communicate the underlying values. Does a team leader encourage discussion and ideation or does she solicit efficiency and expertise? Very quickly new employees learn whether or not it is culturally appropriate to shout out ideas or wait until they have a fully formulated proposal.
But what happens when teams are growing at a pace where there are more new hires than company veterans?
This is the moment when a company culture can easily start to morph. And it’s why there are so many panel discussions on the topic of how to scale culture in startups. During times of high growth, in the absence of group norms to emulate, the team will quickly establish new ones. Perhaps this turns out very well or perhaps the company founders are left feeling like they are holding the reins on a pack of wild horses.
For this reason, creating explicit values statements is particularly useful to a company that is scaling faster than the pace at which tribal knowledge can be shared, especially if the company has a culture it cares about retaining.
As an organization grows and new territory is navigated, values provide direction in new, unknown situations. Values establish foundational behavior markers. They also help employees evaluate whether they are interested in helping contribute to what the company is trying to achieve in the world and whether their own personal development will be furthered in the process.
A mission statement answers the why and what of a company’s goals. The values inform the how. This returns us to the importance of the values being genuine indicators of company behavior and not simply words on a wall.
How can we hold ourselves accountable to our values?
As mentioned in the previous post, Keen has an advantage in that culture has been a focal point of the company since its inception. This reflective propensity is very much on Keen’s side; a values exercise can be a pointless use of time unless the leaders within your organization are demonstrating the values through their own behavior.
Given the scalability limits of role modeling, we’ve built up an array of additional supports to cultivate our values and make sure they don’t get lost in the shuffle of rapid organizational growth.
Every Thursday, the company gathers for Introspection Happy Hour, where we reflect and share what we are feeling anxious and excited about. This weekly event gives people the chance to hear what is happening on other teams and to get to know their colleagues on a personal level. The New York Times just published a great article about the correlation between psychological safety and high-functioning teams that makes a strong case for practices such as this.
At the end of each two-week sprint, each of our engineering teams holds a retrospective 30-60 minute meeting to reflect on what worked well and what can be further improved.
Keen offers an executive coaching program to all of its employees. We believe all of our employees are high performers and should have equal access to self-development resources to increase their effectiveness so they can progress toward their highest potential, not just as professionals, but as humans.
We teach a recurring learning lab called Managing Emotions where participants explore emotional regulation and mindfulness tools to increase self-awareness.
In addition to the Managing Emotions learning lab, we also teach an Effective Communication workshop where participants learn non-violent communication techniques, methods for delivering feedback, conflict resolution techniques, and trust-building models.
Each employee receives an annual stipend to spend on any educational pursuits of their choice: books, conferences, trainings, classes - it is totally at the employee’s discretion.
Our blog is a forum we cultivate to share our ongoing learning - we encourage all members of the Keen team to write for the blog.
Our value of personal agency is deeply reflected in the design of both our product and our organization. Keen’s product is highly customizable to provide developers and organizations with the broadest array of possibilities to explore their data however they like and discover the answers that matter most to them.
Our organization is unique in that we don’t have managers; we believe individuals should have autonomy and authority over their own domain and work style. Instead of managers, we have detailed responsibility matrices, collaboratively created by each team, so everyone knows what they are uniquely responsible for achieving. We also receive performance feedback from our peers vs. a single source of authority.
Direct, effective communication requires honesty. This is something we practice in our Learning Labs, meetings, community interactions, client relationships, and board meetings.
We also believe in communicating honestly with our customers no matter what. If something goes wrong, we own up to it rather than trying to cover up or look for someone to blame. Everyone at Keen knows that it is always right choice to speak honestly with customers, potential customers, each other, and the community.
Our commitment to honesty is also reflected in our organization-wide communication habits. Each week we gather for a session of “Ask Anyone Anything” where no topic is off-limits.
This is perhaps the most meta value, the one that informs the execution of the other values. As we pursue honesty, it reminds us to consider the needs, perspectives, and feelings of others as we communicate, assert our agency, and seek development for ourselves, our business, our product, and our organization.
The Effective Communication Learning Lab provides an environment to practice speaking honestly in cases of conflict, while simultaneously taking the other person’s feelings into account.
We also ensure that every Keen employee rotates through support shifts so they can better understand our customers’ needs and experiences, and feel a personal connection to our community.
These are just some of the ways we are making our values real through concrete practices.
Of course the practices would vary depending on the values of any given organization. But I hope this gives a sense of how values can be much more than a feel-good page in a handbook. When backed up by solid routines and behaviors, they can play a vital role in preserving and promoting the culture of a fast-growing organization.
As always, we’d love to hear your feedback. How are you using values within your organization? Do you have any tips or suggestions to share? And if you want to hear more about the process of how we came up with our Mission and Values, please reach out to me at firstname.lastname@example.org
Lisa Nielsen wrote this post on March 21, 2016
Lisa Nielsen wrote this post on March 14, 2016
Here at Keen we spend a lot of time thinking about culture. This habit extends as far back as Keen has existed, since we’ve always been inspired not only by a product mission, but also by the question of how to build the type of organization where people love to work.
We recently went through a company-wide exercise to take this thinking a step further and create explicit Mission and Values statements that feel real to us and uniquely Keen. We’re excited to share them now.
Mission: Turn Explorers into Discoverers
We rallied around this Mission because it encompasses how we feel about everything we’re building at Keen. On the product side, we’re building a flexible analytics platform that makes it easy for people to collect and explore their data and discover new insights about their business, products, and customers. On the company side, we’re bringing together curious, open-minded humans eager to run experiments, learn new things, and keep growing as people.
In terms of culture, the Mission also feels very Keen. It connects to our love of exploration and discovery at the grandest celestial scale, and it also hints at our more playful side, since ‘discoverers’ is, after all, a word we kind of made up.
To support the Mission, we came up with a specific set of Values that have always been a part of who we are, but weren’t clearly defined until now.
Introspection is something we practice at Keen on both an individual and organizational level. We believe in the power of curiosity, reflection, and thoughtfulness. And in the immense catalyzing force of self-awareness.
Related to introspection, continuous learning guides us to continually assess ourselves, our product, our business, and our organization. We are always open to growth and discovery via access to new data, feedback, opportunities, and insights.
We believe individuals should have autonomy and authority over their own domain and work style. We regard all humans as capable of brilliant problem-solving, innovation, and invention. We deliberately structure our product, platform, and organization design to support this.
Honesty begins with thoughtful introspection and extends to the way we communicate with each other and our customers. If something goes wrong, we own up to it and work together to find solutions. Through a culture of honesty, we build trust and gain the confidence to experiment with new ideas and make new discoveries.
Empathy informs all our values, reminding us to consider the needs, perspectives, and feelings of others as we communicate honestly, assert our agency, and seek development for ourselves, our business, our product, and our organization.
We’d love to hear your feedback about our Mission and Values, as well as the role statements like these play in your organization.
And stay tuned for an upcoming post about why we think values are so important to a healthy company culture.
Lisa Nielsen wrote this post on March 14, 2016
Taylor Barnett wrote this post on March 09, 2016
Have you ever been collecting a lot of awesome data, but you felt like it was a vast jungle of hidden gems and had no idea where to start exploring it? Popily + Keen IO are here to help.
Popily can instantly provide you tons of charts - you can pick out your favorites or ones that help you dig deeper into your data. Popily can impress your boss, team, customers, cats, and dogs. You can import data from anywhere into Popily - CSVs, databases, and more. With Keen IO, we’ve got you covered. No fancy data ninja skills are needed to import your event data from Keen IO into Popily. It’s as easy as three steps.
Find the gold in your data
Discover mind-blowing, meaningful insights in Popily. You can easily extract different data sources from Keen and other sources and merge them in Popily to get a better view into your data. Sometimes you won’t find what you are looking for right away - exploring Popily’s charts can help you dig deeper in your Keen IO data!
Share the gold you discover
When you find something interesting, Popily makes it super easy to communicate in a meaningful way. You can export their charts as images into PowerPoint, Excel, or embed them with a few lines of code. You can even embed them alongside your Keen IO data visualizations.
Need direct API access?
We’ve got you taken care of! You can build your own data collection and exploration engine with API access to both Keen IO and Popily. Collect and store data from users, websites, apps, and smart devices with the Keen IO API and SDKs, and then explore and visualize that data with Popily. You can see more about the Popily API on their blog and API Documentation.
Don’t miss out on loading your Keen IO data into your free Popily account today! Also, check out how exactly the Popily team sent UFO sighting data from Keen IO to Popily for a bunch of instant charts in their latest blog post.
Taylor Barnett wrote this post on March 09, 2016
Ellie Day wrote this post on March 08, 2016
In today’s digital marketing landscape, AdTech companies are are popping up everywhere. The ability to precisely target your audience online can reap great rewards for your marketing and revenue objectives, and entrepreneurs and founders are taking note.
But how do you know if your AdTech efforts are successful? How do you know if your customers campaigns are successful? More importantly, how do your customers know that your AdTech product is working for them?
That’s where AdTech Metrics and Analytics come in.
In this post, we’ll look at the top metrics AdTech companies need to measure - and display to their customers - to ensure success and get a competitive advantage.
Companies like AdRoll, Facebook, and Twitter all have ad platforms that display AdTech metrics to their customers. We’ll show you how you can build and embed these kinds of analytics into your products, too - with real-time results and unlimited scalability.
Spoiler alert: it’s actually pretty easy to do.
AdRoll’s AdTech Dashboard
Step 1. Focus on what’s unique
Building an AdTech platform can be quite complicated. From building responsive ad units to creating detailed customer segments, countless hours must be spent on the delivery of ads. So once an ad is running and racking up views, clicks, and revenue, you want to make sure you can store and display this data in a scalable and real-time manner.
To do this, first you’ll need a scalable data infrastructure for data storage and collection, and then you’ll need a way to query and display individualized data to each of your clients.
We’ll show you how you can do all of this, starting with the first order of business: figuring out which metrics you need to track and display.
AdTech Metrics that Matter
So you’ve built an AdTech platform that can place and serve up ads for your clients. Great! The next thing a client will want to know is “Are my campaigns performing?” To answer this question, it’s super important to know what metrics you should be surfacing for your clients.
To get you started, here are six key metrics used by leading advertising providers such as Facebook and Google:
- Impressions: the number of people who have seen an ad, with a breakdown between Unique Impressions and Total Impressions.
- Exposure: the average number of times an ad is served per person.
- Engagement Rate: the number of people who have “engaged” as a percentage of all ad views. For most ads, an engagement is typically a click-through to the advertiser’s site, but can be a video play or other interaction.
- Conversion Rate: the percentage of people that convert on a desired outcome, such as becoming a paying customer, as a result of an engagement.
- Relevance Score: a score between 1 and 10 that indicates an ad’s effectiveness and relevance to an audience, calculated by combining other metrics such as Conversion and Engagement Rates.
- Revenue: total value of all purchases made as a result of an engagement with an ad or campaign.
Twitter’s AdTech dashboard
Using the right data model, you can produce these metrics with only three events:
- Ad Views: runs once on ad load.
- Engagement: runs each time a user engages with an ad.
- Purchases: runs once after a purchase is completed on a client’s site.
These two events also support standard metrics like User Locations and Referral Sources if such information is needed.
Now that you’re familiar with what to track, you’re ready to learn why the above metrics are so important.
Why Certain AdTech Metrics are Important
You might be familiar with the metrics of Impressions, Exposure, Engagement Rate, Conversion Rate, Relevance Score, and Revenue but here’s a quick refresher as to why they matter:
Guaranteed to be the largest number on a dashboard, Impressions, which are the total views of an ad, is crucial when quickly assessing the success (or failure) of an ad campaign.
Sure, Impressions alone don’t specify how many people interacted with an ad, but even Google understands the importance of being seen by thousands of potential customers, which can be very exciting for your clients. Where metrics like Engagement Rate show actual interactions, Impressions show the possibilities of untapped engagement, seemingly limited only by the size of the audience.
In addition, with Interactive Advertising Bureau’s in-progress Viewability Standard, now more than ever, an impression means someone has actually viewed a client’s ad.
One of the goals of advertising on the web is to accustom potential customers to a brand, so it makes sense that increased exposure to a specific brand can be an effective strategy to improve an ad’s performance.
For example, Retargeting, a popular method for increasing exposure, is known for “high click-through rates and increased conversions” as stated by Adroll, a leader in Retargeting. Your clients can greatly benefit from this strategy, so displaying the exposure levels of their ads is critical.
Engagement & Conversion Rates
The usefulness of Engagement and Conversion Rates stems from the precise data these metrics produce. As both of these metrics measure what percentage of users perform specific actions, there is little room for ambiguity when interpreting the resulting data. Where Impressions provide an opportunity for estimating potential, Engagement and Conversion Rates are great for evaluating what actually occurs. This lets a client actively manage the performance of their ads, which is a great strategy.
With numerous metrics that are vying for the attention of a client, large ad networks, run by companies like Google and Facebook, have created proprietary scoring systems that estimate an ad’s quality or relevance, giving a simple score between 1 and 10. While parts of these scoring algorithms are not public, it is known that metrics like Impressions, Engagement Rate, and Exposure are used, in part, to calculate Relevance Scores.
To reward highly relevant ads, platforms like Facebook use an ad’s Relevance Score to determine what a client pays to have their ad displayed, with a higher score resulting in a lower price.
This sophisticated metric can be a difference-maker for your AdTech platform.
While impressions and engagements are useful in measuring the success of an ad campaign, it’s important to pair those metrics with how much revenue an ad is generating.
For many clients, the end goal is sales and other metrics are just part of the advertising process, so accurately listing revenue from an ad is very important.
Making it happen
Ellie Day wrote this post on March 08, 2016
Manu Mahajan wrote this post on February 29, 2016
Hi, I’m Manu and I’m a software engineer with Keen IO’s Platform team. Over the past year I’ve focused on improving our query performance and scalability. I wanted to share some things we’ve learned from this experience in a series of posts.
Today, I’ll describe how we’re working to guarantee consistent performance in a multi-tenant environment built on top of Apache Storm.
tl;dr we were able to make query response times significantly more consistent and improve high percentile query-duration by 6x by making incremental changes that included isolating heterogenous workloads, making I/O operations asynchronous, and using Storm’s queueing more efficiently.
High Query Performance Variability
Keen IO is an analytics API that allows customers to track and send event data to us and then query it in interesting ways. We have thousands of customers with varying data volumes that can range from a handful of events a day to upwards of 500 million events per day. We also support different analysis types like counts, percentiles, select-uniques, funnels, and more, some of which are more expensive to compute than others. All of this leads to a spectrum of query response times ranging from a few milliseconds to a few minutes.
The software stack that processes these queries is built on top of Apache Storm (and Cassandra and many other layers). Queries run on a shared storm cluster and share CPU, memory, IO, and network resources. An expensive query can easily consume physical resources and slow down simpler queries that would otherwise be quick.
If a simple query takes a long time to execute it creates a really bad experience for our customers. Many of them use our service to power real-time dashboards for their teams and customers, and nobody likes to wait for a page while it’s loading.
Measuring Query Performance Variability
Given that some queries do many magnitudes more work than others, how do we go about guaranteeing consistent response times?
Before we could answer this question, we needed to know the extent of the problem, so we came up with metrics that would define the problem clearly and help us measure progress as we made improvements.
We created a client application that queried our API with a fairly simple query. We monitored results closely and found that there was a big variation in response times at the 99th percentile. We then went on to define internal criteria that included different kinds of queries that were more representative of our customer traffic, instead of a single test query.
These were queries that represented the median size of data or lower. About 20% of all queries fell into this category.
P99 for simple queries: one in a hundred queries were taking up to 60 seconds to execute - clearly we had a big problem!
Identifying How to Make Improvements
Once we had a way to measure performance variability we started thinking about how to tackle this.
We came up with a couple of high level goals:
Reduce variability as much as possible in the type of work that a single storm worker (i.e. JVM) was doing.
Within a JVM, prevent a single incoming request from consuming all of the most precious resources.
A Note About Incremental Progress
As engineers, we have a tendency to look at a complex software systems and immediately see all the problems with them. At this point we asked ourselves questions like: Is Storm even the right platform for doing this? Should we just rewrite the query execution service?
While our system was not perfect, big changes are disruptive and come with their own operational and performance problems which take time to tune. So we took the approach of small incremental goals instead of a big disruptive change.
Improvement: Reducing Variability by Isolating Workers
When I started working on this problem we were running a single Storm cluster that ran multiple topologies doing different kinds of work, ranging from executing queries to ingesting events into our data store and even streaming data to S3. The first thing we did was to create separate Storm clusters. We now run five different clusters across two DCs where each cluster runs a related set of topologies, including two dedicated clusters for processing queries in each DC.
The next step was to examine the distribution of workers on our Storm cluster. Our query topologies are built on top of Storm’s concept of Distributed RPC. Each of these topologies can independently handle multiple concurrent client requests, where each request behaves like an RPC call to a caller.
By default Storm distributes workers (JVMs) for each topology across the cluster. Storm 0.8.2 added the Isolation Scheduler which makes it possible to ‘isolate’ topologies to a set of machines. We couldn’t use the Isolation Scheduler directly because of our deployment infrastructure so we ended up writing our own scheduler that distributes workers in a similar fashion. We also built some additional features like the ability to change the isolation configuration dynamically.
The following diagram illustrates this.
We further went on to reduce our query topologies to a single worker per topology which performed better in our testing.
This adjustment reduced the overhead of serialization and traffic between different workers, which in turn reduced overall storm CPU and gave us a performance boost.
P99 for simple queries: we were still getting some nasty spikes but there was a big improvement. One in a hundred queries were now taking close to 25 seconds instead of 60 as before.
Improvement: Better Sharing of Resources by Making I/O Operations Asynchronous
Once we had isolation to a point where each JVM was executing a small number of concurrent queries we started profiling the code more aggressively. This immediately led us to a problem: the slowest operation in query execution was reading data from our distributed storage layer, which included a caching layer (Memcached) and a distributed database (Cassandra). Queries that required lots of reads from a database would consume the I/O pipeline and make other requests wait until resources were freed.
In Storm, the execution of business logic happens within Bolts. More specifically inside the ‘execute’ method of each bolt class.
Storm’s design promotes doing all I/O operations inside the 'execute’ method. While this works great to maximize throughput, it was causing slowdowns for simpler queries in our case.
Here’s a simplified view of the execution of the I/O-hungry bolt in our initial design:
Note that with shuffle-grouping the tuples were distributed across multiple queues. In the above example a query with large number of tuples fills up the queue and another query with fewer tuples gets queued up behind the already pending tuples.
Some large queries required 100,000+ read operations to be performed, whereas simpler ones had a few hundred. This caused execution times to be highly variable depending on what queries were in-flight at that time.
Solving this problem was hard, especially within the execution environment that Storm provided. This was another why are we using Storm for this? moment.
While shuffle-grouping seems to be the problem here, if we got rid of it completely then the overall throughput would suffer because we would lose concurrency.
We tried using fields-grouping, which uses a hash function to determine how tuples get assigned to executors. By hashing on the 'request-Id’ it would make all the tuples for a query hit the same bolt executor. (Each executor in Storm is a separate thread.)
There were still a few problems with this approach:
If all tuples for a query went to a single executor we would lose concurrency in our query processing. This means that queries would become slower overall.
If there were a hash collision then a simpler query might get assigned to the same executor that was processing an expensive query.
We came up with a solution after trying a few different things. We continued to use Fields Grouping but decided to create a separate shared thread pool for the actual I/O operations. This meant that the job of the bolt was now only to schedule an I/O operation on a thread pool which would be executed asynchronously.
In theory this was great, but because we were using a DRPC topology in Storm, we relied on the CoordinatedBolt for managing ACKs and keeping track of when a step in the query was fully complete. Unfortunately, the CoordinatedBolt doesn’t work with code that is asynchronous and we started seeing exceptions when trying to use it this way. The following email thread talks about a similar problem that another developer experienced. https://groups.google.com/forum/#!topic/storm-user/u3I1W9Dj8-A
We had to work around this and the final scheme we came up with had a few changes.
Use fields-grouping on request-Id
Convert the Bolt to implement the BaseBatchBolt interface.
Use a shared thread pool (shared across the JVM) to execute I/O operations.
Use KeyedFairBolt to prevent hash collisions from starving execution of a query.
The new configuration looked something like the following diagram:
An important learning here was to not have a separate queue for the I/O thread pool. Trying to add more queueing using the traditional LinkedBlockingQueue caused our performance to tank. We realized that the overhead of blocking queues in Java is pretty significant and that is exactly the problem that the LMAX Disruptor used by Storm for queueing internally is designed to prevent.
This means we use the thread pool as a buffer of extra I/O threads. If there are threads available on the thread pool an operation gets scheduled; otherwise the calling thread blocks and tries to execute the operation itself. This way we rely on the queueing that Storm provides via the receive queues and it also acts as a simple back pressure mechanism when there are too many I/O operations in-flight.
The async I/O changes led to a dramatic improvement that was immediately visible.
P99 for simple queries: Since we’ve deployed the change queries now take around 10 seconds or lower and are much more consistent over time.
Here’s a finer-grained version of the same metric showing minutely data (instead of hourly) and the change after the deployment.
This was a big milestone for us. In the past we’ve been able to make multiple improvements in overall query response times but we’ve struggled with the particular problem of how to keep response times consistent for all kinds of queries.
I would like to end by saying that we’re not done yet. We are continuing to invest in improving consistency of response times even further. We’ve also introduced query caching, which enables sub-second response times for queries of any size. This is especially helpful for our customers building customer-facing analytics into their product.
An idea that we’ve started exploring to improve consistency of response times for real-time queries is to build a query ‘weight’ prediction service. Something that will allow us to predict the complexity of a query before it is executed.
We could use information like the size of the dataset being queried, historic response times, or even machine learning to come up with a ‘weight’ for every query. We could then use that to assign the query to an appropriate queue.
While we don’t have a true query weight prediction service just yet, we’ve gone ahead and partitioned our Storm topologies into three groups: small, medium and large, each with its own queue. At the moment we rely on historical data for query response times per project to decide what queue a query should be assigned to.
This has already given us the ability to separate the most expensive and the least expensive customer queries to a reasonable degree. More on this to come.
Manu Mahajan wrote this post on February 29, 2016