Avatar photo

How to Track User Data Without Being Creepy

There’s been a lot of scrutiny lately against companies who are acting irresponsibly with their data. This got me thinking a lot about data, ethics, and our moral responsibility as companies. Last month I gave a talk about ‘responsible analytics’ at Defrag 2014. My intention wasn’t to give a definitive decision-making guide, but to spark a discussion about the kinds of data we should and shouldn’t collect from our users.

The big questions I wanted to get people thinking about were:

  • What kinds of analytics are most helpful to companies and users?
  • What types of data should we think carefully about collecting?
  • How can we avoid getting into “creepy analytics”?
  • How can we respect individual privacy and still learn from user data?

As you read along, you’ll see the slides from my talk with my commentary in between. Please feel free to comment on this post, or tweet me @alexk_k with your reactions. Here’s the presentation.

0_uKXaJQ1pbJXaKQ_f

0_CSdg190kFkT4tYi5

I’m Alex, an open source developer/advocate at Keen IO. I’ve worked on a wide range of web-based applications over the years, everything from online dating to social gaming. At almost all of these jobs, I’ve been responsible for implementing internal analytics. This is how the thought process about analytics usually starts.

0_rvEpmUvPdOaMnbGo

Let’s say we’re starting a new company/product, and we’re thinking about what we want to track in terms of analytics. Storage is cheap, so it makes sense to store as much as we can now, and worry about analyzing later. It’s super tempting to store extra information, even if it’s not relevant to our product. Should we do it?

That’s something we should think very carefully about. Let’s look first at why we would want to collect a lot of data and then think about where we might want to draw the line.

0_nfDprQ7BZocOKO42

0_-6bB5DJcWD_QaT_O

There are a ton of advantages and improvements we can make to our products by tracking and analyzing usage. We can see which features of our products are getting used and how they are being used. We can build recommendations based on usage history and provide product personalization.

One great example of this: I worked at Chomp, which was a search engine for applications. It actually started as an app recommendation engine, but we found out after looking at our analytics that users were using the search bar far more frequently than clicking on the recommendations. We then opted to pivot and start investing in a much better app search engine.

0_sT7RA_YGXkHbemj6

There are certain types of information that people are much more sensitive about when it comes to tracking analytics. They fall into the categories of location information, financial data, habitual data, and health related data. Of course there are products where it makes a lot of sense to track these types of analytics. But sometimes, as app developers and product designers, we inadvertently start tracking data in a way that doesn’t respect the privacy of our users.

Creepy Statistics

0_5EmKvkWkSlNHdIBi

With smart phones constantly tracking our location, it’s easy to track that information. The notification screen tries to be smart and let me know, based on common patterns, how long my commute will take. That’s neat, but it’s also creepy since it never asked if it should try and do that.

0_6RmIcm98GAit3vlu

Ad retargeting is another example of analytics that can feel like an invasion of privacy. When you visit a site, a cookie gets stored in your browser (if you have cookies enabled), and other sites can then look through your cookies to “learn” what you have been browsing. This information is generally used for ad targeting, which can become especially creepy with certain targeted markets like weight loss and diet pills.

Avoid Being a Creeper

0_vpv-yGYFbu51gk52

All apps should be tracking analytics related to their product, and generally, you won’t fall into the “creepy” zone until you start tracking things that have nothing to do with your product. If your product doesn’t use location information to provide value to the user, then you probably shouldn’t be tracking it.

It’s also important to have a clear paragraph in your terms of service about what you track about your users and what you use that information for. Most users won’t read this, but it’s important information to have.

Track Smartly

0_dvApy0VABSvSeHBy

There are a lot of ways to track information, and sometimes we don’t have a ton of options depending on the type of analysis we want to do, but we should always be thinking of ways to protect our users while still tracking the information we need to provide a great product.

When we’re tracking a user’s location, it’s much better to track via a session id rather than an email address or name. Fix a session’s length. That way you aren’t tracking the same session id over time, and it’s harder to piece together the movements of a single person. You can also blur the location data. If you’re storing latitude and longitude, truncate or randomize a few decimal places. This is not great, but it adds another layer of obfuscation. Unless you start to blur in the 3rd decimal place, the location blurring isn’t very good (it’s still accurate within 100 feet).

0_7fPK2_6Cq8JW35TP

So, here we are, tracking information about our application and running analytics on it. Often the value a user gets for giving up some of their private data is readily apparent, and that makes it easy to ask for that data. For example, analyzing location data to display nearby restaurants.

When that isn’t the case, you should tell your user why you need to collect the data, or make sure the time between collecting the data and value to the user are as short as possible.

0_ygIwQjmAT10STP56

If you can, aggregate your data. Aggregated analysis is inherently less creepy. An individual isn’t analyzed, but their cohort is. There are many ways to slice and dice cohorts, such as time of session, day of week, general location. Pick one that makes sense for your data.

It would be naive to say that we can always abstract away the data we collect from users, but we should always take at least the minimum steps to ensure the user is protected, and that their data gets used for the right purposes.

(During the audience Q&A, one attendee suggested using analytics to monitor when anyone accessed sensitive data. This would be a great way to ensure accountability and further enhance the user-data collection relationship.)

What Happens When You Don’t Track Smartly?

0_ATjfNU0LvDPSrr1M

Originally, the iPhone tracked your location over time in an unencrypted file. When people discovered this, there was a lot of bad press in the tech media about the ability for hackers to exploit this information, which wasn’t great for Apple. If this had happened to a smaller company, there’s a good chance it could have tanked them.

Remember Facebook applications that would send you spam messages and notifications? Many of the applications were tracking data about your friends and their friends to build a social graph to “creepily” gain more users.

More recently, Uber was found to have an internal tool that would allow employees to track the usage of Uber rides by a user’s identifying information. It makes sense for a company like Uber to track a user’s location, and even store a link from the user to the location information. The question, though, would be whether that information needs to live in perpetuity.

Would it be feasible for a company like Uber to track a session id with a user, and then remove that linkage after a week or two? They could still use the information from the session to track analytics of usage over time, but could not use that id to link back to a user after an appropriate window of time passed.

Looking forward

0_2qLG_j6ZVH-XywFh

Data helps make our products better. But with collecting it comes responsibility. We have a moral obligation to consider the uses and consequences of what we track and how it affects our users and their privacy.

If you have any thoughts, ideas, or best practices to share about responsible data tracking, please post them in the comments or tweet me. I’d love to know what you think.