An Open Source Conversation with OpenAQ

Last month, I sat down with Christa Hasenkopf and Joe Flasher from OpenAQ, one of the first open, real-time, air quality data platforms to talk about open environmental data, community building, analytics, and open source. I hope you enjoy the interview!

Taylor: Could you both tell me a little bit about yourselves, and how y’all got interested in environmental data?

Christa: I’m an atmospheric scientist, and my background for my doctoral work was on ‘air quality’ on a moon of Saturn, Titan. As I progressed through my career, I got more interested in air pollution here on Earth, and realized I could apply the same skills I’d gained in my graduate training to do something more Earth-centric.

That took Joe, my husband, and I to Mongolia, where I was doing research in one of the most polluted places in the world: Ulaanbaatar, Mongolia. As a side project, Joe and I worked together with colleagues at the National University of Mongolia to launch a little open air quality data project that measured air quality and then sent out the data automatically to Twitter and Facebook. It was such a simple thing, but the impact of that work felt way more significant to me than my research. It also seemed more impactful to the community we were in, and that experience led us down this path of being interested in open-air quality across the world. As we later realized, there are about 5–8 million air quality data points produced each day around the world by official or government-level entities in disparate and sometimes temporary forms but that aren’t easily accessible in aggregate.

Joe: I was a trained as an astrophysicist but then I quickly moved into software development and so when Christa and I were living in Mongolia, I think we just sort of looked around and saw things that didn’t exist that we could make, we went ahead and did that. Open data was always something that seemed like the right thing to do. Especially when it’s data that affects everyone, like air quality data. I think we have the tools together: I had the software development skills and Christa with atmospheric science to put things in place that could really help people.

Taylor: That’s awesome. Could you tell me more about the OpenAQ Project?

Christa: Basically what we do is we aggregate air quality data from across the world and put it one format in one place, so that anyone can access that data. And the reason we do that is because there is still a huge access gap between all of the real-time air quality data publicly produced across the world and the many sectors for the public good that could use these data. Sectors like: public health research or policy applications, or an app developer who wants to make an app of global air quality data. Or say even a low cost-sensor group that wants to measure indoor air quality and also know what the outdoor air quality is like so you know when to open your windows if you live in a place like Dhaka, Bangladesh or Beijing, China. And so by putting the data in this universal format, many people can do all kinds of things with them.

Joe: Yeah, I think we’re just focused on two things. One is getting all the underlying air quality data collected in one place and making it accessible, and the main way to do that is with an API that people can build upon. And then we also have some of these other tools that Christa mentioned to help groups examine the data and look at the data, but meshing that with tools built by people in the community. Because I think the chances of building the best thing right away is very small. What we’re trying to do is make the data openly available to as many people as possible. Because a lot of these solutions are based in local context in a community.

Taylor: That’s really cool. I have heard from other organizations that when you open up the data, you democratize the data because it’s available for the people.

I read the Community Impact document for the project and you had mentioned that some researchers from NASA and NSF and UNICEF are using the data from OpenAQ. I was wondering, what are some other cool applications of the data that you are seeing?

Christa: I think when we first started the project it was all about the data. It was all about collecting the data, getting as much data as we could. And as we went on, we realized, pretty quickly, it’s actually about the community we are building around it and the stuff that people are building. And so there are a few different pieces.

One thing we have seen is a journalist taking OpenAQ-aggregated data to analyze air quality data in their local communities. There is a journalist in Ulaanbaatar, Mongolia, who has published a few data-driven articles about air quality in Ulaanbaatar relative to Beijing. There are some developers who have built packages that make the data more accessible to people using different programming languages.

There is a statistician in Barcelona, Spain, who has built a package in R that makes the data very accessible in R and makes cool visualizations. This person made a visualization where she analyzed fireworks across the US on the Fourth of July. She did a time series, and you could see a map of the US, and as 9pm rolled around in the various time zones you can see air quality change across the US as the fireworks went off.

There is a developer in New Delhi, India, who has made a global air quality app and Facebook bot that compares air quality in New Delhi to other places or will send you alerts. We feel these usages point to the power of democratizing data. No one person or one entity can come up with all the possible use-cases themselves, but when it’s put out there in a global basis, you’re not sure where it’s going to go.

Joe: We have also been used by some commercial entities to do weather modeling, pollution forecasting. Christa, there was an education use case right… Was it Purdue?

Christa: Yeah, a professor there is using it for his classroom to bring in outdoor air quality data to indoor air quality models. Students pick a place around the world. They use outdoor quality data from there to model what indoor air quality would look like, so they are not just modelling air quality data in Seattle, which is pretty good air quality. But they are also pulling in places like Jakarta or Dhaka, to see what air quality would be like indoors, based on the outdoor parameters.

Low cost sensor groups have contacted us because they are interested in getting their air quality data shared on our platform. These groups would like their data to be accessible in universal ways so that more people can do cool stuff with it too. Right now, for our platform, we have government-level data, some research-grade data, and a future direction we are hoping to move is low-cost sensors, too.

Taylor: As you have touched on, I read that OpenAQ has community members over four continents and aggregated 16 million data points from 24 countries. I am curious, how were you able to grow the project to have all that data coming in?

Christa: We have a couple ways of getting the word out about OpenAQ and getting people interested in their local community and to engage with the OpenAQ global community. One way is we do in-person. We visit places that are facing what our community calls “air-inequality” — extremely poor air quality in a given location — and we have a workshop that will convene various people, not just scientists, not just software developers, but also artists, policy makers, people working in air quality monitoring within a given government, and educators. We focus on getting them all in the same room, working on ways they can use open data to advance fighting air inequality in their area.

So far, we’ve held a workshop in Ulaanbaatar, and we have had meetups in San Francisco and DC, since that’s where we’re based. We have also done presentations in the UK, Spain, and Italy. We are about to have our next work shop in Delhi in November. We’re getting the word out through the workshops, the meetups, on Twitter, we have a slack channel. Participation in the OpenAQ Community has been growing organically in terms of participation. Whether it’s in terms of the development end, pulling in more data, or in the application of the data. We tend to get more people interested in using the data once they are aggregated rather than in those helping to build ways to add in more data, which makes sense. We are always in need of more people helping on helping build and improve the platform.

Joe: In the beginning it was very interesting how we decided to add in new sources — there are so many possible ones to add from different places. You could look at a map and see where we had been, because whenever we would go somewhere to give a presentation we would want to make sure we had local air quality data. So before we would give a presentation in the UK, we would make sure we had some UK data. Data has been added like that and according to interest for particular locations in the community.

An interesting thing that we are able to do now with the Keen analytics, is that we can look at what data people are requesting most, and even if we don’t have the data, they might still be requesting it. So we can see from the analytics where we should potentially focus on bringing in new data. So it has been a very helpful way for us to be more data-driven when looking at what data to bring in.

Taylor: When you have a project that is an open source or an open data platform, your time becomes very valuable. You want to put your resources where they are needed most.

Joe: We want to be as data-driven as possible. And it’s hard for us to talk directly to all of the people who are using the data. I think we have a similar problem to anyone who opens up data completely. We don’t require anyone to sign up for anything. We have a lot more people using the data than we know about. We can see just from how many times the data is getting grabbed that it is popular. The analytics really help us, sort of tell something about those use cases, even if we don’t know of them specifically.

Taylor: Could you explain your use of Keen for everyone so they can understand how you are figuring that out?

Joe: The API is powered by a Node.js application that includes the Keen library. Every request that comes in goes to Keen and so we have a way to sift through it.

We don’t track any use, any sign ups, any API keys or anything at the moment. We don’t see addresses that come in from the requests, they are anonymous. But we do get tons of data that we can look through. And that was super-helpful. It gave me two lines of code that go into my API and then all my requests come into Keen and I can handle all the queries there.

We do all the normal things that you would do: total counts of requests that are coming in, we look at our end points usage statistics. This is also very interesting, we were looking at this the other day, not all our endpoints are equal and our system has some that are much heavier computationally and have taken a lot more work to create. It’s interesting to look at how much they are getting hit versus how much effort we put into making them. We can see the most popular endpoints that we have, and then we can also see ones that aren’t used as much. This helps me figure out what and how to prioritize efforts. We have a very database request heavy system. Knowing specifically the sort of queries that are coming in really helps us optimize the database to get the most out of it and make it most cost efficient.

Taylor: That’s interesting that you were able to gauge how much effort you put into some of those endpoints and then look their usage. When you don’t have that data, you are just guessing. It can also help you see that maybe there should be more education on some endpoints.

Why was it important to y’all for this platform to be open source?

Christa: So one of the major reasons we built this platform and made it open source is that we noticed a few of the groups who were gathering this sort of data and the data themselves weren’t open, nor was it clear how they were gathered. There was a few efforts, some commercial, some unclear if they were commercial or public, there were some researchers who do this. And everyone was doing it in a different way or wasn’t entirely clear how it was being done. We saw a lot of efforts having to duplicate what another effort was doing because their work wasn’t open. So we thought if someone just makes the data open and also the platform itself open source and transparent, so it’s clear how we’re grabbing the data — that’s a huge reason to do it. The other reason we chose, was that when we first started this, there was just two of us in our little basement apartment. It’s a big project, and we knew we would need help. So making it open source was an obvious route to find folks interested in helping us around the world.

Joe: I think the other piece here is that open source and free aren’t the same thing. But they are often times lumped together. Beyond just open source, I think what we wanted to be was freely available, because air pollution disproportionately affects people in developing countries. They are the ones that would generally have to pay for this data or don’t have access to them at all. And so we wanted to break down that barrier and let everyone have access to the data, making tools, and not have that be a roadblock.

Taylor: To end things, what is the most exciting thing about the project to each of y’all?

Christa: I think for me it’s definitely interacting with people in specific communities and sharing the data in the open. I love that, it’s the best.

Joe: For me it is definitely having people build something on top of it. As a developer, that’s the best feeling. In fact the first workshop we did in Mongolia, there was a developer who, just over the weekend, built an interface, like a much better exploration interface for the data than what I had initially made. Which was great, right? So I think we used that, and pointed people to that over and over and over again, because I think it took us probably, I don’t know, six months until we finally rolled out sort of a different exploration interface for the data. And that was just made by one community member and that was awesome.

I wanted to thank Christa and Joe for taking the time to talk to me about OpenAQ. I don’t know about you, but I learned a lot! It is a wonderful project that you should definitely check out.

Keen IO has an open source software discount that is available to any open source or open data project. We’d love to hear more about your project of any size and share more details about the discount. We’d especially like to hear about how you are using Keen IO or any analytics within your project. Please feel free to reach out to opensource@keen.io for more info.

Platform

Use Cases

Industries

Company

Developer

Keen

G2 High Performer – Spring 2020

GoodFirms Top Big Data Analytics Software

SourceForge – Open-Source and Business Software Platform

Learn how Keen’s complete event data management platform can transform your business.