Jacob Vasquez

Demystifying Apache Kafka Message Delivery Semantics: At-most-once & At-least-once & Exactly-once…oh my!

If you clicked on this article, you’re probably pretty familiar with Apache Kafka, but in case you aren’t, Apache Kafka® is an open-source distributed event streaming platform capable of handling trillions of events a day commonly used for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Apache Kafka supports 3 message delivery semantics: at-most-once, at-least-once, and exactly-once. So how do you choose which configuration is right for you? The decision typically comes down to the level of data integrity required by your specific use case but other factors such as cost, implementation overhead, and performance can also impact which delivery semantic is your best option.

At-most-once Delivery

For the at-most-once delivery semantic, a message is delivered either one time only or not at all. Failure to deliver a message is typically due to a communication error or other disruption that causes consumers to not be able to handle an event. At-most-once is ideal for applications that need high throughput and low latency due to the fire-and-forget nature. It is the default producer and consumer delivery semantic. At-least-once and exactly-once delivery will require additional configuration. This results in the cost and implementation overhead being relatively low for at-most-once delivery. The key drawback of the at-most-once delivery scheme is that not all data will be captured which is often a disqualifier for data-sensitive applications. Some use cases that may leverage an at-most-once delivery semantic are: log collection and IoT applications such as tracking and sensor measurements. With Keen’s Event Streaming Platform, an at-most-once delivery semantic is achieved by not using a retry mechanism for event collection.

At-least-once Delivery

For the at-least-once delivery semantic, a message can be delivered one or more times, but will never be lost. This delivery semantic is ideal for applications where receiving every message is more important than having high throughput and low latency. The performance is limited due to needing to maintain state on the producer-side (not just fire-and-forget), waiting on acknowledgement from the brokers, and potentially having to retry if an event is not received. This also results in higher cost and implementation overhead than the at-most-once delivery configuration. At-least-once is an extremely common delivery semantic across use cases, where duplication of data is acceptable, including many analytics solutions like: product, embedded, and internal analytics. With Keen’s Event Streaming Platform, an at-least-once delivery semantic is achieved by using a retry mechanism for event collection.

Exactly-once Delivery

For the exactly-once delivery semantic, a message will always be delivered only one time. With this configuration, a message cannot be dropped nor duplicated. Exactly-once delivery is typically achieved by filtering out duplicate events which requires maintaining state on the consumer-side in addition to the producer-side like in at-least-once configuration. Because of this, exactly-once carries the highest implementation overhead and highest cost as well as potentially the worst performance of all of the delivery semantics. That being said, exactly-once delivery is the only configuration that can guarantee no events will be missed or duplicated. This is necessary for many use cases with business critical data, such as financial applications like Keen and Chargify’s integrated product Events-based billing, that allows users to bill their customers based on events.

In addition to the financial space, exactly-once delivery is beginning to be adopted across a wider array of B2B SaaS and IoT applications such as real-time monitoring, alerting, and analytics as well as event-driven apps due to the lower barrier of entry offered by Kafka as a Service platforms like Keen. With Keen’s Event Streaming Platform, an exactly-once delivery semantic is achieved by using Keen Uniqueness Tokens in conjunction with a retry mechanism. Keen Uniqueness Tokens do not introduce any differences in the overall performance of the platform and require minimal effort to implement.

Closing Thoughts

If you need a high throughput, low latency solution that won’t break the bank, and you don’t mind if you sometimes lose a bit of data, at-most-once delivery is the obvious choice, but when deciding between at-least-once and exactly-once delivery, things can get a little bit more tricky. If you are building it yourself, you might find accepting at-least-once delivery as “good enough” to save on cost and implementation overhead. But, if you are considering using a managed solution, like Keen, you won’t have to simply accept good enough. Keen’s Event Streaming Platform offers a Uniqueness Token feature that makes it easy to guarantee exactly-once delivery of your events by persisting only the very first event collected with the specific token value. No more having to worry about spinning up any extra overhead! What was once an elusive delivery semantic often deemed impossible due to price and practicality, is neatly wrapped up in Keen’s fully managed event streaming platform. Try for free today and check out Keen’s documentation to learn more.