Kafka Outbound Cluster
If you need to stream your events from Keen to an external system in real time you can use the Kafka Outbound Cluster and a standard Kafka Consumer. Examples of popular services that consume from Kafka include ksqlDB, Materialize.io, and any system capable of consuming from Kafka with open-source Kafka connectors. This is a great way to build alerting functionality, power an event-driven product, create an integration, or simply back up your data.
Process overview
- Enable the
Kafka Outbound
in your Project on theStreams
page. - In your system create a Kafka Consumer and authenticate to the Kafka Outbound Cluster.
- Use the collection name which you want to stream from as a topic name.
- After a few minutes your Kafka Consumer should start receiving events streamed to Keen.
Consumer
To initialize the Kafka Consumer you need to provide:
Property | Value |
---|---|
bootstrap-server | b1.kafka-out.keen.io:9092,b2.kafka-out.keen.io:9092,b3.kafka-out.keen.io:9092 |
security.protocol | SASL_SSL |
sasl.mechanism | PLAIN |
username | your PROJECT_ID |
password | the MASTER_KEY, READ_KEY, or an ACCESS_KEY with queries enabled |
Example using the Kafka built-in kafka-console-consumer.sh
:
kafka-console-consumer.sh \
--bootstrap-server b1.kafka-out.keen.io:9092,b2.kafka-out.keen.io:9092,b3.kafka-out.keen.io:9092 \
--topic "Target-collection-name" \
--consumer-property security.protocol=SASL_SSL \
--consumer-property sasl.mechanism=PLAIN \
--consumer-property sasl.jaas.config='org.apache.kafka.common.security.plain.PlainLoginModule required username="PROJECT_ID" password="READ_KEY";'
--from-beginning
bootstrap-server
property is different for the Kafka Inbound Cluster and the Kafka Outbound Cluster.
Do not use the same bootstrap-server
for a Producer and a Consumer.
Topic name
Use the event collection name as your topic in the Kafka Consumer.
-
(dash), _
(underscore) and .
(dot).
Keen allows you to use wider set of characters in your collections names.
If your collection name is not a valid Kafka topic name then Keen will replace any unsupported character with _
(underscore) in an outbound
topic name i.e. events from collection $my collection$ %
will be streamed to the topic _my_collection___
(note that the name ends with 3 underscores, as $
, space and %
were replaced with _
).
Dead Letter Queue
Events submitted to the Kafka Inbound Cluster which cannot be processed (i.e. invalid json, bad event structure etc.) are sent to the Dead Letter Queue (DLQ)
.
Each topic (an event collection) has its own DLQ
. To consume from the DLQ authenticate your Kafka Consumer to the Kafka Outbound Cluster and subscribe it to a topic with the -dlq
suffix,
i.e. if you are sending events to the topic purchases in the project Staging then use:
Property | Value |
---|---|
username | Staging project id |
password | the MASTER_KEY, READ_KEY, or an ACCESS_KEY with queries enabled |
topic | purchases-dlq |
Messages in the DLQ topic will contain a JSON object with 2 keys:
-
event
- the original event payload as a single String -
error
- the error message
Kafka Outbound
in your Project on the Streams
page. Otherwise invalid events won’t be streamed to the DLQ topic.
event
node from the DLQ topic message to a JSON object as it might contain invalid JSONs (if invalid JSON was sent to Keen).
Kafka tech details
Setting up the Kafka Outbound Cluster we’ve made several design decisions that you should be aware of to correctly setup Kafka streaming.
Replication factor
The Kafka Outbound Cluster has the replication factor set to 3. This is done in order to ensure no event is lost in case of a failure of a Kafka Broker.
Max message size
The Kafka Outbound Cluster is configured with default message.max.bytes = 1048588
. Configure your Consumer accordingly to avoid exceptions.
Partitions count
Kafka Outbound Cluster uses num_partitions = 9
. Do not use more Consumer threads than 9 as higher values will not improve throughput.
If you need more partitions do not hesitate to contact us.
Limits
- You are not allowed to produce messages to the Kafka Outbound Cluster. You are only allowed to consume events from the cluster.
- Kafka Transactions are not supported.
- Admin operations such as create topic, or delete topic are not available.
- Keen does not guarantee any message ordering (FIFO might not be preserved). If you need to consume messages in an order then
use
keen.timestamp
or add somesequence_number
when sending events and sort messages on the consumer side. - Duplicates in Kafka Outbound stream might occur even if you are using uniqueness_token
(but still events will be stored in Keen without duplicates). If you need
Exactly Once
delivery semantics you need to implement duplicate filtering on the client side. You can usekeen.id
for duplicate filtering. - Keen does not guarantee any specific message partitioning - messages might be sent to Topic Partitions in a round robin fashion.
- Messages in the Outbound Kafka topics are available for consumption for 4 days (
log.retention.hours = 96
). Kafka will remove older messages from a topic after this time.