Kafka Outbound Cluster

If you need to stream your events from Keen to an external system in real time you can use the Kafka Outbound Cluster and a standard Kafka Consumer. Examples of popular services that consume from Kafka include ksqlDB, Materialize.io, and any system capable of consuming from Kafka with open-source Kafka connectors. This is a great way to build alerting functionality, power an event-driven product, create an integration, or simply back up your data.

Process overview

  1. Enable the Kafka Outbound in your Project on the Streams page.
  2. In your system create a Kafka Consumer and authenticate to the Kafka Outbound Cluster.
  3. Use the collection name which you want to stream from as a topic name.
  4. After a few minutes your Kafka Consumer should start receiving events streamed to Keen.

Consumer

To initialize the Kafka Consumer you need to provide:

Property Value
bootstrap-server b1.kafka-out.keen.io:9092,b2.kafka-out.keen.io:9092,b3.kafka-out.keen.io:9092
security.protocol SASL_SSL
sasl.mechanism PLAIN
username your PROJECT_ID
password the MASTER_KEY, READ_KEY, or an ACCESS_KEY with queries enabled
If you are using an ACCESS_KEY with `queries` enabled it cannot contain any filters - you won’t be able to authenticate to a topic with a such key. See: Access Keys.

Example using the Kafka built-in kafka-console-consumer.sh:

kafka-console-consumer.sh \
    --bootstrap-server b1.kafka-out.keen.io:9092,b2.kafka-out.keen.io:9092,b3.kafka-out.keen.io:9092 \
    --topic "Target-collection-name" \
    --consumer-property security.protocol=SASL_SSL \
    --consumer-property sasl.mechanism=PLAIN \
    --consumer-property sasl.jaas.config='org.apache.kafka.common.security.plain.PlainLoginModule required username="PROJECT_ID" password="READ_KEY";'
    --from-beginning
Please note that the bootstrap-server property is different for the Kafka Inbound Cluster and the Kafka Outbound Cluster. Do not use the same bootstrap-server for a Producer and a Consumer.

Topic name

Use the event collection name as your topic in the Kafka Consumer.

Please note that valid characters for Kafka topics are the ASCII Alphanumeric characters (a-z/A-Z), digits (0-9), - (dash), _ (underscore) and . (dot). Keen allows you to use wider set of characters in your collections names. If your collection name is not a valid Kafka topic name then Keen will replace any unsupported character with _ (underscore) in an outbound topic name i.e. events from collection $my collection$ % will be streamed to the topic _my_collection___ (note that the name ends with 3 underscores, as $, space and % were replaced with _).
The topic names within a single project are isolated from topics in other projects. If you have multiple projects like: Staging and Production you can have a Keen collection purchases in both projects. When you authenticate a Kafka Consumer using the Staging project id and subscribe to the purchases topic then the Consumer will consume only messages from the Staging project. You need another Kafka Consumer authenticated with the Production project id and subscribed to the purchases topic to consume from the Production project.

Dead Letter Queue

Events submitted to the Kafka Inbound Cluster which cannot be processed (i.e. invalid json, bad event structure etc.) are sent to the Dead Letter Queue (DLQ). Each topic (an event collection) has its own DLQ. To consume from the DLQ authenticate your Kafka Consumer to the Kafka Outbound Cluster and subscribe it to a topic with the -dlq suffix, i.e. if you are sending events to the topic purchases in the project Staging then use:

Property Value
username Staging project id
password the MASTER_KEY, READ_KEY, or an ACCESS_KEY with queries enabled
topic purchases-dlq

Messages in the DLQ topic will contain a JSON object with 2 keys:

  • event - the original event payload as a single String
  • error - the error message
You need to enable the Kafka Outbound in your Project on the Streams page. Otherwise invalid events won’t be streamed to the DLQ topic.
Do not try to parse the event node from the DLQ topic message to a JSON object as it might contain invalid JSONs (if invalid JSON was sent to Keen).

Kafka tech details

Setting up the Kafka Outbound Cluster we’ve made several design decisions that you should be aware of to correctly setup Kafka streaming.

Replication factor

The Kafka Outbound Cluster has the replication factor set to 3. This is done in order to ensure no event is lost in case of a failure of a Kafka Broker.

Max message size

The Kafka Outbound Cluster is configured with default message.max.bytes = 1048588. Configure your Consumer accordingly to avoid exceptions.

Partitions count

Kafka Outbound Cluster uses num_partitions = 9. Do not use more Consumer threads than 9 as higher values will not improve throughput. If you need more partitions do not hesitate to contact us.

Limits

  • You are not allowed to produce messages to the Kafka Outbound Cluster. You are only allowed to consume events from the cluster.
  • Kafka Transactions are not supported.
  • Admin operations such as create topic, or delete topic are not available.
  • Keen does not guarantee any message ordering (FIFO might not be preserved). If you need to consume messages in an order then use keen.timestamp or add some sequence_number when sending events and sort messages on the consumer side.
  • Duplicates in Kafka Outbound stream might occur even if you are using uniqueness_token (but still events will be stored in Keen without duplicates). If you need Exactly Once delivery semantics you need to implement duplicate filtering on the client side. You can use keen.id for duplicate filtering.
  • Keen does not guarantee any specific message partitioning - messages might be sent to Topic Partitions in a round robin fashion.
  • Messages in the Outbound Kafka topics are available for consumption for 4 days (log.retention.hours = 96). Kafka will remove older messages from a topic after this time.