Kafka Inbound Cluster

If your event pipeline is based on Kafka, you may find it easier to stream events to Keen via our Kafka Inbound Cluster than via the HTTP Stream API.

Process overview

  1. In order to stream events to Keen via the Keen Kafka Cluster you need to initialize the Kafka Producer.
  2. The topic name you choose will be used as the event collection name.
  3. The Kafka Message sent by the Kafka Producer must be a valid JSON document. All the recommendations for the event structure described in the data modeling guide apply.
  4. Keen backend servers read events from all topics in the Keen Kafka Cluster.
    • For each topic: event collection is created.
    • For each event: the “keen” object is created, add-ons are evaluated, autofill from the AccessKey is applied.
    • Then finally the event is persisted in the datastore.
  5. The whole process lasts less than 10 seconds. After that time, the event is available for querying.

Producer

To initialize the Kafka Producer you need to provide:

Property Value
bootstrap-server b1.kafka-in.keen.io:9092,b2.kafka-in.keen.io:9092,b3.kafka-in.keen.io:9092
security.protocol SASL_SSL
sasl.mechanism PLAIN
username your PROJECT_ID
password the MASTER_KEY, WRITE_KEY, or an ACCESS_KEY with writes enabled

Example using the Kafka built-in kafka-console-producer.sh:

$ kafka-console-producer.sh \
    --bootstrap-server b1.kafka-in.keen.io:9092,b2.kafka-in.keen.io:9092,b3.kafka-in.keen.io:9092 \
    --topic "Target-collection-name" \
    --producer-property security.protocol=SASL_SSL \ 
    --producer-property sasl.mechanism=PLAIN \
    --producer-property sasl.jaas.config='org.apache.kafka.common.security.plain.PlainLoginModule required username="PROJECT_ID" password="WRITE_KEY";'

Topic name

The topic name you choose will be used as the event collection name, which means that both the Kafka topic name limitations and the Keen collection name limitations apply. Following character groups and characters are allowed: a-z, A-Z, 0-9, . (dot), _ (underscore), - (dash). The total topic name length must be less or equal to 64.

The topic name must be unique only for a given PROJECT_ID. So, if you have multiple projects like: Staging and Production you can have a topic called purchases, which translates to a Keen collection purchases in both projects.

Kafka tech details

Setting up the Kafka Inbound Cluster we’ve made several design decisions that you should be aware of to correctly setup Kafka streaming.

Replication factor

The Kafka Inbound Cluster has the replication factor set to 3. This is done in order to ensure no event is lost in case of a failure of the Kafka Broker. As a consequence when streaming events we recommend setting the Kafka Producer property ack="all", to make sure an event was delivered (and replicated) successfully.

Max message size

The Kafka Inbound Cluster is configured with default message.max.bytes = 1048588. Keep that in mind when optimizing your Producer.

Error handling

As of now the only way to verify the result of streaming an event is to run a query via HTTP API: a count query with a very specific filter or an extraction. Incorrect events (i.e. invalid JSON, event larger than 900 KB, …) are skipped from processing. We are planning to add a -DLQ (Dead letter queue) feature in the near future, so you are notified immediately if an error occurs.

Limits

  • You are not allowed to consume from the Kafka Inbound Cluster. You are only allowed to produce events to the cluster.
  • Kafka Transactions are not supported.
  • Admin operations such as create topic, or delete topic are not available.