Kafka Inbound Cluster
If your event pipeline is based on Kafka, you may find it easier to stream events to Keen via our Kafka Inbound Cluster than via the HTTP Stream API.
- In order to stream events to Keen via the Keen Kafka Cluster you need to initialize the Kafka Producer.
- The topic name you choose will be used as the event collection name.
- The Kafka Message sent by the Kafka Producer must be a valid JSON document. All the recommendations for the event structure described in the data modeling guide apply.
- Keen backend servers read events from all topics in the Keen Kafka Cluster.
- For each topic: event collection is created.
- For each event: the “keen” object is created, add-ons are evaluated, autofill from the AccessKey is applied.
- Then finally the event is persisted in the datastore.
- The whole process lasts less than 10 seconds. After that time, the event is available for querying.
To initialize the Kafka Producer you need to provide:
|password||the MASTER_KEY, WRITE_KEY, or an ACCESS_KEY with
Example using the Kafka built-in
$ kafka-console-producer.sh \ --bootstrap-server b1.kafka-in.keen.io:9092,b2.kafka-in.keen.io:9092,b3.kafka-in.keen.io:9092 \ --topic "Target-collection-name" \ --producer-property security.protocol=SASL_SSL \ --producer-property sasl.mechanism=PLAIN \ --producer-property sasl.jaas.config='org.apache.kafka.common.security.plain.PlainLoginModule required username="PROJECT_ID" password="WRITE_KEY";'
The topic name you choose will be used as the event collection name, which means that both the Kafka topic name limitations and the Keen collection name limitations apply.
Following character groups and characters are allowed:
a-z, A-Z, 0-9, . (dot), _ (underscore), - (dash). The total topic name length must be less or equal to 64.
Kafka tech details
Setting up the Kafka Inbound Cluster we’ve made several design decisions that you should be aware of to correctly setup Kafka streaming.
The Kafka Inbound Cluster has the replication factor set to 3.
This is done in order to ensure no event is lost in case of a failure of the Kafka Broker.
As a consequence when streaming events we recommend setting the Kafka Producer property
to make sure an event was delivered (and replicated) successfully.
Max message size
The Kafka Inbound Cluster is configured with default
message.max.bytes = 1048588. Keep that in mind when optimizing your Producer.
As of now the only way to verify the result of streaming an event is to run a query via HTTP API:
a count query with a very specific filter or an extraction.
Incorrect events (i.e. invalid JSON, event larger than 900 KB, …) are skipped from processing.
We are planning to add a
-DLQ (Dead letter queue) feature in the near future, so you are notified immediately if an error occurs.
- You are not allowed to consume from the Kafka Inbound Cluster. You are only allowed to produce events to the cluster.
- Kafka Transactions are not supported.
- Admin operations such as create topic, or delete topic are not available.