Managed Apache Kafka vs. DIY: What’s the difference and how to choose?

Apache Kafka® is an open-source distributed event streaming platform used by 80% of Fortune 100 companies as well as thousands of small-to-midsize businesses (SMBs) for implementing high-performance data pipelines, streaming analytics, data integration, and mission-critical applications. Event streaming architecture is commonly used in robotic manufacturing, self-driving vehicle navigation, IoT network device monitoring, real-time shipping logistics, and millions of other forms of custom analytics for businesses and enterprises.

When implementing a Kafka-based event streaming architecture, IT teams face two major challenges: 1) building and maintaining the Apache Kafka infrastructure; and 2) adding custom code to implement event messaging across existing products, services, and operations. The scale of the event messaging system depends on the web/mobile traffic or device requirements. There are two common approaches to address this challenge—deploy, scale, monitor, and maintain the architecture in-house or pay for a Managed Kafka solution that handles this for them. Deciding between do-it-yourself(DIY) or a managed cloud approach depends on a multitude of factors including resources, budget, timeline, and special project requirements. This article discusses how to choose whether a managed cloud event streaming platform or building it yourself is the best option for your team.

In this guide, we will:

  • Review the market landscape for prominent Managed Apache Kafka Solutions
  • Discuss DIY event streaming solutions including Apache Kafka as well as suites of tools from Cloud Service Providers
  • Compare the benefits vs. drawbacks of using a Managed Kafka solution vs. building event streaming infrastructure in-house
  • Learn which Apache Kafka event streaming solution is recommended for the different requirements of SMB and enterprise organizations in support for web, mobile, & IoT.

Managed Apache Kafka Cloud Solutions: Marketplace

Distributed cloud architecture spans more than one data center in a similar manner to the way that elastic cluster servers extend the traditional vertical stack model of hardware. In both instances, the virtualization of hardware resources through virtual machines (VMs) and containers is assisted by software-defined networking (SDN) and APIs so that data from hundreds or thousands of microservices can be combined in the construction of custom web/mobile application runtimes.

Within the different layers of abstraction in data center operations, there are now many major IT companies and startups offering software and hardware products, as well as contracting, development, and integrator services. Event streaming solutions built on Apache Kafka form their own development ecosystem, where there is a wide variety of features and functionality that is increasingly being launched in managed cloud products to fit the requirements of DevOps teams. This includes production monitoring and network analytics.

Managed Cloud Solutions – Apache Kafka Event Streaming Architecture:

  • Keen is an all-in-one event streaming platform that provides pre-configured, managed big data infrastructure accessible via an API. The also includes additional functionality beyond streaming including data enrichments, persistent storage, real-time analytics, and embedded visualization
  • Confluent is an enterprise Apache Kafka solutions platform that offers both self-managed and fully-managed cloud plans based on dedicated multi-clusters with ksqlDB support.
  • Aiven produces an open-source event streaming server product that can be deployed in VMs on public cloud hardware to run clusters with a choice of database and analytics.
  • Cloudera is a data management platform with flexibility and scale to manage event streaming with additional functionality for data warehousing and AI/ML processing.
  • CloudKarafka offers managed Kafka clusters on GCP/AWS that can be configured through a browser with tools for VPC networking across 37 international data centers.
  • Instaclustr provides a fully managed service for Apache Kafka® offering customization and optimization of user’s cluster—SOC 2 certified and hosted on AWS, Azure, GCP, or on-prem

The main advantage of managed Apache Kafka platforms is that the companies all include teams of leading data center technicians with years of experience in enterprise best practices. It can be especially expensive for SMBs to staff and provision infrastructure internally for 24/7/365 on-prem management of event streaming resources in data centers. These products help SMBs to jump-start using Apache Kafka event streaming with a quicker time-to-implement and at an affordable price without absorbing the risk of having to build and manage it themselves.

The pricing on managed cloud plans varies with some offering a monthly/annual subscription model and others being calculated under the “pay-as-you-go” billing model. Metered billing is typically based on the amount of compute, bandwidth, and storage consumed by the event streaming resource requirements. Keen offers monthly subscriptions starting as low as $149 for a set amount of events captured and queried. When comparing pricing for the various options, it is important to keep in mind the full scope of services including customer support, premium features, and features offered by some solutions but not others. For example, Keen and Confluent offer solutions for storage and analytics while this would require additional configuration with tools like Aiven, CloudKarafka, and Instaclustr.

Apache Kafka and Cloud Platforms Event Streaming Solutions: Public Cloud

In the enterprise solutions marketplace, there is little fundamental difference in the operational functionality of software between private and public cloud platforms. AWS Outposts, Google Anthos, and Microsoft Azure Stack allow businesses to run the same stack software on-premises as in the cloud to take advantage of the most recent industry innovation in data center management. These products make hybrid and multi-cloud architecture operate seamlessly with automated elasticity to support the requirements of high-performance computer networks.

DIY approaches to event streaming architecture can be implemented in a wide range of methods on both public and private cloud hardware. When building with Apache Kafka, teams have the advantage of being able to leverage the open-source development community and the Apache Software Foundation where programmers from thousands of different companies work together towards the goal of vendor-agnostic interoperability. Proprietary DIY streaming solutions are also available from the major cloud service providers. Although the focus of this article is to compare Kafka-based event streaming solutions, the below public cloud solutions should also be considered for a DIY implementation.

Public Cloud Solutions – Event Streaming Architecture:

  • Google Cloud Platform Dataflow allows developers to build stream analytics with real-time AI/ML processing on TensorFlow TPUs using the Apache Beam SDK.
  • Microsoft Azure Event Hubs supports millions of events per second with Kafka APIs that direct data to blob or data lake storage to build streaming analytics on Azure hardware.
  • AWS Kinesis includes video, data, firehose, and analytics streams from event data with real-time processing by SQL or Apache Flink for AI/ML interpretation & visualizations.
  • IBM Event Streams is a premier Apache Kafka platform for enterprise client support that includes CloudPak integration, streaming analytics, and MQ for IoT requirements.
  • Oracle Cloud Infrastructure Streaming works with OCI, GoldenGate, & Integration Cloud to support database operations for HPC in industry, manufacturing, publishing, etc.

In addition to streaming solutions, cloud service providers also offer processing and analytics services that work seamlessly with their streaming technology, similar to open-source Apache solutions—enabling enterprises to build robust event streaming and analytics infrastructure for a wide variety of applications ranging from event-driven applications, real-time personalization, machine learning (ML)/artificial intelligence (AI), online transactions processing (OLTP) and online analytical processing (OLAP). Comparing these solutions is outside the scope of this article, but we do cover why we chose Apache Kafka over Amazon Kinesis to build the Keen platform.

Example of proprietary solutions for event streaming, processing, and analytics from Amazon Web Services, Microsoft Azure and Google Cloud Platform, (Source: Capgemini: Real-time analytics in the cloud)

SMBs with advanced IT departments have various degrees of difficulty managing their event streaming service layer in support of apps and services. Many enterprise corporations have already committed to event messaging systems as their data-driven operations’ central nervous system. Digital-native and cloud-native standards help organizations adopt data-driven methodologies across verticals and eliminate data silos. IT teams should determine the ideal solutions.

Many businesses choose DIY approaches to save money on software development, application runtime support, batch processing, and expensive managed cloud subscription plans. Other organizations choose DIY methods to advance custom configurations for application support that managed services will not provide. DIY methods allow IT departments to use their preferred choice of hybrid and multi-cloud solutions to build event streaming architecture.

DIY Solutions: Apache Storm architecture in Java 2.0 built by Oracle engineers. (Louwers, 2020)

Managed Apache Kafka vs. DIY: How to Choose?

Any organization considering adoption of Apache Kafka as an event streaming solution is recommended to first start with a complete audit of all IT resources. There are different requirements of SMB and enterprise in support for websites, mobile apps, & IoT. Many complex organizations are using Kafka event streaming as their corporate CNS, integrating all products and services in real-time analytics.

It is important to consider the way that API data will be generated from software applications, devices, and users to create event records. Each IT department will need to build a strategy for which platform events will be recorded as data points and build additional processing pipelines for the data to be used in interactive constructs like DXPs or real-time logistical search. The total cost of implementation includes programming custom software to support event streams.

After the stream events are programmed, IT managers will be able to calculate the expected message queue processing requirements based on the web/mobile application or IoT products that are being supported. Create estimates for the total volume of events per second, minute, hour, etc. that are expected on the network. From this estimate, it should be possible to begin to determine the level of hardware support required for real-time processing of event streams from all applications and the persistent storage capacity that is needed over time to store the data.

Managed Apache Kafka vs DIY: Pros & Cons

After conducting an IT audit, a business should have a clearer understanding of the budget, time, and resources needed to implement usable event streaming infrastructure and analytics. Keeping this information in mind, the next step is to evaluate the trade-offs between the event streaming implementation options.

Managed Cloud solutions are ideal for businesses with smaller IT teams who don’t have the time or resources to deploy and maintain custom event streaming infrastructure. Instead, they can pay for a monthly subscription and have experts deploy, scale, log, and monitor their infrastructure. Subscription plans at Keen start at $149 per month. Managed event streaming platforms also offer APIs and SDKs for streaming and analytics that allow development teams to get up and running in weeks. DIY approaches, on the other hand, can take months or even years to implement but allow for more flexibility in configuration.

Trade-offs – Managed Cloud Event Streaming Architecture:

Benefits:

  • Managed cloud services help SMBs adopt Apache Kafka event streaming architecture on an organizational level more quickly, affordably, and efficiently for engineering teams without the risk of building in-house.
  • Invest in programming custom software solutions for event stream messaging using Kafka APIs rather than spend to staff and support 24/7 data center operation teams.
  • Adopt industry best practices and enterprise security on managed event streaming platforms like Keen with persistent storage and real-time analytics for your data.

Drawbacks:

  • For teams with sufficient IT resources, building custom data infrastructure often makes more sense financially rather than paying a recurring cost. Here’s a guide on evaluating purchase price vs. the total cost of ownership.
  • It’s worth considering building custom infrastructure to support use cases with exceptionally high event volumes and performance requirements.

Many development teams prefer to adopt a DIY approach to Apache Kafka event streaming architecture because of the need for custom configuration of operations that cannot be accomplished on managed service platforms. DIY solutions for Apache Kafka event streaming can be built on public or private cloud hardware. The use of elastic multi-cluster VMs allows hardware to scale to meet variable traffic demands but must be configured in advance.

Trade-offs – DIY Apache Kafka Event Streaming Architecture:

Benefits:

  • DIY solutions for Apache Kafka event streaming architecture can be managed on public, private, hybrid, or multi-cloud using VMware, OpenStack, and Kubernetes platforms.
  • Business organizations can adopt tools like Apache Storm, Spark, Flink, and Beam to build custom AI/ML processing for web/mobile app or IoT integration requirements.
  • Build support for high-performance web/mobile applications, IoT networks, enterprise logistics, and industrial manufacturing facilities with embedded real-time data analytics.

Drawbacks:

  • DIY solutions carry a higher risk of growing costs and delayed timelines for smaller IT teams due to the complexity of implementing a usable solution and scope creep
  • DIY solutions require dedicated resources for deploying, scaling, logging, and monitoring which is not feasible for smaller IT departments.

Every IT department develops custom services depending on the business requirements of their particular organization. DIY approaches to Apache Kafka event streaming architecture are the most powerful data center solutions in the world, driving the real-time analytics capabilities of Fortune 500 multinational companies and brands at scale. Managed cloud solutions make the same stack available to SMBs and startups at a fraction of the price to adopt and maintain. DIY approaches require a longer time to implement (months to years) with greater security risk.

Get Started for Free! Sign up for a 30 day free trial and get unlimited access to Keen’s event streaming and analytics platform, no credit card required.

Apache Kafka: Connecting Enterprise & SMB Organizations

Managed cloud solutions for Apache Kafka event streaming architecture allow SMBs to level the playing field and scale into real-time analytics following industry best practices. Programming teams can build custom data analytics displays for sales, marketing, logistics, or platform metrics. Keen offers SDKs for 15+ programming languages as well as the use of NoSQL database solutions with Cassandra for storage.

Keen’s managed cloud infrastructure is priced lower than competing products based on dedicated multi-cluster hardware at public cloud hosts. SMBs can create a service layer for streaming event architecture that functions in parallel to running websites and mobile applications with real-time platform analytics or product/content recommendations for DXPs.

The use of APIs with Apache Kafka requires customization of software for each business uniquely. Managed cloud solutions allow SMBs to dedicate their investment resources to programming teams and to access pre-configured Kafka resources at affordable rates. DIY plans are optimal for power-users with specific SDN/VPC configurations to support at runtime. Keen’s HTTP Stream API allows programmers to bring custom software to market more quickly.

Learn more about Keen & Apache Kafka Event Streaming Architecture: Read the complete guide: Event Streaming and Analytics: Everything a Dev Needs to Know to learn more.