Kafka Management
Last updated
Was this helpful?
Last updated
Was this helpful?
A KRaft-based Kafka cluster consists of broker nodes responsible for message delivery and controller nodes that manage cluster metadata and coordinate clusters. These roles can be configured using node pools in deployment manifests.
Other Kafka components interact with the Kafka cluster for specific tasks.
Kafka Connect is an integration toolkit for streaming data between Kafka brokers and other systems using connector plugins. Kafka Connect provides a framework for integrating Kafka with an external data source or target, such as a database, for import or export of data using connectors. Connectors provide the connection configuration needed.
A source connector pushes external data into Kafka.
A sink connector extracts data out of Kafka
External data is translated and transformed into the appropriate format.
Kafka Connect can be configured to build custom container images with the required connectors.
Kafka MirrorMaker replicates data between two Kafka clusters, either in the same data center or across different locations.
Kafka Exporter extracts data for analysis as Prometheus metrics, primarily data relating to offsets, consumer groups, consumer lag and topics. Consumer lag is the delay between the last message written to a partition and the message currently being picked up from that partition by a consumer
Kafka in Condense supports Transport Layer Security (TLS), a protocol for encrypted communication.
IMPORTANT! Communication is always encrypted between Kafka components.
Kafka listeners use authentication to ensure a secure client connection to the Kafka cluster. Clients can also be configured for mutual authentication. Security credentials are created and managed by the Cluster and User Operator.
mTLS authentication (on listeners with TLS-enabled encryption)
SASL SCRAM-SHA-512
OAuth 2.0 token-based authentication
Custom authentication (supported by Kafka)
Authorization controls the operations that are permitted on Kafka brokers by specific clients or users.
Simple authorization using ACL rules
OAuth 2.0 authorization (if you are using OAuth 2.0 token-based authentication)
Open Policy Agent (OPA) authorization
Custom authorization (supported by Kafka)
Kafka in Condense can run on FIPS-enabled Kubernetes clusters to ensure data security and system interoperability if the native Kubernetes service of the cloud provider supports it.
Monitoring data allows you to monitor the performance and health of Kafka in Condense. You can configure your deployment to capture metrics data for analysis and notifications.
Metrics data is useful when investigating issues with connectivity and data delivery. For example, metrics data can identify under-replicated partitions or the rate at which messages are consumed. Alerting rules can provide time-critical notifications on such metrics through a specified communications channel. Monitoring visualizations present real-time metrics data to help determine when and how to update the configuration of your deployment.
The following tools are used for metrics and monitoring which is shipped with the Condense deployment
Prometheus pulls metrics from Kafka, Controllers, and Kafka Connect clusters. The Prometheus Alertmanager plugin handles alerts and routes them to a notification service.
Kafka Exporter adds additional Prometheus metrics.
Grafana Labs provides dashboard visualizations of Prometheus metrics.
This architecture demonstrates how Condense achieves a highly available, scalable, and resilient Kafka deployment within a Virtual Private Cloud (VPC) on Azure Kubernetes Service (AKS).
A dedicated node pool is distributed across three availability zones (Zone A, B, C) in Region A to enhance fault tolerance.
Each node contains:
Kafka Operator – Manages Kafka lifecycle and configurations.
Kafka Broker – Handles message storage and distribution, backed by Persistent Volume Claims (PVCs) for data durability.
Controller – Manages cluster metadata and leadership election, also using PVCs for persistence.
Multi-Zone Redundancy
Kafka components are distributed across three zones, preventing single points of failure and ensuring continuous availability.
Node Affinity & Pod Anti-Affinity
Node Affinity ensures components are deployed on specific nodes that are running in different regions.
Pod Anti-Affinity prevents multiple critical Kafka instances from running on the same node, improving redundancy.
Internal Access (VNet Peering): Allows internal clients to communicate with Kafka through an internal load balancer, maintaining security and low latency. This makes it easier to connect existing services to Kafka internally.
External Access (Internet Load Balancer): Distributes traffic efficiently to external clients, ensuring reliable connectivity.
Dynamic Scaling: Kubernetes automatically scales Kafka brokers based on workload demands.
Self-Healing Mechanisms: Kubernetes reschedules failed pods and maintains cluster stability without manual intervention.
Building on the high availability architecture defined earlier, this approach extends Kafka’s resilience by implementing a multi-region deployment for disaster recovery using Kafka MirrorMaker 2. This ensures seamless data replication and failover across geographically distributed clusters.
Primary Region (Region A):
A local producer writes data to the source Kafka cluster.
Cross-Region Replication:
Kafka MirrorMaker 2 replicates data to a secondary Kafka cluster in Region B.
Backup Region (Region B):
The target Kafka cluster serves as a hot backup, ensuring data availability in case of failure in Region A.
Use Case: Ensures a disaster recovery mechanism with a standby cluster that can take over operations if the primary region fails.
Multi-Region Active Clusters:
Clusters in Region A and Region B actively handle data ingestion, processing, and replication.
Real-Time Cross-Replication:
Kafka MirrorMaker 2 ensures continuous synchronization between both regions.
If one cluster fails, the other automatically continues operations, preventing downtime.
Use Case:
Enables an active-active multi-region setup, ensuring real-time failover and load balancing.
Monitoring data allows you to monitor the performance and health of Kafka in Condense. You can configure your deployment to capture metrics data for analysis and notifications.
Metrics data is useful when investigating issues with connectivity and data delivery. For example, metrics data can identify under-replicated partitions or the rate at which messages are consumed. Alerting rules can provide time-critical notifications on such metrics through a specified communications channel. Monitoring visualizations present real-time metrics data to help determine when and how to update the configuration of your deployment.
The following tools are used for metrics and monitoring which is shipped with the Condense deployment
Prometheus
Prometheus pulls metrics from Kafka, Controllers, and Kafka Connect clusters. The Prometheus Alertmanager plugin handles alerts and routes them to a notification service.
Kafka Exporter
Kafka Exporter adds additional Prometheus metrics.
Grafana
Grafana Labs provides dashboard visualizations of Prometheus metrics.