Apache Kafka is an open-source event streaming platform created by LinkedIn in 2011 which initially served as a messaging queue.
What Is a Apache Kafka
Apache Kafka is used as a high-available messaging queue. It receives messages from other services in the environment and provides it to the others.
Kafka is commonly deployed as a cluster with 3 or more brokers (nodes) to have data replicas (backups) on other brokers.
Kafka receives messages from producers and provides them to consumers. Each message is saved to a topic that has a name.
The message can be a text, number or an object, depending on the implementation. The topic is a category name for messages.
Producers write messages to topics and consumers read messages from topics. Kafka retains all messages for a specific time and consumers are responsible to track
location of these messages. Kafka topics are divided into a number of partitions, which contains messages in an unchangeable sequence.
Partition is a section that is separated from other segments and enables users to divide data into logical sections. Each message in a partition has a specific offset.
Source: Apache Kafka
Kafka uses Zookeeper as a centralized service for maintaining configuration information,
naming, providing distributed synchronization, and providing group services. When new brokers are added to the cluster, ZooKeeper will start utilizing them by creating topics and partitions.
Why You Might Want to Implement Apache Kafka
Kafka helps you to move large amounts of data in a reliable way and is a very flexible tool for communication between services. It's possible to scale Kafka easily and it ensures that data are read just once.
Advantages of Kafka:
- High Concurrency
- Real-time Handling
- Low Latency
- By Default Persistent
Problems the Apache Kafka Helps to Solve
Microservices architecture without Kafka
Microservices architecture with Kafka
How to Implement Apache Kafka
It's necessary to have deployed an Apache Kafka cluster including Zookeeper clues to manage Kafka nodes. There are several libraries for programming languages to connect Kafka easily.
Common Pitfalls of the Apache Kafka
- Keeping too much data
- Old Data in Topics Not Being Deleted
- Not balancing topics
- Not accounting for long-term storage
- No disaster recovery
- No API enforcement
Resources for the Apache Kafka
- cloudkarafka.com: Apache Kafka for beginners - What is Apache Kafka?
- kafka.apache.org: Apache Kafka Quickstart
- Confluent.io: Introduction to Kafka
- Confluent.io: Kafka Clients
- thenewstack.io: Apache Kafka: A Primer
- softwaremill.com: Message queue benchmark
- 5 Pitfalls to Kafka Architecture Implementation
- data-flair.training: Advantages and Disadvantages of Kafka
- NewRelic.com: 20 Best Practices for Working With Apache Kafka at Scale