Apache Kafka Interview Questions & Answers (2026)
The go-to guide for Kafka interview questions — from core architecture to real-world streaming scenarios asked at Netflix, LinkedIn, Uber and top tech companies.
📋 What's Covered
Kafka is now the standard for real-time data pipelines. If you're interviewing for a Senior Data Engineer, Platform Engineer, or Backend Engineer role — Kafka questions are almost guaranteed. Here's everything you need to know.
─────────────────────────────────────────────
Producers ──► [ Topic: orders ]
├── Partition 0 ──► Consumer Group A (Consumer 1)
├── Partition 1 ──► Consumer Group A (Consumer 2)
└── Partition 2 ──► Consumer Group B (Consumer 1)
─────────────────────────────────────────────
ZooKeeper / KRaft ──► Broker Coordination
Brokers: 3 (each stores partition replicas)
1. Core Architecture & Concepts
Partition — A physical subdivision of a topic stored on a broker. Each partition is an ordered, immutable log of records. Partitions enable:
• Parallelism — Multiple consumers can read different partitions simultaneously.
• Scalability — Partitions are distributed across brokers.
• Ordering guarantee — Order is guaranteed within a partition, NOT across partitions.
Key rule: A topic with N partitions can have at most N consumers in one consumer group actively reading at the same time.
ZooKeeper (legacy) — Managed broker metadata, leader election, and cluster coordination. Required in Kafka versions before 2.8.
KRaft (Kafka 3.3+ GA) — Kafka's own built-in consensus protocol replacing ZooKeeper. Eliminates the operational complexity of maintaining a separate ZooKeeper cluster. KRaft is now the default and recommended mode.
Key behaviours:
• If you have 4 partitions and 2 consumers → each consumer reads 2 partitions.
• If you have 4 partitions and 5 consumers → 1 consumer is idle (no partition for it).
• Multiple consumer groups can read the same topic independently (fan-out).
Rebalancing — When a consumer joins or leaves the group, Kafka triggers a rebalance to reassign partitions. During a rebalance, consumption pauses briefly.
2. Producer & Consumer Deep Dive
acks setting controls when the producer considers a message "successfully sent":| acks value | Meaning | Risk |
|---|---|---|
0 | Fire and forget — no acknowledgement | Data loss possible |
1 | Leader broker acknowledges | Loss if leader fails before replication |
all (or -1) | All in-sync replicas (ISR) acknowledge | Slowest but zero data loss |
acks=all with min.insync.replicas=2.
murmur2 and applies hash(key) % numPartitions. Same key always goes to the same partition — guaranteeing order for related messages (e.g., all events for user ID 123).Without a key: Kafka uses a sticky partitioner (default since Kafka 2.4) — batches messages to the same partition until the batch is full, then rotates. Previously used round-robin.
Custom partitioner: You can implement your own to route messages based on business logic (e.g., send high-priority orders to partition 0).
| Delivery Semantic | Risk | How to Achieve in Kafka |
|---|---|---|
| At-most-once | Messages can be lost | Auto-commit offsets before processing |
| At-least-once | Duplicates possible | Commit after processing (most common) |
| Exactly-once (EOS) | No loss, no duplicates | Idempotent producer + transactional API |
3. Reliability, Replication & Offsets
retention.ms (default 7 days).Who manages offsets?
• Kafka itself stores committed offsets in an internal topic called
__consumer_offsets (since Kafka 0.9).• Consumers commit their offset after processing to track progress.
• If a consumer restarts, it resumes from its last committed offset.
auto.offset.reset — Controls what happens when there's no committed offset:
earliest (read from beginning) or latest (read only new messages).
ISR (In-Sync Replicas) — The set of replicas that are caught up with the Leader within
replica.lag.time.max.ms. If a follower falls too far behind, it's removed from the ISR.Typical production config:
replication.factor=3, min.insync.replicas=2, acks=allThis means: 3 copies of data, requires at least 2 replicas to acknowledge writes — tolerates 1 broker failure with zero data loss.
4. Performance & Tuning
batch.size (default 16KB → try 64KB–256KB) and linger.ms (add small delay to fill batches).Compression: Set
compression.type=snappy or lz4 — dramatically reduces network and disk I/O.Increase partitions: More partitions = more parallelism = more producers writing simultaneously.
Async sends: Use async producer with a callback instead of blocking on each send.
buffer.memory: Increase from 32MB to 64–128MB to reduce producer back-pressure.
Causes:
• Consumer processing is too slow (heavy computation, slow DB writes).
• Too few consumer instances for the number of partitions.
• Frequent rebalances causing pause time.
Fixes:
• Scale out — add more consumers (up to the number of partitions).
• Optimize consumer processing — batch DB writes, async processing.
• Increase
max.poll.records to process more records per poll.• Monitor with Kafka's
kafka-consumer-groups.sh --describe or a tool like Burrow.
5. Scenario-Based Questions
1. Order Service (Producer) — Publishes order events to
orders topic with order_id as key (ensures all events for the same order go to the same partition).2. Kafka Topics:
orders-created → orders-validated → orders-fulfilled3. Consumer Microservices:
• Validation Service — reads from
orders-created, validates stock/payment, publishes to orders-validated.• Fulfillment Service — reads from
orders-validated, triggers shipping.• Notification Service — reads both topics, sends emails/SMS.
4. Reliability:
acks=all, idempotent producers, dead-letter topic for failed orders.
1. Idempotent Processing — Design your processing logic to be safe to run twice (e.g., upsert to DB using the message ID as the primary key).
2. Deduplication Store — Track processed message IDs in Redis with a short TTL. If seen before, skip processing.
3. Exactly-Once Semantics (EOS) — Use Kafka's transactional API with
enable.idempotence=true for true end-to-end exactly-once guarantees within the Kafka ecosystem.
🎯 Master More Data Engineering Interview Topics
Practice 100-question interactive quizzes on SQL, Spark, PySpark, Hadoop and more — completely free. Then get the 300Q PDF bundle for offline deep prep.
Visit the Blog → Get the PDF Bundle
No comments:
Post a Comment