Khushal Agrawal

Distributed Systems / active

Kafka Clone

A Kafka-like C++ broker with topic creation, produce/fetch APIs, list-offsets support, partition log directories, byte-offset reads, and broker API tests.

Architecture Overview

The Kafka clone is a C++ broker built around append-only topic-partition logs and protocol-style APIs for creating topics, producing messages, fetching records, and listing offsets.

The project is focused on the core mechanics behind streaming infrastructure: partition metadata, byte offsets, broker request handling, persistence, and correctness tests for producer/consumer flows.

Technical Challenges

  • topic and partition metadata management
  • append-only log layout on disk
  • byte-offset based produce and fetch behavior
  • earliest and latest offset discovery
  • duplicate topic handling
  • broker API tests for end-to-end message flow correctness

Current API Surface

The broker currently supports CreateTopics, Produce, Fetch, and ListOffsets. Produce returns byte-position base offsets, ListOffsets reports earliest and latest offsets, and Fetch can read from the beginning or a specific offset.

Lessons Learned

Kafka-style systems are less about one API and more about the contract between durability, offsets, batching, and consumer progress. Even a compact clone quickly becomes an exercise in making the log format and broker semantics explicit.

Tools

C++, broker APIs, append-only logs, filesystem persistence, tests