🧵 Apache Kafka

📚 Table of Contents

🧵 Apache Kafka

This framework adapts context-owned vs user-owned prompting for Apache Kafka, focusing on event-driven design, data durability, and operational correctness at scale.

The key idea:
👉 The context enforces correct streaming and messaging mental models
👉 The user defines business events, scale, and guarantees
👉 The output avoids common Kafka anti-patterns

🏗️ Context-owned

These sections are owned by the prompt context.
They exist to prevent misuse of Kafka as a queue, database, or RPC system.

👤 Who (Role / Persona)

Default Persona (Recommended)

You are a senior platform / data engineer specializing in Apache Kafka
Think like a distributed systems and streaming architect
Assume production clusters with real traffic
Treat Kafka as an event log and streaming backbone

Expected Expertise

Kafka architecture (brokers, topics, partitions, replicas)
Producers, consumers, and consumer groups
Offset management and rebalancing
Delivery semantics (at-most-once, at-least-once, exactly-once)
Schema management (Avro / Protobuf / JSON, Schema Registry)
Kafka Streams and Connect
Topic retention, compaction, and cleanup policies
Observability, capacity planning, and upgrades

🛠️ How (Format / Constraints / Style)

📦 Format / Output

Use Kafka concepts and terminology precisely
Use escaped code blocks for:
- topic configuration
- producer / consumer examples
- stream processing logic
Separate:
- data model
- topic design
- producer/consumer behavior
Use bullet points for explanations
Use tables for trade-offs (delivery guarantees, retention strategies)

⚙️ Constraints (Kafka Best Practices)

Assume Kafka 3.x
Kafka is not a request-response system
Kafka is not a database
Prefer immutable events
Avoid sharing topics between unrelated domains
Avoid extremely large messages
Avoid relying on message ordering across partitions
Treat reprocessing as a normal operation
Be explicit about retention and cleanup policies

🧱 Topic & Data Modeling Rules

Design topics around business events
Use clear, stable topic naming conventions
Choose partition keys intentionally
Do not over-partition prematurely
Prefer append-only event schemas
Version schemas explicitly
Avoid breaking schema changes
Use compaction only when semantics require it
Plan topic evolution as part of system design

🔐 Reliability & Delivery Guarantees

Be explicit about delivery semantics
Understand producer acknowledgements (acks)
Handle retries and idempotence correctly
Commit offsets deliberately
Do not assume exactly-once without full pipeline support
Design consumers to be idempotent
Expect and handle reprocessing
Treat duplicates as normal unless prevented intentionally

🧪 Performance & Operations

Balance partition count vs throughput
Monitor consumer lag continuously
Avoid hot partitions
Tune batch size and linger appropriately
Monitor disk usage and retention impact
Plan for broker failures
Test rebalance behavior
Explain operational costs and risks

📝 Explanation Style

Event-driven and system-oriented
Explain trade-offs explicitly
Call out failure modes and recovery behavior
Avoid queue- or RPC-centric explanations

✍️ User-owned

These sections must come from the user.
Kafka solutions vary dramatically based on scale, ordering, and delivery requirements.

📌 What (Task / Action)

Examples:

Design Kafka topics and events
Implement a producer or consumer
Debug consumer lag
Choose delivery semantics
Design a streaming pipeline
Review Kafka usage in a system

🎯 Why (Intent / Goal)

Examples:

Improve system decoupling
Ensure data durability
Reduce latency
Enable event-driven architecture
Support analytics or streaming use cases

📍 Where (Context / Situation)

Examples:

Kafka version and deployment (self-hosted / managed)
Throughput (events/sec)
Message size
Consumer count
Integration with other systems

⏰ When (Time / Phase / Lifecycle)

Examples:

Initial system design
Production scaling
Incident investigation
Migration or upgrade
Performance tuning

🔗 Final Prompt Template (Recommended Order)

1️⃣ Persistent Context (Put in `.cursor/rules.md`)

# Streaming & Messaging AI Rules — Apache Kafka

You are a senior Kafka engineer.

Think in terms of event logs, durability, and distributed systems.

## Core Principles

- Kafka is an event streaming platform, not a queue or database
- Assume production traffic and failures
- Favor explicit guarantees over assumptions

## Topic Design

- Topics represent business events
- Use clear naming and ownership
- Choose partition keys intentionally

## Producers & Consumers

- Handle retries and idempotence
- Design consumers to be idempotent
- Expect reprocessing

## Reliability

- Be explicit about delivery semantics
- Do not assume exactly-once implicitly
- Treat duplicates as normal unless prevented

## Operations

- Monitor lag and throughput
- Avoid hot partitions
- Plan for re-balances and failures

2️⃣ User Prompt Template (Paste into Cursor Chat)

Task:
[Describe what you want to design, implement, or debug with Kafka.]

Why it matters:
[Explain business or system goals: reliability, decoupling, scale.]

Where this applies:
[Kafka version, scale, deployment, constraints.]
(Optional)

When this is needed:
[Design phase, production issue, scaling event.]
(Optional)

✅ Fully Filled Example

Task:
Design Kafka topics and consumer groups for an order processing system.

Why it matters:
The system must handle high throughput while ensuring orders are processed reliably and can be reprocessed if needed.

Where this applies:
Kafka 3.x managed cluster, ~50k events/sec, multiple downstream consumers.

When this is needed:
During initial event-driven architecture rollout.

🧠 Why This Ordering Works

Who → How enforces correct streaming mental models
What → Why clarifies business and system intent
Where → When grounds solutions in scale and operational reality

Kafka rewards explicit design and discipline.
Context turns streams into reliable systems.

Happy Kafka Prompting 🧵🚀

📚 Table of Contents​

🏗️ Context-owned​

👤 Who (Role / Persona)​

Default Persona (Recommended)​

Expected Expertise​

🛠️ How (Format / Constraints / Style)​

📦 Format / Output​

⚙️ Constraints (Kafka Best Practices)​

🧱 Topic & Data Modeling Rules​

🔐 Reliability & Delivery Guarantees​

🧪 Performance & Operations​

📝 Explanation Style​

✍️ User-owned​

📌 What (Task / Action)​

🎯 Why (Intent / Goal)​

📍 Where (Context / Situation)​

⏰ When (Time / Phase / Lifecycle)​

🔗 Final Prompt Template (Recommended Order)​

1️⃣ Persistent Context (Put in .cursor/rules.md)​

2️⃣ User Prompt Template (Paste into Cursor Chat)​

✅ Fully Filled Example​

🧠 Why This Ordering Works​

📚 Table of Contents

🏗️ Context-owned

👤 Who (Role / Persona)

Default Persona (Recommended)

Expected Expertise

🛠️ How (Format / Constraints / Style)

📦 Format / Output

⚙️ Constraints (Kafka Best Practices)

🧱 Topic & Data Modeling Rules

🔐 Reliability & Delivery Guarantees

🧪 Performance & Operations

📝 Explanation Style

✍️ User-owned

📌 What (Task / Action)

🎯 Why (Intent / Goal)

📍 Where (Context / Situation)

⏰ When (Time / Phase / Lifecycle)

🔗 Final Prompt Template (Recommended Order)

1️⃣ Persistent Context (Put in `.cursor/rules.md`)

2️⃣ User Prompt Template (Paste into Cursor Chat)

✅ Fully Filled Example

🧠 Why This Ordering Works