Apache Kafka is a distributed streaming platform for building real-time data pipelines. This guide covers essential patterns for producing, consuming, and processing streaming data.
Core Concepts#
┌─────────────────────────────────────────────────────────────┐
│ KAFKA CLUSTER │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Broker │ │ Broker │ │ Broker │ │
│ │ 1 │ │ 2 │ │ 3 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Topic: orders │
│ ┌────────────┬────────────┬────────────┐ │
│ │ Partition 0│ Partition 1│ Partition 2│ │
│ │ [0,1,2,3] │ [0,1,2] │ [0,1,2,3,4] │
│ └────────────┴────────────┴────────────┘ │
└─────────────────────────────────────────────────────────────┘
Producers ─────────────────► Topics ─────────────────► Consumers
Producer Implementation#
Basic Producer#
Batched Producer#
Consumer Implementation#
Basic Consumer#
Batch Consumer#
Stream Processing#
Kafka Streams with TypeScript#
Event Aggregation#
Error Handling#
Dead Letter Queue#
Consumer Error Recovery#
Monitoring#
Consumer Lag Monitoring#
Best Practices#
- Use meaningful keys: For proper partitioning and ordering
- Set appropriate retention: Balance storage with replay needs
- Monitor consumer lag: Alert on growing lag
- Implement idempotency: Handle duplicate messages
- Use schemas: Avro or Protobuf for type safety
- Test failure scenarios: Network issues, broker failures
Conclusion#
Kafka enables scalable, fault-tolerant data streaming. Start with simple producers and consumers, then add stream processing as complexity grows. Focus on proper error handling and monitoring for production reliability.