Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
At PostHog we mainly use it to stream events from our ingestion pipeline to ClickHouse.
Dictionary
broker
: a cluster is built by one or more servers. The servers forming the storage layer are called brokersevent
: an event records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headersproducers
: client applications that publish (write) events to Kafkaconsumer
: client application subscribed to (read and process) events from Kafkatopic
: group of eventspartition
: topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokersreplication
: to make your data fault-tolerant and highly-available, every topic can be replicated, so that there are always multiple brokers that have a copy of the data just in case things go wrong