Log retention

Last updated: Aug 09, 2022

When Kafka's disk gets full, the service can get stuck, leading us to drop all incoming events. To mitigate the issue, we can edit Kafka's log retention policies to free up some space. There are two configs we can set (both set minimum values and data can't be deleted beforehand):

time - kafka docs
bytes - kafka docs

Note that the retention check loop by default is ran every 5min retention check interval, we can change it to be more frequent, but probably don't need to.

We want to minimize the probability of kafka disk getting full and losing events and maximize disk usage to have as long retention as possible to have data to recover from in case something broke about ingestion. Therefore we suggest setting time to be relatively low (2h or 24h) and bytes to be about 90% of the volume size.

Questions?

Was this page useful?

PostgreSQL

PostgreSQL, also known as Postgres, is a free and open-source relational database management system emphasizing extensibility and SQL compliance. At PostHog we use it to store all our relational data models. Useful links Official documentation