Disaster recovery considerations

Last updated: Nov 02, 2022

It's important to make sure you understand how to recovery from scenarios before they happen.

Data considerations

Data is stored in PostgreSQL, ClickHouse and Kafka. At a minimum you should have backups of PostgreSQL and ClickHouse, these are crucial to PostHog. Kafka stores the incoming analytics events which are then persisted into ClickHouse.

Note: in order to work, ClickHouse and Kafka use Zookeeper as distributed key value store and locking service. While Zookeeper doesn't host any data related to PostHog it's a core dependency of the overall stack.

PostgreSQL

PostgreSQL stores your PostHog instance configuration and authentication/authorization.

We recommend that you use a provider that supports backups. For example, AWS RDS will handle backups and restores for you. Other cloud providers offer similar services.

ClickHouse

For ClickHouse, see our ClickHouse backup runbook.

ClickHouse consumes from Kafka using a static consumer group. Make sure that you only ever have one cluster consuming at a time. When bringing up a cluster from backup it is a good idea to ensure that the consumer cursors are reset such that the cluster consumes any events in Kafka that may be missing from your ClickHouse the backup.

Kafka

Kafka is the first place events are persisted before being consumed by other services, and ultimately ending up in ClickHouse. Solutions exist for streaming this data to long term storage. For example, with AWS you can use either the Lenses.io S3 Kafka Connector or AWS Kinesis Firehose.

If PostHog is consuming events to ClickHouse in good time, you should not lose much in case of a Kafka failure, so it may be a risk that you want to accept.

Zookeeper

ZooKeeper is a centralized service providing distributed synchronization. It is a core dependency of both ClickHouse and Zookeeper.

Questions?

Was this page useful?

Troubleshooting and debugging your PostHog instance

If you are looking for routine procedures and operations to manage PostHog installations like begin, stop, supervise, and debug a PostHog infrastructure, please take a look at the runbook section. Troubleshooting Helm failed for not enough resources While running helm upgrade --install you might run into an error like timed out waiting for the condition One of the potential causes is that Kubernetes doesn't have enough resources to schedule all the services PostHog needs to run. To know if…