Scaling ClickHouse Vertically

Last updated:

🌇 Sunsetting Kubernetes Deployments

This page covers our PostHog Kubernetes deployment, which we are currently in the process of sunsetting. Existing customers will receive support until May 31, 2023 and we will continue to provide security updates for the next year.

For existing customers
We highly recommend migrating to PostHog Cloud (US or EU). Take a look at this guide for more information on the migration process.
Looking to continue self-hosting?
We still maintain our Open-source Docker Compose deployment. Instructions for deploying can be found here.

How to scale ClickHouse vertically

Currently the easiest way to scale up a ClickHouse environment hosted by our helm chart config is to set the affinity for which node ClickHouse is deployed to and scale that node up in terms of the resources it has available to it. This is very easy to do in practice. Let's get down to the nuts and bolts of how to get this done!

  • Create a node instance or group with more CPU and memory in your K8s cluster with a label of clickhouse:true set on it (this will be used to target that node for ClickHouse deployment). There are a few ways to create a node group and most are implementation specific to your kubernetes platform. A few references for how to create an manage node groups can be found for GKE, EKS, and DigitalOcean.
    • Essentially if you know the node that you want ClickHouse to be installed on you can run kubectl label nodes <desired-clickhouse-node-name> clickhouse=true
    • To restrict other pods from not using that node we can add a taint via kubectl taint nodes <desired-clickhouse-node-name> dedicated=clickhouse:NoSchedule
  • Update your values.yaml:
clickhouse:
nodeSelector:
clickhouse: "true"
tolerations:
- key: "dedicated"
value: "clickhouse"
operator: "Equal"
effect: "NoSchedule"
  • You might need to trigger the reallocation for the clickhouse pod, e.g. run kubectl delete pod chi-posthog-posthog-0-0-0

You can find more information about optional settings like that here and also more about nodeSelectors and taints and tolerations.

Questions?

Was this page useful?

Next article

Diagnosing ClickHouse load using ClickHouse

One of the more powerful features of ClickHouse is its introspective capabilities. This can be easily leveraged to understand where load on our multi-tenant clickhouse servers is coming from. Show me the queries: The following query gives an at-a-glance overview of what is generating load on the cluster: Advanced To diagnose further, it's important to understand ClickHouse operations . Useful dimensions to slice the data on: query_duration_ms - How long the query took formatReadableSize(read…

Read next article