Debugging long-running migrations

Last updated:

When trying to upgrade a self-hosted Postgres-backed PostHog instance, we have seen some users have issues with long-running migrations on tables in the hot path.

For example, when trying to migrate the persons table, some users have reported that a lock is never acquired, making the migration hang.

This usually happens because some analytics queries are taking too long to complete.

To get rid of these queries and run the migration, you can use the script below on the node where the PostHog Django server (web) is running.

⚠️ Warning! Proceed with caution. The following script has potentially destructive behavior.

Python script
Python
# How to run:
# Add this code to a delete_queries.py file
# Run python manage.py shell < delete_queries.py
# Set DRY_RUN=0 if you're sure you know what you're doing
import os
from time import sleep
from django.db import connection
QUERY_TIME_LIMIT = os.environ.get("QUERY_TIME_LIMIT", "5 minutes")
TARGET_TABLE = os.environ.get("TARGET_TABLE", "posthog_person")
DRY_RUN = os.environ.get("DRY_RUN", "1") == "1"
MAX_TRIES = 10
def fetch_open_queries():
res = []
with connection.cursor() as cursor:
cursor.execute(
f"""
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '{QUERY_TIME_LIMIT}'
AND state = 'active'
"""
)
res = cursor.fetchall()
return res
open_queries = fetch_open_queries()
tries = 0
while open_queries and len(open_queries) > 0 and tries < MAX_TRIES:
for query_details in open_queries:
if TARGET_TABLE in query_details[2]:
print(f"Preparing to kill query {query_details[0]}:")
print(query_details[2])
print(f"It has been active for {query_details[1].seconds}s")
if not DRY_RUN:
with connection.cursor() as cursor:
cursor.execute(f"SELECT pg_terminate_backend({query_details[0]})")
print(f"Killed query {query_details[0]}")
print("")
open_queries = fetch_open_queries()
tries += 1
sleep(1)
I'd rather run the SQL myself

Instead of the script, you can also look for long running queries in Postgres yourself, like so:

SQL
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
AND state = 'active'

Using the PIDs from the query above, you can then kill them like so:

SQL
SELECT pg_terminate_backend(pid)

Questions?

Was this page useful?

Next article

Making sense of the plugin server

The plugin server is the core of event ingestion and the plugins platform at PostHog. Despite being called the "plugin server", it is a core service that PostHog cannot function without . This doc should give you a sense of what the plugin server is, how it operates, why it works this way, as well as a better sense of how to operate it in production. So buckle up, the journey is about to start. What does the plugin server do? Before getting into the hows and the whys, it's important to…

Read next article