BigQuery Export

Last updated:

This app streams events from PostHog into BigQuery as they are ingested.

Installation

The BigQuery Export app requires either PostHog Cloud, or a self-hosted PostHog instance running version 1.30.0 or later.

Not running 1.30.0? Find out how to update your self-hosted PostHog deployment!

Enabling the destination

  1. Visit the 'Apps' page from PostHog.
  2. Search for 'BigQuery' and select the 'BigQuery Export' app.
  3. Click on the blue settings icon and follow the configuration steps:
    1. Upload your Google Cloud key .json file. (See below for instructions on how to retrieve this.)
    2. Enter your Dataset ID
    3. Enter your Table ID
  4. Watch events roll into BigQuery

Setting up BigQuery access

To set the right permissions up for the BigQuery plugin, you'll need:

  1. A service account.
  2. A dataset which has permissions allowing the service account to access it.

Here's how to set these up so that the app has access only to the table it needs:

  1. Create a service account. Keep hold of the JSON file at the end of these steps for setting up the app, and remember the name too.

  2. Create a role which has only the specific permissions the PostHog BigQuery app requires (listed below), or use the built in BigQuery DataOwner permission. If you create a custom role, you will need:

    • bigquery.datasets.get
    • bigquery.tables.create
    • bigquery.tables.get
    • bigquery.tables.list
    • bigquery.tables.updateData
  3. Create a dataset within a BigQuery project (ours is called posthog, but any name will do).

  4. Follow the instructions on granting access to a dataset in BigQuery to ensure your new service account has been granted either the role you created or the "BigQuery Data Owner" permission.

SQL_workspace_–_BigQuery_–_Data_Warehouse_Exp_–_Google_Cloud_Platform

Use the Share Dataset button to share your dataset with your new service account and either the BigQuery DataOwner role, or your custom role created above. In the below, we've used a custom role PostHog Ingest.

SQL_workspace_–_BigQuery_–_Data_Warehouse_Exp_–_Google_Cloud_Platform

That's it! Once you've done the steps above, your data should start flowing from PostHog to BigQuery.

Event schema

Here is a summary of all the fields that are exported to BigQuery.

FieldTypeDescription
uuidSTRINGThe unique ID of the event within PostHog
eventSTRINGThe name of the event that was sent
propertiesSTRINGA JSON object with all the properties sent along with an event
elementsSTRINGElements surrounding an autocaptured event
setSTRINGA JSON object with any person properties sent with the $set field
set_onceSTRINGA JSON object with any person properties sent with the $set_once field
distinct_idSTRINGThe distinct_id of the user who sent the event
team_idSTRINGThe team_id for the event
ipSTRINGThe IP address that was sent with the event
site_urlSTRINGThis is always set as an empty string for backwards compatibility
timestampTIMESTAMPThe timestamp when the event was ingested into PostHog
bq_ingested_timestampTIMESTAMPThe timestamp when the event was sent to BigQuery

Configuration

OptionDescription
JSON file with your google cloud key
Type: attachment
Required: True
Dataset ID
Type: string
Required: True
In case Google Cloud tells you "my-project-123245:Something", use "Something" as the ID.
Table ID
Type: string
Required: True
A table will be created if it does not exist.
Events to ignore
Type: string
Required: False
Comma separated list of events to ignore
Maximum upload size in bytes
Type: string
Required: False
Default 1MB. Upload events after buffering this many of them. The value must be between 1 MB and 10 MB.
Export events at least every X seconds
Type: string
Required: False
Default 30 seconds. If there are events to upload and this many seconds has passed since the last upload, then upload the queued events. The value must be between 1 and 600 seconds.
Export the property $elements on events that aren't called `$autocapture`?
Type: choice
Required: False
Advanced

Troubleshooting

What should I do if events aren't showing up?

The best way to debug events not showing up is by viewing the logs, which can be accessed by clicking the 'Logs' icon just to the left of the blue settings button. This will bring up a new panel with a list of all the most recent logs from our app. Take a look back through the log and see if there are any ERROR messages that can help provide more information on why the export is failing.

Tip: You can filter down and only view ERROR or WARN messages using the toggles at the top of the panel next to 'Show logs of type'

Why am I seeing duplicate PostHog events in BigQuery?

There's a very rare case when duplicate events appear in BigQuery. This happens due to network errors, where the export seems to have failed, yet it actually reaches BigQuery.

While this shouldn't happen, if you find duplicate events in BigQuery, follow these Google Cloud docs to manually remove them.

Here is an example query based on the Google Cloud docs that would remove duplicates:

SQL
WITH
-- first add a row number, one for each uuid
raw_data AS
(
SELECT *,
Row_number() OVER (partition BY uuid) AS row_number
FROM `<project_id>.<dataset>.<TABLE>`
WHERE date(timestamp) = '<YYYY-MM-DD>' ),
-- now just filter for one row per uuid
raw_data_deduplicated AS
(
SELECT *
EXCEPT (row_number)
FROM raw_data
WHERE row_number = 1 )
SELECT *
FROM raw_data_deduplicated ;

FAQ

Who created this app?

We'd like to thank PostHog team members Yakko Majuri, Marius Andra, Neil Kakkar, Michael Matloka and community member Edward Hartwell Goose for creating this BigQuery Export app.

Who maintains this app?

This app is maintained by PostHog. If you have issues with the app not functioning as intended, please let us know!

What if I have feedback on this app?

We love feature requests and feedback! Please tell us what you think! to tell us what you think.

What if my question isn't answered above?

We love answering questions. Ask us anything via our community forum, or drop us a message.

Questions?

Was this page useful?

Next article

Customer.io Connector

The Customer.io Connector sends event data from PostHog into Customer.io. User emails will also be sent if available and customers will be created in Customer.io. Requirements The Customer.io Connector requires either PostHog Cloud, or a self-hosted PostHog instance running version 1.30.0 or later. Not running 1.30.0? Find out how to update your self-hosted PostHog deployment ! You'll also need access to the relevant Customer.io account. Installation Visit the 'Apps' page in your instance of…

Read next article