Statistical significance

Last updated:

For your results and conclusions to be valid, any experiment must have a significant exposure. For instance, if you test a product change and only one user sees the change, you can't extrapolate from that single user that the change will be beneficial/detrimental for your entire user base. This is true for any experiment that is a simple randomized controlled experiment (e.g. this is also done when testing new drugs or vaccines).

Furthermore, even a large sample size (e.g. approx. 10,000 participants) can result in ambiguous results. If, for example, the difference in conversion rate between the variants is less than 1%, it's hard to say whether one variant is truly better than the other. To be significant, there must be enough difference between the conversion rates, given the exposure size.

PostHog computes this significance for you automatically - we will let you know if your experiment has reached significant results or not. Once your experiment reaches significant results, it's safe to use those results to reach a conclusion and terminate the experiment. You can read more about how we do this in our 'Advanced' section below.

Calculating exposure

Since count data can be over a total count, vs. the number of unique users, we use a proxy metric to measure exposure: The number of times $feature_flag_called event returns control or test is the respective exposure for the variant. This event is sent automatically when you do: posthog.getFeatureFlag().

It's possible that a variant showing fewer count data can have higher probability, if its exposure is much smaller as well.

How we determine significance

In the early days of an experiment, data can vary wildly, and sometimes one variant can seem overwhelmingly better. In this case, our significance calculations might say that the results are significant, but this shouldn't be the case, since we need more data.

Thus, before we hit 100 participants for each variant in an experiment, we default to results being not significant. Further, if the probability of the winning variant is less than 90%, we default to results being not significant.

So, you'll only see the green significance banner when all 3 conditions are met:

  1. Each variant has >100 unique users
  2. The calculations above declare significance
  3. The probability of being the best > 90%.

Questions?

Was this page useful?

Next article

Experiments under the hood

Below are all formulas and calculations we go through when recommending sample sizes and determining significance. How we ensure distribution of people For every experiment, we leverage PostHog's multivariate feature flags. There's one control group, and up to three test groups. Based on their distinctID, each user is randomly distributed into one of these groups. This is stable, so the same users, even when they revisit your page, stay in the same group. We do this by creating a hash out of the…

Read next article