Question 1

Which analysis method should I choose?

Accepted Answer

Frequentist is the right default for most teams — it's the classic A/B testing approach, well-understood, and what most stats training assumes. Choose Bayesian if you want a more intuitive interpretation ("there is a 95% chance variant B beats control"), if you run lots of low-traffic tests where peeking matters, or if you're already running Bayesian elsewhere and want consistency.

Question 2

What confidence level should I configure?

Accepted Answer

95% is the industry standard and what most teams should pick. Use 90% if you're running rapid iteration tests on lower-stakes changes (button copy, minor layout) and want to ship more frequently. Use 99% for high-stakes changes (pricing pages, checkout flow) where the cost of a false positive is high. 80% is rarely the right answer.

Question 3

Why does my multivariate test need a higher threshold?

Accepted Answer

When you run multiple challenger variants, the chance that at least one of them looks like a winner by random chance increases with each variant. The Bonferroni correction divides your alpha (1 - confidence_level) by the number of challengers, raising the per-variant bar. A 95% test with two challengers needs each variant to clear 97.5%, not 95%. Bayesian tests don't apply this correction — their probabilistic interpretation handles multiplicity naturally.

Question 4

What does 'Chance to Beat Original' mean in Bayesian mode?

Accepted Answer

It's the probability that the variant's true conversion rate (or revenue per visitor) is higher than the control's, given the data we've observed so far. If it's 97%, there's a 97% chance variant beats control and a 3% chance control beats variant. When the variant is currently behind, the label flips to 'Chance Original Beats Variant' so the percentage stays positive and interpretable.

Question 5

What is the 'effective confidence threshold' I see on the results page?

Accepted Answer

It's the actual score your test needs to clear to be declared a winner, after any adjustments. For single-variant frequentist tests, it equals your configured confidence level. For multivariate frequentist tests, it's the Bonferroni-adjusted threshold. For Bayesian tests, it equals your configured confidence level (no adjustment). The progress bar on the results page fills relative to this threshold.

Question 6

How do I know when a test is done?

Accepted Answer

Three signals: (1) the score reaches the effective confidence threshold; (2) you've hit the sample-size estimate the wizard provided; (3) the conversion-rate confidence intervals are tight enough that you can act on the result. Stopping the moment the score crosses the threshold is fine in Bayesian mode but risks false positives in frequentist mode unless you've also hit the sample size.

Question 7

Why is my conversion rate different from my analytics tool?

Accepted Answer

Otter A/B counts conversions attributed to a single variant per visitor, deduplicated within the test window. Analytics tools usually count every event, including repeat conversions from the same person, and don't filter for variant assignment. The two numbers won't match exactly — they're measuring different things.

Question 8

Can I peek at the results before the test is done?

Accepted Answer

Yes, but with caveats. In Bayesian mode, peeking is mathematically safe — the probability interpretation doesn't degrade. In frequentist mode, peeking and stopping early inflates false positives. If you peek in frequentist mode, commit to either letting it run to the sample size or switching to Bayesian for that test.

Reading Results

Reading Results

Frequentist

Bayesian

Effective confidence threshold (Frequentist only)

Common fields on the results page

Reading results well

Frequently asked questions