How A/B Testing Works — Live in 5 Minutes
Three steps to data-driven decisions. One line of code, and you're testing.
Add the snippet
Copy one line of code into your website's <head>. That's it. Works with any HTML page, WordPress, Shopify, Next.js, Nuxt, or any other framework.
<!-- Otter -->
<style id="optimo-hide">body{opacity:0 !important}</style>
<script src="https://www.otterab.com/sdk/optimo.js"
key="YOUR_API_KEY" async></script>Create your test
Name your test, set the target URL, and define your variants. Add DOM changes — swap headlines, images, buttons, or entire sections. Set your conversion goals: page views, clicks, custom events, or revenue.
- Define multiple variants with different weights
- 10 DOM change types: text, HTML, attributes, styles, classes, and more
- 4 goal types: pageview, click, custom event, revenue
Get results
Visitors are automatically assigned to variants using our deterministic algorithm. Conversions are tracked in real-time. Choose frequentist or Bayesian analysis, and Otter will surface the right decision score, lift, and winner status for the method you selected.
Under the hood
What happens between a visit and a decision
A reliable A/B test depends on four things working together: a fast SDK, a stable variant assignment, a goal that maps to revenue or behavior, and a statistical decision that actually means something. Here is how each piece fits.
1. The SDK loads early and hides the page briefly
The snippet you paste into your <head> does two things before anything else renders. First, it applies an anti-flicker style that hides the body for up to 300 milliseconds while variants are resolved. Second, it loads optimo.js asynchronously so it never blocks the rest of the page. A 3-second failsafe removes the anti-flicker style automatically if the SDK fails to load — your visitors are never stuck on a blank screen.
The whole SDK is under 9KB gzipped. After the first page, it lives in the browser cache, so subsequent navigations add essentially no latency. There is no virtual DOM, no React tree, and no framework — it is plain JavaScript that touches the DOM directly so changes apply on the first paint.
2. Visitors are assigned deterministically
Each visitor gets a stable assignment ID stored in a first-party cookie and replayed on every request. Variant assignment is a deterministic hash of (visitor_id, test_id) — the same visitor sees the same variant across sessions, devices that share the cookie, and any subdomains under the project domain.
Weights are exact and not approximated. A 50/50 split splits 50/50. An 80/10/10 split splits 80/10/10. There is no "balancing" pass that nudges traffic between variants partway through the test, which would invalidate the statistical model.
3. Goals connect to revenue, not vanity
Otter supports four goal types: pageview, click, custom event, and revenue. The first three answer "did the visitor do this?" The revenue goal answers the harder question — "did this variant make us more money?" — by attaching a monetary value to a conversion and computing average order value, revenue per visitor, and incremental revenue alongside the conversion lift.
For ecommerce teams, this matters because conversion-rate winners can lose on revenue if the variant moves customers toward cheaper purchases. Revenue tracking surfaces that conflict instead of hiding it.
4. The decision uses the right math for the test
Choose frequentist or Bayesian analysis per test. The frequentist mode reports p-values and confidence intervals with a configurable threshold — usually 95%, but 90% is a defensible choice for low-risk tests on small traffic. For multivariate tests, the threshold adjusts automatically to control the false-discovery rate so you do not "win" by running enough variants.
The Bayesian mode reports the probability that each variant beats control plus the expected lift distribution. It is the better fit when you need to make a decision before reaching textbook significance — the question becomes "given everything we know, what is the chance this variant is better?" rather than "is the p-value below 0.05?"
Either way, Otter exposes the resolved decision label, score, and effective threshold on the results page — the same value the API and CSV export return — so the math and the UI never disagree.
Setting expectations
When A/B testing pays off — and when it doesn't
A/B testing is the right tool when
- • You have enough traffic to detect the lift you care about — typically a few thousand conversions per variant.
- • The change you are testing is meaningful enough that a 5–20% lift is plausible.
- • The decision is reversible and the cost of being wrong is low.
- • You care about a measurable outcome — conversion, revenue, retention — more than a qualitative feeling.
A/B testing is the wrong tool when
- • Your traffic is too small for the effect you can detect in a reasonable timeframe — a sample-size calculator will tell you up front.
- • You are testing brand or strategic decisions where the right answer is "what should we mean," not "what converts higher."
- • The change is irreversible and being wrong is expensive.
- • You are looking for explanations, not decisions — qualitative research and session replay are usually better.
The fastest way to find out if you have enough traffic is the sample size calculator. Run it before launching anything.
Works with any stack
Drop in the snippet and you're live.