What do the colours and words mean?

Each check gets one of four statuses. Green ("all clear") means that check looks healthy. Yellow ("warning") means something is worth a look before you trust the result. Red ("issue") means stop — something is likely wrong and the numbers can't be trusted until you fix it. Blue ("notice") is just information, not a problem. The card's headline always reflects the most serious thing it found.

What does the 'sample size' check tell me?

It compares how many visitors you've collected against how many your test was planned to need. If you've reached the target, you're green and can read the score at face value. If you're still short, it warns you and shows roughly how many days are left at your current traffic. Reaching the planned number is what stops a result that looks exciting on a small sample from quietly vanishing once more people arrive.

It says 'confident winner before full sample' — can I ship now?

You can, but do it knowingly. Your score crossed the line early, which is great, but stopping at the very first good reading makes a fluke more likely than the score suggests. The safest move is to let the test reach its planned visitors. If you ship early, it helps to first see the score stay above the line for several days in a row rather than acting on a single lucky moment.

What is a 'sample ratio mismatch' and why is it red?

When you set up the test you chose how to split traffic — say 50/50. This check counts how many visitors each version actually got. If the real split is very far from what you asked for (far enough that random chance almost certainly isn't the cause), something is broken in how visitors are being sent to each version. That usually bends every other number on the page, so it's flagged red. Common causes: a redirect that loses visitors before they're counted, or filtering that hits one version harder. Fix the cause and start a fresh test — the data already collected can't be rescued.

Why does the traffic-split check sometimes say 'not enough data'?

Below about 200 visitors total, a wobbly split is just normal randomness, not a real problem — so the check waits. It turns on by itself once enough visitors have arrived. There's nothing for you to do in the meantime.

What does the 'secondary goals' check watch for?

A version can win your main goal while quietly hurting something else — for example, more sign-ups but fewer purchases. This check looks at your other (secondary) goals and warns you if one of them has dropped meaningfully. If it flags a goal, open that goal in the results below and decide whether the main win is still worth it before you ship.

What does 'config changes during the test window' mean?

It means the test was edited after it started — a variant, a goal, or who it targets was changed mid-flight. That mixes together visitors who saw slightly different things, which can quietly spoil the comparison. The card points you to the version history so you can see exactly what changed and when. Small cosmetic edits (fixing a typo, renaming a goal) are usually fine; changing what visitors actually saw usually means you should restart the test.

Why did a blue 'active segment filter' notice appear?

Because you're looking at a slice of your visitors (for example, only mobile, or only one country) instead of everyone. Slices have fewer people and are noisier, so a result that looks strong inside a segment is much more likely to be a coincidence. Clear the filter to read the result for all your traffic before deciding, and treat anything you spot in a segment as an idea to test next — not a conclusion.

Do these checks change or stop my test?

No. The health card only looks and reports — it never pauses, stops, or edits anything on its own. (If you want a test to stop automatically, that's a separate feature: see Stop Conditions.) The checks update by themselves as new visitors and conversions come in, so the card always reflects the latest data.

Analyzing Results5 min read

Experiment Health

Q: What is the Experiment health card?

It's a little checklist that sits on every test's results page. Before you decide a winner, it quietly checks a few things that often trip people up — like whether you've collected enough visitors and whether your traffic split looks right — and gives each one a simple status. Think of it as a friend who looks over your shoulder and says "hang on, check this first" before you make a call.

Q: Where do I find it?

Open any test and go to its results page. The card is labelled "Experiment health" and starts collapsed, showing a one-line summary like "Looks clean — safe to read" or "A few things worth a look." Click it to open the full list of checks. Click any single check to see why it matters, what your result means right now, and what to do about it.

An automatic checklist on every results page that tells you whether your numbers are safe to trust yet, and what to fix if they aren't.

Browse docs

Every results page has an automatic checklist that tells you whether your numbers are safe to trust yet — and what to fix if they aren’t. Think of it as a friend looking over your shoulder before you call a winner.

On every test’s results page you’ll find a card labelled Experiment health. Before you decide a winner, it quietly checks a handful of things that trip people up — like whether you’ve collected enough visitors and whether your traffic split looks right — and gives each one a simple status. It only looks and reports: it never pauses, stops, or edits your test.

The card starts collapsed, showing a one-line headline like “Looks clean — safe to read” or “A few things worth a look.” Click it to open the full list, then click any single check to see why it matters, what your result means right now, and what to do about it. The headline always reflects the most serious thing the card found.

What the statuses mean

All clear

This check looks healthy. Nothing to do.

Warning

Worth a look before you trust the result.

Issue

Stop — something is likely wrong. Fix it before deciding.

Notice

Just information, not a problem.

The checks it runs

Sample size

Have you collected enough visitors yet?

Compares the visitors you’ve gathered against the number your test was planned to need, and shows roughly how many days are left at your current traffic. Reaching the target is what keeps an exciting-looking result from quietly vanishing once more people arrive. If your score crosses the line early, it lets you know — so you can choose to wait or ship knowingly.

Traffic split

Did each version get its fair share of visitors?

You chose how to split traffic when you set the test up — say 50/50. This counts what actually happened. If the real split is far from what you asked for, visitors aren’t being shared out correctly, which bends every other number on the page. Below about 200 visitors it waits, because a wobbly split that early is just normal randomness.

Secondary goals

Is a winner quietly hurting your other goals?

A version can win your main goal while harming something else — more sign-ups but fewer purchases, for example. This watches your other goals and warns you if one of them drops meaningfully, so you can decide whether the main win is still worth it.

Mid-test changes

Was the test edited after it started?

Editing a variant, goal, or targeting after a test starts mixes together visitors who saw different things, which can spoil the comparison. This spots edits made during the test window and points you to the version history so you can see exactly what changed.

Active segment filter

Are you looking at everyone, or just a slice?

Appears only when you’ve filtered the results to a slice of visitors (like mobile only, or one country). Slices are smaller and noisier, so a result that looks strong inside one is more likely to be a coincidence. Clear the filter to read the result for all your traffic before deciding.

Getting the most from it

Open it before you call a winner. A glance at the headline tells you whether to trust the page. A red issue means the numbers can’t be trusted until you fix the cause — for a traffic-split problem, that usually means fixing the cause and starting a fresh test, because the data already collected can’t be rescued.

A clean card isn’t a guarantee of a real winner. It means the common traps it checks for didn’t fire. You still need a score that’s reached your confidence threshold and a result that makes sense for your business. See Reading Results for how to read the score itself.

Let it do the worrying. The checks update by themselves as new visitors and conversions arrive, so the card always reflects the latest data — you don’t have to re-run anything.

Frequently asked questions

Quick answers to the questions teams ask most about this part of Otter.