A five-step checkout funnel. Browse → Configure → Order Review → Contact Details → Payment. Analytics shows users regularly backtrack to verify their selections before committing. So the team tests a variation - keep product information visible on every screen to reduce uncertainty and drop-off.

The results come in. 236,000 visitors across control and variation.

Metric Measures Impact
Browse → Purchase End-to-end conversion (primary) +4.13%
Browse → Order Review Upper-funnel progression -1.45%
Order Review → Payment Lower-funnel progression +5.53%

The readout conclusion:

Value concentrates in the lower funnel, so deploy the variation from Order Review onward and skip the early stages where it causes friction.

Clean logic. Actionable recommendation. Supported by the numbers.

Wrong.

Where the reasoning breaks

Order Review → Payment is not measured across the full randomized population. It is measured only among users who reached Order Review - a self-selected subgroup in both control and variation. The variation can change who makes it through, even when the aggregate pass-through rate stays flat. The -1.45% upper-funnel drop makes that composition shift undeniable, but the problem exists regardless.

This is post-treatment selection bias - conditioning on a variable the treatment itself can influence. The step-level metrics look like they tell you where the variation created value. The experiment never tested that.

It persists because step-level metrics are the default view in every analytics platform. The data is presented this way, so it gets interpreted this way.

The thought experiment that makes it obvious

Imagine the variation were not helpful at all. Imagine it were a massive, ugly distraction that cluttered the early screens.

It would filter out all but the most determined buyers before they ever reached Order Review. The casual browsers - the ones most likely to abandon at payment - would already be gone. The surviving population would be disproportionately high-intent.

Lower-funnel conversion would increase. Not because the variation helped anyone, but because the only users left were the ones who were going to buy anyway. The lift would show up in exactly the same place - Order Review → Payment - and it would have nothing to do with the variation’s value in the lower funnel.

I am not saying this is what happened. What I am saying is that the data is equally consistent with this explanation. The experiment cannot distinguish between them.

What the upper-funnel decline actually tells you

The -1.45% has at least four causes, and each implies a different decision:

  • Friction. Visual clutter slows users down. Remove the variation from the upper funnel. This is what the readout assumed.
  • Deliberate decision-making. Users validate more carefully before progressing. Keep the variation - it is working.
  • Earlier error correction. Users catch configuration mistakes sooner instead of discovering them at payment. Keep it - fewer downstream reversals.
  • Composition shift. A different population reaches Order Review. Every downstream metric is measured against a changed baseline. You cannot act on this without a separate test.

Only one of these supports the readout’s recommendation. The data does not tell you which one you are looking at.

Bottom line

The experiment proved one thing: product information throughout the funnel increases purchase conversion by 4.13%. Ship it as tested.

Everything else - where the value lives, which stages to keep, which to cut - is a story the data cannot tell. The numbers feel precise enough to answer those questions. That is exactly why they are dangerous.