A/B Test Sample Size Calculator (2026)
How many visitors per variant before your A/B test is statistically valid? Adjust baseline conversion, MDE, significance, and power , the numbers update instantly.
100% client-side. Your inputs stay in this browser.
Set your baseline conversion rate, the smallest lift you care about detecting, and how strict you want to be. The numbers below update instantly.
Saved with your inputs. The math is region-neutral; this is just for our records and to tailor future tools.
Per variant
31,235
visitors needed in each variant
Total
62,470
across control + variant
Estimated duration
32 days
at 1,000/day per variant
Test setup
95% conf · 80% power
Two-sided test, p1 = 5.00% → p2 = 5.50%
Sensitivity
If you tighten the MDE to +5.0% you need roughly 122,126 per variant. Loosen it to +20.0% and you only need 8,159.
Sample size as MDE changes
How per-variant sample size shrinks as the smallest lift you care about gets larger, at your current baseline, 95% confidence, 80% power.
How this tool works
The calculator uses the standard two-proportion z-test formula for sample size. You provide four parameters: your baseline conversion rate, the minimum detectable effect (the smallest relative improvement worth detecting), the statistical significance level (confidence that the result is not due to chance), and statistical power (the probability of detecting a real effect when one exists). The tool converts your significance and power levels into z-scores, computes the pooled variance of the two conversion rates, and outputs the visitors per variant needed to detect your specified lift. It also calculates test duration by dividing the per-variant sample size by your daily traffic per variant. A curve chart shows how sample size changes as you adjust the MDE, so you can see the trade-off between sensitivity and test duration.
Worked example
Baseline: 5%. MDE: 10% relative (detecting a lift to 5.5%). Significance: 95% two-sided. Power: 80%. Daily traffic per variant: 1,000. p_avg = 0.0525. SE = 0.3154. n = ((1.96 + 0.842) x 0.3154 / 0.005)^2 = 31,237 per variant. Total sample: 62,474. Test duration: 32 days. Widening MDE to 20% drops the sample to roughly 8,000 per variant and the test to 8 days. The MDE curve in the tool shows this trade-off across all values.
Frequently asked questions
What does \\\"minimum detectable effect\\\" mean?
MDE is the smallest improvement you care about finding. A 10% MDE on a 5% baseline means you want to detect a lift to 5.5%. Smaller MDEs require larger samples. If you set the MDE too small, the test runs for months. If you set it too large, you might miss a real but modest improvement.
Should I use a one-sided or two-sided test?
Use two-sided (the default) unless you are certain the change can only improve the metric, never hurt it. Two-sided tests detect both positive and negative effects. One-sided tests require fewer visitors but assume the variation cannot perform worse than the control. Most testing platforms default to two-sided. Test both approaches with real data from your business before committing to a single strategy.
What happens if I stop the test before reaching the sample size?
You risk a false positive or false negative. Stopping early because the result looks significant inflates your error rate. The sample size this tool calculates is the minimum needed to trust the result at your chosen significance and power levels. Run the full duration.
Is the test-duration estimate accurate?
Yes, if your daily traffic is stationary. If traffic spikes on weekends or drops during holidays, the actual duration will differ. The estimate gives you a baseline assuming even daily traffic. Run tests in full-week increments to account for weekly traffic cycles.
What baseline conversion rate should I use?
Use your last 30 days of conversion data for the page or flow you are testing. Do not use site-wide averages, which blend high-intent and low-intent pages. A checkout page at 3% is a very different test than a homepage CTA at 0.5%.
Why does lowering the MDE increase the sample size so much?
Because detecting a small difference between two similar proportions requires more data to distinguish the signal from noise. The relationship is roughly inverse-square: halving the MDE quadruples the required sample. This is a fundamental property of the statistical test, not a limitation of this tool.