How do you calculate needed sample size? How do you select sample size needed for AB Test?

Data Science Interview QuestionsCategory: Data ScienceHow do you calculate needed sample size? How do you select sample size needed for AB Test?
4 Answers
MockInterview Staff answered 7 years ago

Estimate a population mean:
– General formula is ME=t×SnME=t×Sn or ME=z×snME=z×sn
MEME is the desired margin of error
tt is the t score or z score that we need to use to calculate our confidence interval
ss is the standard deviation
Example: we would like to start a study to estimate the average internet usage of households in one week for our business plan. How many households must we randomly select to be 95% sure that the sample mean is within 1minute from the true mean of the population? A previous survey of household usage has shown a standard deviation of 6.95 minutes.

  • Z score corresponding to a 95% interval: 1.96 (97.5%, α2=0.025α2=0.025)
  • s=6.95s=6.95
  • n=(z×sME)2=(1.96×6.95)2=13.622=186n=(z×sME)2=(1.96×6.95)2=13.622=186

Estimate a proportion:
– Similar: ME=z×p(1p)n−−−−−√ME=z×p(1−p)n
Example: a professor in Harvard wants to determine the proportion of students who support gay marriage. She asks “how large a sample do I need?”
She wants a margin of error of less than 2.5%, she has found a previous survey which indicates a proportion of 30%.

MockInterview Staff answered 7 years ago

How to Determine the Statistical Significance of an A/B Test:

MockInterview Staff answered 7 years ago

If you are asked to do this in R: 

power.prop.test(p1=0.1, p2=0.11, power=0.8, alternative='two.sided', sig.level=0.05)

Explanation is here:

Staff answered 7 years ago

You want to run an AB test. How many participants do you need in your test?
As always, the answer is “it depends”. In this case, it depends on:

  1. What your base conversion rate is.
  2. How large of a difference you want to be able to detect.
  3. How concerned you are about Type I and Type II errors (false positives and false negatives)

There is no generic rule of thumb. Don’t trust any advice like “you want about 3,000 people in the test to be confident.” The correct sample size always depends on these 3 parameters for your specific test.

  • The lower the base conversion rate the more participants you’re going to need
  • To detect smaller differences you’re going to need more participants
  • If you want to increase your confidence in your result, you guessed it, you’re going to need more participants.

How to calculate necessary sample size
If you know your base conversion rate and what size difference you wish to detect it is easy to calculate the necessary sample size using R.

> power.prop.test(p1=0.25, p2=0.275, power=0.8, alternative='two.sided', sig.level=0.05)

So, in an ideal world you would run all tests as follows:

  1. Track your base conversion rate For example, 25% of people who reach a registration page successfully register.
  2. Agree on the size of the difference we want to detect We may only care about detecting relative differences of 10% or more (27.5% or better conversion using the example above)
  3. Decide on the desired significance level. This is the chance of a false positive. It is common to use 0.05 (which represents a 5% chance of a false positive)
  4. Decide on the desired statistical power. This is the chance of a false negative. It is common to use 0.80 (which means that if there is a difference there is a 20% chance we’ll miss it)
  5. Calculate the necessary sample size as described above. Using these examples we would need to have 4862 people in each group.
  6. Run the test until you have enough participants in both your control and treatment Don’t look at the results while the test is running
  7. End the test
  8. Analyze the test

Unfortunately, that isn’t how it normally goes in the real world:

  • We often don’t know what the baseline conversion is. Often times conversion rates for the control aren’t clearly tracked until you start the test. Sometimes, you’re unable to effecively baseline a conversion rate because it varies wildly. I have a little bit of experience dealing with optimizing ecommerce sites where inventory is only available for a limited time. The quality of inventory can have a large effect on the conversion rate, so it is very difficult to compare conversion rates across time.
  • Most AB Testing software provides real time results which make it easy to fall victim to repeated significance testing errors.

To combat these pitfalls we can use the control as an approximation of the true base conversion rate. Then we can use that as the base conversion rate to figure out how much longer we will need to run a test to detect a difference of the size demonstrated.


Your Answer

6 + 9 =