Comparing Two Proportions - Sample Size - Select Statistical Consultants

Calculator
What confidence level do you need? Typical choices are 90%, 95% or 99%	%	This reflects the confidence with which you would like to detect a significant difference between the two proportions. The higher the confidence level, the larger the sample size.
What power do you need? A common choice is 80%	%	The power is the probability of detecting a signficant difference when one exists. The higher the power, the larger the sample size.
What do you believe the likely sample proportion in group 1 to be?	%	What do you expect the sample proportion to be? This can often be determined by using the results from a previous survey, or by running a small pilot study.
What do you believe the likely sample proportion in group 2 to be?	%	What do you expect the sample proportion to be? This can often be determined by using the results from a previous survey, or by running a small pilot study.
Your recommended sample size is	95	This is the minimum sample size you need for each group to detect whether the stated difference exists between the two proportions (with the required confidence level and power).

Alternative Scenarios
With a confidence level of	%	%	%
Your sample size would be	75	95	141
With a power of	%	%	%
Your sample size would be	75	95	127
With a sample proportion in group 1 of	%	%	%
And with a sample proportion in group 2 of	%	%	%
Your sample size would be	1531	385	40

More Information

Worked Example

Before implementing a new marketing promotion for a product stocked in a supermarket, you would like to ensure that the promotion results in a significant increase in the number of customers who buy the product. Currently 15% of customers buy this product and you would like to see uptake increase to 25% in order for the promotion to be cost effective. In this case you would need to compare 248 customers who have received the promotional material and 248 who have not to detect a difference of this size (given a 95% confidence level and 80% power).

Formula

This calculator uses the following formula for the sample size n:

n = (Z_α/2+Z_β)² * (p₁(1-p₁)+p₂(1-p₂)) / (p₁-p₂)²,

where Z_α/2 is the critical value of the Normal distribution at α/2 (e.g. for a confidence level of 95%, α is 0.05 and the critical value is 1.96), Z_β is the critical value of the Normal distribution at β (e.g. for a power of 80%, β is 0.2 and the critical value is 0.84) and p₁ and p₂ are the expected sample proportions of the two groups.

Note: A reference to this formula can be found in the following paper (pages 3-4; section 3.1 Test for Equality).

Wang, H. and Chow, S.-C. 2007. Sample Size Calculation for Comparing Proportions. Wiley Encyclopedia of Clinical Trials.

Discussion

The above sample size calculator provides you with the recommended number of samples required to detect a difference between two proportions. By changing the four inputs (the confidence level, power and the two group proportions) in the Alternative Scenarios, you can see how each input is related to the sample size and what would happen if you didn’t use the recommended sample size.

For some further information, see our blog post on The Importance and Effect of Sample Size.

Most sample size calculations assume that the population is large (or even infinite). With a finite, small population, the variability of the sample is actually less than expected, and therefore a “finite population correction”, FPC, can be applied to account for this greater efficiency in the sampling process.

For a large population (greater than 100,000 or so), there’s not normally any correction needed to the standard sample size formulae available. For large, finite populations, the FPC will have little effect and the sample size will be similar to that for an infinite population. This is explained in more detail in our blog: Why Use A Complex Sample For Your Survey.

However, the effect of the FPC will be noticeable if one or both of the population sizes (N’s) is small relative to n in the formula above. To apply a finite population correction to the sample size calculation for comparing two proportions above, we can simply include f₁=(N₁-n)/(N₁-1) and f₂=(N₂-n)/(N₂-1) in the formula as follows.

Substituting f₁ and f₂ into the formula below, we get the following.

n = (Z_α/2+Z_β)² * (f₁*p₁(1-p₁)+f₂*p₂(1-p₂)) / (p₁-p₂)²

…becomes:

n = X*A / (1 + X*B),

where

X = (Z_α/2+Z_β)² / (p₁-p₂)²,

A = (N₁/(N₁-1))*(p₁*(1-p₁)) + (N₂/(N₂-1))*(p₂*(1-p₂)), and

B = (1/(N₁-1))*(p₁*(1-p₁)) + (1/(N₂-1))*(p₂*(1-p₂))

Definitions

Confidence level

This reflects the confidence with which you would like to detect a significant difference between the two proportions. If your confidence level is 95%, then this means you have a 5% probability of incorrectly detecting a significant difference when one does not exist, i.e., a false positive result (otherwise known as type I error).

Power

The power is the probability of detecting a signficant difference when one exists. If your power is 80%, then this means that you have a 20% probability of failing to detect a significant difference when one does exist, i.e., a false negative result (otherwise known as type II error).

Sample Proportions

The sample proportions are what you expect the results to be. This can often be determined by using the results from a previous survey, or by running a small pilot study. If you are unsure, use proportions near to 50%, which is conservative and gives the largest sample size. Note that this sample size calculation uses the Normal approximation to the Binomial distribution. If, one or both of the sample proportions are close to 0 or 1 then this approximation is not valid and you need to consider an alternative sample size calculation method.

Sample size

This is the minimum sample size for each group to detect whether the stated difference exists between the two proportions (with the required confidence level and power). Note that if some people choose not to respond they cannot be included in your sample and so if non-response is a possibility your sample size will have to be increased accordingly. In general, the higher the response rate the better the estimate, as non-response will often lead to biases in you estimate.

Comparing Two Proportions – Sample Size

Calculator

Alternative Scenarios

More Information

Worked Example

Formula

Discussion

Definitions

Confidence level

Power

Sample Proportions

Sample size

Tell us what you want to achieve

Services

Sectors

Contact us

Calculator

Alternative Scenarios

More Information

Worked Example

Formula

Discussion

Definitions

Confidence level

Power

Sample Proportions

Sample size

Tell us what you want to achieve

Services

Sectors

Contact us

Sign up to our Newsletter

Enquiry - Jobs