Written by RAG- AMMA ERF (Research Action Group)
In this post, we will look at the importance of a sample size, and the assumptions behind estimating the sample size.
You have defined your research question. You have a hypothesis in hand and are now ready to start research that will either prove or disprove your hypothesis. You have searched the literature, found evidence to indicate that it is worth the while to pursue your question. You have then gone about choosing an appropriate design for your study and have decided the variables to study. You are now reached a very important phase (Hope these statements are not assumptions and that you have really done all of this!)- How many subjects should I study?
HOW MANY SUBJECTS DO I STUDY?
The number of subjects to study is important for several reasons
- Ensures that there are enough subjects in the study to minimize the risk of drawing wrong conclusions
- Provides an understanding of the resources needed to complete the study (before you actually start the study) and gives a very good indication of the study is feasible within your resources
- The sample size assumes particular significance when we do not find any significant difference between groups- we need to show that this lack of difference is true and not because we did not study an adequate number of subjects.
Needless to say, the sample size for a study has to be estimated before the study starts.
Sample size estimations can turn out to be complex, especially for RCTs, and you may be better off letting a biostatistician do the sample size. However, knowledge of the assumptions behind sample size estimations will help you provide meaningful information to the biostatistician, which of course will help the biostatistician estimate sample size easily.
Results of your study | The Truth | |
Treatments are not different | Treatment are different | |
Conclude treatments are not different | Correct Decision | Type II Error (β) |
Conclude treatments are different | Type I Error (α) | Correct Decision (1- β) POWER |
α= probability of a type I error; the probability of concluding that the treatments are different when actually the treatments do not differ
β= probability of making a type II error; probability of concluding that the treatments do not differ when actually the treatments do differ
Power= probability of correctly concluding that the treatments do differ; and the probability of detecting a difference between the treatments if they do differ.
Sample size for a proportion, or descriptive study
Input Data:
- Population size
- Expected prevalence of the condition of interest or the expected proportion
- Confidence limits or the absolute precision (how much deviation are you willing to tolerate- usually kept between 5-15% but can vary depending on what you are studying)
- Design Effect for complex cluster surveys (used if the selection is not random or systematic in nature)
Sample size for an unmatched case control study
Input Data
- Two sided confidence interval (1- α)-confidence intervals usually chosen between 90-99%, commonly 95%
- Power (1- β)- usually, ≥80% chosen as power. The minimum power should be 80%
- Ratio of cases to controls
- Percent of controls exposed to the factor of interest (between 0 and 99.99)
- Percentage of cases with exposure to the factor of interest (between 0 and 99.99)
- Odds ratio of interest
Sample Size for cross sectional, cohort and RCT
- Two sided confidence interval (1- α)-confidence intervals usually chosen between 90-99%, commonly 95%
- Power (1- β)- usually, ≥80% chosen as power. The minimum power should be 80%
- Ratio of unexposed to exposed in sample
- Percent of unexposed with outcome (between 0 and 99.99)
- Percentage of exposed with outcome (between 0 and 99.99)
- Odds ratio of interest
- Risk/Prevalence ratio
- Risk/prevalence difference (between -99.99 to 99.99)
Additional parameters of interest for RCT
- Effect size of interest
- Superiority, non-inferiority or inferiority margin
Note: Can choose a one sided confidence interval if interested only in results in one direction.
Sample Size for comparing two means
- Two sided confidence interval
- Power
- Ratio of sample size (group 2/group1)
- Mean or mean difference
- Standard deviation or variance
These assumptions or input data will help you derive sample sizes for relatively uncomplicated studies. You may need to consult a biostatistician for more complex sample size estimations.
The sample size estimations are different for different study designs. The study design depends on the research question you ask. Collecting the input data is important to estimate the sample size. Inappropriate sample sizes lead to studies that do not really add value to the literature. Studies with inappropriate sample sizes can also bias systematic reviews and meta-analyses. Do a sample size estimation before you start the study.
Consult someone with biostatistical expertise if you are not sure.