Which of the following is not a reason that a study might yield a null result?

In science, a null result is a result without the expected content: that is, the proposed result is absent. It is an experimental outcome which does not show an otherwise expected effect. This does not imply a result of zero or nothing, simply a result that does not support the hypothesis.

In statistical hypothesis testing, a null result occurs when an experimental result is not significantly different from what is to be expected under the null hypothesis; its probability (under the null hypothesis) does not exceed the significance level, i.e., the threshold set prior to testing for rejection of the null hypothesis. The significance level varies, but common choices include 0.10, 0.05, and 0.01.

As an example in physics, the results of the Michelson–Morley experiment were of this type, as it did not detect the expected velocity relative to the postulated luminiferous aether. This experiment's famous failed detection, commonly referred to as the null result, contributed to the development of special relativity. The experiment did appear to measure a non-zero "drift", but the value was far too small to account for the theoretically expected results; it is generally thought to be inside the noise level of the experiment.

Scientific journals for null results[edit]

There are now several scientific journals dedicated to the publication of negative or null results, including the following:

While it is not exclusively dedicated to publishing negative results, BMC Research Notes also publishes negative results in the form of research or data notes.

1Department of Immunology and Reumatology, ChanRe Rheumatology and Immunology Center and Research, Bangalore, India

Find articles by S Chandrashekara

Disclaimer

Department of Biostatistics, National Institute of Animal Nutrition and Physiology, Bangalore, India

1Department of Immunology and Reumatology, ChanRe Rheumatology and Immunology Center and Research, Bangalore, India

Address for correspondence: Dr. KP Suresh, Scientist(ss), National Institute of Animal Nutrition and Physiology Adugodi, Bangalore-560030, India. E-mail: moc.liamg@79pkhserus

Received 2012 Feb 16; Revised 2012 Feb 16; Accepted 2012 Mar 7.

Copyright : © Journal of Human Reproductive Sciences

This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been retracted. See J Hum Reprod Sci. 2015; 8(3): 186.

Abstract

Determining the optimal sample size for a study assures an adequate power to detect statistical significance. Hence, it is a critical step in the design of a planned research protocol. Using too many participants in a study is expensive and exposes more number of subjects to procedure. Similarly, if study is underpowered, it will be statistically inconclusive and may make the whole protocol a failure. This paper covers the essentials in calculating power and sample size for a variety of applied study designs. Sample size computation for single group mean, survey type of studies, 2 group studies based on means and proportions or rates, correlation studies and for case-control for assessing the categorical outcome are presented in detail.

KEY WORDS: Correlation, odds ratio, power, prevalence, survey, proportions, sample size

INTRODUCTION

Clinical research studies can be classified into surveys, experiments, observational studies etc. They need to be carefully planned to achieve the objective of the study. The planning of a good research has many aspects. First step is to define the problem and it should be operational. Second step is to define the experimental or observational units and the appropriate subjects and controls. Meticulously, one has to define the inclusion and exclusion criteria, which should take care of all possible variables which could influence the observations and the units which are measured. The study design must be clear and the procedures are defined to the best possible and available methodology. Based on these factors, the study must have an adequate sample size, relative to the goals and the possible variabilities of the study. Sample must be ‘big enough’ such that the effect of expected magnitude of scientific significance, to be also statistically significant. Same time, It is important that the study sample should not be ‘too big’ where an effect of little scientific importance is nevertheless statistically detectable. In addition, sample size is important for economic reasons: An under-sized study can be a waste of resources since it may not produce useful results while an over-sized study uses more resources than necessary. In an experiment involving human or animal subjects, sample size is a critical ethical issue. Since an ill-designed experiment exposes the subjects to potentially harmful treatments without advancing knowledge.[,] Thus, a fundamental step in the design of clinical research is the computation of power and sample size. Power is the probability of correctly rejecting the null hypothesis that sample estimates (e.g. Mean, proportion, odds, correlation co-efficient etc.) does not statistically differ between study groups in the underlying population. Large values of power are desirable, at least 80%, is desirable given the available resources and ethical considerations. Power proportionately increases as the sample size for study increases. Accordingly, an investigator can control the study power by adjusting the sample size and vice versa.[,]

A clinical study will be expressed in terms of an estimate of effect, appropriate confidence interval, and P value. The confidence interval indicates the likely range of values for the true effect in the population while the P value determines the how likely that the observed effect in the sample is due to chance. A related quantity is the statistical power; this is the probability of identifying an exact difference between 2 groups in the study samples when one genuinely exists in the populations from which the samples were drawn.

Factors that affect the sample size

The calculation of an appropriate sample size relies on choice of certain factors and in some instances on crude estimates. There are 3 factors that should be considered in calculation of appropriate sample size- summarized in Table 1. The each of these factors influences the sample size independently, but it is important to combine all these factors in order to arrive at an appropriate sample size.

Table 1

Factors that affect sample size calculations

Which of the following is not a reason that a study might yield a null result?

Open in a separate window

The Normal deviates for different significance levels (Type I error or Alpha) for one tailed and two tailed alternative hypothesis are shown in Table 2.

Table 2

The normal deviates for Type I error (Alpha)

Which of the following is not a reason that a study might yield a null result?

Open in a separate window

The normal deviates for different power, probability of rejecting null hypothesis when it is not true or one minus probability of type II error are in shown Table 3.

Table 3

The normal deviates for statistical power

Which of the following is not a reason that a study might yield a null result?

Open in a separate window

Study design, outcome variable and sample size

Study design has a major impact on the sample size. Descriptive studies need hundreds of subjects to give acceptable confidence interval for small effects. Experimental studies generally need lesser sample while the cross-over designs needs one-quarter of the number required compared to a control group because every subject gets the experimental treatment in cross-over study. An evaluation studies in single group with pre-post type of design needs half the number for a similar study with a control group. A study design with one-tailed hypothesis requires 20% lesser subjects compared to two-tailed studies. Non-randomized studies needs 20% more subjects compared to randomized studies in order to accommodate confounding factors. Additional 10 - 20% subjects are required to allow adjustment of other factors such as withdrawals, missing data, lost to follow-up etc.

The “outcome” expected under study should be considered. There are 3 possible categories of outcome. The first is a simple case where 2 alternatives exist: Yes/no, death/alive, vaccinated/not vaccinated, etc. The second category covers multiple, mutually exclusive alternatives such as religious beliefs or blood groups. For these 2 categories of outcome, the data are generally expressed as percentages or rates[–] The third category covers continuous response variables such as weight, height, blood pressure, VAS score, IL6, TNF-a, homocysteine etc, which are continuous measures and are summarized as means and standard deviations. The statistical methods appropriates the sample size based on which of these outcomes measure is critical for the study, for example, larger sample size is required to assess the categorical variable compared to continuous outcome variable.

Alpha level

The definition of alpha is the probability of detecting a significant difference when the treatments are equally effective or risk of false positive findings. The alpha level used in determining the sample size in most of academic research studies are either 0.05 or 0.01.[] Lower the alpha level, larger is the sample size. For example, a study with alpha level of 0.01 requires more subjects when compared to a study with alpha level of 0.05 for similar outcome variable. Lower alpha viz 0.01 or less is used when the decisions based on the research are critical and the errors may cause substantial, financial, or personal harm.

Variance or standard deviation

The variance or standard deviation for sample size calculation is obtained either from previous studies or from pilot study. Larger the standard deviation, larger is the sample size required in a study. For example, in a study, with primary outcome variable is TNF-a, needs more subjects compared to a variable of birth weight, 10-point Vas score etc. as the natural variability of TNF-a is wide compared to others.

Minimum detectable difference

This is the expected difference or relationship between 2 independent samples, also known as the effect size. The obvious question is how to know the difference in a study, which is not conducted. If available, it may be useful to use the effect size found from prior studies. Where no previous study exists, the effect size is determined from literature review, logical assertion, and conjecture.

Power

The difference between 2 groups in a study will be explored in terms of estimate of effect, appropriate confidence interval, and P value. The confidence interval indicates the likely range of values for the true effect in a population while P value determines how likely it is that the observed effect in the sample is due to chance. A related quantity is the statistical power of the study, is the probability of detecting a predefined clinical significance. The ideal study is the one, which has high power. This means that the study has a high chance of detecting a difference between groups if it exists, consequently, if the study demonstrates no difference between the groups, the researcher can reasonably confident in concluding that none exists. The ideal power for any study is considered to be 80%.[]

In research, statistical power is generally calculated with 2 objectives. 1) It can be calculated before data collection based on information from previous studies to decide the sample size needed for the current study. 2) It can also be calculated after data analysis. The second situation occurs when the result turns out to be non-significant. In this case, statistical power is calculated to verify whether the non-significance result is due to lack of relationship between the groups or due to lack of statistical power.

Statistical power is positively correlated with the sample size, which means that given the level of the other factors viz. alpha and minimum detectable difference, a larger sample size gives greater power. However, researchers should be clear to find a difference between statistical difference and scientific difference. Although a larger sample size enables researchers to find smaller difference statistically significant, the difference found may not be scientifically meaningful. Therefore, it is recommended that researchers must have prior idea of what they would expect to be a scientifically meaningful difference before doing a power analysis and determine the actual sample size needed. Power analysis is now integral to the health and behavioral sciences, and its use is steadily increasing whenever the empirical studies are performed.

Withdrawals, missing data and losses to follow-up

Sample size calculated is the total number of subjects who are required for the final study analysis. There are few practical issues, which need to be considered while calculating the number of subjects required. It is a fact that all eligible subjects may not be willing to take part and may be necessary screen more subjects than the final number of subjects entering the study. In addition, even in well-designed and conducted studies, it is unusual to finish with a dataset, which is complete for all the subjects recruited, in a usable format. The reason could be subject factor like- subjects may fail or refuse to give valid responses to particular questions, physical measurements may suffer from technical problems, and in studies involving follow-up (eg. Trials or cohort studies), there will be some degree of attrition. The reason could be technical and the procedural problem- like contamination, failure to get the assessment or test performed in time. It may, therefore, necessary to consider these issues before calculating the number of subjects to be recruited in a study in order to achieve the final desired sample size.

Example, say in a study, a total of N number of subjects are required in the end of the study with all the data being complete for analysis, but a proportion (q) are expected to refuse to participate or drop out before the study ends. In this case, the following total number of subjects (N1) would have to be recruited to ensure that the final sample size (N) is achieved:

Which of the following is not a reason that a study might yield a null result?
, where q is the proportion of attrition and is generally 10%,

The proportion of eligible subjects who will refuse to participate or provide the inadequate information will be unknown at the beginning of the study. Approximate estimates is often possible using information from similar studies in comparable populations or from an appropriate pilot study.[]

Sample size estimation for proportion in survey type of studies

A common goal of survey research is to collect data representative of population. The researcher uses information gathered from the survey to generalize findings from a drawn sample back to a population, within the limits of random error. The general rule relative to acceptable margins of error in survey research is 5 - 10%. The sample size can be estimated using the following formula

Which of the following is not a reason that a study might yield a null result?

Where P is the prevalence or proportion of event of interest for the study, E is the Precision (or margin of error) with which a researcher want to measure something. Generally, E will be 10% of P and Zα/2 is normal deviate for two-tailed alternative hypothesis at a level of significance; for example, for 5% level of significance, Zα/2 is 1.96 and for 1% level of significance it is 2.58 as shown in Table 2. D is the design effect reflects the sampling design used in the survey type of study. This is 1 for simple random sampling and higher values (usually 1 to 2) for other designs such as stratified, systematic, cluster random sampling etc, estimated to compensate for deviation from simple random sampling procedure. The design effect for cluster random sampling is taken as 1.5 to 2. For the purposive sampling, convenience or judgment sampling, D will cross 10. Higher the D, the more will be sample size required for a study. Simple random sampling is unlikely to be the sampling method in an actual filed survey. If another sampling method such as systematic, stratified, cluster sampling etc. is used, a larger sample size is likely to be needed because of the “design effect”.[–] In case of impact study, P may be estimated at 50% to reflect the assumption that an impact is expected in 50% of the population. A P of 50% is also a conservative estimate; Example: Researcher interested to know the sample size for conducting a survey for measuring the prevalence of obesity in certain community. Previous literature gives the estimate of an obesity at 20% in the population to be surveyed, and assuming 95% confidence interval or 5% level of significance and 10% margin of error, the sample size can be calculated as follow as;

N = (Zα/2)2 P(1-P)*1 / E2 = (1.96)2*0.20*(1-0.20)/(0.1*0.20)2 = 3.8416*0.16/(0.02)2 = 1537 for a simple random sampling design. Hence, sample size of 1537 is required to conduct community-based survey to estimate the prevalence of obesity. Note-E is the margin of error, in the present example; it is 10% χ 0.20 = 0.02.

To find the final adjusted sample size, allowing non-response rate of 10% in the above example, the adjusted sample size will be 1537/(1-0.10) = 1537/0.90 = 1708.

Sample size estimation with single group mean

If researcher is conducting a study in single group such as outcome assessment in a group of patients subjected to certain treatment or patients with particular type of illness and the primary outcome is a continuous variable for which the mean and standard deviation are expression of results or estimates of population, the sample size can be estimated using the following formula

N = (Zα/2)2 s2 / d2,

where s is the standard deviation obtained from previous study or pilot study, and d is the accuracy of estimate or how close to the true mean. Zα/2 is normal deviate for two- tailed alternative hypothesis at a level of significance.

Research studies with one tailed hypothesis, above formula can be rewritten as

N = (Zα)2 s2 / d2, the Zα values are 1.64 and 2.33 for 5% and 1% level of significance.

Example: In a study for estimating the weight of population and wants the error of estimation to be less than 2 kg of true mean (that is expected difference of weight to be 2 kg), the sample standard deviation was 5 and with a probability of 95%, and (that is) at an error rate of 5%, the sample size estimated as N = (1.96)2 (5)2/ 22 gives the sample of 24 subjects, if the allowance of 10% for missing, losses to follow-up, withdrawals is assumed, then the corrected sample will be 27 subjects. Corrected sample size thus obtained is 24/(1.0-0.10) ≅ 24/0.9 = 27 and for 20% allowances, the corrected sample size will be 30.

Sample size estimation with two means

In a study with research hypothesis viz; Null hypothesis Ho: m1 = m2 vs. alternative hypothesis Ha: m1 = m2 + d where d is the difference between two means and n1 and n2 are the sample size for Group I and Group II such that N = n1 + n2. The ratio r = n1/n2 is considered whenever the researcher needs unequal sample size due to various reasons, such as ethical, cost, availability etc.

Then, the total sample size for the study is as follows

Which of the following is not a reason that a study might yield a null result?

Where Zα is the normal deviate at a level of significance (Zα is 1.96 for 5% level of significance and 2.58 for 1% level of significance) and Z1-β is the normal deviate at 1-β% power with β% of type II error (0.84 at 80% power and 1.28 at 90% statistical power). r = n1/n2 is the ratio of sample size required for 2 groups, generally it is one for keeping equal sample size for 2 groups If r = 0.5 gives the sample size distribution as 1:2 for 2 groups. σ and d are the pooled standard deviation and difference of means of 2 groups. These values are obtained from either previous studies of similar hypothesis or conducting a pilot study. Let`s us say a clinical researcher wanting to compare the effect of 2 drugs, A and B, on systolic blood pressure (SBP). On literature search, researcher found the mean SBP in 2 groups were 120 and 132 and common standard deviation of 15. The total sample size for the study with r = 1 (equal sample size), a = 5% and power at 80% and 90% were computed as

Which of the following is not a reason that a study might yield a null result?
and for 90% of statistical power, the sample size will be 32. In unequal sample size of 1: 2 (r = 0.5) with 90% statistical power of 90% at 5% level significance, the total sample size required for the study is 48.

Sample size estimation with two proportions

In study based on outcome in proportions of event in two populations (groups), such as percentage of complications, mortality improvement, awareness, surgical or medical outcome etc., the sample size estimation is based on proportions of outcome, which is obtained from previous literature review or conducting pilot study on smaller sample size. A study with null hypothesis of Ho: π1 = π2 vs. Ha: π1 = π2 + d, where π are population proportion and p1 and p2 are the corresponding sample estimates, the sample size can be estimated using the following formula

Which of the following is not a reason that a study might yield a null result?

Where p1 and p2 are the proportion of event of interest (outcome) for group I and group II, and p is

Which of the following is not a reason that a study might yield a null result?
Zα/2 is normal deviate at a level of significance and Z1-β is the normal deviate at 1-β% power with β% of type II error, normally type II error is considered 20% or less.

If researcher is planning to conduct a study with unequal groups, he or she must calculate N as if we are using equal groups, and then calculate the modified sample size. If r = n1/n2 is the ratio of sample size in 2 groups, then the required sample size is N1 = N(1+r)2/4r, if n1 = 2n2 that is sample size ratio is 2:1 for group 1 and group 2, then N1 = 9N/8, a fairly small increase in total sample size.

Example: It is believed that the proportion of patients who develop complications after undergoing one type of surgery is 5% while the proportion of patients who develop complications after a second type of surgery is 15%. How large should the sample be in each of the 2 groups of patients if an investigator wishes to detect, with a power of 90%, whether the second procedure has a complications rate significantly higher than the first at the 5% level of significance?

In the example,

  • a)

    Test value of difference in complication rate 0%

  • b)

    Anticipated complication rate 5%, 15% in 2 groups

  • c)

    Level of significance 5%

  • d)

    Power of the test 90%

  • e)

    Alternative hypothesis(one tailed) (p1-p2) < 0%

The total sample size required is 74 for equal size distribution, for unequal distribution of sample size with 1.5:1 that is r = 1.5, the total sample size will be 77 with 46 for group I and 31 for group II.

Sample size estimation with correlation co-efficient

In an observational studies, which involves to estimate a correlation (r) between 2 variables of interest say, X and Y, a typical hypothesis of form H0: r = 0 against Ha:r ≠ 0, the sample size for correlation study can be obtained by computing

Which of the following is not a reason that a study might yield a null result?
where Zα/2 and Z1-β are normal deviates for type I error (significance level) and Power of study [Tables [Tables22 and and33].

Example: According to the literature, the correlation between salt intake and systolic blood pressure is around 0.30. A study is conducted to attests this correlation in a population, with the significance level of 1% and power of 90%. The sample size for such a study can be estimated as follows:

Which of the following is not a reason that a study might yield a null result?
the sample size for 90% power at 1% level of significance was 99 for two-tailed alternative test and 87 for one-tailed test.

Sample size estimation with odds ratio

In case-control study, data are usually summarized in odds ratio, rather than difference between two proportions when the outcome variables of interest were categorical in nature. If P1 and P2 are proportion of cases and controls, respectively, exposed to a risk factor, then:

Which of the following is not a reason that a study might yield a null result?
if we know the prevalence of exposure in the general population (P), the total sample size N for estimating an OR is
Which of the following is not a reason that a study might yield a null result?
where Zα/2 and Z1-β are normal deviates for type I error (significance level) and Power of study [Tables [Tables22 and and33].

Example: The prevalence of vertebral fracture in a population is 25%. When the study is interested to estimate the effect of smoking on the fracture, with an odds ratio of 2, at the significance level of 5% (one-sided test) and power of 80%, the total sample size for the study of equal sample size can be estimated by:

Which of the following is not a reason that a study might yield a null result?

DISCUSSION

The equations in this paper assume that the selection of individual is random and unbiased. The decisions to include a subject in the study depend on whether or not that subject has the characteristic or the outcome studied. Second, in studies in which the mean is calculated, the measurements are assumed to have normal distributions.[,]

The concept of statistical power is more associated with sample size, the power of the study increases with an increase in sample size. Ideally, minimum power of a study required is 80%. Hence, the sample size calculation is critical and fundamental for designing a study protocol. Even after completion of study, a retrospective power analysis will be useful, especially when a statistically not a significant results are obtained.[] Here, actual sample size and alpha-level are known, and the variance observed in the sample provides an estimate of variance of population. The analysis of power retrospectively re-emphasizes the fact negative finding is a true negative finding.

The ideal study for the researcher is one in which the power is high. This means that the study has a high chance of detecting a difference between groups if one exists; consequently, if the study demonstrates no difference between groups, the researcher can be reasonably confident in concluding that none exists. The Power of the study depends on several factors, but as a general rule, higher power is achieved by increasing the sample size.[] Many apparently null studies may be under-powered rather than genuinely demonstrating no difference between groups, absence of evidence is not evidence of absence.[]

A Sample size calculation is an essential step in research protocols and is a must to justify the size of clinical studies in papers, reports etc. Nevertheless, one of the most common error in papers reporting clinical trials is a lack of justification of the sample size, and it is a major concern that important therapeutic effects are being missed because of inadequately sized studies.[,] The purpose of this review is to make available a collection of formulas for sample size calculations and examples for variety of situations likely to be encountered.

Often, the research is faced with various constraints that may force them to use an inadequate sample size because of both practical and statistical reasons. These constraints may include budget, time, personnel, and other resource limitations. In these cases, the researchers should report both the appropriate sample size along with sample size actually used in the study; the reasons for using inadequate sample sizes and a discussion of the effect of inadequate sample size may have on the results of the study. The researcher should exercise caution when making pragmatic recommendations based on the research with an inadequate sample size.

CONCLUSION

Sample size determination is an important major step in the design of a research study. Appropriately-sized samples are essential to infer with confidence that sample estimated are reflective of underlying population parameters. The sample size required to reject or accept a study hypothesis is determined by the power of an a-test. A study that is sufficiently powered has a statistical rescannable chance of answering the questions put forth at the beginning of research study. Inadequately sized studies often results in investigator's unrealistic assumptions about the effectiveness of study treatment. Misjudgment of the underlying variability for parameter estimates wrong estimate of follow-up period to observe the intended effects of the treatment and inability to predict the lack of compliance of the study regimen, and a high drop-rate rates and/or the failure to account for the multiplicity of study endpoints are the common error in a clinical research. Conducting a study that has little chance of answering the hypothesis at hand is a misuse of time and valuable resources and may unnecessarily expose participants to potential harm or unwarranted expectations of therapeutic benefits. As scientific and ethical issue go hand-in-hand, the awareness of determination of minimum required sample size and application of appropriate sampling methods are extremely important in achieving scientifically and statistically sound results. Using an adequate sample size along with high quality data collection efforts will result in more reliable, valid and generalizable results, it could also result in saving resources. This paper was designed as a tool that a researcher could use in planning and conducting quality research.

Why would a study yield a null result?

In statistical hypothesis testing, a null result occurs when an experimental result is not significantly different from what is to be expected under the null hypothesis; its probability (under the null hypothesis) does not exceed the significance level, i.e., the threshold set prior to testing for rejection of the null ...

Which of the following are factors that can contribute to a null effect in a study?

Which of the following are factors that can contribute to a null effect in a study? - weak manipulation = A weak manipulation can make the between-groups differences too small to detect. - situation noise = Situation noise can increase within-groups variability and thus obscure group differences.

Why is there a publication bias against null effects quizlet?

Why is there a publication bias against null effects? People generally want to read about independent variables that matter.

Which of the following is a reason why a researcher might choose to conduct a double blind?

A double-blind study is one in which neither the participants nor the experimenters know who is receiving a particular treatment. This procedure is utilized to prevent bias in research results. Double-blind studies are particularly useful for preventing bias due to demand characteristics or the placebo effect.