When a researcher is referring to a study showing significant results it means that?

Recommended textbook solutions

When a researcher is referring to a study showing significant results it means that?

Social Psychology

10th EditionElliot Aronson, Robin M. Akert, Samuel R. Sommers, Timothy D. Wilson

525 solutions

When a researcher is referring to a study showing significant results it means that?

Consumer Behavior: Buying, Having, Being

13th EditionMichael R Solomon

449 solutions

When a researcher is referring to a study showing significant results it means that?

Myers' Psychology for the AP Course

3rd EditionC. Nathan DeWall, David G Myers

955 solutions

When a researcher is referring to a study showing significant results it means that?

Social Psychology

10th EditionElliot Aronson, Robin M. Akert, Timothy D. Wilson

525 solutions

Sidebar to Jakob Nielsen's column Risks of Quantitative Studies, March 2004.

In the main article, I said that "one out of every twenty significant results might be random" if you rely solely on statistical analysis. This is a bit of an oversimplification. Here's the detailed story.

"Statistical significance" refers to the probability that the observed result could have occurred randomly if it has no true underlying effect. This probability is usually referred to as "p" and by convention, p should be smaller than 5% to consider a finding significant. Sometimes researchers insist on stronger significance and want p to be smaller than 1%, or even 0.1%, before they'll accept a finding with wide-reaching consequences, say, for a new blood-pressure medication to be taken by millions of patients.

If we test twenty questions that have no underlying effect at play, we would on average expect one statistical test to come out as "significant." This doesn't really mean that one out of every twenty published studies is wrong. It means that one out of every twenty statistical tests is wrong if we simply go fishing for results without founding our study on an understanding of the issues at play.

Good researchers start by building their hypotheses on qualitative insights. For example, after having observed how people read online, a researcher might suspect that scannable layouts would make website content easier to read and understand. If you run statistical tests on questions that are likely to be true, your findings are less likely to be false.

As a thought experiment, let's assume that a researcher, Dr. Bob, has established 100 hypotheses, of which 80% are true. Given the statistical analysis, Bob will erroneously accept one of the 20 false hypotheses. Assuming Bob is running a study with good statistical power, he'll accept most of the 80 true hypotheses, rejecting maybe 10 as insignificant. Bob will then publish 71 of his conclusions, of which 70 are true and one is false. In other words, only 1.4% of Bob's papers will be bogus.

Unfortunately, not all real-world researchers are good enough that 80% of their hypotheses will be correct. And not all studies have sufficient statistical power to accept 70 out of every 80 correct hypotheses. Thus, the percentage of bogus results in most published quantitative research is higher, but we can't determine that percentage exactly because it depends both on researchers' competence and their pre-study insights.

When a researcher is referring to a study showing significant results it means that?

Published: 27th September 2021

What is statistical significance?

When reading about or conducting research, you are likely to come across the term ‘statistical significance’. ‘Significance’ generally refers to something having particular importance – but in research, ‘significance’ has a very different meaning. Statistical significance is a term used to describe how certain we are that a difference or relationship between two variables exists and isn’t due to chance. When a result is identified as being statistically significant, this means that you are confident that there is a real difference or relationship between two variables, and it’s unlikely that it’s a one-off occurrence.

However, it’s commonplace for statistical significance (i.e., being confident that chance wasn’t involved in your results) to be confused with general significance (i.e., having importance). A statistically significant finding may, or may not, have any real-world utility. Therefore, having a thorough understanding of what statistical significance is, and what factors contribute to it, is important for conducting sound research.

1 – Hypotheses:

A hypothesis is a particular type of prediction for what the outcomes of research will be, and comes in two forms. A null hypothesis predicts that there is no difference or relationship between two groups or variables of interest and therefore the two groups or variables are equal.In contrast, an alternate hypothesis predicts that there is a difference or relationship between two groups or variables of interest. In this case, the two groups or variables are not equal, and so could be greater or less than one another.

A key purpose of statistical significance testing is to determine whether your null hypothesis occurred by chance. If your null hypothesis occurred by chance, then we do not reject (retain) the null hypothesis and conclude there is no difference. Because the result occurred by chance,it is not likely to happen in the real world. However, if your null hypothesis did not occur by chance, then we reject the null hypothesis and conclude there is a difference. Because it did not occur by chance,it is likely to occur in the real world. This will in turn will affect the conclusions that you can draw from your research.

2 – The Likelihood of Error:

When dealing with chance, there is always the possibility of error – including Type I or Type II errors. A Type I error occurs when the null hypothesis is rejected when it should have been retained (i.e., a false positive). This means that the results are identified as significant when they actually occurred by chance. Because they occurred by chance, it is unlikely to happen in the real world and so should have been identified as non-significant. A Type II error occurs when the null hypothesis is retained when it should have been rejected (i.e., a false negative). This means that the results are identified as non-significant when they actually did not occur by chance. Not occurring by chance suggests that it is likely to happen in the real world, and so should have been identified as significant.

3 – Alpha and p Values:

Prior to any statistical analyses, it is important to determine what you will consider the definition of statistically significant to be. This is referred to as the alpha value, and represents the probability you are going to make a Type I error (i.e., reject the null hypothesis when it is true).Alpha values are typically set at .05 (5%), meaning that we are 95% confident that we won’t make a Type I error. However, more conservative tests will use smaller alpha values such as .01 (1%), meaning that we are 99% confident we won’t make a Type I error.Alpha is not to be confused with the p value, which is the specifically calculated probability of the obtained result occurring by chance. For statistical significance, alpha is used as the threshold value and the p value is compared to it. If the p value is above the alpha value (p> .05), our result is not statistically significant. If it is below our alpha (p< .05), then it is statistically significant.

4 – One or Two Tailed Tests:

Your hypotheses will determine which type of significance test you will need to conduct. A one-tailed hypothesis is where you predict a specific direction of the difference (higher, lower) or relationship (positive, negative) between the two groups or variables of interest. Therefore, with a one-tailed test, while your alpha value stays the same, you halve your p value because you are focusing on one specific direction only. On the other hand, a two-tailed hypothesis is where you do not predict a specific direction of the difference or relationship and as such with a two-tailed test you keep the p value as a whole number. Two-tailed tests are more widely used in research compared to one-tailed tests.

5 – Sample Size and Power:

Statistical power refers to the probability that the statistical test you are using will correctly reject a false null hypothesis. Type II errors are reduced by having enough statistical power, which is generally kept at 80% or higher. Statistical power is increased by having an adequate sample size. However, if your study is not adequately powered because you don’t have enough participants, this will affect statistical significance. Generally, if the alternate hypothesis is true and there is a difference or relationship to be observed, then with a larger sample the chances of seeing this difference or relationship will increase. If you see a difference or relationship between two small groups, you could reasonably expect that the difference or relationship would increase in prominence if the groups became larger.

Determining Statistical Significance Using Hand Calculations:

  1. Determine your thresholds and tailed tests: Before performing any analyses, decide what your alpha value is (.05 or .01), and whether you are performing a one-tailed or two-tailed test.
  2. Determine your critical value:This step is unique to calculations done by hand. A critical value is a number that corresponds to the probability equal to your pre-determined alpha value, and for hand calculations will serve as the threshold for significance. Critical values are based on the number of tails in your test and your alpha value, which is why these parameters are determined first. There are different sets of critical values for each different type of statistical test you are conducting – these are easily accessible in statistics textbooks, or online.
  3. Calculate your test statistic: With your parameters set, perform the hand calculations needed. Your observed test statistic is the final numerical result.
  4. Compare your observed test statistic (Step 3) to the critical value (Step 2), and draw your conclusions:
    a. If your observed statistic is greater than the critical value (observed > critical), reject the null hypothesis. This means that the probability that this finding occurred by chance is less than 5%, and is evidence in support of a likely real-world difference or relationship between two groups or variables.

    b. If your observed statistic is less than the critical value (observed < critical), retain the null hypothesis. This means that the probability of your finding occurred by chance is greater than 5%, and suggests that there is no evidence of a real-world difference or relationship between two groups or variables.

Determining Statistical Significance Using Software Packages:

  1. Determine your thresholds and tailed tests: Before performing any analyses, decide what your alpha value is (.05 or .01), and whether you are performing a one-tailed or two-tailed test.
  2. Calculate your test statistic: With your parameters set, perform the calculations needed. Your observed test statistic is the final numerical result. You will note that next to your final result that there will be a p value next to it – software packages will calculate the specific p value for you.
  3. Compare your observed p value (Step 2) to your alpha value (Step 1), and draw your conclusions:
    a. If your p value is less than your alpha value (p< .050), reject the null hypothesis. This means that the probability that this finding occurred by chance is less than 5%, and is evidence in support of a likely real-world difference or relationship between two groups or variables.
    b. If your p value is greater than your alpha value (p> .050), retain the null hypothesis. This means that the probability of your finding occurred by chance is greater than 5%, and suggests that there is no evidence of a real-world difference or relationship between two groups or variables.

Effect Sizes:

Just because a result has statistical significance, it doesn’t mean that the result has any real-world importance. To help ‘translate’ the result to the real world, we can use an effect size. An effect size is a numerical index of how much your dependent variable of interest is affected by the independent variable, and determines whether the observed effect is important enough to translate to the real world.Therefore, effect sizes should be interpreted alongside your significance results.The two main types of effect size include Cohen’s d, which indexes the size of the difference between two groups in units of standard deviation. For Cohen’s d, a score of 0.2 is a small effect, 0.5 is a medium effect, and 0.8 is a large effect. The other effect size is eta-squared, with measures the strength of the relationship between two variables. For eta-squared, a score of .05 is a weak effect, .10 is a medium effect, and .15 is a strong effect. Both of these effect sizes can be calculated by hand, or you can ask for it to be calculated for you as part of statistics software.

Helpful References:

Australian Bureau of Statistics (2011). Significance Testing.