Tuesday, September 4, 2012

Clinical significance versus statistical significance

The ability to appraise research evidence for its validity and determine its applicability to a particular patient case is a critical skill for evidence-based physical therapy practitioners. While the research evidence comprises only one component of evidence-based practice (EBP), it is an important component. (The other key components are the therapist’s clinical expertise and the patient’s individual circumstances, values and preferences.)

In evaluating evidence from research studies, my students are often confused by the difference between statistical and clinical significance, or they hang their hat on the findings being statistically significant (or not). Both are important and need to be considered when making decisions about the value of research evidence.

In research studies comparing the effects of two interventions on a particular outcome, the difference between treatment groups at the end of the intervention is usually determined by a statistical test. The specific statistical test will depend on the type of data, the study design, and the research hypothesis being tested. The statistical test indicates whether Treatment A has a different effect than Treatment B, but it does not reveal any information about the size of the treatment effect. The effect size is vital information in determining whether the resulting outcome is worth caring about from a clinical standpoint.

Statistical significance does not necessarily indicate clinical significance
Lack of statistical significance does not necessarily indicate lack of clinical significance

To illustrate this, let’s discuss difference between p-value and effect size.

p-values and statistical significance

The p-value is the probability that the difference/result occurred due to chance. Therefore, if you are trying to demonstrate a difference between two interventions you want this value to be small – that is, NOT likely that the difference between the groups was due to chance. Typically, the threshold for statistical significance is set at 0.05. This means that if the probability that the difference between the groups was due to chance is greater than 5%, the difference is not considered to be statistically significant.

Statistical significance is important; you want to know that the difference is unlikely to be a chance result. But this is not enough. Statistical significance is not sufficient by itself to determine the importance of the findings. The difference between groups might not be due to chance, but it might not be clinically important. Therefore, you also want to consider the effect size, to determine if the results are clinically meaningful. This is what it means for something to be clinically significant – the observed difference between the groups is a difference worth caring about; it is scientifically or clinically important.

Effect sizes and clinical significance

It is possible to have a scenario in which the results of a study are statistically significant, but the effect size is small or not clinically important. Conversely, the results of a study may fail to find statistically significant differences between two interventions, but the effect size may be large and of clinical significance. (In the latter, it is worth considering if the study was underpowered to detect the desired effect size.)

An effect size measures the magnitude of the difference between two groups. It may be presented as the absolute effect size – the difference in mean outcome value for Treatment A and mean outcome value for Treatment B at the conclusion of the study. (Absolute effect sizes are easy to interpret if you know what change in the unit of measurement is meaningful.) Standardized versions of the effect size are calculated when the variation in the scores is included in the calculation. Standardized effect sizes are useful when you want to compare the magnitude of impact of the same intervention across different outcome measures in the study. (A common example of a standardized effect size index is Cohen’s d, which is the effect size between two means.)

Confidence intervals and clinical significance

Confidence intervals are also helpful in determining potential meaningfulness because they establish the precision (or lack thereof) of effect sizes. The confidence interval is defined as a range of scores within which the true score for a variable is estimated to lie, within a specified probability (usually 95%, but can be 90% or 99%). The narrower the confidence interval, the more precise the estimate of the effect size.

So, statistical significance and clinical significance are both important, but neither is sufficient by itself. You need to have both! You want to know that the results are not due to chance and that they are clinically important. Therefore, as consumers of the medical literature, it is important to look at both the p-value AND the effect size! Critically evaluate both before making decisions about the evidence for use in clinical practice.

No comments:

Post a Comment