Just as juries need good evidence to reach prudent verdicts, healthcare providers need good evidence as a guide for better decision making in treating patients with pain. And, patients need to understand reasons behind treatment recommendations so they can be active participants in their care. Therefore, the most important questions addressed by this “Making Sense of Pain Research” series are:
- How can research be assessed in terms of its quality, reliability, and applicability for everyday clinical practice?
- How can research results be interpreted and put into language that makes sense to practitioners and patients for decision making purposes?
Misunderstood Statistics Muddle Healthcare Decisions
A recent and enlightening systematic review from the Cochrane Collaboration found that health professionals, as well as consumers, often misunderstand the meaning of research statistics [Akl et al. 2011]. A major finding was that the presentation of exactly the same data but in different statistical formats can result in strikingly different healthcare decisions. This is particularly evident when it comes to interpreting the risks or benefits of different therapies, since statistics can be used persuasively to portray health interventions in different lights, for example…
One could read a research report (hypothetical in this case) saying that, in elderly patients with severe arthritis, taking a new analgesic reduced the risk of hip fracture due to falls during a 1-year period by 40%. (Actually, the press release headline would emphasize the 40% figure; the researchers might state it in their report as the 1-year Risk Ratio, or RR, was 0.60 in those taking the newer drug compared with an older drug. The reader then needs to calculate what is called the Relative Risk Reduction, RRR [1 minus 0.60 = 0.40, or 40%] afforded by the new drug.)In the Cochrane Review, investigators examined evidence from 35 studies assessing understandings of risk statistics by health professionals and consumers. They found that study participants understood frequencies better than probabilities, such as outcomes stated in relative terms. That is, the Relative Risk Reduction, as in “the new drug cuts the risk by 40% compared with the older one,” was less well understood than the Number Needed to Treat, or “1 in 250 patients might benefit from the new drug.” At the same time, however, participants misperceived relative risk (eg, Risk Ratio =60%) to be inappropriately greater than the exact same benefits presented using either Relative or Absolute Risk Reductions (eg, RRR=40% or ARR=0.4%) or the NNT (eg, 250). Surprisingly, both healthcare providers and consumers equally misinterpreted statistical data.
At first glance, cutting the risk of bone fractures by 40% certainly seems clinically worthwhile. However, an inquisitive reader looking closer at the raw data in the research paper would notice that 1.0% of patients taking the older analgesic had fractures during the 1-year timeframe whereas only 0.6% did with the new drug. In this case, what is called the Absolute Risk Reduction, or ARR, is merely 0.4% rather than 40% (0.01 minus 0.006 = 0.004 or 0.4%). So, now the benefit of the new drug does not seem as impressive.
Another way of looking at this same data would be that 250 patients need to take the new analgesic for 1 year to achieve one less incidence of hip fracture than would occur with the older drug. (This is called the Number Needed to Treat, or NNT=250; calculated in this case by 1.0 divided by 0.004.) The effect also can be simply stated as a frequency: one out of 250 persons taking the new analgesic rather than the older one might avoid hip facture.
When stated in simpler, real-world terms (which researchers often do not do for readers in their reports), the newer analgesic may not be very appealing. Especially, if it is more expensive or has more side effects than the older drug.
The above-mentioned statistics — RR, RRR, ARR, NNT — and others will be discussed more fully in future articles in this series. Meanwhile, if the discussion above seemed difficult to follow there are at least several reasons: (1) The example illustrates how an understanding of even basic research statistics involves learning a new vocabulary; (2) interpreting data in research reports requires conceptual math skills that many people do not practice on a daily basis; and, (3) identical data can be transformed and presented in different statistical formats, sometimes deceptively so, to depict different portraits of exactly the same evidence.
Getting Started — Some Basic Principles
In an earlier UPDATES article [Can Pain Research Be Trusted? here], we emphasized that published research reports in the pain management field cannot always be trusted and the best advice is caveat lector — reader beware. We noted examples of (a) data fraud or falsification (fortunately rare), (b) deceptive authorship practices, (c) improprieties in clinical trial reporting focusing only on positive outcomes, and (d) misrepresentations or “spinning” of research results to the public. Along with those concerns, it is vital to appreciate some basic principles underlying research in the pain management field (or any medical discipline for that matter):
- The amount of research in pain medicine is huge and growing out of control.
A prior UPDATES article [here] noted that in 2007 there were about 970 journals publishing pain-related research articles each year, or about 4,620 articles. Examining even a small percentage of those would be a daunting task for most healthcare providers, and the thought of wasting time by reading poor quality, incomprehensible, or misleading research may explain why a significant percentage of practitioners do not regularly follow the literature at all.
Saving time by merely reading research abstracts can be a risky undertaking, since full data needed for prudent decision-making are often omitted and results highlights as stated in abstracts can be misleading. In the example above, a 40% relative advantage of the new opioid treatment stated in the abstract turned out to be 0.4% in absolute value, or 1/250 in terms of actual numbers of patients benefitting.
- Absolute certainty in pain research is an illusion.
Research investigations do not actually “prove” anything, and they cannot demonstrate that a therapy is completely “safe.” While authors sometimes wrongly use those terms in their writing, proof and safety are relative concepts and the very nature of research always acknowledges the possibility that pure chance or random effects may influence outcomes, good or bad. This is particularly true in research on humans because there can be so much variability in physiology, anatomy, and even psychology affecting how individuals respond to any therapy for pain.
An allied tenet is that it is impossible to prove a negative. That is, clinical research cannot with 100% confidence ever rule out the possible occurrence of either a beneficial or harmful effect of a treatment. Just because an effect was not found in a research study does not mean it does not or cannot occur; or, in other words, “The absence of evidence, is not evidence of absence.”
- Evaluating research evidence is about assessing probabilities.
Research outcomes usually generalize from a relatively tiny proportion of patients to a much larger population by expressing results in relative terms, such as ratios or percentages, which represent the probability, or likelihood, of an effect either occurring or not being present. In the example above, 0.60, 0.40, 40%, 0.4% were all ratios and percentages, that is probabilities, pertaining to the same research data describing beneficial effects of a new opioid analgesic ostensibly for all elderly patients with arthritis.
Furthermore, an important role of statistical analyses is to assess the probability that observed effects are real and do not merely reflect the play of chance or random events. Statistical methods are used to anticipate and control for such error, such as saying that there was a significant effect of a treatment when none actually existed (called a “false positive,” “α error,” or “Type I error”) or, conversely, saying that the treatment failed when it actually did have a worthwhile effect (“false negative,” “β error,” or “Type II error”). [Note: mind-bending concepts and terms like these are good examples of why the language of research and statistics can be challenging.]
- All medical research is inherently imperfect.
Researchers understand that there is always a probability of unknown, chance, or random factors that might produce false or misleading outcomes. In fact, in determining the statistical significance of their results, they decide in advance how often they are willing to be wrong. For example, a standard minimum level of significance is a probability or P-value = 0.05, which is the researchers’ way of conceding that 5% of the time their results might be in error, such as finding that a treatment had an important effect when it actually did not.
Based on this, it is interesting to consider that each year at least 2,310 erroneous or false results may be published in the pain research literature. This hypothetically assumes that each of those 4,620 articles noted in Point 1, above, include on average 10 outcome measures stated as significant at a P=0.05 level. That is, 1 in 20 results claimed as being statistically significant could be in error, and there is no way of knowing which ones. Some critics have asserted that nearly one-third of all published research studies may portray either false or exaggerated claims [Ioannidis 2005, Lehrer 2010].
Another problem of considerable consequence is that, while statistics may be calculated by error-free computer software, research articles themselves are written by humans who often make mistakes. For example, one investigation found that more than two-thirds of the journal article abstracts examined contained data that were inconsistent with or absent from the main body of the articles [Leavitt 2008; Pitkin et al. 1999]. Even more frustrating, it is not uncommon to find mathematical inconsistencies within the text and/or tables of articles — eg, numbers not adding up or dividing properly — and it is impossible to know if the mistakes were in the data numbers or the math calculations.
- Most research articles are written for other researchers.
Healthcare providers, and some patients, look to research articles for answers to clinical questions; eg, “Will this treatment be of benefit or harm, and how much so?” On the other hand, researcher-authors often become so focused on presenting data and statistics to support their evidence that they do not take the time or space to explain their data in language that practitioners or patients can understand and put to use.
Part of the reason is that journals have rigid requirements for what must be included in research articles, and most of those center on providing technical details of methodology and statistical manipulations so other researchers can understand and possibly replicate the study. Since journals also limit the allowable length of articles, and there is no requirement that authors must explain their approach and outcomes in language that typical healthcare providers or patients can understand, they rarely do so.
- All research articles and presentations are persuasive communications.
While few research communications are blatant propaganda, albeit some are, all researcher-authors (or presenters at conferences) have a point of view or particular goal in mind and use evidence selectively to support those positions. There are several concerns with this…
> First, there is no guarantee that the writers or presenters, themselves, have thoroughly examined, or understood, the issues communicated to their audiences. It is not uncommon for authors to selectively choose and interpret content in ways that support their objectives, and conference presenters have been known to “cherry-pick” nuggets of data from research to reinforce their positions, which may be biased in some way.
> Second, medical research is big business and some critics have claimed that the greater the financial or other interests at stake the less likely research findings are to be accurately or objectively reported, or true [Ioannidis 2005; Leavitt 2008].
> Third, there is the “Lake Wobegon Effect” whereby researchers, their sponsors, and/or their publishers often have overly enthusiastic or optimistic views of the importance and value of the evidence. This comes from the fictional town of Lake Wobegon where storyteller Garrison Keillor says, “all the women are strong, all the men are good-looking, and all the children are above average” [reference in Leavitt 2008].
> Akl EA, Oxman AD, Herrin J, et al. Using alternative statistical formats for presenting risks and risk reductions. Cochrane Database of Systematic Reviews. 2011;3(CD006776) [abstract here].
> Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;8(2)696-701 [article here].
> Leavitt SB. Can Pain Medicine Research Be Trusted? Pain-Topics e-Briefing [online]. 2008;3(1):1-5 [PDF available here].
> Lehrer J. The Truth Wears Off. The New Yorker. 2010(Dec 13):52-57 [article here].
> Pitkin RM, Branagan MA, Burmeister LF. Accuracy of data in abstracts of published research articles. JAMA. 1999;281(12):1110-1111 [abstract here].