Friday, May 20, 2011

Risks of Misinterpreting “Risk” in Pain Research

PainResearch Part 6 – Understanding RR, RRR, ARR, & NNT

An understanding of outcomes presented in pain research studies as “risk statistics,” which are special estimates of effect, is central to evidence-based medicine approaches. Yet, this can be confusing, since authors often do not portray or discuss study results involving these measures in everyday terms that can be applied in pain practice. Therefore, being able to interpret risk effects — denoted as RR, RRR, ARR, or NNT — is vital for making sense of pain research.

Healthcare providers (and their patients) need answers to basic questions about the chances of success or failure when considering new treatments or interventions, such as:

  • How does the new treatment compare with other therapies?
  • How many patients might achieve better outcomes with the new treatment?

For the non-statistician healthcare professional there are several easy-to-perform calculations that provide answers. These might appropriately be called “bottom-line clinical effects,” because they help put research into perspective for everyday practice. However, there also are some pitfalls to be aware of when interpreting these estimates of effect.

Data Presentation Makes the Difference

Many pain research studies compare different treatments or interventions with each other or with placebo to determine the relative risks versus benefits of new approaches. However, identical data can be transformed and presented in formats that depict quite different portraits of exactly the same evidence.

Concepts of risk effects were first introduced in Part 1 of this series [here], which discussed a hypothetical study finding that a new analgesic reduced the risk of falls and hip fracture in elderly patients with severe arthritis by 40%. Another way of looking at this same data was that, compared with an older analgesic there was only a 60% occurrence of fractures with the new drug.

However, these were improvements expressed in relative or proportional terms. The actual incidence of fractures with the older drug was 1.0%, so the new drug reduced the risk by only 0.4%. Still another way of presenting the same data, which could be more meaningful from a clinical perspective, was that 250 patients would need to take the new analgesic to achieve one less occurrence of fall/fracture than would occur with the older drug; that is, the new analgesic would be advantageous for 1 of every 250 patients taking it instead of the older drug.

In this example, 60% represents the Relative Risk, or RR; 40% is the Relative Risk Reduction, or RRR; 0.4% is the Absolute Risk Reduction, or ARR; and 250 is the Number-Needed-to-Treat, or NNT. These terms are explained further below, but it should be apparent that the presentation of exactly the same data in different statistical formats could motivate strikingly different healthcare decisions.

The Meaning of “Risk Effects”

In statistical parlance, “risk” is the probability of something (eg, an event) happening that is either good or bad, desirable or undesirable, of benefit or harm. Of course, the usual goal of clinical research is to demonstrate that the treatment or intervention of interest either increases some benefit (eg, pain relief) or reduces a harm (eg, adverse event, such as falls/fractures in the above example).

People are more accustomed to statements of risk and risk-related probabilities than they realize. For example. consider the following…

A sign in the store window says “Half-Price Sale” — this is a relative risk statement about the offer (the “risk” here is paying full price for the item). Or, if it reads “50% Off Regular Price,” this would be the relative risk reduction. For a $200 item, the sign might read, “Save $100” or “$100 Off Regular Price,” which is the absolute risk reduction. Or, the offer might be “Get 2 for the Price of 1,” which is somewhat of a number-needed-to-treat statement (ie, how many items need to be purchased to avoid the “risk” of paying for one at full price).

Since “risk” can denote either a benefit or harm, its meaning needs to be interpreted within the context of the particular research. Another challenge is that, since some of the risk-effect statistics are expressed in proportional terms as ratios — eg, Risk Ratio, Relative Risk Reduction — they need to be translated into language that can be better understood and applied in clinical practice.

The relationships are not always clearly explained in research reports and, sometimes, authors themselves misinterpret their own data. So, critical readers must understand the concepts and basic math behind them; fortunately, the calculations are actually not complicated and a closer look at risk-effect statistics will help.

Calculating Risk Estimates of Effect

RiskCalcThere are six statistical measures that are of importance for this discussion of risk effects, which relate to the figure. This table represents a typical pain research study design in which two groups are examined: 1) an Experimental group exposed to a therapy or intervention of interest, and 2) a Control group that instead receives an alternate therapy/intervention or placebo. For each group, either the outcome event of interest occurs or it does not occur. The alphabetical letters symbolize the numbers of research subjects in each cell of the table.

For example, using the study above in which elderly patients either received a new analgesic (Experimental group) as compared with those receiving an older drug (Control group), the subjects either experienced the event of interest (fall with hip fracture) or they did not. Here is how the essential risk-effects are computed…

  • EER (Experimental Event Rate) = a/(a+b). This is the proportion of all Experimental (eg, treatment) group subjects in whom the event of interest (eg, fall/fracture) is observed. It is expressed as a probability — fraction or percentage — ranging from 0.0 (0%) to 1.0 (100%). In the example above regarding falls/fractures, the rate with the new (experimental) drug was 0.6% or 0.006. [Note: research report authors often present event rates as percentages, rather than providing raw data for each cell in the table above.]

  • CER (Control Event Rate) = c/(c+d). This is the proportion of all Control (eg, comparative treatment or placebo) group subjects in whom the event of interest is observed. It is expressed as a probability — fraction or percentage — ranging from 0.0 (0%) to 1.0 (100%). From the above example, the rate of falls/fractures with the older (control) drug was 1.0% or .01.

  • RR (Risk Ratio or Relative Risk) = [a/(a+b)] / [c/(c+d)] = EER/CER. This is the ratio of the event rate (or, outcome risk) in the Experimental group compared with that in the Control group. It responds to the clinical question, “What was the relative advantage (or disadvantage) of the Experimental condition in comparison with the Control condition?”

    From our example above, the statement of RR may be something like, “Compared with the older analgesic there was only a 60% occurrence of fall/fracture events with the new drug [the RR is 0.60].” In a different type of study, the authors might write something like, “Subjects receiving the experimental drug were 3.5 times more likely to develop constipation than those administered placebo [the RR is 3.50, but note that this is NOT the same as saying there was a 350% increase in constipation with the experimental drug — see RRR below].

    Theoretically, the RR can range from 0.0 to 1/CER, which can be infinitely large if events rarely occur in the Control group. An RR of 1.0 indicates that outcomes in the two groups compared are equivalent; values may be greater than or less than 1.0, depending on whether there were more or fewer events in the Experimental than the Control group, respectively.

    How can RRs be interpreted in terms of their clinical meaning and effect size? As noted above, the meaning needs to be interpreted within the context of the research. For example, in a study examining adverse effects of a therapy, having fewer events in the Experimental group than the Control group (and, thus, an RR < 1.0) would be favorable; whereas, if the event of interest were pain relief, having a greater response in the Experimental group (RR > 1.0) would be desired.

    As for effect size, an RR = >0.8 or <1.25 is often considered a small effect size, RR = 0.5–0.8 or 1.25–2.0 is considered a medium effect size, and RR = <0.5 or >2.0 is a large effect size. These are not rigidly defined effect sizes, and their overall context within the particular research should be considered. (Also, see the discussion of Confidence Intervals below.)

  • RRR (Relative Risk Reduction) = 1-RR. This also can be calculated as CER-EER/CER, and it answers the clinical question, “How does the reduction (or increase) in outcome risk as a result of the Experimental condition compare with that risk in the Control group?”

    In parallel with the above examples, a statement of RRR might read, “The new analgesic reduced the risk of falls/fractures by 40% as compared with placebo [1–0.60 = 0.40 RRR].” In the second RR example above the authors might observe, “Exposure to the experimental drug increased the occurrence of constipation by 250% compared with placebo [1–3.50 = –2.50 RRR].” (Note that the negative value here needs to be interpreted in context as an increase in the risk event.)

    An RRR of 0.0 indicates events in the Experimental and Control groups are equal, and there is no effect of the experimental treatment; however, this rarely happens. An RRR other than zero can be a positive or negative value depending on whether events are greater in the Experimental or the Control group, and this needs to be interpreted within the context of the study.

  • ARR (Absolute Risk Reduction) = [c/(c+d) – a/(a+b)] = CER–EER. Whereas, the RR and RRR are relative probabilities, or ratios, the ARR answers the question, “In specific terms, how much better (or worse) was the outcome in one group than the other?” Sometimes simply called the “risk difference,” or RD, it is the difference in the event rate (risk probability) between the Control group (CER) and the Experimental group (EER).

    An ARR of 0 would indicate no difference between comparison groups; whereas, the highest possible values of +1.0 or –1.0 denote either a 100% reduction or increase, respectively, of events in the Experimental group. The clinical meaning of this must be interpreted within the context of the study, because more events in the experimental condition may be either preferable (eg, pain relief) or undesirable (eg, a side effect).

    The ARR is based on actual event data rather than ratios (eg, RR or RRR), so it puts the outcome result into better clinical perspective. In the above example, the actual incidence of fractures with the older drug was 1.0% (CER) and the event rate with the newer drug was only 0.6% (EER), so the ARR was 0.4% [1.0–0.6].

  • NNT (Number-Needed-to-Treat) = 1/ARR. This statistic answers the question, “How many patients need to receive the experimental treatment to either achieve one additional desired outcome (eg, pain relief) or prevent one additional undesired outcome (eg, a side effect)? The NNT can be very helpful for deciding on the clinical advantage of a treatment or intervention.

    From our example above, the NNT indicates that 250 patients need to take the new analgesic to achieve one less occurrence of hip fracture than would occur if they were taking the older drug [1.0 divided by 0.004]. Whether an NNT represents gaining a desired effect or preventing a negative outcome, and the value of this, depends on the context of the study. For example, treating 250 patients to prevent 1 fall/fracture may not be cost-effective, or the best alternative if the new drug incurs added side effects.

    The clinical importance of particular NNTs must be determined on a case-by-case basis, and there are a number of factors that can influence NNT size; so, this will be the subject of a separate future UPDATES article in this series. For example, the NNT can be influenced by the duration of a study since, during longer trials, more patients receiving placebo might experience increasing rates of negative outcomes or lack of benefit, which, in turn, could increase the ARR and reduce NNT size for demonstrating benefits of the Experimental treatment. It also is possible to have a negative NNT, which designates harmful effects of a treatment or intervention and is usually called the “Number-Needed-to-Harm.”

The risk-effect measures and their calculation equations are summarized in the table below.


Confidence in Risk Estimates of Effect

All risk statistics — EER, CER, RR, RRR, ARR, and NNT — are essentially point estimates of effect, and may or may not be “true” values for the population. Tests of statistical significance (P-values) and confidence intervals (CIs) can be calculated for each of these just as with any other measures. Prior UPDATES in this series discussed P-values [here] and confidence intervals [here]. Research authors should report P-values and 95% CIs for key risk statistics, though this is not always done.

When interpreting CIs of risk-effect statistics it is important to keep the null value (point of no significant difference) and possible CI ranges for each in mind…

  • The null value for RR is 1.0 — denoting that at this point the EER and CER are equivalent — and values for the RR CI can theoretically range from 0 to infinitely large.

  • For RRR the null value is 1.0, and the RRR CI can theoretically range from 0 (if CER=EER) to an infinitely large value.

  • The ARR can theoretically range from +1.0 (+100%) to –1.0 (–100%), with the null value at 0.

  • The NNT is a special case in which the theoretical range extends from +1.0 to an infinitely large value or from –1.0 to infinity, and values between +1 to –1 are impossible.

As with CIs for other statistics, the narrower the range the more precision or accuracy can be assumed in the risk-effect measure. CI ranges for RR, RRR, or ARR that include the null value, and extend beyond both sides of it to some extent, denote that the respective point estimate is statistically non-significant and might have occurred due to chance alone. For the NNT, a CI that includes both a positive and negative value denotes statistical non-significance.

Advice & Caveats for Interpreting Risk Effects

  1. Event rates — EER, CER — represent dichotomous data; that is, either the event of interest occurs or it does not. However, the event may represent a more complex outcome. For example, event data for pain relief may represent the numbers of subjects not just achieving pain relief but experiencing a 50% reduction in scores from baseline on a visual analog scale, or VAS, during a specific period of time. Report authors should clearly explain what is being represented in their outcome events and how the events were measured. If this is unclear, there is no way for a reader to understand or interpret the data.

  2. In many, but not all cases, the EER and CER represent the average, or mean, frequency of the particular event occurring in the respective group. The ARR is then the “mean difference,” either increase or decrease, of the event rate between groups. This is often the statistic provided by report authors; although, they may not call it the ARR.

    In some of the literature, biostatisticians suggest calculating the ARR as EER–CER, rather than CER–EER as shown above. In this case, the value remains the same but the sign, plus or minus, is reversed and its interpretation needs to be considered within the overall context of the research.

  3. Data presented only as RRs or RRRs are not in themselves clinically informative and can be misleading unless they are translated into ARRs or NNTs. Often, it is up to the educated reader to make the transition from statistical language to what the data might actually mean for patient outcomes; that is, what the numbers mean in terms of harmful or beneficial effects, their magnitude of effect, and their significance for clinical practice.

    RRs and RRRs can be easily confused and used inappropriately, even by report authors themselves. For example, an RR of 3.5 is the rate of the event in the Experimental group compared with the Control group, it is not the same as the relative reduction (or increase) of the event as represented by the RRR (calculated as 1–RR). If the RR is erroneously used to calculate the decrease (or increase) of the event due to the Experimental condition — eg, using 3.5 rather than 2.5 [RRR] — the outcome will be grossly inflated. Relying on journal abstracts for understanding a study can be unreliable, and journalists often misinterpret RR versus RRR in their news articles.

  4. Keep in mind that RR, RRR, or ARR, can represent risks that are favorable or unfavorable depending on the context, and the ARR can be positive or negative in value. In their descriptions, report authors should make it clear whether the effects are benefits or harms, but this sometimes can be confusing without careful reading.

  5. Many studies in the pain field make risk-effect statements based on assessments of large databases, which is called “data-mining.” The result is often very large estimates of risk based on relatively few events. For example, a study may find that there is a 2.50-fold greater likelihood of an adverse event (eg, opioid overdose) occurring in one group compared with another (this is the RR). The relative risk increase is 150% (1–RR); however, in absolute terms, this may represent an increase from 10/100 to 15/100 patients or 10/10,000 to 15/10,000 patients; the actual event rates make a big difference in terms of the severity of effects in patients, and this may not be clearly presented in the research report. Expressing Absolute Risk Reductions (ARRs, or the increase in this case) as percentages often helps to clarify the situation — eg, 5% vs .05% increase, in this example — but readers may need to do these calculations themselves.

  6. Finally, some studies express relative risk-effect outcomes as Odds Ratios (ORs) and these often are confused with and interpreted the same as if they were Risk Ratios (RRs). The two measures both rely on event rates in the groups compared; however, each is calculated quite differently. Except in certain circumstances, the OR is NOT equivalent to the RR and, in fact, the OR calculated from the same data is often of a greater magnitude than the RR. Therefore, interpreting ORs as if they were RRs can depict an inflated portrayal of the outcomes in question, and readers need to be wary of being misled by this.

    Some authorities have suggested that the OR is like an “evil cousin” of the RR, that ORs are difficult to interpret in clinical terms, and that there is no place for them in published reports of pain research. Yet, ORs are frequently reported and this statistic will be more adequately discussed in a future UPDATE in this series on “Making Sense of Pain Research.”
To be alerted by e-mail of when further UPDATES articles in this series are published, register [here] to receive once-weekly Pain-Topics “e-Notifications.”

> Altman DG. Confidence intervals for the number needed to treat. BMJ. 1998(Nov);317:1309-1312.
> Bender R. Improving the calculation of confidence intervals for the number needed to treat. In: Hasman A, et al. eds. Medical Infobahn for Europe. IOS press, 2000 [
PDF here].
> Chatellier G, Zapletal E, Lemaitre D, Menard J, Degoulet P. The number needed to treat: A clinically useful nomogram in its proper context. BMJ. 1996;312:426-429.
> Greenhalgh T. How to read a paper: Statistics for the non-statistician. BMJ. 1997;315(7104).
> Greenhalgh T. How to read a paper: Statistics for the non-statistician, II - “Significant” relations and their pitfalls. BMJ. 1997;315(7105).
> Guyatt G, Rennie D (eds). Users’ Guides to the Medical Literature: A Manual for Evidence-Based Clinical Practice. Chicago, IL: AMA Press; 2002.
> Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the medical literature: II. How to use an article about therapy or prevention; B. What were the results and will they help me in caring for my patients? JAMA. 1994;271(1):59-63.
> Higgins JPT, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. Section 9.2.2 Effect measures for dichotomous outcomes. The Cochrane Collaboration, 2011. [
available here].
> Israni RK. Guide to Biostatistics. MedPage Today. 2007 [
PDF here].
> Leavitt SB. EBAM (Evidence-Based Addiction Medicine) for Practitioners. Addiction Treatment Forum. March 2003 [
PDF here].
> Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-Based Medicine: How to Practice & Teach EBM. New York, NY: Churchill Livingstone; 1997.
> Statistical Significance Terms. Medical University of South Carolina, 2001 [
online here].