Part 15 – NNT, NNH, and Harm-to-Benefit Ratios
Pain research reports typically convey an array of statistical analyses, but among the more useful, and at the same time least frequently presented, is the Number-Needed-to-Treat (NNT) for benefit or harm from a therapy or intervention. In simplest terms, NNT helps healthcare providers, and their patients, to assess the extent to which a pain treatment is likely to be helpful or harmful in specific ways. This article in the series “Making Sense of Pain Research” explains why and how to calculate and use NNT statistics, which are simple in concept yet can be challenging to properly interpret.
The concept of NNT was proposed 25 years ago as a useful measure of the clinical effects and consequences of a treatment [Laupacis et al. 1988]. It estimates the effort that practitioners must expend in treating patients to help attain a good outcome (eg, pain relief) or to avoid an undesirable consequence (eg, adverse effect) of a therapy or intervention [DiCenso 2001; McAlister 2008]. Additionally, NNT is a meaningful way of expressing the magnitude of a treatment effect in contrast to either a control (eg, placebo or no treatment) or comparative intervention.
Furthermore, knowing the NNT helps to determine whether likely treatment benefits will overcome associated harms and costs. For example, it could be reasonable to treat 2 patients with a new, relatively safe, but very expensive analgesic to achieve 50% pain relief in one of those patients (NNT=2), when compared with an older, safe, but much less expensive analgesic requiring that 4 patients must be treated for one of them to achieve the same level of pain relief (NNT=4).
In overview, NNT is a statistical measure of effect that helps to answer several clinical questions of importance to practitioners and their patients:
- How many patients need to be treated with a therapy or intervention for one of them to benefit in some way?
- How many patients need to be treated for one of them to be harmed (eg, adverse effect) in some way?
- How many patients might benefit from treatment for each one who experiences some harmful effect?
When an NNT reflects an undesirable event (eg, adverse effect) it is usually denoted as NNH, or Number-Needed-to-Harm. Some authors suggest that, whether signifying benefit or harm, the concept of NNT remains the same and should be designated as either “NNT-B” (NNT to benefit) or “NNT-H” (NNT to harm) [Cates 2005]. This makes sense; but NNT-B or NNT-H are rarely used, and it is most important that researchers unambiguously communicate in their reports what is intended by an NNT or NNH and its significance.
In the following discussions, the term “NNT” is used primarily; although, the calculations of NNH are exactly the same as for NNT. In both cases, smaller numbers imply either greater benefit or greater harm, depending on the context. A correctly described NNT or NNH should always specify the treatment and its outcome of interest, the comparator, the duration of time necessary for achieving the outcome, and the 95% Confidence Interval with P-value for the statistic [Moore 2009].
The NNT may be reported for individual clinical trials or in meta-analyses that combine multiple studies. When the NNT is not provided by researchers, it often can be calculated from data in the report or estimated from other statistical measures of effect, as described below. However, while NNT is a relatively straightforward measure of effect, there are subtleties and nuances of this statistic that readers of pain research literature should understand for proper interpretation and use.
The Number-Needed-to-Treat (NNT) was first discussed in Part 6 of this series [here], which considered bottom-line treatment effects presented as “risk statistics” in reports of clinical trials. Relative Risk (RR), Relative Risk Reduction (RRR), and Absolute Risk Reduction (ARR) were explained in some detail (and readers may find it helpful to review the earlier material in Part 6).
The notion of “risk” in statistical parlance refers to the probability of an event occurring that is either beneficial or harmful depending on what sort of outcomes are being measured in the research study. For calculating NNT in clinical trials, the most important metric is the ARR — or, its counterpart, Absolute Risk Increase, ARI — both of which might be most simply described as the Absolute Risk Difference (ARD) between 2 groups that are being compared: an experimental (eg, therapy or intervention) group and a control (eg, placebo, comparator, or no treatment) group [Citrome 2007; Sackett et al. 1997].
The ARD is simply the difference between the probability or rate of an event of interest occurring in the control group (Control Event Rate, or CER) and the rate in the experimental group (Experimental Event Rate, or EER). Thus, ARD = CER-EER [Citrome 2007, 2008; McAlister 2008; Sackett et al. 1997]. The ARD can reflect an increase or decrease in event rates, depending on study design and what is being measured.
Since the CER and EER are expressed as probabilities — ie, the proportion, or the percentage converted to decimal form — values for each can range from 0.0 to 1.0. And, the difference between the two, or ARD, can range in value from -1.0 to +1.0, with 0.0 indicating no difference between groups. The clinical meaning of positive vs negative ARDs must be interpreted within the framework of the individual study design (discussed below).
The NNT is then simply calculated as the inverse (reciprocal) of the Absolute Risk Difference, or NNT = 1/ARD. Mathematically, this indicates how many persons must receive the experimental treatment rather than the control/comparator treatment for 1 of them to realize the effect, whether it is beneficial or harmful (ie, NNH).
Example: Suppose hypothetically that in a large study (total n=5000) the Relative Risk of hip fractures due to falls in elderly patients with knee osteoarthritis was 60% (RR=0.60) in the group prescribed a new analgesic compared with those prescribed an older drug during a one-year period. The Relative Risk Reduction would be 40% (RRR = 1–0.60 = 0.40). This sounds clinically important and worthwhile since, compared with the older drug, the newer analgesic reduced fracture occurrences by 40%.
However, in absolute terms, suppose that during the year of study 20/2500 or 0.80% of patients in the control group taking the older analgesic experienced falls/factures (CER), compared with 12/2500 or 0.48% taking the new drug (EER). This is still a 40% reduction in risk, and possibly statistically significant, but the absolute risk difference (ARD, or CER-EER) is only 0.32%, and the NNT=313 (ie, 1/0.0032=312.5).
That is, for every 313 elderly patients with osteoarthritis prescribed the new analgesic rather than the older drug during a one-year period, 1 additional patient might be spared a hip fracture due to a fall. This effect also may be expressed as a frequency: 1 out of 313 elderly patients taking the new analgesic rather than the older one for a year might avoid hip fracture. Either way, this benefit of the new analgesic would not seem very impressive, unless it also has much lower cost and/or offers important other advantages over the older drug.
The smaller the NNT the greater the effect size, or fewer patients needed to be treated for benefit or harm. Depending on study design and context, calculated NNTs can be positive or negative in value; although, they always are presented as positive whole numbers. Calculations resulting in numbers with fractions (eg, 312.5 in the above example) are rounded to the nearest whole number, since fractions of patients are meaningless in this context.
To be conservative, most authors recommend that the NNT should always be rounded upward; eg, NNT=3.6 would become 4 [Citrome 2008]. Although, in some cases small fractions might be more appropriately rounded downward; for example, if 3.1 is rounded up to 4, instead of down to 3, it might significantly understate the NNT effect size (ie, the larger NNT artificially portrays a smaller effect). Research report authors should indicate how rounding was done since it can make a difference when NNT or NNH values are small in size.
Unfortunately, when learning about research statistics, it seems almost axiomatic that no estimate of treatment effect size is ever so simple that it cannot be complicated beyond the comprehension of average readers of the medical literature. So it is with NNT; essentially a simple concept, but with important requirements, limitations, and cautions regarding its calculation and interpretation. When wrongly construed — whether intentionally or not — various factors can bias the presentation of NNT, or NNH, in a report to make a pain treatment unduly appear either more or much less favorable.
A. Study Design & Context Are Critical
NNTs are only useful when the evidence on which they are based fulfills criteria of good quality in terms of methodology, sample size, accuracy, reliability, and validity [Moore 2009]. Additionally, the study’s design and purpose must be taken into account.
For example, “risk” is usually thought of as being negative, but statistically it can represent events that are either favorable (eg, pain relief) or unfavorable (eg, adverse effects) depending on the study design. As noted above, the Absolute Risk Difference (ARD), can reflect either an increase or decrease in “risks” per se, which may be desirable or undesirable depending on the context of what is being measured.
Example: If the outcome of interest is pain relief with a new analgesic, one might expect more event occurrences in the treatment than in the control (eg, placebo) group — the corresponding ARD (CER-EER) would be negative, even though the outcome is favorable. In the case of a treatment that reduces an undesirable outcome (eg, occurrence of a side effect), there would be more events in the control group and the ARD would be both positive and favorable.
Just to add another layer of potential confusion, some biostatisticians suggest calculating the ARD as EER–CER, rather than CER–EER as shown above. In this case, the absolute value remains the same but the sign, plus or minus, is reversed. In all cases, when converting the ARD to an NNT (1/ARD) the plus or minus sign is usually discarded, but it is still important to keep in mind whether the NNT represents more events in the control or experimental group.
A single study or trial can have a number of NNTs representing different therapeutic endpoints of interest and representing benefits or harms. Furthermore, when comparisons of NNTs are made across studies, as in meta-analyses, it is essential that there is clinical homogeneity — ie, the data have been derived for the same treatments and outcomes, assessed against similar comparators, in similar patient populations at the same stage of disease, and followed for the same duration of time [Anon 2003; McAlister 2008]. Otherwise, conclusions about the similarity of NNTs, or NNHs, can be misleading.
B. NNTs Require Dichotomous Data
An NNT (or NNH) can only be calculated from dichotomous or binary data; that is, those in which subjects are categorized as either achieving a specified endpoint or not doing so [Moore 2009]. For example, the proportion (eg, percentage) of subjects achieving ≥50% pain relief from baseline as a result of a therapy versus those who do not reach this endpoint. Or, the percentage of subjects who experience an adverse effect of importance versus those who do not.
Continuous or scaled data representing a range of individual responses — such as differences between groups in overall changes in average pain scores on a 100-point visual analog scale — cannot be portrayed as NNTs unless the data are transformed or categorizing into yes/no types of outcomes. For example, within each group, categorizing changes in pain scores below a certain point as a favorable response and the rest as unfavorable.
The need for dichotomous data is sometimes neglected by researchers and readers of the literature. It can be very tempting to just subtract a mean percentage outcome on an endpoint measure in the treatment group from that in the control group and treat it like an ARD (Absolute Risk Difference) to calculate an NNT — which would be incorrect.
C. Categories Can Be Misleading
It must be considered that the threshold boundaries used to divide data into binary categories for calculating an NNT, or NNH, may be inclusive of a range of patients and not entirely depict the complexities of clinical reality [Moore et al. 2010]. For example, with any pain therapy, there is a tendency for many patients to have either good pain relief or poor pain relief, rather than moderate or average relief; across the spectrum, the trend encompassing all patients is “U-shaped” [see Figure].
A typical NNT may represent the proportion of subjects achieving 50% pain relief due to a therapy, but it does not depict that many patients might have actually fared much better, say 75% or 90% relief; or, at the other extreme, much worse (eg, only 10% or 25% pain relief). Or, an NNH may indicate the proportion of subjects who experienced an adverse effect compared with those who did not, but it does not suggest if it was a singular event for most patients or a more frequent problem in many of them.
D. Timeframes Matter
The interpretation of NNTs must take into account the timeframes during which events were observed [DiCenso 2001, McAlister 2008]. Most studies in the pain field are rather short term, involving weeks rather than months or years, even though in clinical practice it often may take some time for beneficial or harmful effects to fully emerge.
It is essential that statements describing NNTs specify the time periods of observation. An NNT of 3 may seem favorable, but does this require 4 weeks of treatment, 6 months, a year, or longer? And, there could be important clinical implications for a treatment requiring only 4 weeks versus several months for 1 in 3 patients to benefit.
Along with that, NNTs across similar studies of the same treatment, but with different timeframes, cannot be accurately compared with each other [Moore 2009]. For example, in two identically-designed studies, with one lasting 4 weeks and the other 6 months, NNTs of 3 in each study favoring some treatment may not be truly equivalent; largely, because each NNT was observed during different timeframes.
Furthermore, an NNT cannot usually be simply divided by some number to denote a shorter time period. For example, an NNT=12 for treatment during 4 years is not the same as an NNT=3 for 1 year (12/4); unless it is known that the rates of events (CER and EER) are consistently and equally distributed over the total 4-year timeframe. In most cases, events may occur either early or late during an observation period; so, if it is desirable to estimate NNTs at various time points a time-to-event analysis — eg, Cox proportional hazards or Kaplan-Meier survival analysis — must be applied to the data.
In that regard, NNTs for benefit or harm can be calculated from epidemiological data collected over time and reflected in incidence or prevalence statistics [discussed in Part 14 of this series here]. It is important, however, that time periods of data collection are taken into account, with an understanding that rates of events may vary at different time points and cumulatively over time. This complicates data interpretation and can be a source of bias in how NNTs are presented in research reports that ignore time-based effects.
Another important caveat is that epidemiological data reported as patient-years must be used cautiously for calculating NNTs, since NNTs portray effects in terms of individual patients (not patient-years) during the time period of observation [Suissa 2009]. For example, data expressed as 200 patient-years could reflect 200 patients treated for 1 year, 100 patients treated during 2 years, 50 patients treated for 4 years, and so on.
Example: Part 14 in this series described a study assessing benefits of a zoster-virus vaccine for decreasing the incidence of painful herpes zoster (shingles) and postherpetic neuralgia (PHN) in older adults [Oxman et al. 2005]. The researchers reported that, overall, during their 5-year study the vaccine significantly (p<0.001) reduced the shingles incidence rate from 11.12 per 1000 person-years in the placebo group to 5.42 per 1000 person-years in the vaccine group. In the report, they did not present NNTs for the data.
A later article describing this study noted that the NNT was 175; ie, “1 case of shingles is avoided for every 175 people vaccinated” [Thakur and Philip 2012]. This was probably calculated from the overall incidence rates per 1000 person-years: ARD = 0.00542 (placebo) – 0.01112 (vaccine) = –0.0057; NNT=1/0.0057≈175. However, this was faulty in that the NNT calculated this way reflects patient-years rather than individual patients, and the authors did not qualify their NNT by specifying the total observation period, or 5 years in this case.
Other authors described the NNT to prevent 1 case of shingles over 3 years as 58 [Fashner and Bell 2011]. They did not indicate how this was calculated; however, if they erroneously assumed that the NNT of 175 represented a single year, then spreading that out instead over each of 3 years (175/3) would be an NNT≈58.
A more correct approach, since it cannot be assumed that event (incidence) rates were constant throughout the course of the study, is to consider time-to-event data for estimating NNTs for the total period and for time points in between [Suissa 2009]. To aid in this, the original report included Kaplan-Meir survival curves spanning the 5-years of the study [see Figure, from Oxman et al. 2005], which illustrate how cumulative incidence proportions in placebo and vaccine groups gradually diverged over time.
From the curves it can be estimated that cumulative incidence rates at 3 years were about 3.2% in the placebo group and 1.5% for vaccine, or an NNT≈59 (1/[0.032–0.015]; rounded up). This is close to the number from Fashner and Bell  noted above, but more validly calculated.
Similarly, at the end of the study the cumulative incidences were about 5.3% (placebo) and 2.5% (vaccine), or an NNT≈36 (1/[0.053–0.025]). That is, during a 5-year period 1 of 36 persons treated with the zoster vaccine rather than placebo could be spared from developing shingles. This is remarkably different from and more favorable than the NNT of 175 suggested in some reports [eg, Thakur and Philip 2012].
At the very least, the above example shows how NNT calculations can differ depending on the subtle ways that data are presented and interpreted. Unfortunately, interpretations in the pain literature sometimes are incorrect and critical readers need to understand enough about NNTs to check them for accuracy and validity.
Deriving NNTs from Other Effect-Size Estimates
As discussed above, an NNT derived from an Absolute Risk Difference (ARD) is a clinically intuitive measure of treatment effect, provided certain constraints are taken into account. One reason NNTs may not be more frequently presented in pain research reports is because other estimates of effect size — Odds Ratio (OR), Risk Ratio (RR), Relative Risk Reduction (RRR), or Standardized Mean Difference (SMD) — that may seem impressive could turn out to be less clinically remarkable if converted to NNT [Kraemer and Kupfer 2006; Moore 2009]. In the hypothetical example above regarding a new analgesic for osteoarthritis in elderly patients, a seemingly important 40% RRR in fractures due to falls turned out to have a lackluster NNT of 313.
Regrettably, authors reporting other effect-size statistics often do not provide all of the necessary event-rate data to derive an Absolute Risk Difference (ARD) between groups for directly calculating the NNT (or NNH). So, what can be done? Fortunately, conversion formulas have been devised that allow transforming other effect-size estimates into NNTs, as described below.
[Caveat: Readers with “statistiphobia” to some extent may wish to skip the remainder of this section. The important point is that NNTs can be calculated from other measures of effect — whether OR, RR, RRR, or SMD — but only if report authors are willing and able to do so for helping readers to better understand the clinical impact of their study outcomes. Otherwise, readers are left to do the conversions on their own, using the approaches below.]
An important requirement of the conversion formulas below is having a known or an estimated value for event/risk rates in the control or comparator group. The focus in the formulas on rates in the group not receiving the new/experimental treatment of interest is because that group is used as the reference for comparison purposes when calculating and defining the NNT (or NNH).
If data suggesting the Control Event Rate (CER) are presented somewhere in the research report, then these numbers can be directly plugged into the formulas below. If not, it is necessary to estimate a parameter called the “Patient Expected Event Rate,” or PEER. This also is sometimes called an “Assumed Control Risk,” or ACR (eg, in the Cochrane Handbook, Higgins and Green 2011).
Numerically, PEER is the indirectly estimated risk of an event in the control group of a particular study, or it can be the background prevalence in an untreated population. Essentially, it represents the assumed event rate for an outcome measure in patients who do not receive the experimental therapy or intervention of interest; ie, those who either receive placebo, a comparator or conventional treatment, or no treatment.
How does one go about determining the PEER? This can require investigation, since some authors recommend finding similar trials in the literature that may or may not test the exact same experimental treatment, but do involve the same clinical health condition and provide outcome data for a control group of patients (eg, those receiving placebo or no treatment). Review articles or product information sometimes can be helpful in this regard or, depending on the circumstances, there may be baseline population prevalence data available for estimating event rates in persons receiving usual care or no treatment for a particular condition.
Example: In a large, cross-sectional, data-mining study spanning one year, researchers used prescription of medications for erectile dysfunction and/or testosterone replacement as a surrogate measure of sexual dysfunction/hypogonadism in men receiving long-term opioid therapy for chronic back pain [Deyo et al. 2013; also discussed in a Pain-Topics UPDATE here]. Patients receiving higher daily opioid doses (≥120mg morphine-equivalents) exhibited greater evidence of sexual dysfunction than those who did not receive opioid analgesics; Odds Ratio (OR) = 1.58.
The report authors do not provide specific event-rate data in patients not receiving opioids (comparator group); however, according to review articles, the prevalence of symptomatic hypogonadism in the general male population is typically about 5% to 6%. Using these data as the PEER, along with the OR in the study, the NNH can be calculated (see formula below) to yield values of 32 to 37. That is, for every 32 to 37 men treated with high-dose opioid analgesics during one year, rather than not being treated with those agents, 1 additional patient than normally expected may experience symptoms of sexual dysfunction/hypogonadism.
It is not uncommon for research reports in the pain field to provide RRs (Risk Ratios, or Relative Risks), along with their Confidence Intervals and P-values. Converting RR or RRR to NNT can be done with the aid of simple formulas [Chatellier et al. 1996; DiCenso 2001; Higgins and Green 2011; McAlister et al. 2000; Sackett et al. 1997].
When the CER in known or a PEER can be estimated, the following formula is used with an RR:
NNT from RR = 1/([CER or PEER] x [1-RR])
If the Relative Risk Ratio (RRR) is given instead of RR, and since the RRR is equal to [1-RR] in the above formula, NNT also can be calculated as follows:
NNT from RRR = 1/([CER or PEER] x RRR)
Whether the NNT pertains to a benefit or harm (ie, NNH) depends on the research design. These calculations are relatively easy to perform; however, there also is a convenient Microsoft Excel worksheet in our Pain-Topics PTCalcs program for doing this [NNT-from-RR available here]. Using the worksheet facilitates easily testing different CER or PEER values to see how NNT results might vary. However, as noted above, if the RR or RRR is not statistically significant (ie, if the 95% CI range crosses 1.0, or p>0.05) any NNT also will be non-significant.
Odds Ratios (ORs) were discussed in Part 7 of this series [here], and these estimates of effect can be confusing or difficult to interpret. Still, in many pain research reports only the ORs for outcomes of interest are indicated and it would be more clinically useful if these also could be converted to NNTs. There are formulas that readers can use to convert an OR to NNT (or NNH), at least approximately [Anon 2012; Higgins and Greeen 2011; Kraemer and Kupfer 2006; McAlister et al. 2000; Sackett et al. 1996, 1997].
First, the best possible (ie, lowest) NNT may be calculated from an Odds Ratio alone by the following equation:
Minimal NNT from OR = (√OR)+1 / (√OR)–1
Using this formula, although the actual NNT may be larger, it will not be smaller than this value.
Second, if the CER is known or a PEER can be estimated, a more accurate NNT can be calculated as follows:
NNT from OR = 1–([CER or PEER] x [1–OR]) / ([CER or PEER] x [1–OR] x [1–CER or PEER])
In all cases, if the OR pertains to a harm of some sort (eg, chance of an adverse effect), then the same formulas are used for calculating NNH. The calculations are rather tedious, so there is a Microsoft Excel worksheet in the Pain-Topics PTCalcs program for doing these [NNT-from-OR available here]. In this worksheet, values for OR and CER or PEER can be simply inserted to derive results, and a range of values can be easily tested.
The Table at right (adapted from Anon 2012; McAlister et al. 2000; Sackett et al. 1996) displays NNTs (in lighter gray boxes) derived from a selection of ORs, and CERs or PEERs. As usual, the NNTs might represent NNHs, depending on study design and context. Furthermore, an OR >1.0 may represent either benefit or harm, and similarly for values <1.0, depending on how outcome variables are being measured and presented in the study.
It should be noted, as pointed out in Part 7 of this series, when risks of events are relatively small — that is, few events occurring in either treatment or control groups relative to the total sizes of the populations or patient groups being studied — Odds Ratios and Risk Ratios become approximately equal in size. This may be especially evident in large-scale trials using epidemiological data or medical records databases and, in such cases, the NNT for the OR might be derived most simply by the formula above for converting RR to NNT (insert the OR as if it were an RR).
The Standardized Mean Difference (SMD) is an effect size measurement previously discussed in Part 8 of this series [here]. It also is known as “Cohen’s d” and is very useful for gauging the clinical importance of outcomes in pain research reports.
However, SMDs are derived from continuous data, so they do not fit the requirement that NNTs must be calculated from dichotomous data. Transforming SMDs into NNTs is a somewhat complex and indirect process, which has been described in the Cochrane Handbook for Systematic Reviews of Interventions [Higgins and Greeen 2011, here]:
1. First, the SMD is converted to the natural logarithm of the Odds Ratio: lnOR = ∏/√3 x SMD (≈1.81 x SMD)
2. This lnOR is then converted to the OR base value: OR = e^lnOR (=2.718^lnOR)
3. Then, the OR is used in the formula above for converting OR to NNT. To represent CER or PEER, a value is used that signifies the proportion of subjects in the control/comparator group who have improved by some amount from baseline in the continuous outcome variable — ie, control responder Proportion Improved.
These calculations can be difficult; so, again, a Pain-Topics PTCalcs Excel worksheet is available to make the process easier, using the SMD provided in a research report and an estimated value for the Proportion Improved [NNT-from-SMD available here].
The Table at right [adapted from Higgins and Green 2011 (here)] displays NNTs derived from select SMDs, assuming different Proportions Improved in the control or comparator group. Resulting NNTs must be regarded as approximations, since there is an assumption that the underlying continuous variable being assessed has a logistic distribution with equal standard deviations in the control and treatment groups. As with the PEER described above, Proportion Improved needs to be guesstimated from study data or other sources when report authors do not provide sufficient information for a more accurate determination.
Assessing the Significance of NNTs
As with all other measures of effect, both the statistical significance and clinical significance of NNTs must be considered. Both, Confidence Intervals (CIs) and P-values can be calculated for NNTs and should be indicated by report authors as measures of statistical significance and strength of evidence [DiCenso 2001]. Prior UPDATES in this series discussed P-values [here] and Confidence Intervals [here].
If the CI and/or P-value are not provided for an NNT, then one needs to look at the data that went into calculating the NNT (or NNH if that is the focus). As noted above, if any of those measures — eg, RR, RRR, EER, CER, or ARD — were not statistically significant, then the NNT would not be significant either and probably should not have been featured in the study report.
As with CIs for other statistics, the narrower the range of the confidence limits the more precision and strength of evidence can be assumed in the NNT. There are complex formulas for calculating confidence limits for an NNT, but in many cases, some authors suggest simply inverting the confidence limits (if known) of the Absolute Risk Difference, or ARD [Altman 1998; Bender 2000, Cates 2005; Citrome 2007].
Example: Given an ARD=0.40 and its 95% CI = 0.30 to 0.50, then the NNT=1/0.40 and the confidence limits for the NNT = 1/0.50 to 1/0.30; that is, NNT=3; 95% CI = 2 to 4 (numbers rounded up).
If the CI for the ARD is not statistically significant to begin with — ie, the range includes both positive and negative numbers and crosses the null value of 0.0 — then the CI for the NNT also will be nonsignificant and cannot be accurately calculated with this method.
Having adequate study sample sizes is important for deriving NNTs that have favorably narrow confidence limits. An important reason is that random effects tend to be irregular in small, short-term studies, but predictability increases as study size and duration increase. This reflects the “Law of Large Numbers” and is an important phenomenon in all research designs [Hazewinkel 2001]. For example, if an NNT is 10 for a favorable treatment effect we might expect to have 1,000 successes if 10,000 patients are treated; however, if only a dozen patients are treated we may observe 0, 1, 3, or some other number of favorable outcomes due to random variation or chance.
A highly significant NNT of 1.0 can occur only if the event rate in one group is 100% and it is zero (0%) in the other group, which is very unlikely to happen in any valid research study. If both groups approach having exactly the same event rate, the ARD becomes infinitesimally small and the NNT calculation becomes 1/±0.0 — the resulting NNT is an infinitely large and non-significant number, denoted as the infinity sign, or ∞.
The relationships have been graphically portrayed by Citrome [2008, Figure at right]. In general, it is proposed that single-digit NNT values represent outcomes that could be meaningful in everyday clinical practice and somewhat compelling. Larger values, 50 or 100 or more, usually suggest differences in outcome and subsequent NNTs, or NNHs, that are clinically unimportant.
However, the numbers must be considered within context of the study. Some interventions with large NNTs still may be important if they prevent severe adverse events (eg, death, stroke, etc.). Conversely, a small, significant NNT for a mild benefit (eg, 10% pain relief) or to prevent only a nuisance adverse effect (eg, dry mouth) may not be of clinical importance to most patients.
In terms of clinically significant effect sizes, Standardized Mean Differences (SMD, or Cohen’s d) of 0.2, 0.5, and 0.8 are generally considered small, medium, and large effects, respectively. Citrome [2008, Figure at right] and others [Kraemer and Kupfer 2006] have proposed that the comparable NNTs are roughly 9, 4, and 3, respectively.
However, those NNT values do not fully account for potential variability in NNTs depending on event rates in control/comparator groups. The Table at right, using the approach suggested above from the Cochrane Handbook [Higgins and Green, 2011] for calculating NNTs from SMDs — and a Proportion Improved value of 40% — shows that small, medium, and large effect sizes would roughly correspond to NNTs of 12, 5, and 3, respectively. The Table also shows comparable estimates for OR and RR, if CER or PEER values of 0.40 are used.
This 40% value for Proportion Improved seems like a practical starting point, since long ago it was observed in pain-treatment studies that >30% of patients in placebo-control groups often demonstrate meaningful improvement [Beecher 1955; also discussed in UPDATE here]. However, unless the specific CER or Proportion Improved is known, it is recommended that a range of values should be tested when calculating the likely NNT (or NNH).
It must remembered that the size of the NNT alone provides an effect-size estimate of the likelihood of either a benefit or harm as measured in a trial, but this does not directly indicate if the clinical importance of the effect itself being measured is marginal, minimal, moderate, or substantial — that is, whether a treatment is worthy of use [McAlister 2008]. Nor does the NNT suggest if the effect occurs early in treatment or later, or if it continues or fades over time. These are qualities that must be evaluated separately as part of the research study design.
Example: A study that uses ≥20% improvement in pain during 4-weeks as a threshold for therapeutic success may produce an impressively low NNT favoring the treatment over some control condition. However, in terms of clinical significance, one still needs to question whether that level of pain relief is important to patients and if the effects continue beyond the relatively short-term period of observation. If 50% or greater improvement had been used as the treatment endpoint threshold in a much longer-term trial we might be more confident in the clinical importance of a low NNT.
In sum, there are statistical measures that can assess the probability of an NNT being a chance or random finding (P-value), and the strength of the evidence in terms of the width of the Confidence Interval. However, the clinical significance of an NNT needs to be assessed on a case-by-case basis taking into account study design and the thresholds used to define endpoints.
There can be a fairly narrow range of NNTs that are of clinical importance in pain medicine. For example, reviews of analgesics for acute pain, providing ≥50% relief within 4 to 6 hours, have found NNTs commonly ranging from 2 to 5 (numbers rounded up) and with confidence intervals overlapping considerably [Moore et al. 2011, also see UPDATE here]. Only rarely do analgesics still in use exhibit an NNT≥10 (eg, codeine), which is generally considered clinically unacceptable.
For any therapy or intervention there are likely to be trade-offs between possible harms and potential benefits, so considering in isolation only the NNT or an NNH associated with a treatment tells only part of the story [DiCenso 2001]. It also can be helpful to understand the relationship of these effects to each other, since a successful new treatment would have a low NNT and a high NNH in comparison with another therapy or intervention.
Therefore, in looking at the reported outcomes of a clinical trial, an important question is: “How many patients might benefit from a treatment for each one who experiences some sort of harm?” In answer to that, a metric called “Likelihood to be Helped or Harmed,” or LHH, has been suggested [Citrome and Weiss-Citrome 2012]. The formula for calculating this is simply: LHH = NNH/NNT.
Example: Shah and colleagues [2012; also discussed in UPDATE here] conducted a review and meta-analysis to estimate NNT and NNH values for pharmacotherapies used in treating irritable bowel syndrome (IBS) with diarrhea. Calculated NNTs depicted event rates in patients who “responded” favorably to therapy. The NNHs were based on study discontinuations by patients due to combined adverse effects associated with each therapy.
An interesting feature of this report by Shah et al. was a comparative analysis of 3 agents, providing a Likelihood to be Helped or Harmed analysis, or what the authors called “Benefit-to-Harm Ratios” (see Table, numbers were not rounded).
In this analysis, compared with tricyclic antidepressants and alosetron, rifaximin was clearly the most beneficial with very few discontinuations (ie, large NNH) and an LHH, or NNH/NNT ratio of 846. That is, there was only 1 discontinuation of the drug due to adverse effects (harm) for every 846 patients who favorably responded to the therapy (benefit). For the other two agents, approximately 1 patient discontinued therapy for every 3 who benefitted. However, if only the NNTs were considered, rifaximin would not have appeared to be the most advantageous among the 3 drugs.
With better reporting of research outcomes in the pain management literature, providing adequate benefits and harms data, an LHH analysis might be applied to a variety of therapies. However, while NNT and NNH are clinically useful and intuitive measures of effect size — allowing assessments of both clinical efficacy and tolerability — there are some important points regarding LHH to consider, for example [Citrome 2007 and 2008; Citrome and Weiss-Citrome 2012]:
- An LHH, or harm-to-benefit ratio (NNH/NNT) much greater than 1.0 is obviously preferred, since it denotes that many patients benefit for each one harmed in some way. At a ratio of 1.0 there is an equal trade-off between benefits and harms; less than 1.0 would denote that harm exceeds benefit.
- The clinical significance of LHH ratios must be considered in context. For example, a drug might have an unfavorably small LHH ratio, say 3, when it comes to dry mouth, but this might be viewed as somewhat inconsequential since it is a minor side effect that is unlikely to influence treatment failures. Conversely, what seems like a desirably large LHH ratio, such as 500, involving a serious adverse effect like stroke or heart failure might still be of great consequence for influencing a decision against a therapy.
- An important limitation is that an LHH value, itself, may not account for the relative impact of time-to-event outcomes and duration of an effect. For example, an adverse effect may be less troublesome if it arises early during treatment and is short-lived; or, a beneficial effect could be preferred if it occurs early in treatment and continues for some time.
- Interpreting the clinical relevance of an NNT, NNH, or NNH/NNT can be somewhat subjective and may be most helpful when comparing those metrics across multiple therapies for the same disorder. This is clearly demonstrated in the Shah et al. study of IBS therapies above, in which alosetron and tricyclic antidepressants appear comparable in terms of their LHH ratios, but rifaximin is clearly superior on that measure.
- It should be noted that in some of the earlier literature [McAlister et al. 2000] the LHH was described as a “benefit-to-risk ratio” calculated as [1/NNT]:[1/NNH]. However, this seems to be a more complex way of accomplishing the same objective. For example, with NNT=20 and NNH=60, this becomes 0.05:0.017, or 3:1. Whereas, simply taking NNH/NNT would yield 60/20, or 3. Appropriate interpretation of the two approaches arrives at essentially the same understanding of LHH.
Despite the data requirements, caveats, and limitations, calculations of NNT, NNH, and NNH/NNT (LHH) can be very useful for assessing and selecting optimum pain therapies; however, this is not always possible with current pain management research reports. For example, Shah and colleagues  found that their analytical approach was impractical when it came to treatments for IBS with constipation due to missing data in available studies. Whether inadvertently or otherwise, research report authors often do not provide adequate data for calculating the sort of clinically meaningful comparisons afforded by NNT, NNH, and LHH.
Clinicians, in consultation with patients, need to decide when treatment effects are sufficiently large and beneficial to more than offset harms or costs of a therapy or intervention. NNT, NNH, and LHH are statistical tools that can help put those effects into a meaningful context; although, they are only part of the total assessment when such decisions must be made. At the same time, some researchers have found that more appropriately conservative decisions are made when data are presented in terms of NNT, NNH, and/or LHH than as SMD, OR, RR, RRR, or other measures of effect [Moore 2009].
Unfortunately, effect sizes expressed as NNT or NNH are commonly omitted from research reports. Of greatest concern, sometimes when study data are converted to these measures — rather than SMD, OR, RR, RRR, etc. — what appeared to be an advantageous therapy or intervention is revealed as being of much lesser consequence. Therefore, the burden of discovering the true quality of evidence presented in pain research reports often rests with educated consumers of the literature who understand the nuances of effects expressed as NNT, NNH, and/or LHH metrics and know how to calculate them.
> Altman DG. Confidence intervals for the number needed to treat. BMJ. 1998(Nov);317:1309-1312.
> Anon. Calculating and using NNTs. Bandolier (Oxford University). 2003(Feb); online [here].
> Anon. Number Needed to Treat (NNT). Centre for Evidence-Based Med (Oxford Univ). 2012; online [here].
> Beecher HK. The powerful placebo. JAMA. 1955;159(17):1602-1606.
> Bender R. Improving the calculation of confidence intervals for the number needed to treat. In: Hasman A, et al. eds. Medical Infobahn for Europe. IOS press, 2000 [PDF here].
> Cates C. NNT - No need to be confused. UPDATE newsletter. 2005(Sep 15), online [PDF here].
> Chatellier G, Zapletal E, Lemaitre D, Menard J, Degoulet P. The number needed to treat: A clinically useful nomogram in its proper context. BMJ. 1996;312:426-429.
> Citrome L, Weiss-Citrome A. A Systematic Review of Duloxetine for Osteoarthritic Pain: What is the Number Needed to Treat, Number Needed to Harm, and Likelihood to be Helped or Harmed? Postgrad Med. 2012(Jan);124(1):83-93 [access here].
> Citrome L. Compelling or irrelevant? Using number needed to treat can help decide. Acta Psychiatr Scand. 2008;117(6):412-419 [article here].
> Citrome L. Show me the evidence: Using number needed to treat. Southern Med J. 2007;100(9):881-884 [article here].
> Deyo RA, Smith DHM, Johnson ES, et al. Prescription Opioids for Back Pain and Use of Medications for Erectile Dysfunction. Spine. 2013(May 15);38(11):909-915 [abstract here].
> DiCenso A. Clinically useful measures of the effects of treatment. Evid Based Nurs. 2001;4:36-39 [available here].
> Higgins JPT, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. [available here].
> Fashner J, Bell AL. Herpes zoster and postherpetic neuralgia: prevention and management. Am Fam Physician. 2011;83(12):1423-1437.
> Hazewinkel M, ed. Law of large numbers, in Encyclopedia of Mathematics. New York: Springer; 2001.
> Kraemer HC, Kupfer DJ. Size of treatment effects and their importance to clinical research and practice. Biol Psychiatry. 2006;59:990-996.
> Laupacis A, Sackett DL, Roberts RS. An assessment of clinically useful measures of the consequences of treatment. NEJM. 1988;318:1728-1733.
> McAlister FA. The “number needed to treat” turns 20 — and continues to be used and misused. CMAJ. 2008;179(6):549-553 [article here].
> McAlister FA, Straus SE, Guyatt GH, et al. User’s Guides to the Medical Literature: XX. Integrating Research Evidence With the Care of the Individual Patient. JAMA. 2000;283(21):2829-2836.
> Moore RA, Derry S, Eccleston C, Kalso E. Expect analgesic failure; pursue analgesic success. BMJ. 2013 (May);346:f2690 [abstract].
> Moore RA, Derry S, McQuay HJ, Wiffen PJ. Single dose oral analgesics for acute postoperative pain in adults. Cochrane Database of Systematic Reviews. 2011;9(CD008659) [available here].
> Moore RA, Eccleston C, Derry S, et al. “Evidence” in chronic pain – establishing best practice in the reporting of systematic reviews. PAIN. 2010;150:386-389.
> Moore RA. What is an NNT? Bandolier (Oxford Univ). 2009, online [PDF here].
> Oxman MN, Levin MJ, Johnson GR, et al. A vaccine to prevent herpes zoster and postherpetic neuralgia in older adults. NEJM. 2005;352(22):2271-2284 [abstract here].
> Sackett DL, Deeks JJ, Altman DG. Down with odds ratios! Evidence-Based Med. 1996(Sep/Oct);1(6):164-166.
> Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-Based Medicine: How to Practice & Teach EBM. New York, NY: Churchill Livingstone; 1997.
> Shah E, Kim S, Chong K, et al. .Evaluation of Harm in the Pharmacotherapy of Irritable Bowel Syndrome. Am J Med. 2012(Apr);125(4):381-393 [abstract here].
> Suissa S. Calculation of number needed to treat [letter]. NEJM. 2009;361:424-425.
> Thakur R, Philips AG. Treating herpes zoster and postherpetic neuralgia: an evidence-based approach. J Fam Prac. 2012;61(9):S9-S15.
For a listing of and access to other articles in this series [Click Here>.