Part 13 – Guiding Evidence-Based Clinical Decisions
Systematic reviews of the literature followed by meta-analyses of the collected data can be invaluable for helping to guide evidence-based clinical decisions regarding pain management therapies or interventions. However, there are potential caveats and limitations to understand and take into account, since a poorly designed and executed systematic review and meta-analysis can be confusing and misleading.
Simply put, a meta-analysis is an observational research study in which the “subjects” for inclusion are carefully selected studies that investigate a common clinical question, or hypothesis, of interest [Siegfried 2010]. The process begins with a thorough, systematic search and review of all relevant studies and, when randomized controlled clinical trials (RCTs) are selected as “subjects” for investigation, the meta-analysis of study data achieves the highest status in the hierarchy of evidence.
This “Evidence Hierarchy” (table) was discussed in Part 2 of this series [here]. Ranked from weakest at the bottom to strongest at the top, this recognizes that evidence represented in studies toward the top of the pyramid may be given greater emphasis for guiding clinical decision-making.
As with all other types of pain research, systematic reviews and meta-analyses are straightforward in concept but complex when it comes to their design, execution, and interpretation. It can require several hours to thoroughly read and understand nuances of the published report, and readers need to be aware of the many limitations that may influence faulty, biased, or misleading conclusions.
Perhaps the most important lesson demonstrated by good systematic reviews and meta-analyses is that any single research study — no matter how extensive, large, and methodologically sound — provides only a partial picture of what has been discovered or is yet to be revealed about a therapy or intervention for better pain management. It is only by objectively evaluating a body of evidence accumulated over time that relatively unbiased and externally valid conclusions can be reached for clinical decision making.
Systematic Reviews — The Discovery Process
Systematic reviews gather all evidence available to address clearly-focused hypotheses, such as whether one therapy or intervention is better than another for a particular pain disorder. Unlike narrative reviews, or “perspectives articles,” in which authors select only those studies that they consider relevant and then summarize what they think the studies mean (which has a high risk of personal-opinion bias), a systematic review identifies and includes studies according to an explicit and objective set of criteria established ahead of time [Cochrane Collaboration 2011; Egger et al. 1997; Greenhalgh 1997; Oxman et al. 1994].
The researchers must ferret-out all studies on the issue and then fairly adjudicate which evidence meets preset standards for inclusion [evidence quality was discussed in Part 11 of this series here]. To be thorough, multiple databases — eg, Medline, Embase, CINHAL, Cochrane Library, trial registries, and many others — should be searched for studies in English as well as other languages. However, any review containing only published studies may be incomplete, and the authors should also scour the “grey literature”; eg, research that may have never appeared in journals [Ahmed et al. 2012].
The researchers must consider that “positive” trials, those finding significant and/or favorable outcomes, are more likely than “negative” ones to be published (publication bias), or to be cited more often in the literature (citation bias), or even to be submitted for publication in the first place (called the “file drawer problem”) [Ahmed et al. 2012; Sterne et al. 2011]. Therefore, a truly comprehensive discovery process can be arduous and time consuming for the researchers.
Almost always, a great many (usually hundreds) of studies on a topic may be discovered, but with only a handful meeting rigid eligibility criteria for inclusion in a systematic review. This is particularly evident in the pain research field, in which there usually is an abundance of small, poor quality studies on most topics. However, there is some evidence that only half of systematic reviews adequately assess the quality of selected studies [Moore et al. 2010].
Systematic reviews may include almost any type of clinical study, but the highest quality review is achieved when selecting only well-designed randomized controlled trials (RCTs). Importantly, if the quality of selected studies is poor, the results may tend to falsely favor benefits of the therapy or intervention examined, leading to distorted conclusions.
Good systematic reviews facilitate the relatively rapid assimilation of large amounts of research by readers. However, critics have expressed concerns about the validity of combining studies that were done on different patient populations, in different settings, at different times, and sometimes for different reasons [Siegfried 2010]. Still, such diversity can be a strength of a review if the researchers account for potential biases or confounding factors that may come into play.
Not all systematic reviews are precursors of data meta-analyses; sometimes, the purpose is to provide more narrative descriptions of collected research on a specific subject, summarizing the evidence in non-statistical terms [Gliner et al. 2003]. While this may be of value to readers, it also could be necessitated by insufficient or incompatible data in the respective study reports, in which case the quality and reliability of the evidence might be questioned.
Clinical practice guidelines often result from a systematic review process that can be quite elaborate. Conclusions and recommendations tend to be reliable and appropriate IF there is an abundance of high-quality evidence available, which often is not the case in the pain field. Too often, strong recommendations are made on the basis of weak evidence and readers need to be wary of this possibility [Leavitt 2009].
Meta-Analysis — Aggregating Outcomes Data
Meta-analyses take systematic reviews a step further by combining data from the selected studies — almost always RCTs — and using statistical techniques to analyze the results and reach summary conclusions [Siegfried 2010]. Hence, as noted above, these are research projects in which the unit of analysis becomes individual studies rather than individual patients/subjects. This approach facilitates achieving greater precision and clinical applicability of results than is possible with any individual study or systematic review; however, the validity of the meta-analysis depends heavily on the quality of the systematic review on which it is based.
Essential requirements of meta-analysis are summarized by Hennekens and DeMets  as follows:
“The quality and usefulness of any meta-analysis are dependent on the quality and comparability of data from the component trials. In particular, the trials combined should have high adherence and follow-up rates and should have reasonably comparable drugs, doses, and outcomes. The characteristics of the participants and the magnitude of effect from each trial must be sufficiently similar so that their combination will not produce a distorted estimate. Thus, meta-analyses can reduce the role of chance in the interpretation but may introduce bias and confounding.”
The foremost approaches to systematic reviews and meta-analyses have been specified in the Cochrane Handbook and its associated software [Cochrane Collaboration 2011]. Inaugurated in 1993, the mission of the Cochrane Collaboration — which includes an international group of clinicians and researchers — is to help practitioners and patients make well-informed decisions about health via preparing and maintaining a collection of systematic reviews and meta-analyses on the benefits and risks of selected healthcare interventions.
Transforming Data Into Effect Measures
A first step in meta-analysis is ensuring that the included studies from the systematic review address the same hypothesis using similar methods. There also must be adequate outcomes data on important variables from each study — such as, pain, functionality, or other measures — so that they can be combined, or “pooled,” for analysis.
In many cases, the summary data for each study may not be expressed the same and the researchers will need to statistically transform the data into comparable standardized measures of clinical effect along with their respective Confidence Intervals. The effect can be expressed, for example, as a Risk Ratio (RR), Odds Ratio (OR), Correlation Coefficient (r), Standardized Mean Difference (SMD or Cohen’s d), or Number-Needed-to-Treat (NNT). This allows an “apples-to-apples” aggregation of outcome effects, even if the data were originally presented differently across studies. [Effect sizes were discussed in Part 10 here, and Confidence Intervals, or CIs, were discussed in Part 5 here — readers may wish to review these concepts.]
Size Matters in Meta-Analyses
The importance of sample sizes in clinical trials, and the associated statistical power, was discussed in Part 9 of this series [here]. Small-sized studies have many limitations that can negatively affect their validity; therefore, a collection of small, underpowered trials, no matter how many in number, does not amount to a large and valid aggregation in a meta-analysis.
An important principle is that, even if a statistically significant and beneficial effect of a treatment is found in a study with relatively few participants, it is likely that this result might be skewed or biased. For example, in an extensive examination of this problem, Nüesch et al.  observed that the predominance of small studies in the pain field — considered as <100 subjects in each group — can distort results in favor of falsely depicting beneficial effects of a treatment; the true effect size may be overestimated by as much as 50%.
Even at 100 participants in each group, Nüesch and colleagues noted, there is only 80% power to detect a small-to-medium standardized effect size (eg, Cohen’s d) of about 0.40 at a two-sided P=0.05. This would amount to only a difference of about 1 unit between experimental and control groups on a 0-to-10 unit visual analog scale assessing pain.
Others have suggested that pain trials with fewer than 50 subjects per treatment arm are potentially more biased toward erroneously favoring treatment effects than those with 50 to 200 patients [Moore et al. 2010]. Such risk drops further with numbers greater than 200 subjects per group; although, studies this large in the pain field are uncommon.
Larger studies benefit from increased statistical power and greater precision in resulting effect sizes, with narrower Confidence Intervals. However, these advantages can be nullified in a meta-analysis if the studies are of low quality in terms of methodology and execution. So, there sometimes must be compromises made between size and quality.
Weighing the Evidence
The selected studies even in a good systematic review customarily differ in terms of their size and/or quality, which must be taken into account for a fairly balanced meta-analysis [Egger et al. 1997]. The results of the different studies cannot be merely added together and divided by the number of studies, or averaged, since this would produce a distorted and misleading outcome.
Via statistical methods, usually built into computer software, individual studies are assigned more or less “weight” depending on size (eg, number of subjects and the associated data variance) and, occasionally, quality factors as well — eg, length of followup, blinding, randomization techniques, etc. Then, the final calculation of aggregated data depicts an adjusted, weighted average of the individual studies.
In this way, stronger evidence is given preference over weaker evidence, as it should be. That is, larger and higher quality studies, with less random variation in outcomes, are usually given more emphasis in summary, or pooled, effect-size calculations than smaller ones of lower quality. Report authors should indicate the weight assigned to each study and describe how weighting was determined; especially, whether both size and quality factors were taken into account, which is not always the case.
Graphic Presentation as “Forest Plots”
The classic feature of a good meta-analysis is the graphic depiction of data as a “Forest Plot” [Crombie and Davies 2009; Greenhalgh 1997; Guyatt et al. 1993; Oxman et al. 1994]. There are variations of these graphs depending on available data and researcher preferences, but they follow a pattern similar to the figure.
The stacked horizontal lines represent individual studies, with line-length indicating the Confidence Interval (almost always a 95% CI) and a box — or a dot/blob in some cases — along the line indicating the point estimate of the standardized effect. Sometimes, different sized boxes or blobs are used to indicate varying study weights or precision, usually relating to study size. An arrow at the end of a line suggests that the CI extends off the scale of the Plot.
Along the bottom, a horizontal line represents the scale of effect sizes, which varies depending on effect measures used. A solid vertical line corresponds to the point of no difference in effect size between groups in the individual studies, or null effect, so it is the line of no effect, such as an RR=1.0 or r=0.0. If the CI crosses the line of no effect, it suggests that there was no statistically significant difference between groups in the particular study (eg, P>0.05).
Toward the bottom, a diamond shaped mark represents the pooled (mathematically combined) standardized effect size data from all of the studies, which has been adjusted to take into account the various weightings. The horizontal width of the diamond represents the CI, and the center is the point estimate of the pooled effect. If a horizontal tip of the diamond is at or crosses the line of no effect, then the pooled estimate is assumed to be statistically nonsignificant and the null hypothesis is accepted — ie, there is no significant difference in overall effect due to the therapy or intervention other than by chance. Also see examples below.
Forest Plots are very practical and useful in allowing readers to visualize what happened in the individual studies, as well as how their combined or pooled results suggest clinically useful conclusions. These Plots also convey where there may be limitations of the data or cause for concern; eg, in the hypothetical Plot above only 3 of the 4 included studies are statistically significant, and 1 of those 3 (the topmost) is significant in the opposite direction; it also is probably the largest study (evidenced by the narrowest CI line) and would be most heavily weighted.
Homogeneity vs Heterogeneity in Meta-Analysis
In the ideal scenario, the results of all trials in a meta-analysis — eg, effect sizes and their CIs — would be consistently similar to and mathematically compatible with each other; that is, there is homogeneity across trial outcomes. Many times this is impractical, if nothing else due to the various sizes of included trials and the different ways that they were conducted; so, there often are considerable inconsistencies and variability in outcomes, called heterogeneity.
Forest Plots help to visualize these conditions, facilitating use of an “eyeball test.” When homogeneity exists, all of the effect-size point estimates are on the same side of the line of no effect and the CIs all overlap each other to some extent. In contrast, heterogeneity is evident when there is a mix of point estimates on both sides of the line of no effect, with some CIs not overlapping each other. See examples below.
There are various statistical tests to assess heterogeneity, but those commonly reported in Cochrane reviews are Cochrane’s Q test and the x² (chi-square) test of goodness of fit [Cochrane Collaboration 2011; Sedgwick 2012]. These assess whether observed differences across study results are likely due merely to chance or random error alone; if not, then heterogeneity exists. For example, a statistically significant P-value (eg, P<0.05) for x² provides evidence of heterogeneity across study effects (ie, variation in effect-size estimates beyond chance).
In some cases, the Q or x² tests have low power to detect heterogeneity when it is present, so an additional test — Higgins I² (I-squared) statistic — is usually performed, which scores the degree of heterogeneity between 0% and 100% [Higgins and Thompson 2002; Higgins et al. 2003]. I² describes the percentage of total variation across studies that is due to heterogeneity rather than chance. As a rule of thumb, I²≈25% is considered low heterogeneity, ≈50% is moderate, and ≈75% is high. Often, a Confidence Interval is provided for the I² value that is helpful in gauging the full potential of heterogeneity in the data [Crombie and Davies 2009].
The degree of heterogeneity is important, because it determines what sort of statistical analysis procedure will be applied when pooling the various study data: ie, either a fixed or random effects model [Crombie and Davies 2009, Gliner et al. 2003; Riley et al 2011].
- Fixed-effects modeling is applied when heterogeneity is low or absent. It assumes that the size of treatment effects across all studies is essentially the same (fixed) and any variation observed between studies is most likely due to the play of chance.
- Random-effects modeling is applied when significant heterogeneity is present. It assumes that there actually might be significant differences of importance between treatment effects in the studies, and takes into account both within study and between study variability. This model tends to give more weight than usual to small trials in balancing effect sizes across studies.
When it is necessary to use a random-effects model the variance of the pooled effect size will tend to be increased, resulting in less precision and a wider Confidence Interval. Thereby, as heterogeneity increases it becomes more difficult to obtain a valid, statistically significant result. In fact, if the amount of heterogeneity is large (I²≥75%), there may be some question as to the appropriateness of continuing the meta-analysis and attempting to calculate any summary effect size. The following examples help to illustrate the above discussions:
EXAMPLE 1: Homogeneity
Researchers undertook a meta-analysis to assess the efficacy of parenteral corticosteroid (dexamethasone) for the relief of acute severe migraine headaches and the prevention of recurrent headaches in adults [Colman et al. 2008]. Based on an extensive systematic review, 7 RCTs were identified in which either single dose dexamethasone or placebo was given in combination with standard abortive treatment to patients with severe headache. For the prevention of recurrent headache within 72 hours, the Forest Plot below demonstrates that dexamethasone was significantly more efficacious than placebo (P=0.003).
Several important features in the Forest Plot are worth noting:
- The 95% CIs of the all 7 studies overlap to some extent and the point estimates (boxes) are all on the same side of the vertical line of no effect (1.0), which suggests by “eyeball test” that there is low if any heterogeneity. Indeed, as indicated, the x² test for heterogeneity was nonsignificant (P=0.40), and the I² statistic was only 3.4%. Therefore, a fixed effects model was appropriately used in calculating the pooled effect.
- The assigned weights appear to be proportional to study size; however, the largest study (twice the weight of any others, 30.26) was not statistically significant. The authors do not discuss if study quality also entered into the weightings.
- The pooled overall Relative Risk effect was statistically significant (P=0.003), but the estimated RR=0.74 — or, a 26% reduction in headache recurrence with dexamethasone compared with placebo — was only a small-to-medium effect size, with a 95% CI of 0.60 to 0.90.
- Additionally, the authors calculated an NNT=9 for the aggregated outcome; that is, for every 9 patients treated with dexamethasone rather than placebo, 1 additional patient would benefit from a reduction in headache recurrence. This is considered to be only a small effect size that is comparable to a Cohen’s d of about 0.20 [Citrome 2008].
- An interesting feature is that only the first study (at the top) exhibited a statistically significant result (ie, the 95% CI does not cross the line of no effect), while the others exhibited weaker outcomes that are not statistically significant (although the authors do not discuss this). Yet, the pooled RR was statistically significant (P=0.003).
This seemingly contradictory outcome was largely due to a narrower 95% CI achieved by aggregated numbers of subjects as weighted studies were mathematically combined (see cumulative Forest Plot below). This resulted in greater precision and, consequently, statistical significance.
- It also should be noted that, except for the largest study, all of the others were relatively small scale and may have been underpowered. This might bias outcomes; although, it is not known if larger and adequately powered studies would yield a more clinically significant outcome.
So, can these results be trusted as having external clinical validity for patient care? A number of factors may have biased outcomes toward a favorable effect of dexamethasone and the results of this meta-analysis might be only cautiously accepted until larger, high-quality studies are conducted.
EXAMPLE 2: Heterogeneity
Linn et al.  conducted a systematic review and meta-analysis of studies assessing the relationship between physical activity and disability in patients with nonspecific chronic low back pain. The Plot at right shows correlation coefficient point estimates (blobs/dots) with their respective 95% CIs for each of 14 included studies. The “pooled” line represents a statistical summary of the data; the negative correlation in this case indicates that higher levels of disability are significantly associated with lower levels of physical activity in these patients.
There are several observations that can be made from the Forest Plot, many of which were not discussed by the report authors:
- Overall, the 14 studies were diverse and inconsistent (ie, heterogeneous), as is clearly suggested by dispersion of the Confidence Intervals and point estimates by “eyeball test.” While the report authors did not provide a statistical assessment of this heterogeneity, as they should have, they did appropriately use a random-effects model in their calculations.
- More than half (8) of the 14 studies exhibit no or marginal statistical significance (ie, P>0.05), based on CIs that cross or touch the line of null effect (0.0).
- The remaining 6 studies were statistically significant (P<0.05) and exhibited moderate to strong effects. Two of those studies alone — farthest on the left, with the largest effect sizes and narrowest CIs — probably had the greatest weightings (not presented by the authors) and might alone have driven a statistically significant pooled estimate.
- Also, 8 of the 14 studies were RCTs whereas 6 were observational (cross-sectional or cohort) in design, which is a mixing of methodologies that might have reduced precision and biased results of the meta-analysis [Egger et al. 1998].
- The pooled correlation is statistically significant (P<0.05), since the CI range (-0.51 to -0.15) does not include the line of no effect (0.0). However, a correlation coefficient point estimate of r = -0.33 is a relatively small-to-medium effect size, and this combined with the other concerns may challenge the strength of the evidence in supporting the relationship between physical activity and disability.
These two examples illustrate the importance of meta-analysis for a better understanding of a particular issue in pain management. If examined in isolation, many of the individual studies would foster a completely misleading conclusion, particularly when there is heterogeneity in the overall body of evidence and/or small sample sizes.
Only through a statistical aggregation of outcomes can a clearer and more objective clinical assessment of treatment effects be achieved. Along with that, however, there is a need to consider where there may have been limitations and gaps in the available evidence, or biases in the analysis and interpretation that weaken results.
The Critical Importance of Sensitivity Analyses
There are many ways in which the countless decisions that researchers must make can affect the conduct and interpretation of a systematic review and meta-analysis, tilting the verdict in one direction or another [Goodman and Dickersin 2011]. In a seminal paper discussing this dilemma, Simmons et al.  coined the phrase “researcher degrees of freedom” to express the countless small choices that may add up to largely questionable outcomes.
In statistical parlance, degrees of freedom, is the number of values in a calculation that are free to vary. Similarly, researcher degrees of freedom represents the potentially considerable variation in the choices meta-analysts make when judging the quality and validity of the included studies, deciding on which statistical procedures to use, and determining how to report and interpret results.
Often, the various choices are made in ways that will influence a finding of positive outcomes or significant effects that are publishable. This is not necessarily a deception, but an unfortunate reality in today’s publish-or-perish environment that may motivate researchers in self-serving ways [Simmons et al. 2011].
A good meta-analysis report aids readers in determining for themselves the reasonableness of the decisions made by the researchers and their likely impact on the final pooled estimate of effect and its significance. This can be particularly important if there was heterogeneity among the studies, since there is a need to understand how the differences came about [Egger et al. 1997].
In this regard, it is critical that the researchers conduct some form of sensitivity analysis. In general, sensitivity analyses explore ways in which findings might be changed by varying inclusion/exclusion of certain studies, the statistical treatment of the data, and the influences of other factors [Crombie and Davies 2009].
As a start, the researchers may explore effects of excluding certain studies from the analysis — eg, very large or small studies, unpublished studies, or those of low quality — to see if the main results hold constant in terms of effect size and significance. They also might examine how consistent the results are across various subgroups — eg, defined by patient demographics, clinical setting, etc. — to uncover important factors that influenced outcomes. For example:
Above, in EXAMPLE 1 (Homogeneity), only 1 of the 7 trials was statistically significant, and it was of only moderate size. So, it could be important to know what the pooled effect size and its statistical significance would be without that particular trial included in the analysis.
Similarly, in EXAMPLE 2 (Heterogeneity) above, it could be important to assess pooled outcomes only for the 57% of studies that were RCTs rather than observational studies, and/or to selectively eliminate studies that were outliers in terms of size or methodological quality. Meta-analyses of observational studies (eg, cohort or case-control) are usually burdened by may biases and confounding factors that weaken their precision and the reliability of outcomes [Egger et al. 1998].
To be thorough, researchers should compare both fixed- and random-effects statistical modeling for analyzing key data. Computer software makes this a relatively easy process.
For example, the Forest Plot in the figure at right [excerpted from Vickers et al. 2012] demonstrates (red diamonds) that acupuncture for chronic headache demonstrated a moderate, statistically significant pooled effect size using a fixed-effect model. However, since there was some heterogeneity in the data, random-effects modeling produced a smaller effect size with a much wider Confidence Interval, thus somewhat lowering the precision and the efficacy estimate of the intervention.
If researchers use only a fixed-effects approach it can yield misleading results when there is heterogeneity in the data, demonstrating strong and significant effects that may not truly exist. On the other hand, if there is little or no heterogeneity, and a random-effects model is applied, the results will be the same as if a fixed-effects approach was used. So, there may be greater confidence in meta-analysis results if researchers use the more conservative random-effects modeling statistics; or, at the least, if they present dual-modeling analyses as in the example.
Funnel Plots – Checking for Publication Bias
An important concern in any meta-analysis is that important studies may have been undiscovered during the systematic review process. More often than not, these are studies demonstrating nonsignificant or negative outcomes that have succumbed to publication bias or the file-drawer problem (described above).
A Funnel Plot displays the studies included in a meta-analysis in a scatter plot portraying effect size against some measure of precision or the extent to which the findings might be affected by the play of chance (eg, sample size, standard error, differences between groups on some parameter, etc.) [Sterne et al. 2011]. For example…
The expected portrayal if there is little or no bias or heterogeneity in study selection is roughly one of a symmetrical, inverted funnel. The figure [adapted from Sterne et al. 2011] shows a symmetrical Plot in which the vertical dashed line is the pooled effect-size estimate and the solid vertical is the line of no effect. The outer dashed lines denote a triangular region in which 95% of the studies are expected to lie if selection bias and/or heterogeneity across studies are largely absent.
Well-constructed Funnel Plots help to demonstrate strengths and weaknesses in research on a particular therapy or intervention. If the plot is grossly asymmetric (eg, lopsided), or there are few pertinent studies, it suggests that many important trials may have been overlooked or elided during the systematic review — often those demonstrating little or unfavorable effect.
In some cases, Funnel Plots can be difficult to visually interpret, and there are statistical tests of asymmetry that can be used if there are 10 or more studies in the meta-analysis [Sterne et al. 2011]; although, some experts have claimed that asymmetry tests do not elucidate the full extent of publication bias [Bandolier 2005]. However, these Plots often are omitted entirely in meta-analysis reports in the pain field, which should be explained by report authors.
Cumulative Forest Plot
As noted above, when there is significant heterogeneity it may reduce confidence in the strength of pooled results, even if the outcome is statistically significant. And, data depicted in traditional Forest Plots may sometimes raise important unanswered questions. A more revealing approach is offered by a Cumulative Forest Plot [Carr 2008; Egger and Smith 1997].
The figure below [adapted from Carr 2008] depicts a meta-analysis of RCTs spanning 20 years that addressed the reduction of postoperative pulmonary atelectasis (lung collapse) during epidural versus systemic opioid analgesia. The traditional Forest Plot on the left shows a statistically significant moderately-sized pooled effect (in red at the bottom) favoring epidural opioid administration; however, by “eyeball test” there appears to be some heterogeneity of outcomes across individual studies that might diminish confidence in the overall clinical validity of the findings.
The Forest Plot on the right uniquely depicts a cumulative meta-analysis in which aggregate point estimates with CIs are successively recalculated as each study is added to the analysis in chronological order of its publication. It is apparent that, as larger studies became available and a critical mass was achieved by 1984 (approaching 200 subjects total) and thereafter, the outcome became statistically significant and the emerging consistent pattern evokes somewhat greater confidence that epidural administration may, indeed, be most favorable.
When a therapy or intervention is genuinely effective, it would be expected that continuing research would produce stronger and more convincing results over time as the body of evidence accumulates; aggregated numbers of subjects increase and CIs become narrower. However, if a treatment is studied for some time and the outcomes continue to be weak and inconsistent (with heterogeneity between studies), treatment efficacy should be questioned.
A Cumulative Forest Plot depiction — which is unfortunately rarely provided in pain research reports — helps to demonstrate the pattern over time and the importance of larger, statistically significant trials for influencing pooled outcomes. Still, in the above example it is relevant to consider that, if the two larger trials from 1985 and 1987 were eliminated in a sensitivity analysis, the overall pooled effect at the end might not be statistically or clinically significant. A closer inspection of those two trials for critical differences in terms of methodology, patient population, or other factors affecting the strength of their outcomes might be enlightening.
Network Meta-Analyses – Extending the Capabilities
Researchers, as well as clinicians and their patients, are often interested in efficacy comparisons among a number of drugs or across different interventions for pain conditions. Most meta-analyses involve only pair-wise direct comparisons, such as a drug vs placebo or one drug vs another drug. Rarely are there sufficient studies for meta-analysis examining head-to-head clinical trials of all the different treatments that may be of interest.
A newer approach — “Network Meta-Analysis” — seeks to overcome the limitations. Also called a “Multiple (or Mixed) Treatment Comparison (MTC) Meta-Analysis,” this technique permits a meta-analysis in which multiple treatments (3 or more) are compared using both direct comparisons (if available) as well as indirect comparisons across trials based on common comparators (eg, placebo or a standard treatment) [Li et al. 2011; Mills et al. 2012; Song et al. 2003].
For example, in the simplest case there may be a number of trials comparing drug A to placebo and also comparing drug B to placebo, but not A vs B. Since they each have the same comparator (placebo), the researchers can use a network approach to compare drug A vs B. Many additional drugs might be compared in an expanded network if sufficient studies are available, but there are limitations that researchers must take into account in their analyses and these should be discussed in their published reports.
Network meta-analyses can be quite complex, but useful. For example, as discussed in an UPDATE [here], Swiss researchers conducted a systematic review and network meta-analysis to indirectly assess the relative safety of 7 different NSAID medications that had all been compared with placebo but not head-to-head with each other. The analysis incorporated 31 RCTs that had enrolled 116,429 patients. From these data, the researchers were able to estimate which drugs relative to the others had the most favorable safety profiles regarding cardiovascular adverse events.
Problems may arise if relatively few studies are available for inclusion in a network meta-analysis. Overambitious researchers sometimes seem to stretch available data from a meager selection of studies beyond reasonable limits for reaching their conclusions (described as “data torture” in Part 12 of this series here).
For example, investigators from the UK reported a network meta-analysis to compare patient pain response to all drugs commonly used in treating fibromyalgia [Roskell et al. 2011]. Their search uncovered 21 clinical trials variously examining 9 different medications that had each been compared with placebo, and some drugs also were tested at multiple doses. Heterogeneity across studies could not be assessed due to the few numbers of trials for each drug, and many trials also were very small in size.
Using a network meta-analysis approach, the researchers concluded that there were no strong differences in pain relief effects for fibromyalgia between the medications. However, data for the individual drugs incorporated in the meta-analysis came from no more than 3 studies, and most came from only 1 or 2 studies per drug. It is questionable whether such data were statistically representative of each drug’s true effects, or could be considered as adequate for meta-analysis purposes, so the reliability and validity of the researchers’ findings will depend on future analyses when and if more studies become available.
Conclusion: Promises & Pitfalls of Meta-Analysis
Meta-analysis is a very attractive approach because it promises more precise and definitive answers to clinical questions. It is very good at combining data from multiple studies and assessing the evidence from statistical perspectives as well as providing important information about the strength of relationships. Therefore, one can have much greater confidence in the evidence — in terms of its direction, magnitude, and statistical significance — than if there were only one or a few isolated investigations of a therapy or intervention for consideration.
Thus, meta-analysis is bestowed the highest position in the evidence hierarchy; yet, meta-analyses still must be carefully scrutinized by critical readers and the outcomes considered cautiously. There are many potential pitfalls in the conduct of meta-analyses, and the approach is fundamentally limited by the quality and size of the included studies.
Statistical procedures necessary for performing meta-analyses are driven by computer software, which makes the process accessible to a great many researchers. However, the old computer acronym, GIGO — Garbage In, Garbage Out — holds very true [Crombie and Davies 2009]. This is a particular concern in the pain field with its predominance of small-scale trials and lower-quality research evidence in many areas.
Systematic reviews and subsequent data meta-analyses were never intended to control for the influence of poor quality studies [Moore et al. 2010]. Some critics have questioned the trustworthiness of meta-analysis because of its potential to include data from relatively flawed studies, thereby covering up for “rotten apples.” Others are concerned about the prospect of combining studies assessing and measuring outcomes in very different ways, thereby comparing “apples to oranges.”
These are valid concerns, which are often overlooked (or avoided) in the sensitivity analyses and discussion sections of meta-analysis reports. In many cases authors also do not adequately discuss the clinical significance of the pooled effect sizes. In that regard, an inherent problem with meta-analyses, as with many other research designs, is that they usually describe “population average” effects, whereas clinicians want to know what is best for particular patients [Egger and Smith 1997].
Rarely is there an attempt to perform a qualitative analysis that addresses which patients, under what circumstances appear to benefit most from the treatments in the studies examined. This places the burden on educated consumers of pain research literature to carefully examine the data and judge for themselves whether the proposed conclusions are valid and reliable for any clinical purpose, setting, and specific patients. Most importantly, readers should keep in mind that an aggregation of weak, low-quality research outcomes does not make for strong evidence.
> Ahmed I, Sutton AJ, Riley RD. Assessment of publication bias, selection bias, and unavailable data in meta-analyses using individual participant data: a database survey. BMJ. 2012;344:d7762 [article here].
> Bandolier. Funnel plots: is seeing believing? Bandolier Knowledge Library. 2005(Oct);140-146 [available here].
> Carr DB. When Bad Evidence Happens to Good Treatments. Reg Anesth Pain Med. 2008;33(3):229-240.
> Citrome L. Compelling or irrelevant? Using number needed to treat can help decide. Acta Psychiatr Scand. 2008;117(6):412-419 [article here].
> Cochrane Collaboration (Higgins JPT, Green S, eds.). Cochrane Handbook for Systematic Reviews of Interventions; Ver 5.1.0. 2011(Mar) [available here].
> Colman I, Friedman BW, Brown MD, et al. Parenteral dexamethasone for acute severe migraine headache: meta-analysis of randomized controlled trials for preventing recurrence. BMJ. 2008;336;1359-1361 [abstract].
> Crombie IK, Davies HT. What is meta-analysis, 2nd ed. Hayward Medical Communications. 2009 [available here].
> Egger M, Schneider M, Smith GD. Meta-analysis spurious precision? Meta-analysis of observational studies. BMJ. 1998;316:140+ [abstract].
> Egger M, Smith GD. Meta-analysis: Potentials and promise. BMJ. 1997;315:1371+
> Egger M, Smith GD, Phillips AN. Meta-analysis: Principles and procedures. BMJ. 1997;315(7121) [abstract].
> Gliner JA, Morgan GA, Harmon RJ. Meta-Analysis: Formulation and Interpretation. J Am Acad Adolesc Psychiatry. 2003;42(11):1376-1379.
> Goodman S, Dickersin K. Metabias: A Challenge for Comparative Effectiveness Research. Ann Intern Med. 2011;155(1):61-62.
> Greenhalgh T. How to read a paper: Papers that summarize other papers (systematic reviews and meta-analyses). BMJ. 1997;315(7109).
> Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the medical literature: II. How to use an article about therapy or prevention; A. Are the results of the study valid? JAMA. 1993;270(21):2598-2601.
> Hennekens CH, DeMets D. The need for large-scale randomized evidence without undue emphasis on small trials, meta-analyses, or subgroup analyses. JAMA. 2009(Dec 2);302(21):2361-2362 [here].
> Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539-1558.
> Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analysis. BMJ. 2003(Sep 6);327:557-560.
> Leavitt SB. Misguided Pain Guidelines? Pain-Topics e-Briefing. 2009;4(1) [PDF here].
> Li T, Puhan MA, Vedula SS, et al. Network meta-analysis highly attractive but more methodological research is needed. BMC Med. 2011;9:79 [here].
> Lin CC, McAuley JH, Macedo L, et al. Relationship between physical activity and disability in low back pain: A systematic review and meta-analysis. PAIN. 2011(Mar)152(3):607-613 [abstract here].
> Mills EJ, Ioannidis JPA, Thorlund K. How to use an article reporting a multiple treatment comparison meta-analysis. JAMA. 2012;308(12):1246-1253.
> Moore RA, Eccleston C, Derry S, et al. “Evidence” in chronic pain — establishing best practice in the reporting of systematic reviews. PAIN. 2010;150:386-389.
> Nüesch E, Trelle S, Reichenback S, et al. Small study effects in meta-analyses of osteoarthritis trials: meta-epidemiological study. BMJ. 2010;341:c3515 [here].
> Oxman AD, Cook DJ, Guyatt. Users’ guides to the medical literature: VI. How to use an overview. JAMA. 1994;272(17):1367-1371.
> Riley RD, Higgins JPT, Deeks JJ. Interpretation of random effects in meta-analyses. BMJ. 2011(Feb 10);342:d549.
> Roskell NS, Beard SM, Zhao Y, Le TK. A Meta-Analysis of Pain Response in the Treatment of Fibromyalgia. Pain Practice. 2011;11(6): 516–527 [abstract here].
> Sedgwick P. Meta-analyses: tests of heterogeneity. BMJ. 2012;344:e3971.
> Siegfried T. Odds Are, It’s Wrong: Science Fails to Face the Shortcomings of Statistics. ScienceNews. 2010(Mar 27);177(7):26+.
> Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psych Sci. 2011;22(11):1359-1366 [PDF here].
> Song F, Altman DG, Glenny A-M, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ. 2003;326:472+.
> Sterne JAC, Sutton AJ, Ioannidis JPA. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomized controlled trials. BMJ. 2011;342:d4002 [here].
> Vickers AJ, Cronin AM, Maschino AC, et al. Acupuncture for Chronic Pain: Individual Patient Data Meta-analysis. Arch Intern Med. 2012;172(11):E1-E10 [abstract].
Don’t Miss Out. Stay Up-to-Date on Pain-Topics UPDATES!
Register [here] to receive a once-weekly e-Notification of new postings.