Friday, February 24, 2012

“Proof” in Pain Research: How Much is Enough?

Making Sense of Pain Research Part 11 – Weighing Evidence in Law and Medicine

Just as “Where’s the beef?!” was the exhortation in a once-popular hamburger commercial in the U.S., healthcare providers and patients should be demanding “Where’s the proof?!” when it comes to research in the pain management field. This article in our ongoing Series, “Making Sense of Pain Research,” examines parallels of proof in law and medicine regarding several critical questions: What qualifies as evidence? What are the requirements of evidence as proof? How much proof is necessary for reaching valid and just clinical decisions?

As we stated early in this Series, in Part 1 [here], “Just as juries need good evidence to reach prudent verdicts, healthcare providers need good evidence as a guide for better decision-making in treating patients with pain.” Along with that, we noted that absolute certainty in pain research is an illusion and research investigations do not actually prove anything. This should be further qualified by saying, “in pain research, just as in the law, nothing is proved beyond the shadow of a doubt,” which is discussed further below.

Common Threads of Law and Medicine

In April 2000, the U.S. Agency for Healthcare Research and Quality (AHRQ) and the Institute of Medicine (IOM) hosted an expert meeting, "Evidence: Its Meanings and Uses in Law, Medicine, and Health Care," to explore similarities in how researchers, clinical practitioners, legal scholars, judges, and juries interpret and use evidence. The result was a special issue of the Journal of Health Politics, Policy and Law [journal edition here 2001(Apr);Vol.26].

In one of the articles, Cynthia Mulrow and Kathleen Lohr [2001] observe that judging what constitutes sound evidence in medicine can be difficult because of the sheer quantity, diversity, and complexity of medical evidence available today; the various scientific methods that have been advanced for gathering, evaluating, and interpreting such information; and the guides for applying medical research evidence to individual patient situations. Many gaps and deficiencies exist in the methods for assessing evidence, and judging medical research involves subjective, not solely explicit, processes.

Still, since the early 1990s there has been a focus on evidence-based medicine, or EBM. This involves increased reliance on formal, systematic analysis and synthesis of the research literature — especially, as published in peer-reviewed journals — to determine clinical effectiveness. It challenges consensus-based judgments and applies critical assessment of the best available research to decide if there is methodologically sound evidence that a clinical option is valid. EBM also seeks to identify the types of patients for whom a particular clinical approach would be most effective.

Yet, Mulrow and Lohr stress that EBM is an aid, not a panacea, for establishing benefits and harms of medical care. The contributions that medical research evidence can make in any clinical, or legal, situation must be understood from perspectives of critical judgment, an understanding of probability, and tolerance for uncertainty — areas in which most healthcare providers need further training.

Consequently, in a second article, John Eisenberg [2001] writes, “most clinicians' practices do not reflect the principles of evidence-based medicine but rather are based upon tradition, their most recent experience, what they learned years ago in medical school, or what they have heard from their friends. The average physician is said to read scientific journals approximately two hours per week, and most are likely overwhelmed by the volume of material confronting them.”

It would be interesting and probably instructive if evidence in pain medicine were adjudicated by courts of law, with evidence required to fulfill well-established burdens of proof. Eisenberg notes some of the complexities: “In addition to deciding what evidence should be admitted, there is the challenge of determining how the evidence should be weighed in driving a decision.” There also would be a need to reconcile evidence that is probabilistic in healthcare with evidence that must be at least “without a reasonable doubt” in criminal law or represent a “preponderance of evidence” in civil cases (described further below).

Eisenberg further emphasizes, “Every participant in the healthcare system should care about how evidence is defined. Patients will receive services based upon how evidence is weighed, and clinicians will provide services based upon their conclusions about the evidence of effectiveness and risk. Healthcare managers, purchasers, and system leaders will make decisions based upon the evidence that certain services should be provided to the clientele that they serve, and policy makers, including judicial policy makers such as judges and juries, will weigh evidence to decide whether harm has been done because a service was or was not provided.”

This suggests two major distinctions between how scientific evidence is used in healthcare and in law:

  • In healthcare, evidence as proof may be most helpful in determining the likelihood that a practice, therapy, or intervention may cause benefit or harm to patients in the future.

  • In the law, evidence is used as proof to assess possible causation of an event in the past, to determine who or what was accountable for it and who was harmed or helped by it.

Therefore, the time perspectives are different —looking forward or backward — but there are common objectives in medicine and law that seek to judge evidence as a basis for arriving at unbiased and valid conclusions regarding “proof.” As one of the many commonalities of evidence and proof in both law and medicine, consider that just as an accused party in most justice systems around the world is presumed innocent until proven otherwise, scientific inquiry begins with a “null hypothesis.” This assumes that a practice, therapy, or intervention in question has no significant effect unless the body of evidence demonstrates differently within prespecified limits of certainty. It may be no coincidence that clinical research investigations are often called “trials.”

Rating the Strength & Quality of Research Evidence

Evidence Hierarchy Obviously, there are many types of clinical research studies designed to answer particular questions, or hypotheses, in pain medicine. As was noted in Part 2 of this Series, these may be assembled into a general “Hierarchy of Evidence” representing the relative strength of each type of study for providing results that are likely to be free of bias and valid for use by healthcare providers and their patients [see Table at right, and Part 2 here for explanations]. While the strength and the quality of evidence often go together they are not the same thing.

Through the years, various groups, particularly developers of clinical guidelines, have developed schemes for grading evidence in terms of both its strength and quality. All of the grading systems rate evidence as weakest toward the bottom of the hierarchy pyramid and stronger toward the top. Rankings do not question the ability of any individual research approach to be valid and of value for a particular purpose; however, each type of study has its limitations and the rankings recognize that certain forms of evidence may be given greater emphasis for guiding clinical decisions, just as certain evidence might be given greater weight in courts of law.

A problem is that there are diverse schemes for grading research evidence. Roughly a decade ago the AHRQ [2001] commissioned a report to comprehensively examine systems used for rating the strength and quality of scientific evidence. The authors discovered 20 different rating systems for systematic reviews, 49 for RCTs, 19 for observational studies, 18 for diagnostic test studies, and 40 that addressed overall grading of a body of evidence. In all, they found 121 different grading systems (individual systems total more than 121 because several included more than one type of study).

Many systems numerically rank the level of evidence from 1 at the top of the hierarchy pyramid above to 5 or more at the bottom, sometimes with intermediate stages (eg, 1a, 1b, 2a, 2b, 2c, etc.). Additionally, a “grade of evidence” assessing quality may alphabetically (eg, A, B, C, etc.) follow similar rankings. For clinical practice guidelines, there also may be various “strength of recommendation” ratings. Taken together, the various ranking, grading, and rating systems have the common goal of objectively determining evidence that is trustworthy of being valid and serving as a level of proof; but their number and diversity obfuscates rather than clarifies to the task.

Furthermore, where many evidence-grading systems fall short is by their focus on methodological quality (eg, random allocation, blinding, etc.) and statistical significance of outcomes (eg, P-values), without at least an equal emphasis on statistical power (eg, size of study) and the clinical strength of results depicted by effect sizes. Even the best designed studies, with statistically significant results, may produce effect sizes that are of negligible or questionable clinical importance [as was discussed in Part 10 of this Series here]. And, in pain-practice guidelines it is not uncommon for strong recommendations to be based on collections of weak evidence [discussed in a Pain-Topics e-Briefing, PDF here].

When estimates of clinical effect size demonstrated by individual research studies or compilations of such investigations (eg, in systematic reviews and meta-analyses) are taken into account for assessing evidence and, thereby, its capacity for serving as valid proof, a relatively simple “quality grading” of evidence has been suggested by Guyatt and colleagues [2008]:

  1. High Quality — further research is very unlikely to change our confidence in the estimate of effect.

  2. Moderate Quality — further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.

  3. Low Quality — further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.

  4. Very Low Quality — any estimate of effect based on this evidence is very uncertain.

Proof Weighing The emphasis here is on clinical research in human subjects and, admittedly, there can be a certain amount of subjectivity in judging quality, with diverse opinions when it comes to weighing the evidence at hand. It also can be difficult to predict whether today’s best estimates of effect size will likely endure over time. However, it seems reasonable to observe that the pain research field is overwhelmed by evidence of relatively low-to-moderate quality, with only a minority of studies being of higher quality. (Figure at right adapted from BMJ, 2008;337:327.)

The Evidence Hierarchy (above) is a starting place for assessing strength and quality levels, with clinical study types toward to top of the pyramid usually also yielding a higher quality of evidence. And, as other articles in this Series have demonstrated, there are objective criteria for gauging the validity of research outcomes and the level of confidence in their probable accuracy, reliability, and clinical importance. Unfortunately, researchers and organizations offering recommendations to the healthcare community have sometimes erred as a result of not taking sufficient account of evidence strength and quality.

One classic example of this — discussed by Guyatt and colleagues [2008] — is how, for a decade, organizations advised that clinicians should encourage postmenopausal women to use hormone-replacement therapy and many primary care physicians dutifully complied with the recommendation. This guidance was driven by a belief, based on limited research evidence, that such therapy substantially decreased women’s cardiovascular risks.

Had a more rigorous system of analysis been applied it would have shown that the evidence for a reduction in cardiovascular risk was of low quality, since the data came from observational studies with inconsistent results. Ultimately, stronger evidence from good quality RCTs (randomized controlled trials) established clear and convincing data as proof that hormone-replacement therapy not only fails to reduce cardiovascular risk but may even increase the risk in some women.

In view of the above concerns, a more enlightened perspective — which is both utilitarian and straightforward in comparison with the multitude of complex ranking, rating, and grading systems — may be to examine evidence in pain research the way a court of law might approach the challenge, guided by its burden of proof standards.

Onus Probandi — Burden of Proof

“Proof” is a difficult concept in science and medicine, since it could falsely imply something absolute and without a chance of error or random occurrence. Most pain research seeks to demonstrate whether or not a therapy or intervention has an effect, and to rule out unlikely explanations, but always within a certain level of doubt — possibly or probably, rather than proof, are the usual operational terms.

For example, earlier articles in this Series have discussed how statistically significant differences between groups — eg, an experimental-treatment and a comparison or control group — suggest only that something has happened and it is probably not some random event that might have occurred anyway. However, it does not provide absolute proof that there is a difference between groups, since there is always a probability within certain limits of confidence, no matter how small, that the outcome truly might have been due to chance alone. Furthermore, a statistical improvement in outcomes does not necessarily equate to a clinically meaningful effect, and, as noted above, research in pain medicine too frequently focuses on statistical benefits or harms rather than magnitude of effects as evidence of clinical importance.

The burden of proof in law (Onus Probandi in Latin) refers most generally to the obligation of a party to present evidence establishing its allegations or other pleading at trial, as adjudicated by a “trier of fact” — which may be a judge, jury, board of inquiry, or select panel. Acceptable evidence must meet certain established standards for decision-making purposes by the trier of fact, and the level of required evidence as proof for rendering final judgments may differ depending on the type of case, such as a criminal versus civil action.

Within this overall framework, the burden of proof in law might be viewed as comparable to using evidence for arriving at judgments regarding the adequacy and validity of research in making clinical practice decisions. The “trier of fact” here consists of journal editors, peer-review panels, guidelines-development committees, regulatory agencies, audiences of readers/listeners, and even the popular press. How might these groups better assess evidence as “proof” for reaching well-founded and just decisions?

The following 7 “Standards of Proof” should be considered: [Black 2012; LLI 2010; Herring 2004; Cooper 2003; Wikipedia]

1. Reasonable Suspicion

This is the lowest standard of proof under U.S. law — whereby a citizen may be briefly stopped for investigation or search — but a mere guess or hunch is not enough to constitute reasonable suspicion. Evidence is evaluated using a “reasonable person” standard; that is, a typical person could reasonably reach the same conclusion given the totality of circumstances, which may result from a combination of particular facts, even if each fact is individually innocuous.

In medicine, this sort of proof usually represents the “educated opinion” of a researcher, writer, or speaker — perhaps bolstered by personal experience or the opinions of others — which appears as reasonable to the audience. For example, a conference speaker might opine that a particular therapy is good or bad for a particular pain condition based on his own clinical observations and/or the writings of other “experts.” Such evidence might be suitable for motivating further investigation, but it is highly subject to personal bias and should not be taken as a reliable attestation of any “proof.”

2. Probable Cause

Probable cause must be based on some substantiated evidence and not just on suspicion; and, while it is more definitive, it is still a relatively low standard of proof. It may be used in the U.S. to determine whether a formal search or an arrest is warranted.

Sources of probable-cause evidence typically include (a) observations of actions or events, (b) the particular expertise of the observer based on training and/or experience, (c) statements or information gathered from various sources, and (d) circumstantial evidence implying something has occurred (eg, a harmful act). Such evidence does not directly prove what occurred and might be only hearsay.

In medicine, probable cause goes beyond reasonable suspicion or opinion by adding a semblance of unbiased information and fact gathering, albeit the amount of this may be quite small. Very low quality evidence, as described above, also might be used since the burden of proof at this level is quite tenuous. Medical evidence meeting the probable cause standard may be of some interest hypothetically or for inspiring further research but, in itself, this would be unacceptable as proof for reaching important conclusions or decisions.

3. Some Credible Evidence

In law, this standard goes a step further in the proof continuum by requiring a bare minimum of materially convincing evidence to support allegations for or against the accused or some argument in court. This level of proof does not necessarily require the trier of fact to judge the quantity or merits of conflicting evidence, if any, just to ascertain that there is sufficient evidence of seemingly reasonable quality available for proceeding and/or rendering a preliminary opinion for further action.

In medicine, even weak forms of research evidence might be used in attaining this standard of proof. For example, results of laboratory research in tissue (in vitro) or animal (in vivo) models might support a theory explaining the merits of a therapy or intervention. Furthermore, anecdotal evidence (eg, case reports) might be available supporting the theory’s potential applicability in humans. However, while such weak and low-quality evidence might be considered as credible and warranting attention, meeting this burden of proof only suggests that further inquiry could be appropriate and worthwhile.

4. Preponderance of Evidence

This is the minimum standard of proof required in most civil law cases and grand jury indictment proceedings in the U.S. The preponderance of evidence standard is met if the accusation or assertion is more likely to be true than not true. In effect, the standard is satisfied if, taking into account all of the available credible evidence, there is greater than a 50% chance that the accusation/assertion is true. This does not specify the exact amount of evidence that must be available or its absolute quality, merely that sufficient credible evidence is available for weighing a decision to reach a judgment.

In medical decisions regarding clinical practice , the evidence base must include investigations in human subjects, not just laboratory experiments. However, “preponderance of evidence” requires having a sufficient quantity of research studies available as evidence to assess the probability of truth in reaching a judgment of proof. Having too few studies available always raises doubts about the adequacy of evidence for judgment, especially if the studies are conflicting.

Quality also matters. To the extent that low-to-moderate quality evidence, described above, is included within the body of evidence for meeting this standard of proof the greater the likelihood that further investigation may change the estimates of outcome effects; it lowers our confidence in the veracity of conclusions or judgments. For example, small-scale studies — eg, statistically underpowered — are highly prone to errors in the estimation of true effects, even when large effect sizes are found, and studies in larger groups would likely produce different outcomes.

Therefore, “preponderance of evidence” may sound impressive but, in actuality, it can amount to rather limited proof and consumers of pain research need to be alert to this possibility. Unfortunately, many authors of research reports or articles seem to accept this relatively low standard as definitively supporting their arguments or conclusions. However, given the minimal requirement noted above — only >50% probability of being true — the preponderance of evidence argument may represent conclusions that are only modestly better than the play of chance.

Argumentum The above 4 standards of proof are reminiscent of the old saw about three “experts” arguing the merits of a new therapy. The first expert says, “In my experience this has proven to be a worthwhile therapy"; meaning he has successfully treated one patient. The second says, "Well, in my series of cases this therapy was a failure"; meaning he treated 2 patients. Then, the third expert chimes in, "I disagree, in case after case after case this therapy has proven to be effective"; meaning he has observed or heard of 3 patients who benefitted. Based on this, the three experts reach a consensus that the preponderance of evidence proves that the therapy is of benefit for patients. Despite the absurdity — and the fact that the plural of “anecdote” is not “evidence” — this happens frequently as writers or speakers make authoritative claims supported by alledged evidence or personal experience meeting only minimal standards of proof.

It should be noted that the extent to which outcomes from observational, epidemiological, or data-mining studies might be legitimately included in the body of evidence for this or any other standard of proof is debatable. Such studies are useful for generating hypotheses but they are at most suggestive of causation; oftentimes unreliably so. It also is important to remember that a lack of evidence for some argument or clinical effect is not itself a form of evidence; for example, the notion that there is no research evidence supporting a drug’s long-term beneficial effects is not evidence that such effects do not exist.

5. Clear and Convincing Evidence

This goes a step further: meeting the clear and convincing evidence standard requires that the body of evidence must be highly and substantially more probable to be true than not true, and the trier of fact must have a firm belief or conviction in its factuality. Interestingly, in many states, this is the standard of proof that must be met during Licensing Board investigations regarding healthcare providers accused of some infraction.

In medical research, this implies that at least a majority of available evidence is of moderate-to-high quality, in that further investigation of the subject is unlikely to alter our confidence in the estimate of outcome effects. Additionally, the implication is that evidence is available from multiple studies meeting tests of unbiased methodology (randomization, blinding, etc.), adequate size (statistical power), statistically significant outcomes (P<0.05 or better), and relevant clinical effect sizes.

6. Beyond Reasonable Doubt

This is the highest standard necessary to meet the burden of proof in most systems of jurisprudence and applies most typically in criminal proceedings. In a negative sense, proof beyond reasonable doubt is attained if there is no plausible reason to believe otherwise. If there is substantial doubt based upon reason and common sense after careful and unbiased consideration of all evidence then this standard of proof has not been met.

Proof beyond a reasonable doubt, therefore, is of such a convincing nature that one would be willing to rely and act on it without hesitation; however, it does not denote absolute certainty. In medicine, this level of proof might only be attained over time, with multiple high-quality studies, and, possibly, after a number of meta-analyses of data have been performed that include an amalgamation of outcomes in various settings applying different methodological and analytical techniques.

7. Beyond the Shadow of a Doubt

This is the strictest standard of proof and it is largely unattainable; yet, it is what many people wrongly have in mind when they think of “proof.” The beyond a shadow of doubt (sometimes called “beyond all doubt”) standard requires that there be absolutely no uncertainty about the issue at hand. It is widely accepted as an impossible standard in both law and medicine, and evidence can never attain this ultimate level because of the natural probability of error or chance, no matter how infinitesimally small that likelihood may be. This is particularly true in medicine where, if only due to the vicissitudes of biology, no practice, therapy, or intervention can be proved 100% effective or safe for all persons at all times, and no diagnostic test or other assessment can be 100% accurate and reliable.

Proof Hierarchy Using the 7 Standards of Proof described above (and summarized at right) as a framework, critical consumers of pain research should feel empowered to judge for themselves the level attained by individual studies or by a body of evidence, such as presented in research reviews, meta-analyses, guidelines, or commentary. Given the quantity, strength, and quality of the evidence presented, how certain can one be of the veracity of an author’s or speaker’s conclusions? Does the evidence barely rise above suspicion or does it prove beyond reasonable doubt that the arguments or conclusions are likely to be valid? Or, is some intermediate and less convincing level of proof more likely attained?

A current example of the evidence-as-proof conundrum in pain medicine is the phenomenon of opioid-induced hyperalgesia (OIH); that is, opioid pain relievers paradoxically seeming to make a patient’s pain worse. OIH has been accepted by many researchers, authors, and pain practitioners as a proven concept that may justify abruptly weaning patients with chronic noncancer pain (CNCP) from their opioid medication. However, is OIH truly a culprit causing pain, and what is the level of proof to support its occurrence?

There is a significant body of evidence describing OIH, as discussed in several comprehensive systematic reviews [eg, Ramasubbu and Gupta 2011; Tompkins and Campbell 2011; Fisbain et al. 2009; Angst and Clark 2006]; although, the reported data have been inadequate to perform a quantitative meta-analysis. A scientific rationale for OIH was proposed based on extensive preclinical experiments, primarily in rodents, which created reasonable suspicion that it might be important in humans. Laboratory experiments in healthy human volunteers and in persons currently or previously addicted to opioids, suggested probable cause for the existence of OIH, but this was generally lower-quality evidence.

Clinical observations of what might be OIH came from perioperative settings where opioid infusions were used and from multiple case reports among patients with CNCP; however, there were no extensive and more conclusive observational studies or prospective controlled trials to detect OIH in patients maintained on opioids for CNCP. Furthermore, there has been some confusion as to whether opioid tolerance or withdrawal was being observed in such patients, or possibly an addiction disease process, rather than a biological lowering of the pain threshold expected of true OIH. The type of opioid also might be a critical factor, since many of the preclinical and clinical studies involved only morphine and/or its toxic metabolites.

In one of the most rigorous reviews, Fishbain and colleagues [2009] found numerous confounding factors in assessing OIH that might have biased outcome observations and measurements, producing inaccurate and sometimes inconsistent results, as well as lowering the quality of evidence. They concluded that there is insufficient evidence, except possibly in the case of normal volunteers receiving opioid infusions experimentally, to support the existence of OIH in human patients. At the same time, most reviewers have been careful to note that there is inadequate evidence to refute the existence OIH; but, in any event, this would be trying to prove a negative — the nonexistence of something — which is theoretically impossible. In sum, until a higher level of proof can be established, at most, there might only be reasonable suspicion that OIH plays any role in the treatment of patients with CNCP.

The OIH example also raises a question about the extent to which cause and effect relationships are demonstrated in pain research as a form of proof. One of the greatest challenges facing a “trier of fact” — whether it be a judge, jury, or consumers of pain research — is assessing how the totality of evidence might be used to assess causality.

Cause & Effect According to Hill

An ultimate goal of research in pain management, and a foundation of the highest levels of proof, is to establish causation beyond reasonable doubt; that a practice, therapy, or intervention is a direct and independent cause of either a beneficial effect or an adverse effect of some sort. Yet, this often is a difficult and elusive quest.

Perhaps the best known approach for establishing cause-effect relationships is the Bradford Hill Criteria, also called Hill's Criteria for Causation. These are a group of 9 minimal conditions necessary to provide adequate evidence of a causal relationship between an incidence and a consequence, established in 1965 by the British epidemiologist and statistician Sir Austin Bradford Hill [Hill 1965]. To some extent, the principles set forth by Hill form a foundation of evaluation used in all modern scientific research.

Hill’s 9 cause-effect criteria include the following:

  1. Strength — How strong is the association between the putative cause and the effect? Does a change in a causative factor produce a robust change in an outcome effect? For example, this association is sometimes measured by statistical correlation; a full-strength positive correlation has a coefficient of 1.0 and a lesser number suggests a weaker association and less confidence in the relationship.

  2. Consistency — Has the association of cause-effect been repeatedly and similarly observed by different researchers, and in different subjects, settings, and circumstances? If some studies find a strong relationship, while in others it is weak or conflicting, a cause-effect relationship would be questionable.

  3. Specificity — Is the relationship of cause-effect such that there is no other likely explanation? In pain research, outcome effects may sometimes have a range of potential causes, including strong placebo effects, that must be carefully considered.

  4. Temporality — Is there a time relationship, such that the effect occurs within an expected period after the cause? Or, if a time delay is expected between cause and effect, does that interval consistently occur?

  5. Biological Gradient — Does an expected and consistent dose-response relationship occur to suggest causality? For example, does increasing the dose consistently increase the response or reaction, or vice versa? This need not be a simple linear relationship, and there could be minimum or maximum thresholds for the dose-response curve.

  6. Plausibility — Does the cause-effect relationship make sense from perspectives of current scientific theories and biologic rationales? However, what seems biologically plausible depends on the scientific knowledge of the day; if this is deficient, additional hypothesizing and testing may be required before a true cause and effect relationship can be determined.

  7. Coherence — Is the cause-effect relationship in accordance with generally known facts and research regarding the natural history and biology of the disease or condition in question? While future investigations may be required to determine a truly plausible scientific explanation, as noted above, the proposed cause-effect relationship still should not conflict with current knowledge; it must cohere to what is presently known. However, Hill cautions, it could be imprudent to assume that laboratory evidence derived from animal models would be coherent with cause-effect relationships in humans.

  8. Experiment — Is the cause-effect relationship supported by experimental research evidence? It is unclear why Hill had this lower down on his list of criteria; however, he does note, “Here the strongest support for the causation hypothesis may be revealed.” Indeed, it is through research, particularly in human subjects, that most of the other criteria can be tested and either substantiated or contested.

  9. Analogy — Is the relationship in question similarly comparable to an already established cause-effect relationship? For example, evidence for a newly-discovered opioid molecule in producing analgesia might be supported by a well-established cause-effect relationship of another, similar opioid in relieving pain.

Hill is careful to warn that the 9 criteria are not hard-and-fast rules of evidence that must be obeyed before cause and effect relationships can be accepted. What they can do, with greater or lesser strength, he says, “is to help us to make up our minds on the fundamental question — is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?” As much as anything, Hill’s Criteria provide a mindset for thinking about causality when using research or other evidence as a form of proof.

OIH Dilemma Returning to the example of opioid-induced hyperalgesia (OIH) from above, some of Hill’s Criteria for Causation have been addressed by research, but primarily in preclinical animal models or laboratory experiments in humans. At that, however, a definitive biologic rationale has only been vaguely elucidated to establish strength of the relationship between opioids and increased pain, as well as plausibility, coherence, and analogy. During clinical observations, temporality and dose-response relationships (biological gradient) have been variable, and there is a lack of consistency in some of the reported research outcomes. Specificity is problematic, as worsening pain pathology, opioid tolerance or withdrawal, and addictive disease processes often suggest alternate explanations for hyperalgesia. These deficiencies are especially evident when it comes to OIH in patients with chronic noncancer pain (CNCP), since clinical research evidence is sparse and/or of low quality, consisting primarily of case reports, in this population of patients. Therefore, while OIH might be a theoretical possibility in patients with CNCP, and some practitioners and authors may still believe it is a viable factor, causality and external validity for the phenomenon in everyday clinical practice are generally lacking.

Only when the totality of evidence is of a sufficient strength, quality, and level of proof can healthcare professionals make rational decisions for individual patients with pain and policy-makers make appropriate recommendations for the public health. On the other hand, when the totality of evidence is incomplete or weak, and causality seems obscure, it often is most prudent to remain uncertain and postpone judgment [Hennekens and DeMets 2011].

> AHRQ. Systems to Rate the Strength of Scientific Evidence. Evidence Report/Technology Assessment: Number 47. 2002;Pub. No. 02-E015 [
PDF here].
> Angst MS, Clark JD. Opioid-induced hyperalgesia: a qualitative systematic review. Anesthesiology. 2006;104(3):570-587.
> Black HC. Black’s Law Dictionary. 2nd ed; online 2012 [
available here].
> Cooper, S. (2003). "Human Rights and Legal Burdens of Proof". Web J Curr Legal Issues. 2003;3 [
available here].
> Eisenberg JM. What Does Evidence Mean? Can the Law and Medicine Be Reconciled? J. Health Politics, Policy, Law. 2001;26 [
article here].
> Fishbain DA, Cole B, Lewis JE, et al. Do opioids induce hyperalgesia in humans? An evidence-based structured review. Pain Med. 2009;10(5):829-839 [
> Guyatt GH, Oxman AD, Vist GE, et al. Rating quality of evidence and strength of recommendations GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008(Apr);336:924 [
abstract here].
> Hennekens CH, DeMets D. Statistical association and causation. JAMA. 2011(Mar 16);305(11):1135-1137.
> Herring, J. (2004). Criminal Law: Text, Cases, and Materials. 2004; Oxford: Oxford University Press: 58–64.
> Hill AB. The environment and disease: association or causation. Proc Royal Soc Med. 1965;58:295-300 [
available here]
> LII (Legal Information Institute). Complete text of Federal Rules of Civil Procedure. Cornell Univ Law. 2010 [
available here].
> Mulrow CD, Lohr KN. Proof and Policy from Medical Research Evidence. J. Health Politics, Policy, Law. 2001;26 [
article here].
> Ramasubbu C, Gupta A. Pharmacological treatment of opioid-induced hyperalgesia: a review of the evidence. J Pain Palliat Care Pharmacother. 2011;25(3):219-230.
> Tompkins DA, Campbell CM. Opioid-induced hyperalgesia: clinically relevant or extraneous research phenomenon? Curr Pain Headache Rep. 2011;15(2):129-136 [
> Wikipedia. Legal burden of proof [online

A listing of this entire Series on “Making Sense of Pain Research,” including a consolidated document in MS Word format and access to the PTCalcs Excel statistical calculator, is available [here].

eNotificationsl Don’t Miss Out. Stay Up-to-Date on UPDATES!
Register [here] to receive a once-weekly eNotification of new postings.