Friday, April 1, 2011

Validity, Reliability, & Bias in Pain Research

Part 3 – When Bad Things Happen to Good Research
Critics of medical research have proposed that many wrong, or at least unreliable and invalid, therapeutic answers are being generated due to biased studies that are poorly designed and use inappropriate analyses. The pain field is no exception, even though the underlying research may seek answers to important clinical questions that are of value. Understanding potential sources of bias in pain research is vital for assessing reliability and validity of the outcomes.

The first UPDATE article in this series [here] noted some basic principles for understanding limitations of pain research and its interpretation. The second article [here] introduced the importance of evidence-based medicine and presented a hierarchy of research evidence for consideration. Before moving on to the mechanics how to better assess and use data within a study report, there are several other considerations for determining whether a pain research study is worthwhile and potentially meaningful for application in clinical practice.

Troubling Truths About Medical Research

Distinguishing an effective treatment from a less effective or ineffective one — called “assay sensitivity” — is a major challenge of pain research [Dworkin et al. 2011]. Planning and conducting research is so complex and inherently imperfect that even potentially good research studies can go bad because of faulty design at the outset or unanticipated factors that drive outcomes in unexpected directions.

Furthermore, one of the most ardent critics of research quality, John P.A. Ioannidis — an epidemiologist at Tufts University School of Medicine, Boston — contends that studies can be designed and/or interpreted in ways that make essentially ineffective treatments look like lifesavers while genuinely helpful treatments may be underrated as being only marginally beneficial. He further proposes that false claims in research reports are commonplace and, for many published studies, outcomes proposed as evidence for some effect or lack thereof may simply reflect biases. Even worse, he suggests, highly biased stakeholders (ie, prominent researchers, major institutions, or government agencies) can create barriers that deter efforts at obtaining research results and disseminating viewpoints that oppose their own prejudiced perspectives [Ioannidis 2005].

Why do these problems come about? Ioannidis proposes some provocative possibilities [excerpted and quoted from Freedman 2010]…
We think of the scientific process as being objective, rigorous, and even ruthless in separating out what is true from what we merely wish to be true, but in fact it’s easy to manipulate results, even unintentionally or unconsciously. At every step in the process, there is room to distort results, a way to make a stronger claim or to select what is going to be concluded. There is an intellectual conflict of interest that pressures researchers to find whatever it is that is most likely to get them funded.

To get funding and tenured positions, and often merely to stay afloat, researchers have to get their work published in well-regarded journals, where rejection rates can climb above 90 percent. Not surprisingly, the studies that tend to make the grade are those with eye-catching findings. But while coming up with eye-catching theories is relatively easy, getting reality to bear them out is another matter.

Imagine, though, that five different research teams test an interesting theory that’s making the rounds, and four of the groups correctly prove the idea false, while the one less cautious group incorrectly “proves” it true through some combination of error, fluke, and clever selection of data. Guess whose findings your doctor ends up reading about in the journal, and you end up hearing about on the evening news?

Researchers can sometimes win attention by refuting a prominent finding, which can help to at least raise doubts about results, but in general it is far more rewarding to add a new insight or exciting-sounding twist to existing research than to retest its basic premises — after all, simply re-proving someone else’s results is unlikely to get you published, and attempting to undermine the work of respected colleagues can have ugly professional repercussions.

Even when the evidence shows that a particular research idea is wrong, if you have thousands of scientists who have invested their careers in it, they’ll continue to publish papers on it. It’s like an epidemic, in the sense that they’re infected with these wrong ideas, and they’re spreading it to other researchers through journals.
While there is probably much truth in what Ioannidis says, there also is much pessimism. It might seem tempting to adopt a glum outlook that distrusts and discards all research; however, there is much to learn and be gained for better patient care by developing an educated and healthy skepticism — selecting only the best research and using it; letting the rest go unread. It begins by knowing how to judge quality in pain research.

Assessing Validity & Reliability of Pain Research

Questions of validity and reliability consider whether reported research outcomes represent the most accurate direction and size of treatment effects. Basically, can the research be trusted for clinical application? Validity and reliability may be conceptualized as follows [refs in Leavitt 2003, except as noted]:
  1. Internal Validity is the degree to which a research outcome or result is likely to be correct and free of bias. It refers to the appropriateness of methodology and analyses of observed effects that are applicable to the subjects in the particular study (as opposed to external validity).

  2. External validity — also called generalizability, relevance, or transferability — is the extent to which the outcomes/results of an investigation might be expected to occur in typical pain management settings and/or apply to populations beyond those included in the study.

  3. Reliability — can only be assessed by replication of an investigation, producing similar outcomes/results and, thereby, confirming validity of the findings. If the study under review never has been repeated in any fashion, the reliability of the results and their relevance for clinical decision-making are less certain. Even when studies are replicated they may be different in so many ways as to make comparisons difficult or impossible, and this is common in pain research.
Validity is determined largely by examining a study’s outcomes and sources of potential bias. Outcomes are determined by measurements or observations of endpoints, and these should be specified in advance as part of the study design or protocol to avoid bias in analyzing the data.

Essentially, endpoints determine the “payoff” — the results confirming or contesting the hypotheses of the study. Researchers generally specify the most important endpoints as being “primary” and those of lesser importance for purposes of the study as “secondary.” However, from a quality of evidence perspective there are two types to consider:
  • Primary/Observable Endpoints can be quantitatively measured, and they directly and objectively portray the targeted outcome or result of interest. For example, retention in treatment can be directly observed and measured in days, weeks, or months; similarly, the amount of medication consumed can be accurately measured. Pain research presents challenges in devising and measuring primary endpoints that dependably represent clinically important outcomes, with results that also can be replicated in followup studies.

  • Surrogate Endpoints are indirect measures, serving as “markers” of an outcome of interest, and are very common in pain research. The most problematic is assessment of pain and its relief, since pain is a subjective experience and self-reported scores (eg, on a pain scale or questionnaire) cannot directly and objectively measure this variable. Similarly, urine drug testing (UDT) is often used as a surrogate marker of medication compliance, but there can be problems with variable metabolism of drugs affecting UDT outcomes and it does not measure the exact amount and times that medications are taken.

    Many other endpoints in pain research also raise concerns about their validity and reliability, such as those assessing: a) symptomatic effects (eg, fatigue, somnolence); b) functional status (eg, activities of daily living); c) psychological affect (eg, depression, anxiety); or, d) social outcomes (eg, family relationships). At the least, there should be a complete description in the study report of how such endpoints accurately and dependably reflect status changes in study participants.
Some recent trials in pain research have used novel approaches for managing endpoints. For example, a “composite endpoint” might be used that integrates data from multiple variables, such as pain intensity plus rescue medication use, or pain score change plus physical functioning [Dworkin et al. 2011]. Such measures may show greater responsiveness to treatment than each variable taken alone, but there also have been concerns that the benefits of treatment may be overstated by this approach and, thus, of questionable validity.

Another approach is “responder analysis.” That is, endpoint data analyses focus only on those subjects who benefitted most from the experimental treatment [Dworkin et al. 2011]. This approach may demonstrate positive outcomes and help to facilitate a successful trial; however, readers need to carefully consider whether the data may have been overly “enriched” and if such responders are typical of their patients — is there external validity?

Potential Bias in the Publication of Research

Bias can play an insidious role in when, where, and why research gets published and/or the promotion of research to healthcare professionals and the public. It must be noted, however, that bias is usually distinct from prejudice, which would include presenting purposely inaccurate, unsupported, distorted, or slanted (one-sided) interpretations of facts or data to manipulate audience perceptions and opinions. Prejudiced communications are considered unethical and are fortunately rare in mainstream pain literature. Aside from this, there are several areas of concern when it comes to the communication of research.

Publication Bias
There are many types of bias relating to which studies ever appear in print and, consequently, how a body of research evidence comes to be perceived. To begin, investigations with statistically significant positive results — measuring up to expectations and/or favoring the Experimental treatment — are more likely to be submitted for publication than those with negative or equivocal outcomes. And, journals are more prone to accept for publication articles reporting positive outcomes. This sort of publication bias can make it appear that certain pain treatments are more effective than actually might be the case.

Sometimes, there are legitimate reasons why trials fail and should be discarded (perhaps, they were ill-conceived at the outset, or could not enroll sufficient numbers of subjects); however, not knowing about such attempts to demonstrate efficacy of a treatment can mislead other researchers and the public. There is little the average consumer of research can do about this except to be cautious of individual trials that have not been replicated and might represent only the latest iteration of a research approach and treatment that had previously failed.

Another, more subtle form of bias is the persistent publication by some journals of low quality research that is poorly designed, enrolls too few subjects to achieve proper statistical significance, and is inadequately reported. As Marcia Angell, former editor-in-chief of the New England Journal of Medicine (NEJM), has admitted:
“Let me tell you the dirty secret of medical journals: It is very hard to find enough articles to publish. With a rejection rate of 90 percent for original research, we were hard pressed to find 10 percent that were worth publishing. So you end up publishing weak studies because there is so much bad work out there. Doctors are not skeptical enough about what they read in top journals.” [ref. in Leavitt 2008]
If a top-tier publication like NEJM has problems filling its pages with worthwhile research, one can only imagine what happens at the many journals of lesser quality. In some cases, perhaps they provide a haven for articles that have been rejected by or were never submitted to the scrutiny of more demanding editors and peer reviewers.

It also is important to consider that, even in the best of circumstances, it can take years from the time of data gathering until a study appears in print or on a journal’s website (called “time-lag bias”). Consequently, the latest revelations or innovations appearing in today’s journals may be completely overruled by studies still waiting their turn in the publication pipeline.
Pre-publication Bias
It seems traditional for researchers to announce early results of their work at conferences, either in “poster sessions” or live presentations. These abstracts of work in progress suffer from “pre-publication bias”; that is, they are not peer reviewed and may be so preliminary and speculative that further investigation or analysis of the work may completely reverse or negate the outcomes. One review found that 1 in 5 human studies presented at conferences involved fewer than 30 subjects, and 3-years later only half of the abstracts had been published in journals as full articles while 25% were never published at all [Schwartz et al. 2002].

Often, the preliminary results may be hyped in press releases and/or appear in published conference abstracts (which then may be erroneously cited as legitimate evidence by future researcher-writers). While there may be justifiable reasons for informing colleagues of ongoing research, and it certainly elevates the esteem of the researchers, there also is a real danger of propagating the dissemination of faulty evidence. Readers should be cautious about accepting such communications as being in valid.
Sources of Bias in Research Design & Reporting

There are many possible sources of bias in pain research studies themselves, which may be broadly defined as anything that potentially distorts comparisons between groups under investigation and influences invalid conclusions. Bias almost always exaggerates effect sizes to favor the experimental treatment under investigation while lowering the quality of the evidence. Since all research is inherently imperfect, the question is not if any individual study reflects bias, but how much, and whether the particular biases are sufficient to eclipse the internal and/or external validity of the results. Following are some of the more common forms of bias in pain research [refs. in Leavitt 2003, except as noted]:

Participant Selection
When two or more groups of subjects are compared, it is important for them to be as similar as possible at the outset. However, even when study-group composition is equivalent at baseline, the included participants may exhibit selection or sampling biases that could affect external validity (eg, the study might include only females, or older persons, etc.).

Just as important is who is excluded from a study. For example, a trial might exclude patients with severe forms of a pain condition in question, or those who do not respond in certain ways to a class of drugs, or who are not compliant with particular treatment regimens. Such approaches for limiting participation may facilitate more efficient study designs, and produce more dramatic outcomes, but they limit the external validity of conclusions. Readers must always consider if study subjects are typical of their patient population in the everyday world.
Confounding Factors
Many researchers use subject inclusion/exclusion criteria to reduce the presence of extraneous factors — or “confounding” variables — that may bias outcomes in some way. Such confounders might prevent the outcome of interest from occurring or cause it to occur when it otherwise might not. For example, response to an experimental analgesic could be altered by subjects concurrently taking over-the-counter pain medications.

In some cases, patients with known potential confounding factors might be included in a study, since they would be more representative of a typical population. Statistical adjustments to the data to account for confounders that are known and measured can then be made by using sophisticated techniques; however, it makes data computation and interpretation more complex and potentially less precise. Still, most troublesome for researchers is the nagging possibility that unknown confounding factors may be present and bias their results.
The essential principle of randomization is that any research subject has the same and equal chance of assignment to any study condition or group. Randomization is the only way to control for confounders that are unknown or not measured. However, many research designs in pain management do not use randomization, and the likely influence of potential confounders and associated bias in such cases always needs to be carefully considered. Non-randomized study designs tend to overestimate the effects of pain treatments; although, the extent and even the direction of this bias is often impossible to predict.
Blinding (Masking)
Patients, clinicians, and other study personnel who are aware of just who is and who is not receiving a therapy or intervention of interest — an “open label” design — are likely to form opinions about its efficacy. Such opinions, whether optimistic or pessimistic, may systematically distort clinical interactions and other aspects of treatment, and bias the reporting of outcomes.

The most effective way of avoiding such bias is by double-blinding (also called “double-masking”), in which neither subjects nor study personnel know who is in the Control or Experimental groups. When outside study evaluators are involved and also blinded it is called “triple-blinding.”

In pain research, blinding can be difficult if not impossible to achieve. Subjects can often correctly guess whether they are receiving an active drug or placebo, based on effects or side effects. With invasive interventions — eg, acupuncture, injections, surgery — group assignment is often apparent to both investigators and subjects. Researchers should acknowledge and address limitations or problems of blinding in their reports.
Placebo Effects
One of the most insidious forms of bias occurs when a favorable response to drug therapy — regardless of whether it is active medication or an inert placebo — is attributable to the mere expectation of some benefit. This is called the “placebo effect,” which is a potent force in pain research [discussed in a series of UPDATES here]. Conversely, a “nocebo phenomenon” also has been described in which subjects receiving either active treatment or placebo may report negative outcomes due to the anticipation of adverse effects or other harmful consequences.

The only way to control for placebo and nocebo influences is via effective double-blind research designs. At that, the inactive placebo treatment must be completely identical to the active treatment in all aspects. And, whether or not a truly inert and inactive placebo exists in pain research is still being debated.

Overall, the use of placebos has raised ethical concerns. Some authorities have argued that placebo administration is not appropriate if an effective comparator treatment for a pain condition exists and can be used as a control. The other side of the debate contends that, when assessing efficacy of a treatment, it must be acknowledged that some persons have favorable outcomes without any treatment at all and only an expectedly inactive placebo condition would control for that effect.
Run-in Periods
Some studies of drug therapy for pain have used “run-in periods” involving a time before the “official” trial begins when the active treatment under investigation is given to test potential participant response. This may serve a role in screening out unresponsive or potentially noncompliant subjects. Alternatively, a run-in period may be used to wean participants off of all other drugs or interventions that might compete with the active treatment in the study and distort outcomes. In pain research, run-in periods may result in high rates of pre-trial dropouts or ineligible subjects, and the remaining participants may not be representative of typical patients (ie, external validity).
Compliance & Followup
Participant adherence to the study protocol (ie, the plan for conducting a study specified in the methodology section of a report) and trial completion are essential ingredients for valid outcomes and conclusions. For various reasons, some subjects will disregard instructions and/or drop out during the course of a trial; having more than 10% of participants lost to followup may be cause for concern, and greater than 20% may invalidate study results.

The problem is that reasons for subject noncompliance or dropout are often unknown. Adverse events (side effects, incapacitation, or even death) can be a cause; conversely, subjects may be doing so well that they never return for followup assessments. However, enrolling only patients with the most potential for full compliance and participation raises questions about selection bias and external validity.

Another important concern is the duration of followup. A study must continue long enough for the effect of the treatment, whether drug therapy or other intervention, to be reflected in the outcome variables. For practical and economic reasons, many pain research studies may be too brief to account for outcomes that could take many months or even years to be fully realized.

Whatever the length of the study, for purposes of determining validity, every patient considered for and entering a study should be accounted for in the study report. Statisticians sometimes use novel ways of accounting for missing data from participants who drop out early or simply do not show up for all appointments during the study, and these should be clearly explained in the study report so readers can judge whether the approach was reasonable and valid.
Intention-to-Treat vs Per-Protocol Analyses
When assessing outcome data, the most rigorous approach is called an Intention-to-Treat (ITT) analysis. That is, all patients who enrolled in a study with the intention of being treated are included in statistical analyses as if they received a full course of treatment; even if they dropped out early, were not adherent with instructions, or deviated from the protocol in other ways.

It is assumed that the ITT approach reflects most accurately how patients might act in everyday clinical practice. In randomized trials, dropouts or other departures from protocol for reasons unrelated to the treatment itself would be, on average, equally distributed across the groups. Therefore, any differences in outcomes could likely be due only to effects of the treatment. However, the higher the rates of noncompliance or discontinuation, the greater the likelihood that ITT analyses will produce distorted conclusions about treatment efficacy.

A different strategy is a “Per-Protocol” analysis of data (also called an endpoint, on-treatment, efficacy, or as-treated analysis). For this, researchers take into account only those “good participants” who complete all, or a specified proportion, of the study and are compliant with the study protocol to at least a certain degree. Those who dropped out early, did not adhere to instructions or attend sufficient study sessions, or otherwise departed from the protocol in significant ways are excluded from the analysis.

A per-protocol analyses can bias results in favor of the experimental treatment, but it makes sense when the groups are not randomized to begin with or many subjects are lost to followup. Per-protocol analyses also might be justified in pain research if outcome success hinges specifically on retention in treatment and compliance with the protocol regimen.

Authors should carefully explain their strategies for data gathering and analyses, and ideally present both ITT and per-protocol analyses in their reports. Readers can then more aptly judge the validity of results and conclusions.
Statistical “Overkill”
There appears to be a trend in much of the pain research literature favoring what might be called a “statistician bias.” That is, extensive and elaborate presentations of data and complex statistical analyses are included in the report, which often are inadequately explained and appeal to statisticians or academic researchers rather than typical readers. At that, some reviews have found serious statistical errors in 20% to 40% of the articles examined [Altman 2002].

Albert Einstein once wrote, “Any intelligent fool can make things bigger and more complex... it takes a touch of genius and a lot of courage to move in the opposite direction.” Unfortunately, many authors (and journal editors) seem to believe that the numerical complexity and elegance of their data presentations are a testament to the thoroughness and importance of the research. If the purpose of a published research report is to communicate study results in ways that healthcare providers can understand and put into action, this sort of bias is self-defeating — the quality of a pain research study is not ultimately determined by the quantity of data or statistical analyses.
Researcher-Author Conflicts of Interest
All reputable journals require authors to declare if they have any conflicts of interest that might have biased their research or its interpretation. Such influences might include, for example, funding from or investments in companies that might financially benefit from the pain therapies or interventions being examined in the research. Usually, any potential conflicts are disclosed in the published research report.

Often, such acknowledgements are glanced over by readers, but it can be important to take potential conflicts of interest into account when critically assessing and accepting the research as being valid. Due to the pressures of vested interests, even the most disappointing research outcomes might be “spun” in a more positive light via the creative presentation of data and statistics. This is not to suggest that the authors are being dishonest; rather, it may reflect a natural human tendency to optimistically emphasize the positive while downplaying negative aspects of the research in question — especially when support for future research endeavors may be at stake. Readers need to exercise added caution when considering the validity of such research.

The above discussion, though lengthy, is not all-inclusive; other, subtle forms of bias may creep into research studies and reports, sometimes unknown to the researchers themselves. To become a critical and intelligent consumer of pain research it is necessary to understand both the good and the bad of what can happen during the research process. Being aware of the potential for biases, flawed study designs or analyses, and inappropriate reporting can help in avoiding untrustworthy research and selecting publications that provide the best evidence for improved patient care.

To be alerted by e-mail of when further UPDATES articles in this series are published, register [here] to receive once-weekly Pain-Topics “e-Notifications.”
> Altman DG, Poor-Quality Medical Research: What Can Journals Do? JAMA. 2002;287(21):2765-2767 [
> Dworkin RH, Turk DC, Katz NP, et al. Evidence-based clinical trial design for chronic pain pharmacotherapy: A blueprint for ACTION. Pain. 2011(Mar);152(3suppl):S107-S115 [
access by subscription].
> Freedman, David H. Lies, Damned Lies, and Medical Science. The Atlantic [online]. 2010(Nov) [
full article here].
> Ioannidis JPA. Why most published research findings are false. PLoS Medicine. 2005;2(8):e124 [
full article here].
> Leavitt SB. Can Pain Medicine Research Be Trusted? Pain-Topics e-Briefing – 2008;3(1):1-5 [
PDF here].
> Leavitt SB. EBAM (Evidence-Based Addiction Medicine) for Practitioners. Addiction Treatment Forum. March 2003 [
PDF here].
> Schwartz LM, Woloshin S, Baczek L. Media Coverage of Scientific Meetings: Too Much, Too Soon? JAMA. 2002;287(21):2859-2863 [