Friday, January 21, 2011

Data Mining Fallacies in Pain Research Revisited

EBPM LogoWhy should you care about “data mining” in pain research studies? Simply because this may be a wave of the future and it could produce a flood of misleading “pseudoscience” that might be more of a hindrance than a help in furthering better care for patients with pain.

Writing in a recent edition of the Journal of the American Medical Association (JAMA), researchers from the University of South Florida, Gainesville, remark that legislation in the United States incorporated comparative-effectiveness research (CER) as a fundamental and vital scientific approach for helping to improve health care [Djulbegovic and Djulbegovic 2011]. There is a particular interest in discovering which treatments work best in “real world settings,” and this appears to encourage observational studies that use “data mining” techniques incorporating electronic health records, the authors believe.

Data mining, as described by the authors, involves analyzing a body of existing data from different perspectives to reveal new patterns, trends, or correlations of interest. The approach relies on access to large repositories of patient data — such as from government agencies, insurance plans, or networks of electronic health records — and, in our opinion, its value for the pain management field must be cautiously considered.

In previous UPDATES we observed that retrospective analyses using databases of patient records have produced some noteworthy but potentially debatable results. For example, data mining research — sometimes called “data dredging” — discovered extremely low rates of opioid-use problems in patients with noncancer pain [posting on 1/14/11], that opioids incur greater risks of adverse events than NSAIDs in the elderly [12/15/10], that patients receiving higher opioid doses are 9 times more likely to overdose [2/13/10], and that a third of patients may be noncompliant with opioid therapy [9/11/10]. In each case, the authors’ findings and conclusions seemed reasonable and justified, until their research methodology and its limitations were examined more closely. It then appeared that some, albeit not all, of these studies might be more “pseudoscience” than externally valid and clinically relevant evidence.

As the Djulbegovics [2011] observe in JAMA, a premise is that, by examining large pools of information using various data mining tools, new discoveries will emerge. However, data mining is a retrospective process and the databases provide only an historic view of patients and treatments during a particular period of time. Medical practices may have changed since the data were collected and the databases do not reflect this. Furthermore, researchers often seek answers to questions (hypotheses) for which the databases were not designed or intended, so the outcomes are often burdened by informational gaps that may confound or bias conclusions.

Statistically, data mining approaches benefit from the added power of large sample sizes; however, the multiple analyses conducted may interject error when assessing the significance of results. For example, if one conducts 100 different analyses of any body of data (easy to do with computer technology), with a preset significance level of p<0.05, up to 1 in 20, or 5, of the analyses will likely be reported as statistically significant merely due to chance and might be erroneous. Yet, it will be those 5 outcomes that are featured in the published journal article. As others have observed, absolute certainty in medical research is an illusion; rather, assessing evidence is about understanding probabilities [ref here].

There is some benefit in data mining for generating new hypotheses — that is, discovering new questions — which can then be subjected to further, more rigorous scrutiny in targeted clinical trials. Another advantage of data mining is that, given a large enough pool of data from a diverse population, it can examine many discrete subgroups of patients; whereas, traditional clinical trials often provide data for an “average” patient within a single, carefully-selected subgroup. Again, however, such discoveries are best used as guides for further research rather than as definitive evidence in published reports.

Readers of pain-research literature relying on data mining approaches need to be cautious in accepting the findings as reflecting current clinical realities. And, data mining outcomes do not demonstrate valid cause and effect relationships; although, the results may be misinterpreted that way. Finally, authors rarely describe their approach as data mining or data dredging; rather, for example, one recent study was portrayed as a “post hoc exploratory analysis,” in some cases the investigations are merely called “observational studies.”

With modern, computerized analytical tools, data mining studies are relatively quick, easy, and inexpensive to do. It is likely that the amount of such research will increase significantly in the pain field and others, especially studies involving large administrative databases and electronic medical records. This will fill the pages of journals in the pain field; however, depending on how the outcomes are interpreted and put to use for guiding future research, this trend could be detrimental when it comes to improving pain care.

REFERENCE: Djulbegovic M, Djulbegovic B. Implications of the Principle of Question Propagation for Comparative-Effectiveness and “Data Mining” Research. JAMA. 2011;305(3):298-299 [extract here].