Thursday, May 26, 2011

Hierarchies of Evidence for Comparative Effectiveness - Reinforced by Unequal Quality of Publications?

Flowchart of four phases (enrollment, interven...Image via WikipediaThe ISPOR Annual International Meeting in Baltimore this week was dominated by workshops and sessions addressing comparative effectiveness research (CER) and implications for US health policy.  I moderated an issue panel addressing whether experimental or observational studies should have a higher rank in healthcare decisions.  We were finished addressing the trade-offs between the internal validity of experimental studies and external validity of observational studies when an audience member commented on the high prevalence of low quality observational studies in the scientific literature.  This moved the panel to an important topic that's not commonly discussed.

While observational research methods have evolved, the quality of observational data has not.  We still throw very complex and sophisticated statistical models at very crude administrative data.  Even worse, we make only cursory mention of the limitations of such datasets in our scientific papers.  In short, the quality of many studies is either poor or the description of such studies is not transparent.  There is a consensus statement that establishes standards for transparency and reporting of observational studies - the STROBE Statement (STrengthening Reporting of OBservational studies in Epidemiology).  Unfortunately, many papers lack the transparency necessary to evaluate study quality.

Clinical trials have a reporting standard through the CONSORT Statement, which was developed in the early 1990's.  Currently no decent medical journal will publish and results of a randomized controlled trial that does not adhere to CONSORT standards of transparency.  The STROBE Statement was published in 2007 and hopefully will soon become a universal standard for all credible medical journals.  Until the reporting of observational studies rivals the transparency of RCTs in the scientific literature, RCTs may continue to be granted higher status even when they are not the appropriate means of answering the question at hand.
Enhanced by Zemanta

Saturday, April 23, 2011

#Adversity, #Resilience, and #Leadership

Abraham Lincoln, the sixteenth President of th...Image via Wikipedia
Nearly all men can stand adversity, but if you want to test a man's character, give him power.
Abraham Lincoln
I came across this quote in my Twitter feed and immediately wondered whether in fact most people can handle adversity, particularly in leadership settings.  I'm not aware of the context of this quotation, but Lincoln appears to be stating that how one behaves when possessing power reveals their true character.  Others are less likely to hold a powerful individual accountable, so his or her internal checks and balances are the only buffer to avert abuses.  This is fairly self-evident and consistent with other commentaries on the Web.

On the other hand, adversity is also a test of character if we differentiate between those who endure it and those who master it.  Resilience is the capacity to "bounce back" from adversity and has been studied in the context of PTSD among Gulf veterans, survivors of natural disasters, and displaced workers. 

Those who only thrive under ideal circumstances are ill-suited for leadership roles in an increasingly unstable and tumultuous world.  Moreover, failing under adversity is not the kiss of death.  If one succeeds in delivering results and even thriving when times are hard, great.  If one fails under adverse circumstances, the key is how they deal with the consequences.  Scanning the web on keys to resilience, there are several common themes:
  • Develop and maintain caring relationships for support, advice, and companionship
  • Cultivate optimistic self-confidence that is only mildly exaggerated
  • Be playful, curious, and child-like
  • Develop a clear sense of purpose and personal mission
Maybe Lincoln found that growing up on the American frontier these traits were common.  Today they are increasingly rare, but those who develop these will thrive under adversity and use power effectively.

Enhanced by Zemanta

Monday, February 21, 2011

Confounding By Indication and Comparative Effectiveness

Histogram of sepal widths for Iris versicolor ...Image via WikipediaOne of the major criticisms of observational research is "confounding by indication", also referred to as "treatment selection bias".  A confounder is a factor associated with both exposure (treatment) and outcome without being part of the causal pathway.  Because selection of treatments is not random and is determined by patient and physician characteristics, the observed effect is influenced by factors other than the treatment.  It may not be possible to fully adjust for the effects of such confounding and one therefore doesn't know the "true" treatment effect without some additional work.  A simple example is intravenous versus oral antibiotics in hospitalized patients.  Patients receiving IV antibiotics are usually sicker than those receiving oral antibiotics,so we expect the former to have a higher unadjusted mortality rate.  Once we control for disease severity and other confounders that determine who receives IV vs. oral antibiotics, we have a more accurate assessment of the relative effects of each treatment on hospital mortality.

Confounding is one of the reasons that randomized controlled trials are considered by many to be the gold standard for answering questions of comparative effectiveness.  Randomization under double blind conditions prevents physicians from choosing treatments.  With a large enough study sample, randomization ensures that each treatment group is essentially identical, so any differences in outcome may be attributed solely due to the treatments.  However, in clinical practice physicians and patients may choose a medication of lower proven efficacy because it may be more acceptable for reasons specific to them.  Under such circumstances, a more efficacious medication that loses all effect after 1-2 missed doses may be less attractive than a less efficacious medication that has a good effect even with low adherence.  In an observational study, we may find that both medications are equally effective in clinical practice even if one is superior in a clinical trial.  At this point we can either throw out the observational result as invalid and require patients to be perfectly adherent, or acknowledge that assessing comparative effectiveness is complicated.

As scientists and policy makers continue to debate levels of scientific evidence, the obsession with determining which treatment is "really" better seems a little naive.  Few patients and physicians have the resources to ensure replicate clinical trial monitoring, treatment adherence, and clinical outcomes in daily practice. They are constantly forced to make choices based on limited evidence and resources, and are guided by personal values, none of which can be "controlled" to determine the "real" treatment effect.

In the debate between observational and experimental research design it is important to remember that another term for "treatment selection bias" and "confounding by indication" is clinical judgment.  Trying to "adjust" judgment out of all analyses seems neither credible nor desirable in a world of imperfect knowledge and people.

Enhanced by Zemanta

Sunday, January 16, 2011

Comparative Effectiveness - The Answer Depends on the Question

Fantasy american footballImage via WikipediaOne of the key challenges in Comparative Effectiveness Research (CER) is synthesizing evidence from multiple studies with varying designs to arrive at an estimate of a treatment's overall effectiveness.  This is the process of health technology assessment (HTA).  Systematic reviews are one means of accomplishing this and may lead to a full meta-analysis or mixed treatment comparison (MTC), a statistical model that uses head to head and placebo-controlled trials to estimate the relative efficacy of multiple treatments (sort of a Fantasy Football for healthcare treatments).

Systematic reviews are the core of HTAs and start with a specific question of interest to the reviewer.  A standard approach to all research questions is PICO, or "Patient, Intervention, Comparison, and Outcome".  By pre-specifying the patient population, intervention(s), comparator(s), and outcome(s) of interest, a search strategy can be developed and implemented in any of the available literature search engines.  For example, "Among cigarette smokers without co-morbid substance abuse or psychiatric illness (P), how does nicotine replacement (I) compare to cognitive behavioral therapy and 12-step programs (C) in achieving one-year tobacco abstinence and with respect to medical and psychiatric complications (O)?"  This question may then be refined to develop search terms to collect abstracts and manuscripts. 

A systematic review then requires a framework for evaluating the quality of the studies, including those that are informative and excluding those that are not.  The pre-specified search may involve inclusion and exclusion criteria based on a minimum sample size, randomization (for trials), and other design issues.  However, the resulting studies must be evaluated to rate the quality of evidence of each study.  The US Preventative Services Task Force (USPSTF) uses a hierarchy of study design as the starting and ending point for rating studies.  Properly conducted randomized controlled trials (RCTs) receive Class I designation as the highest quality evidence while non-randomized trials or cohort studies receive Class II designation.  Case series and expert consensus receive Class III designation.  The Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group employs a somewhat different approach to apply designations of High, Fair, and Poor quality.  Study design still dominates, with RCTs "starting" as High and observational studies Low quality.  However, a criteria-driven process may downgrade RCTs or upgrade observational studies.  Any study with a "fatal flaw" is designated as Poor quality.  Both of these approaches are based on the assumption that the internal validity offered by double blind randomization is the most important element in any question related to comparative effectiveness.

The Agency for Healthcare Research and Quality (AHRQ) takes a somewhat different stance in its Methods Guide for Comparative Effectiveness Reviews.  While acknowledging the inherent strength of RCTs to robustly answer questions of efficacy, the authors note that the strength of the design is only assessed in the context of the question at hand.  Specifically, they cite that the long-term safety of a new medication may best be assessed through an observational study.  Why?  Because RCTs tend to attract healthier patients without many of the co-morbidities and concomitant treatments that may affect the overall safety of the medication.  In other words, RCTs may have biased enrollment that prevents them from answering the question of real-world safety outside of controlled experimental settings.

I would extend this caveat to any questions of real-world effectiveness of treatments to achieve desired clinical outcomes in non-experimental settings.  For example, if physicians do not commonly prescribe the same dose studied in published RCTs or in the product label, systematic reviews that favor such studies may be irrelevant to the question of comparative effectiveness in current practice.  In one study, colleagues and I demonstrated that time to psychiatric hospitalization was longer with one antipsychotic compared to others.  This appears to have been influenced by the fact all of the medications tended to be initiated at sub-therapeutic doses.  The comparators tended to be dosed much lower that the intervention of interest.  A systematic review and meta-analysis of the RCT literature would have found no meaningful difference between treatments.  In real world practice, there was.

The experimental design of RCTs is our gold standard for determining whether a medication has sufficient biological activity to favorably affect disease.  Robust methods for comparing such effect sizes across different trials are available.  However, policy makers should be cautious in assuming that experimental results can be directly applied to current clinical practice.  If our question relates to real-world practice settings, observational methods may be the most appropriate design to answer the question.

Enhanced by Zemanta

Saturday, January 1, 2011

New Year's Resolutions for US Healthcare

New York Times Square New year celebrations in...Image via WikipediaThe ball has dropped, the ACA is law and, despite challenges on both sides of the aisle, the implications of health reform are emerging.  These are a few of the long overdue promises the US healthcare system is making to us.

1.  Expanded health insurance coverage  While the promise is coverage for all citizens, it seems that high risk pools are being underutilized and the individual tax penalties for non-coverage are being challenged in court.  However, broader Medicaid eligibility criteria will certainly enable more people to enroll in this safety net health insurance program.

2.  Commitment to healthcare quality
Accountable care organizations (ACOs) are a key element of the ACA, driving provider/hospital alignment and managing incentives to increase the practice efficient, effective medicine.  The "Quality Chasm" described by the Institute of Medicine in 2001called for a radical redesign of healthcare processes and incentives to ensure that practice patterns are consistent with the best scientific evidence.  Only a decade later and we're finally on our way.

3.  Evidence-based medical practice
The Patient-Centered Outcomes Research Institute (PCORI) owns the challenge of prioritizing and defining the process for evaluating the effectiveness of different healthcare interventions.  Comparative Effectiveness Research (CER) is the process of evaluating various alternatives, including watchful waiting, and assessing the relative benefits and risks.  The goal of CER is to provide actionable information to patients, providers, payers, and policy makers that lead to effective, informed, real world decisions.  This goes far beyond reviewing randomized controlled trials, which have limited applicability to most clinical practice.  A key challenge of the PCORI Methodology Committee will be to create a framework for integrating evidence from a variety of study types, and to inform the PCORI on the suitability of specific study designs to answer specific questions.

4.  Managing healthcare spending
There is broad agreement that current healthcare spending trends are unsustainable.   Moreover, the Dartmouth Atlas provides disturbing evidence that higher spending is not associated with better outcomes.  Rather than promoting radical spending cuts and arbitrary restrictions on access to treatments, the ACA, through PCORI and ACOs, aims to squeeze unnecessary, ineffective, and harmful costs out of the system.  Of course, this requires significant up front investment to develop the systems, processes, evidence, and competencies to ensure that healthcare dollars are spent wisely.  The European focus on cost-effectiveness of individual interventions relies heavily on trial-based evidence generated under near-ideal conditions, i.e. the patient is clearly diagnosed, has few co-morbidities, is motivated for treatment, and selected based on the best potential to respond.  Economic analyses of care processes, quality improvement, and other systemic changes are likely to generate greater cost savings than managing access to individual interventions.

These New Year's resolutions are not those of the US government, the AMA, or state health authorities.  They are commitments for all of us who claim to have an interest in affordable, timely, effective healthcare that extends and enhances human life.  We may not agree with every provision of the Affordable Care Act, but it does challenge us to join forces to tackle these longstanding problems in US healthcare with rigor and discipline.

Enhanced by Zemanta