Good cost-effectiveness analysis design requires consideration of multiple elements in order to both adequately explore the relationship between costs and effects, and to determine the robustness of the conclusions and the comparability of the results to those of other studies. Each of these elements is outlined in Table 2-1 with reference to both the PCEHM and ATS guidelines, and discussed individually in more detail below.
The costs considered in a CEA can vary depending on whose perspective is considered. As an example, consider the issue of early discharge from the hospital after childbirth. From the hospital's or managed care organization's perspective, cost may be reduced by early discharge. In contrast, from a societal perspective, the cost savings for the health care system may be offset by additional costs to the patient, such as extra time off work for the husband who must stay home to care for the new mother. Cost studies conducted to date have often been hampered by a lack of consistent perspective either within or among studies. Failure to maintain a consistent perspective hampers comparison of results across studies and threatens the validity of the study itself. Both the PCEHM and ATS recommend adoption of the societal perspective when conducting cost-effectiveness studies.
This is an exceedingly difficult problem for CEA for a variety of reasons. First, information on outcomes usually comes from randomized clinical trials (RCTs), which often do not reflect the actual clinical practice of medicine. Conversely, the implications of a CEA are intended for real-world practice. A cost-effectiveness ratio is intended to capture the expected relationship between the costs incurred and the effects gained in actual practice. Conversely, an RCT is usually designed to maximize the likelihood of finding an effect. As such, an RCT can represent a rather idealized situation, which is quite distinctly different from the real world. For example, only specific patients may be selected, the dosage and timing of therapy will likely be optimized, and other aspects of care may be protocolized and carefully controlled. The effect size generated under such rigorous situations is termed a therapy's efficacy (or maximal effect). In the real world, the effect of a new therapy is likely diluted by less appropriate patient selection, changes in dosing and timing, and increased variability in other aspects of patient care. The effect of a new therapy under these real-world conditions is termed a therapy's effectiveness. The more RCTs are refined, the further removed they are from the reality of using a therapy in clinical practice.10 Thus, the relationship between cost and effect in some RCTs becomes increasingly distorted.
A cost analysis conducted using efficacy from an RCT might better be termed a cost-efficacy study, rather than a cost-effectiveness study. However, there are no clear guidelines on how to reduce the bias introduced by using efficacy data instead of effectiveness data. One possibility is to consider adding an open-label, open-enrollment arm to clinical trials in which a CEA is being conducted.11 However, this presents both many logistic and ethical problems. The more accepted alternative is to expose the cost model to varying estimates of reduced effect from those seen in the RCT during sensitivity analysis (see below).
Another problem encountered when determining effect or outcome is that the outcome measure evaluated in the RCT may not be directly relevant in the cost analysis. The PCEHM recommends, and the ATS agrees, that quality-adjusted life years (QALYs) be used as the units of effect, or utility. However, most RCTs in critical care use short-term (day 28 or hospital) mortality as the primary end point, and still others use indices such as “organ failure–free days” as outcome measures.12 Although short-term survival likely correlates with long-term quality-adjusted survival, the relationship is not explicitly clear. Whether there is any relationship between organ failure–free days and long-term quality of life is even less clear. A recent study by Clermont and associates showed that patients who develop acute organ dysfunction are at risk for poor long-term quality of life (QOL), but that the risk is largely due to poor baseline health status, and not directly to organ failure in the ICU.13
This problem is only slowly evolving. While the PCEHM recommends long-term outcome, a National Institutes of Health (NIH)–sponsored workshop on sepsis studies recommended day 28 mortality.12 More recently, the United Kingdom Medical Research Council workshop still recommended that day 28 mortality was an appropriate primary end point, but recommended follow-up to ≥90 days, and whenever possible to ≥6 months.14 Recent successful trials in sepsis reported mortality at widely varying time points: 28 days (drotrecogin alfa [activated]),15 28 days and one year (steroids),16 60 days (early-goal directed therapy),17 and a recent study of ARDS (Low Tidal Volume)18 reported mortality to 180 days.
Proponents of short-term outcome state that longer follow-up is too expensive and not necessarily related to the therapy being studied. Advocates of longer follow-up state that short-term survival, of indeterminate quality of life, and possibly with death a short time thereafter, is of little utility to society.8 They further argue that the ability to prioritize health care spending on the basis of value requires that we compare the long-term value of alternative programs for alternative disease processes. Many health care programs are administered, and/or have effects lasting, over a long period of time, making long term follow-up of patients enrolled in these programs essential.
There is currently relatively little long-term follow-up information on ICU patients. However, the available evidence does suggest that there is considerable mortality and morbidity beyond hospital discharge, supporting the notion that we should consider longer follow-up.19–21 Quartin and colleagues showed that continuing mortality occurs in sepsis patients for many months after discharge from the hospital.19 Studies exploring quality of life after ICU care have yielded conflicting results, but certainly several suggest considerable diminution of quality of life that appears to be sustained over time.22
Thus, until more evidence is available, studies of new ICU therapies upon which CEAs are to be performed should have some mechanism (e.g., a subset study or parallel cohort) to incorporate mortality follow-up for 6 to 12 months with an accompanying quality-of-life assessment.
Which costs should be included? Debates over this subject can be very contentious and can resemble debates over whether to give colloids or crystalloids to hypotensive patients. The subject is further complicated by economic terms such as direct versus indirect costs and tangible versus intangible costs. We will attempt to avoid using too many accounting terms and to suggest alternative ways to understand this issue.
Let us reconsider the cost-effectiveness ratio. It is a ratio of net costs divided by net effects. Thereby, regardless of whether the costs of any given element seem important, if they are distributed equally in both comparison groups, the net difference will be zero and we therefore need not worry about them. In other words, we need consider only those costs we believe to be relevant and likely to differ between the treatment groups. As an example, the PCEHM believed that the intangible costs of pain and suffering were relevant costs that should be measured in CEAs, but we have never measured these costs in any critical care CEA. Therefore, if a new therapy is unlikely to cause either more or less pain, then we can continue to ignore such intangible costs, even though they are considered relevant. The caveat here is that we have now made the important assumption of no difference in pain, which may or may not be true.
We have of course glossed over the term “relevant.” Which costs are relevant? All costs to society could be considered relevant from the societal perspective. Utilizing this perspective, one could argue that the costs of lost wages while a patient is sick are relevant. In response to this issue, the PCEHM recognized there are no correct answers. However, in order to promote standardization of CEA methodologies, they recommend inclusion of all health-related costs, and the ATS concurs that this is the current best approach. They also recommended including opportunity costs, and suggested that lost wages, not only as a postdischarge consequence of the illness but also during hospitalization, represents an example of an opportunity cost. Direct application of these guidelines to critical care is not easy. But one way to consider them is to think about a health care system without drotrecogin alfa (activated) and a health care system with the new therapy. We then need to include all possible cost elements that could differ between these two health care worlds.
Estimating, Measuring, or Guessing Costs
Not all costs included in a CEA are necessarily measured empirically. The CEA is a model that is often calibrated using estimates. Some of these estimates come from measurements. For example, the estimate of differences in the mortality rate between a drug and placebo often is derived from the effect of size in an RCT. Other estimates can be based on expert opinion, or some combination of measurement and opinion. For example, the cost of the actual therapy is usually unknown since the therapy is often not yet approved, and no price has been set by the company that manufactures the therapy. One is therefore forced to estimate on the basis of an educated “best guess,” perhaps with some knowledge of preliminary pricing from the company. While one might be alarmed at this notion of educated guesswork, it is important to appreciate that such estimates can be wildly erroneous, yet have minimal impact on the cost-effectiveness ratio. In order to test how sensitive a CEA ratio is to various estimates in the cost model, the completed CEA model is exposed to a rigorous sensitivity analysis (see below). In this way, we can decide to include many costs in a CEA, yet only measure specifically some portion of that total. As long as the estimated costs have little impact on the overall final CEA conclusions, the strategy regarding which costs to measure and which to estimate can be considered robust.
For How Long Do We Measure Costs?
When the cost of therapy is computed, the duration of the costs attributed to the therapy must also be considered. For example, if our new therapy allows more people to leave the ICU, but causes a higher incidence of renal failure requiring long-term dialysis, shouldn't all the costs of dialysis be attributed to that therapy? The answer is yes.
Although most intensivists do not accept this concept of blaming therapy received in the ICU for incurred long-term costs, it is difficult to argue to the contrary. In producing a survivor, one must also take responsibility for the cost of maintaining survival, which means following the cost streams for a significant length of time. Furthermore, if chronic renal failure leads to a lower quality of life, the new therapy will be doubly penalized, both for the cost of the dialysis and for the reduced quality-adjusted survival.
How Should Costs Be Measured?
For those costs that we choose to measure, we must decide what represents true cost. When we consider hospital costs, true costs are generally assumed to be those generated by formal cost-accounting mechanisms. For example, the cost of a complete blood count includes the wage rate for and time spent by the employee who drew the blood, the cost of the tube, and some tiny amortized fraction of the cost of the equipment upon which the test is run. However, detailed information such as this is rarely available as part of a CEA. Another frequently used approach is to collect hospital charges and adjust them by the hospital- or department-specific cost-to-charge ratios. The relationship between hospital charges and costs has long been a source of skepticism for physicians. However, recent work by Shwartz and associates comparing department-specific cost-to-charge ratio-adjusted charges to estimates generated from a formal cost-accounting system, found good correlation when assessing patients in groups.23 Agreement was much worse when comparing individual patients and when using hospital-specific ratios. However, CEAs rely on average grouped estimates of costs, and therefore department-specific cost-to-charge ratios appear adequate for estimating hospital costs.
Other proxy measures of cost, such as the Therapeutic Intervention Scoring System (TISS) or length of stay, can also be used.24,25 As stated above, their value will depend on how sensitive the conclusions are to variations in the relationship between these measures and true costs.
Defining Standard Care (Comparators)
When comparing a new therapy, the choice of comparator, or standard therapy, is also critical. For example, the cost-effectiveness ratio of a 1-year cervical cancer screening program is quite different than that of 2-year or 3-year programs. Similarly, a tissue thromboplastin activator has a different cost-effectiveness ratio when compared to standard acute myocardial infarction therapy with no thrombolytic therapy as opposed to standard therapy with streptokinase. The PCEHM recommended that the control therapy used for comparative purposes be the least expensive available standard therapy. However, in the field of critical care this view is currently changing. In the treatment of sepsis, should standard care include early goal-directed therapy, steroids, and/or drotrecogin alfa (activated), even though these may be expensive? If so, do we consider all treatments to be standard therapy, or just one or two? The ATS Guidelines recommend that standard care isn't always “best practice,” and that best practice should be the comparator of choice in critical care.
Discounting costs due to time is another important factor to consider when conducting a CEA. When we borrow money, we must pay it back with interest. This is because money is worth more now than it will be in the future. Therefore, $10 is more valuable now than $10 delivered at a rate of $1 per year for the next ten years. Thus, to pay back $10 that we just received over the next 10 years, we would be required to pay back more than $1 per year. Worldwide economic growth is occurring at approximately 3% per year, and therefore the PCEHM has recommended that all costs be discounted at a 3% rate per annum, and the ATS has agreed with this recommendation.
But what about effects; should they also be discounted? Are ten people living for one year more valuable than one person living for ten years? Although this issue may seem inhumane, consideration of this point is vital. Discounting costs without discounting effects will incur the Keeler-Cretin procrastination paradox wherein we would forever favor health care programs that take place some time in the future.26 This situation would have us forever putting off until tomorrow that which could be done today, and therefore we also discount effects at 3%, the same rate as costs.
Robustness and Uncertainty
When we perform an RCT, our primary conclusion is a statement of effect: did the new therapy change the outcome of interest? While it is highly likely that the outcome rates will be different (rarely would the mortality rates in both trial arms be identical), we rely on statistical significance to tell us whether the observed difference is due to a true effect of the therapy and not chance alone. We traditionally infer statistical significance when the p-value is <0.05. In this instance, we are 95% certain that the observed difference did not occur by chance alone. If we are interested only in effect, then we care only about which therapy arm is better, not how much better.
It is important to appreciate, however, that the p-value does not confirm the magnitude of effect. Consider the case of drotrecogin, for which a recent large RCT found a mortality rate in the treatment arm of 25%, as opposed to a placebo rate of 31%, with a p = 0.006.15 This does not mean that six lives are saved per 100 persons treated. Rather, it tells us that our best estimate is that six lives are saved. If we presume a binomial distribution around the mortality rates, we can generate confidence intervals around the two estimates. These confidence intervals might now tell us that new therapy saves between 2 and 10 lives per 100 persons treated, but cannot tell us where the true value falls within that range. The p-value simply confirms the likelihood that there are lives saved by the new therapy, not how many lives.
In CEAs, however, we must quantify the magnitude of effect (and cost) so that we can generate a ratio. The general principle is to first take our best point estimates of cost and effect to generate a base case. Thereafter, we vary all our measures and estimates across their range of probabilities (e.g., 95% CI) in order to determine the extent to which the cost-effectiveness ratio varies. This is a sensitivity analysis and can be done either with one or multiple variables simultaneously. In one respect, the sensitivity analysis can be considered analogous to the p-value in that it allows us to explore the robustness of our conclusions. In other words, if, despite varying several or all variables across their stochastic distributions, there is minimal change in the final ratio, then one can have considerable confidence in the CE ratio estimate.
Another aspect of the sensitivity analysis is that it can be used to determine which model estimates must be the most accurate. For example, the CE ratio may be exquisitely sensitive to the estimate of ICU costs, but relatively insensitive to the expected costs of postdischarge health care resource use. In this situation, one might need to measure ICU costs very carefully, yet rely only on approximate estimates of postdischarge resource use. A comprehensive sensitivity analysis can in fact often be considered more powerful than a p-value, because it can be used to graphically show all of the uncertainties inherent in the underlying assumptions of the CEA model.
Figure 2-3 shows the base case cost effectiveness and reference case cost-effectiveness ratio estimates for drotrecogin alfa (activated) generated by running 1000 simulations.27 This is a common graphic representation of the output from a rigorously conducted CEA. The x axis shows incremental effects and the y axis incremental costs. Quadrants to the right of the y axis represent where treatment with drotrecogin alfa (activated) was associated with a net gain in effect. Quadrants above the x axis represent a net increase in cost. The majority of the simulation estimates fall within the upper right hand quadrant, indicating a net gain in effect with an associated increase in cost (more costly, more effective). The dashed lines represent thresholds of cost, with regions below and to the right of the thresholds being more cost effective than regions above and to the left.
Cost effectiveness of drotrecogin alfa (activated) in severe sepsis. The figure shows the CEbase (left panel) and CEreference (right panel) distributions of cost-effectiveness ratios of the 1000 simulations with the corresponding 95% confidence ellipses generated by Fieller's method.31 Incremental effects are shown on the x axes and incremental costs are shown on the y axes. Quadrants to the right of the y axes represent regions where treatment with drotrecogin alfa (activated) is associated with a net gain in effects. Quadrants above the x axes represent regions where treatment is associated with a net increase in costs. The dotted lines are illustrative thresholds. Regions below and to the right of the thresholds are more cost effective than regions above and to the left of the thresholds. The ellipses are the smallest areas containing the average incremental costs and effects, with 95% confidence. Both distributions are predominantly in the “more costly, more effective” upper right quadrant, with the majority of simulations falling below the $500,000 per life saved and $100,000 per quality-adjusted life year thresholds. (Reproduced with permission from Angus et al.27)