The results (outcome) of cardiac surgery can be measured in several ways. The type of variable used as the measure of a particular outcome determines the statistical methods that should be used for its analysis. For example, some administrative outcomes are captured by continuous variables, such as hospital charges (in dollars) or length of stay (in days). Other outcomes are collected as categorical variables, such as discharge destination (eg, acute-care facility, specialized nursing facility, or home). Health-related quality of life is another kind of outcome, which is often measured by the Medical Outcome Study (MOS) 36-Item Short-Form Health Survey (SF-36),1 the Sickness Impact Profile,2 or disease-specific quality-of-life measures, and can be transferred to quality-adjusted life years (QALYs).3 Economic endpoints have been used increasingly, such as cost-effective ratio.4

However, the major outcomes of interest to clinicians are described by variables that indicate the occurrence of (usually adverse) events, such as death, stroke, infection, reoperation, etc. Statistically, we must differentiate between two fundamentally different types of events based on their timing: *early (one-time) events* and *late (time-related) events*. Different types of analyses are used for these two types of events. We divide the areas of statistical inquiry into three major categories based on the goals of the analysis: *s ummarize*, *compare*, and *model*. This chapter will describe and illustrate the statistical methods used most often in each situation.

In cardiac surgery, early events are those occurring within 30 days of surgery or before hospital discharge, whichever is later. By the time of the analysis, the early outcome of every patient presumably is known. Thus, every patient has a “yes” or “no” value for the event being studied, and an estimate of the probability of the event can be determined by the ratio of patients with the event to total patients, usually multiplied by 100 and expressed as a percentage.

Late events are those that occur after discharge and more than 30 days after surgery. The analysis of these events is complicated by two considerations. First, the time of occurrence must be taken into account because, for example, a death at 6 months will have a different effect on the analysis than a death at 6 years. Second, in the usual ongoing analysis, some patients will have experienced a late event, whereas others will not have experienced an event but are still alive and at risk for the event and may have it in the future. Their event status is termed *censored*, which means that it is known not to have occurred by the time of the latest follow-up. For example, a patient in the study who had surgery 5 years ago and is still alive has a time of death that is not yet known. But we have partial information about his or her survival time, namely, that it exceeds, or is *censored* at, 5 years. When dealing with censored data, it is necessary to use special statistical methods. It is not appropriate, for example, to summarize late mortality by a simple percentage such as the number of late deaths divided by the number of patients. Mortality varies over, and must be related to, postoperative time. The simple “late mortality” percentage in any series of patients will be 100% if the investigator waits long enough.

Statistics are derived for many purposes of varying complexity. The most common ones used for evaluating cardiac surgery are (1) to *summarize* the results from a single series, (2) to *compare* the results between two or more series or subgroups of the same series based on a single discriminating variable or risk factor, and (3) to construct a multivariable *model* that considers the simultaneous effect of many risk factors.

A study usually includes series of patients, and rather than listing a particular variable of interest for each individual, the first use of statistics is to summarize the variable for the entire group with a single, representative number (statistic). The sample *average,* or *mean value,* and the *median* (50th percentile) provide a measure of central tendency; the *standard deviation* and the *interquartile range* measure the dispersion of the individual values. A single-valued estimate such as this is called a *point estimate,* but, acknowledging the imprecision of a single estimated value, a range of estimate values should also be given. The *standard error* (SE) is a measure of the precision of an estimate, and a *confidence interval* (CI), a range of values for the estimate that is consistent with the observed data, can be constructed using its SE or in other, often better ways.

A study often consists of evaluating subgroups of patients who received different treatments. Thus, in addition to summarizing the outcomes from the subgroups, we are interested in comparing their summary statistics. To do so, we typically compute another statistic that combines data from both groups and that follows (often approximately) some known statistical reference distribution, such as a *normal* (bell-shaped) or other (*chi-square*, *t*, etc) distribution. We then see how extreme or improbable the value computed from our data would be in that reference distribution if there were in fact no difference between the two groups we are comparing. This probability is called the *p-value*, and when it is smaller than .05 (5%), the difference we have observed is said to be *statistically significant.* This value (.05) is a completely arbitrary number, although in practice it is applied almost universally. Note that the level of significance is inversely related to the sample size. A given difference, no matter how small, will be statistically significant (*p* < .05) if the sample size is large enough. Conversely, a large different will not be significant if the sample size is too small. Thus, the *clinical importance* of a given difference is more important than the *statistical significance* of that difference.

A different paradigm for making comparisons and constructing CIs that has gained much prominence with the availability of computing power and specific software is the *bootstrapping* technique.5 Instead of using an assumed statistical distribution, it generates many repeated random samples from the data themselves to produce the reference distribution.

Comparing the outcomes between two groups based on a single factor, as described in the preceding section, is called *univariable* (or sometimes *bivariable*) *analysis,* to distinguish it from *multivariable analysis,* in which several characteristics of each subgroup are considered simultaneously. Most clinical studies are based on observational data collected in the normal delivery of care, and the patient subgroups may differ with regard to several influential characteristics. A multiple regression analysis can determine the influence of the treatment under study on the outcome variable after simultaneously adjusting for these potential patient differences. The result is a statistical model that consists of the group of factors that significantly affects the outcome and which may or may not ultimately include the treatment being studied. Each factor is assigned a coefficient that indicates the amount of weight given to that factor in the model. The model hopefully gives us a fuller understanding of the interrelationship among the treatment being studied, the outcome variable, and other important risk factors. One can compute the expected outcome for any patient using these weights applied to the values of that patient's particular set of risk factors.

To describe and illustrate the statistical methods used most frequently in the cardiac surgical literature, we will use a historical data set of mitral valve replacements with Starr-Edwards (S-E) heart valves. Dr. Albert Starr and his group at the Oregon Health Sciences University and Providence St. Vincent Medical Center in Portland, Oregon, implanted 1255 S-E mitral valves in adult patients (age >20 years old) from 1965 to 1994.6 A prospective lifetime follow-up service was implemented for every patient. The total follow-up through 2002 was 11,621 patient-years, with a maximum of 37 years. (*Note:* In all figures in this chapter, we have plotted the curves only up the time where more than 20 patients were at risk because the estimates beyond that time are imprecisely determined.) Table 8-1 contains a summary of the selected four variables (*Disclaimer:* More variables ordinarily would be considered in a real study than in this simple expository exercise. The data set used in this chapter was chosen only to illustrate the statistical methods.) Several valve models are represented in this group: “FINAL S-E valve model” refers to the Model 6120 valve, which is the most recently used model of the S-E valve; “PREVIOUS S-E valve model” refers to all the other models, which were discontinued. Patients in the FINAL valve model group have a higher mean age, more valve re-replacement surgery, and more concomitant coronary artery bypass grafting (CABG).

Previous | Final | |
---|---|---|

| ||

Number of patients | 543 | 712 |

Mean age ± S.D. (years) | 53.0 ± 10.8 | 60.3 ± 11.3 |

Female (%) | 60.6 | 61.8 |

Re-replacement (%) | 4.6 | 6.9 |

Concomitant CABG (%) | 10.1 | 20.2 |

Early mortality | ||

Number of deaths | 25 | 58 |

Point estimate (%) | 4.6 | 8.1 |

Standard error (%) | 0.9 | 1.0 |

95% confidence interval (%) | ||

Normal approximation | (2.8, 6.4) | (6.4, 10.2) |

Exact binomial | (6.2, 10.4) | (3.0, 6.7) |

Comparison statistics |
| |

Pearson chi-square | 0.012 | |

With continuity correction | 0.017 | |

Fisher's exact test | 0.016 |

Most of the statistical methods described in this chapter are available in the commonly used statistical software package. The statistical analyses and graphics in this chapter were done using PASW Statistics 17 (SPSS, Inc., Chicago, IL), Stata 10.1 (Stata Corp., College Station, TX), S-PLUS 6.2 (Insightful Corp., Seattle, WA), and the open-source program R version 2.10.0 (R Foundation for Statistical Computing, Vienna, Austria, www.R-project.org).

The only functionality not found in standard statistical packages is cumulative incidence (“Actual”) analysis. Cumulative incidence in the presence of competing events is implemented in the Stata ado file *stcompet*.7 In addition, Stata version 11 has just included *stcurve* and *stcrreg* for computing and comparing cumulative curves.

Cumulative incidence is also available for S-PLUS and R with the package *cmprsk*.8 Finally, the NCSS Statistical Analysis System (NCSS, Kaysville, UT) does implement this function directly.

The *mean* (point estimate) operative mortality is computed as the number of operative deaths divided by the number of patients. Multiplying by 100 converts this decimal to a percentage (*P*). The SE of a proportion *P* based on *N* patients equals the square root of *P* (1 − *P*)/*N.* Thus, as shown in Table 8-1, the percentages of patients with early death are 4.6% (SE = 0.9%) and 8.1% (SE = 1.0%) for the PREVIOUS and the FINAL valve model groups, respectively. Table 8-1 also contains the 95% CI, computed by two popular methods. The first method is the simple (asymptotic) method based on the fact that the *binomial* distribution, which governs proportions, can be approximated by the normal (bell-shaped) distribution as the sample size increases.9 This CI is computed easily as the point estimate plus and minus twice the SE. A second method uses the (exact) binomial distribution directly.10 Although the “exact” method sounds like it obviously would be the most desirable, there are other methods that may have better statistical properties.11

To demonstrate a *univariable* comparison, operative mortality between the two valve model groups was used. This does not seem very interesting clinically because valve model should have little to do with operative mortality; nevertheless, many valve comparison papers attempt to draw clinical conclusions from just such questionable comparisons. Comparing two proportions gives rise to a matrix with two rows (for the two valve groups) and two columns (for the two possible outcomes) called a *two-by-two contingency table.* Several methods have been used to assess the significance of such tables.12 The most common method for extracting a *p* -value from such a matrix is the *(Pearson) chi-square test.* This test has an alternative, more conservative form using a *continuity correction*. Validity of the chi-square test depends on having an adequate sample size (technically, each cell of the table should have an expected size of at least 5), and when this is not the case, the *Fisher's exact test* is often used. All three tests find that the FINAL valve model has significantly higher operative mortality because the *p* -values are smaller than .05 (see Table 8-1).

The simple comparison above showed that operative mortality with the FINAL valve model was significantly higher than with the PREVIOUS valve model. But patients with the FINAL valve model were older and had more concomitant CABG and re-replacement operations (see Table 8-1). Could the apparent difference in operative mortality between valve models be a result of these patient characteristics, instead of the valve itself? We explore this possibility using a multivariable analysis.

For binary (dichotomous) outcomes such as operative mortality, the most common method for developing a multivariable model is *logistic regression.*13 In this model, operative death is the outcome (*dependent variable*), and patient characteristics, plus valve model, are the potential risk factors (*independent variables*). For technical reasons, logistic regression does not use the probability (*p*) of death directly as the dependent variable in the model. Instead, it uses the logarithm of the *odds**p* /(1 − *p*) of death. To facilitate interpretation of a regression coefficient (**B**) from such a model, the coefficient can be converted into an *odds ratio* (OR) by using the exponential function. Most statistical programs do this automatically, and the ORs are sometimes labeled exp(**B**). The 95% CI for the OR is computed as the exponential of the normal approximation CI (mean plus and minus twice the SE) for the coefficient itself.

A stepwise regression program begins with a univariable test of each potential risk factor,13 using a model with a single variable to get the OR and *p* -value associated with that variable. If the OR is greater than 1, that variable is a risk factor (meaning that it adds to the risk). If the OR is less than 1, it is a protective factor. For the heart valve example (Table 8-2), age, concomitant CABG, and valve model are statistically significant (their *p* -values are less than .05). These variables, plus any others showing a trend toward association with operative mortality (usually *p* < .2), would be included in the next step of the stepwise logistic regression. In the final regression model, only age and concomitant CABG are still significant (see Table 8-2). After those effects are accounted for, the effect of valve model is no longer significant (*p* = .515). Thus, by this analysis, the apparent increase in operative mortality in the FINAL valve model group seems to be an artifact; FINAL valve model is apparently a surrogate for older age and more bypass surgery, which themselves are primarily responsible for the increased mortality. There are no doubt other clinical variables to consider in this model, but because we used the data only for demonstration purposes, not all possible variables were included. As a rule of thumb, 10 events can support one risk factor considered in a risk model.14 In our data set, there are 83 operative deaths, so we would have been justified in considering about eight risk factors. In practice, researchers would reference the published models and study their own data to select more variables for consideration.

Univariable | Multivariable | |||||
---|---|---|---|---|---|---|

Variable | p-value | Odds Ratio | Coefficient | SE | p-value | Odds Ratio (95% CI) |

Age Concomitant CABG FINAL valve model Female gender Re-replacement | <.001 <.001 .014 .361 .314 | 1.05 3.78 1.84 0.81 1.52 | 0.045 1.016 | 0.012 0.251 | <.001 <.001 (.515) | 1.04 (1.02, 1.07) 2.76 (1.69, 4.52) |

The OR of a binary variable such as concomitant CABG (2.76 in Table 8-2) means that the odds of mortality for a patient having concomitant CABG are 2.76 times those of a patient not undergoing concomitant CABG. This is the point estimate; the interval estimate (see Table 8-2) ranges from 1.69 to 4.52. When the lower limit of the 95% CI is greater than 1 (as it is for concomitant CABG, ie, 1.69), the OR will be significantly greater than 1. For a continuous variable such as age, the OR of 1.04 means that for each year of age, the odds of an operative death are multiplied by 1.04.

The discrimination of a risk model is the ability to separate those who will have an event from those who will not. Traditionally, the discrimination is evaluated by the c-index, which is the area under the receiver operating characteristic (ROC) curve,15 This is the probability that a death will have a higher risk score than a survivor. Generally, a c-index between 0.7 and 0.8 is considered acceptable discrimination, a c-index between 0.8 and 0.9 is considered excellent discrimination, and a c-index greater than 0.9 is considered outstanding discrimination.16

Calibration is the measure of how close the predictions are to reality. For example, if 100 patients had risks of 5% from a well-calibrated model, then 5 of them would be expected to die. Calibration is evaluated by the Hosmer-Lemeshow (H-L) statistic, which computes the significance of the difference between the observed and expected events.17 If the H-L statistic is significant (*p* < .05), it may be a sign of poor calibration. For our final model in Table 8-2, the c-index is 0.710 (95% CI 0.653–0.767) and the H-L statistic is *p* = .365. These values can be considered optimistic, however, because the data used to generate the model also were used to test it. Ideally, one would use a different data set, or bootstrap resampling of the original data, to test the model.18 There are some technical issues with the H-L statistic. 19,20,21 Accordingly, in the next section we introduce a visual, continuous analog of the H-L test using the CUSUM methodology.

The predicted (expected) mortality from logistic regression can be used to compare the risk-adjusted performance between groups of patients, eg, to compare different surgical techniques or different providers. If the ratio of observed (*O*) to expected (**E**) mortality, the *O/E ratio*, is greater than 1, then there are more deaths than expected by the model, and if the *O*/*E* ratio is less than 1, there are fewer deaths than expected. The CI of the *O*/*E* ratio can be calculated by using a normal approximation method, which, as usual, gives a symmetric interval around the point estimate, or by using a logarithmic transformation, which provides a more appropriate asymmetric interval.22,23Table 8-3 contains these values for our heart valve example. The CIs for the *O*/*E* ratios for both groups include 1, which means that their risk-adjusted mortalities are not different from those predicted by the model.

Valve Model | Previous | Final |
---|---|---|

Observed mortality | 25/543 = 4.6% | 58/712 = 8.1% |

Expected mortality | 27.5/543 = 5.1% | 55.5/712 = 7.8% |

O/E ratio | 0.91 | 1.05 |

95% confidence interval | ||

Normal approximation | (0.55, 1.27) | (0.80, 1.29) |

Log transformation | (0.61, 1.35) | (0.83, 1.32) |

Odds ratio | 0.90 | 1.05 |

95% confidence interval | (0.60, 1.35) | (0.80, 1.38) |

Another method to compare the risk- adjusted performance between groups is using OR, which is technically more suitable. The OR is the ratio of the odds of observed O/(1-O) to the odds of the expected E/(1-E). An OR of 1 indicates that the observed death is equally likely to occur as predicted; an OR greater than 1 indicates that the observed death is more likely to occur than predicted; an OR less than 1 indicates that the observed death is less likely to occur as predicted. The CI of the OR can be calculated by using a likelihood-based method or, more easily, as an output from the logistic regression.24

*Cumulative sum* (CUSUM) analysis methods are often used to examine the performance of a provider across time, by plotting the cumulative sum of observed minus expected events as a function of surgery date.25 For a data set whose observed mortality exactly fits the expected, the line would lie along the horizontal line y = 0. When the CUSUM lies below the y = 0 line, it means fewer deaths were observed than were expected, and when the CUSUM lies above the y = 0 line, it means more deaths were observed than were expected. When the CUSUM is going up, it means the performance is getting worse than expected; When the CUSUM is going down, it means the performance is getting better than expected. Thus CUSUM can be used to detect a learning curve.26 The 95% prediction limits (point-wise 95% confidence intervals) account for the excursions from y = 0 that could be expected to happen by chance.27

CUSUM can be used for other purposes when using different variables for the x-axis. When the dependent (outcome) variable is dichotomous (death), it is difficult to appreciate its relationship to a continuous risk factor, eg, age, graphically. The CUSUM25 can be used to overcome this difficulty by plotting the CUSUM against age to give us a graphic view. This technique can also be used to examine the fit of a model, by plotting the cumulative sum of observed minus predicted deaths as a function of predicted mortality (Fig. 8-1). For a model whose observed mortality exactly fit the expected, the line would lie along the horizontal line y = 0. When the horizontal axis equals the predicted risk, the CUSUM could be thought of as a continuous version of the H-L test of model calibration, which is based on the differences in observed minus expected deaths in each of the 10 deciles of risk, shown by shaded vertical bars in Fig. 8-1.

###### Figure 8-1

CUSUM plot of operative death. Vertical axis is the cumulative sum of observed deaths minus predicted deaths by the logistic regression model in Table 8-2. The horizontal axis is scaled in number of patients (ordered by the predicted risk), so it is nonlinear in predicted risk of death. The blue/white bars each contain 10% of the patients.

We use both death and thromboembolism (TE) to illustrate methods for the analysis of time-related events.

A single percentage is adequate to summarize mortality at a single point in time, such as the operative period (see preceding). To express the pattern of late survival, however, requires a different estimate at virtually every postoperative time, ie, a survivorship function, whose plot is the familiar survival curve. The most common way to estimate a survival curve is the *Kaplan-Meier* (KM) *method,*28 called *nonparametric* or *distribution-free,* because it does not presuppose any particular underlying statistical distribution. If all the patients in a given series were dead, the survival curve would be very simple to construct, as the percentage that had lived until each point in time. The KM method allows these percentages to be estimated before all the patients have died in an ongoing series, using the assumption that patients who are still alive (whose survival time is censored) will have the same risk of future death as those who have already died. Figure 8-2 shows the KM survival curve for the mitral valve patients, with the 95% CI at 5-year intervals using the greenwood method.29 The median survival time can be calculated as the survival time at which the survival curve crosses 50% (blue line in Fig. 8-2). The mean survival time can be calculated as the area under the completed survival curve. If the observation is not completed, the survival curve could be fitted with a suitable distribution, such as a Gompertz distribution, that could be extrapolated to estimate the uncompleted part, or reported as a conditional mean as we did in Fig. 8-2 (blue area).

Besides survival curves, there are several other statistical functions that can characterize the distribution of a time-related event. Survival curves are the easiest to interpret and apply to a patient or a population because they integrate the possibly varying risks over time and produce the probability of being alive at each point in time. The *hazard function**h*(*t*) can be considered the fundamental building block of the other functions. The instantaneous hazard measures the risk of the event at each moment for an individual who is so far event-free. For technical reasons, the *instantaneous hazard* is difficult to measure directly, but its integral, the *cumulative hazard function,* is easy to produce either by taking the negative logarithm of the KM estimate30 or by computing it directly using the *Nelson-Aalen method.*31 This latter estimate can be derived from the two basic curves in the upper panel of Fig. 8-3. In the modern *counting process* formulation of survival analysis,32,33 these two curves are the fundamental survival processes. The red curve counts the number of patients still at risk at time *t* and is called the *at-risk process Y* (*t*). The blue curve counts the number of events that have happened by time *t* and is called the *event counting process.* The blue curve rises 1 unit when each event occurs. The cumulative hazard function is similar to the blue curve, except that it rises 1/*Y* (*t*) each time an event occurs. The lower panel of Fig. 8-3 shows the cumulative hazard function estimated this way and also as the negative logarithm of the KM curve, −log[*S* (*t*) ].

One property of the cumulative hazard function is that it will be a straight line when the event hazard is constant. It is this property that gave rise to the name used in the cardiac literature to measure event rates that are presumed to be constant over time. If the (instantaneous) hazard is a constant λ, then the cumulative hazard function *H* (*t*) is a linear function of postoperative time *t* with slope λ: *H* (*t*) = λ*t.* In the cardiac literature, this constant risk parameter is called a *linearized rate*. For a given event in a series of patients, the maximum-likelihood estimate of this rate is the number of events (**E**) divided by the total follow-up time (*T*) in patient-years: *E*/*T.* Multiplying this by 100 converts it to “events per 100 patient-years,” often abbreviated as *percent per patient-year* or *percent per year.* The SE is the square root of *E*, divided by *T*. Early events are usually not included in the calculation because the risk of most events is higher after operation, so the assumption of a constant hazard would not hold. Figure 8-4 shows that the cumulative hazard functions for two valve groups fit the constant hazard assumption fairly well, with a slight decline in the slope after about 10 years.

Table 8-4 contains the linearized late TE rates by valve model based on the number of late TEs and late follow-up (patient-years beyond 30 days). The normal approximation (the mean plus and minus two times the SE) yields approximate 95% CIs. A preferred approximation is based on a suggestion owing to Cox,34 which was recommended after a comparison of several methods.35 In this method, the upper and lower 95% confidence limits are given by the 0.025 and 0.975 quantiles, respectively, of the chi-square distribution with 2*E* + 1 degrees of freedom divided by 2*T.* Another general technique for producing CI that usually is found to have very good properties is the *likelihood-ratio method.*36 In our example, the CIs given by these three methods agree well, differing from each other only in the second decimal place. Note that the Cox limits and the likelihood-ratio limits are not symmetric around the point estimate, as the normal approximation limits are. (Cox's method also coincides with the probability interval produced by *Bayesian analysis*37 using a noninformative prior.)

Valve Model | Previous | Final | p-value (2-sided) |
---|---|---|---|

| |||

Number of late thromboembolisms | 129 | 191 | |

Late follow-up (patient-years) | 4341 | 4604 | |

Linearized rate | |||

Point estimate (%/year) | 2.97 | 4.15 | |

Standard error (%/year) | 0.26 | 0.29 | |

95% confidence interval (%/year) | |||

Normal approximation | (2.47, 3.48) | (3.57, 4.72) | .003 |

Cox's method | (2.49, 3.52) | (3.59, 4.77) | .003 |

Likelihood ratio | (2.49, 3.50) | (3.58, 4.78) | .003 |

The KM method is often used for events other than death. Figure 8-5 contains a KM TE-free curve for the mitral valve patients. When used for events such as TE that are not necessarily fatal, KM estimates the probability of being event-free given the unrealistic condition that death does not occur. But patients do, in fact, die before such an event has happened to them, so the KM event-free estimate is lower than the real (“actual”) event-free percentage. Another method, called *cumulative incidence* in the statistical literature38 and *“actual” analysis* in the cardiac literature, provides a mortality-adjusted event-free percentage.39–42 The CI of the “actual” curve can be calculated by *Gray's method.*43,44 The “actual” TE-free curve for this mitral valve series is much higher than the KM TE-free curve (see Fig. 8-5). Besides providing an unrealistic estimate of the probability of TE, there is another, more technical problem with using the KM method in this situation. Its use is justified only if the risk of future TE for patients who died TE-free would have been the same as for those who actually had a TE. But this assumption cannot be proven from the data, so that the KM TE estimate generally is regarded as statistically inappropriate.45

The log-rank statistic is chosen most often to compare survival (or event free) curves.46Figure 8-6 shows the survival curves for the PREVIOUS and FINAL valve models, including all deaths, early and late. The PREVIOUS model has significantly better survival according to this univariable comparison (log-rank test *p* = .042). But the difference is mostly because of early deaths; if we only consider late deaths, there is no significant difference between the two groups (log-rank test *p* = .172).

Comparing late TEs using linearized rates (see Fig. 8-4) also shows identical levels of statistical significance using three different methods of testing: a normal-approximation method, a method recommended by Cox,34 and a likelihood-ratio method47 (see Table 8-4).

Comparison of cumulative incidence curves, or their complement, the “actual” event-free curves, can be accomplished using special techniques.44

Analogous to logistic regression, which provides multivariable analysis of the simple percentages associated with operative mortality, there is a widely used method for assessing multivariable influences on late survival: *Cox proportional hazards regression*.14 This method assumes that the *hazard ratio* (HR) for all risk factors is constant over time. Table 8-5 shows the result of this regression as applied to the valve data for late survival. The univariable comparisons show three variables (ie, age, concomitant CABG, and female gender) to be significant. The final Cox model includes all five variables, although FINAL valve model and re-replacement were not significant by univariable analysis. This latter finding demonstrates that the practice of allowing only significant (by univariable analysis) variables to enter a stepwise regression may eliminate important risk factors.

Univariable | Multivarible Cox Regression | Multivarible Gompertz Regression* | ||||
---|---|---|---|---|---|---|

Variable | p-value | Hazard Ratio | p-value | Hazard Ratio (95% CI) | p-value | Hazard Ratio (95% CI) |

Age | < .001 | 1.04 | <.001 | 1.05 (1.04, 1.05) | <.001 | 1.05 (1.04, 1.05) |

Concomitant CABG | < .001 | 1.65 | .052 | 1.23 (1.00, 1.50) | .049 | 1.23 (1.00, 1.50) |

FINAL valve model | .172 | 1.1 | .03 | 0.85 (0.73, 0.98) | .03 | 0.85 (0.73, 0.98) |

Female gender | .005 | 0.82 | .005 | 0.81 (0.70, 0.94) | .004 | 0.81 (0.70, 0.94) |

Re-replacement | .135 | 1.25 | .036 | 1.37 (1.02, 1.83) | .036 | 1.37 (1.02, 1.83) |

*Additional parameters in the Gompertz regression: scale constant = −5.412, shape = 0.055.

The HRs for female and FINAL valve model are less than 1; this means that they are protective factors rather than risk factors. For each significant factor (*p* < .05) in the model, whether a risk or protective factor, the 95% CI does not include 1. *Note:* The univariable log-rank test shows no significant difference between the survival of PREVIOUS and FINAL valve models (*p* = .172); the multivariable Cox regression shows that FINAL valve model has a lower risk of death (HR = 0.85, *p* = .030). Thus, the univariable log-rank test and multivariable Cox regression give opposite results. This tells us that if the groups are not compatible, as in this example, a multivariable comparison should be preferred. When the groups are compatible, such as in a randomized clinical trial, univariable comparisons may be appropriate.

An analog of the Cox regression model can be used for regression analysis of cumulative incidence and “actual” event-free curves.43

The preceding sections discussed nonparametric methods and nonparametric or semiparametric regression models to describe the survival data, such as KM curves and Cox regression, which do not make any assumptions about the underlying distribution of the survival or hazard functions. Another way to deal with survival data is to use a *parametric method.* A family of distributions is chosen, and the data are used to select the best-fitting member of that family by estimating the parameters that define the distribution. Three popular distributions used in cardiac surgery research are the *exponential, Weibull,* and *Gompertz distributions.* There are several functions that can be used to characterize a survival distribution. We have already discussed three of them: the *hazard function h* (*t*), the *cumulative hazard function H* (*t*), and the *survival function S* (*t*). The hazard function can be considered the fundamental quantity from which the others are derived. It is the risk of the event at each instant of time *t* for a patient who is so far event-free. The cumulative hazard is the mathematical integral of the (instantaneous) hazard function. And the survival function is the exponential of the negative cumulative hazard: *S* (*t*) = exp[−*H* (*t*) ].

Figure 8-7 shows typical plots of these three functions for the *exponential*, *Weibull,* and *Gompertz distributions,* with a selection of values for their parameters. The upper row of plots contains the hazard functions *h*(*t*), the second row is the cumulative hazard functions *H*(*t*), and the third row is the survival functions *S*(*t*). Table 8-6 contains the formulas used for these functions. The exponential is the simplest lifetime distribution, having only a single parameter λ called the *scale parameter,* which is the (constant) hazard (“linearized”) rate. The Weibull distribution is a natural generalization of the exponential distribution that adds a second parameter α called the *shape parameter* to accommodate an increasing (*α* > 1) or decreasing (*α* < 1) risk. The cumulative hazard reduces to the exponential distribution (constant risk) when *α* = 1. The Weibull distribution is used commonly for time to failure and is employed in the cardiac literature to model structural deterioration of prosthetic heart valves. A Gompertz distribution48 has a scale parameter λ and a shape parameter α. Its hazard function is an exponential function of time. The Gompertz distribution is widely used to model survival, especially in older age groups. Figure 8-8 uses the late death information from the heart valve data to show the fits derived from these three distributions. The data fits the Gompertz distribution very well.

Exponential | Weibull | Gompertz | |
---|---|---|---|

Hazard function | λ | αλtα−1 | λeαt |

Cumulative hazard | λt | λtα | λ(eαt−1)/α |

Survival function | exp(−λt) | exp(−λtα) | exp[λ(1−eαt)/α] |

Mean time to event | 1/λ | Γ(1+1/α)/λ | ∫S(t)dt |

Median time to event | log(2)/λ = 0.693/λ | (log(2)/λ)1/α = (0.693/λ)1/α | Log(1+log(2)α/λ)/α |

Notes: Γ() is the gamma function.

There is not a simple formula for the mean time to event of the Gompertz distribution.

###### Figure 8-8

Three parametric distributions used to fit the Kaplan-Meier survival curve. The exponential (constant hazard) is that the least good fit. The Weibull, with an increasing (power function) hazard, fits the curve better than the exponential. The best fit is the Gompertz model, which has an exponentially increasing hazard.

Some of the advantages of the parametric method (over nonparametric or semiparametric methods) are:

- The hazard function itself can be portrayed easily, which otherwise requires many data points and a complicated smoothing technique.
- The survival curve can be extrapolated into the future (beyond the maximum follow-up time).
- The median time to failure can be given, which otherwise cannot be estimated until the event-free curve reaches 50%.
- The mean time to failure can be given, which otherwise cannot be estimated until the event-free curve reaches 0%.
- The resulting curves may reproduce the underlying mortality process, which is no doubt smooth over time, more faithfully unlike the random roughness of the original observed data points, which results in graphic peculiarities.
- The entire survival experience can be summarized with a small number of parameters.

A final and important advantage of fitting parametric models is that there may be a theoretical basis for the model that helps us understand the physical process rather than just describe it. The Weibull distribution has such a theoretical interpretation as the time to failure of a physical system that depends on very many parts (sites) for integrity. Thus, the Weibull distribution is a commonly used distribution for failure analysis and is employed in the cardiac literature to model the structural deterioration of prosthetic heart valves. The Gompertz distribution is used widely to model for human mortality, especially for older populations, based on the assumption that the “average exhaustion of a man's power to avoid death to be such that at the end of equal infinitely small intervals of time he lost equal portions of his remaining power to oppose destruction which he had at the commencement of these intervals.”49

These parametric distributions also can be used as the basis for regression models; usually the scale parameter is expanded to contain the risk factors. For the mitral valve data, a Gompertz regression produced hazard ratios identical to those of the Cox regression (see Table 8-5).

Different analytical methods must be used for outcome events after surgery depending on whether they are one-time (operative) or time-related (late) events.

Factors found significant on Univariable analysis often are overturned by multivariable ate analysis because they are surrogates for other more clinically fundamental variables. This was the case with valve model in both the early and overall analysis of mortality. The converse also can happen; re-replacement was not significant for overall mortality by itself but became so in concert with other risk factors. Multivariable analysis, adjusting with the other risk factor, always should be the first choice.

The hazard and cumulative hazard functions measure the instantaneous and cumulative risks of an event, respectively. The cumulative hazard is easier to obtain. The survival curve converts their risks into probabilities of experiencing the events.

Linearized rates provide a convenient single-parameter summary of late-event rates but should not be used unless the hazard function is approximately constant.

Kaplan-Meier analysis estimates survival probabilities as a function of time after surgery. When used for events that are not necessarily fatal, KM estimates probabilities as if death were eliminated, whereas “actual” analysis gives a true mortality-adjusted estimate of the event probabilities.

Parametric regression is a useful tool to analyze long-term outcome. Usually, the Gompertz distribution is good for survival, and the Weibull distribution fits tissue and structural valve deterioration very well.

*Med Care*1992; 30(6):473-483. [PubMed: 1593914]

*Am J Public Health*1975; 65(12):1304-1310. [PubMed: 1200192]

*J Forensic Econ*2000; 13(2):145-167.

*Stat Med*2002; 21(19):2879-2888. [PubMed: 12325104]

*Stat Sci*1986; 1(1):54-75.

*J Heart Valve Dis*2004; 13(1):91-96. [PubMed: 14765846]

*Stat Med*1993; 12:809-824. [PubMed: 8327801]

*Am Statistician*1992; 46:53-54.

*Am Statistician*1998; 52:119-126.

*Statistical Methods for Rates and Proportions*, 3rd ed. Hoboken, Wiley, 2003; 50-63.

*Stat Med*1996; 15:361-387. [PubMed: 8668867]

*Radiology*1982; 143:29-36. [PubMed: 7063747]

*Applied Logistic Regression*, 2nd ed. New York, Wiley, 2000; pp 160-164.

*Commun Stat*1980; A10:1043-1069.

*Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival analysis*. New York, Springer, 2001; pp 11-40.

*Stat Med*1997; 16:965-980. [PubMed: 9160492]

*Biometrics*1995; 51(2):600-614.

*Stat Med*1990; 9(11):1303-1325.

*Stat Med*1995; 14(19):2161-2172. [PubMed: 8552894]

*Stat Med*1994; 13(10):1001-1013. [PubMed: 8073196]

*Ann Thorac Surg*2007; 83(4):1240-1244. [PubMed: 17383319]

*Stat Med*1992; 11(8):1115-1129. [PubMed: 1496199]

*Ann Thorac Surg*2002; 73(1):S358-362.

*Ann Thorac Surg*2009; 87(2):361-364. [PubMed: 19161738]

*J Am Stat Assoc*1958; 53:457-481.

*Reports on Public Health and Medical Subjects,*Vol 33. London, His Majesty's Stationery Office, 1926; pp 1-26.

*J Am Stat Assoc*1977; 72:854-858.

*Ann Statist*1978; 6(4):701-726.

*Statistical Models Based on Counting Processes (Springer Series in Statistics)*. New York, Springer, 1996.

*J Heart Valve Dis*1998; 7(2):163-169. [PubMed: 9587856]

*The Statistical Analysis of Failure Time Data*, 2nd ed. Hoboken, Wiley, 2002; pp 62-65.

*The Statistical Analysis of Failure Time Data*. 2nd ed. Hoboken, Wiley, 2002; pp 251-254.

*Circulation*1997; 96(9 Suppl):II-70-75.

*J Thorac Cardiovasc Surg*1994; 108(4):709-718. [PubMed: 7934107]

*Curr Opin Cardiol*1999; 14(2):79-83. [PubMed: 10191964]

*Ann Thorac Surg*2005; 80(6):2091-2097. [PubMed: 16305851]

*J Am Stat Assoc*1999; 94:496-509.

*Ann Statist*1988; 16(3):1141-1154.

*Ann Thorac Surg*2007; 83(5):1586-1592. [PubMed: 17462363]

*Br J Cancer*1977; 35(1):1-39. [PubMed: 831755]

*The Statistical Analysis of Failure Time Data*, 2nd ed. Hoboken, Wiley, 2002; pp 66-68.

*Phil Trans R Soc A*1825; 115:513-580.

*Continuous Univariate Distributions: Volumne II,*2nd ed. New York, Wiley, 1995.