+++
Scores Established at Admission
++
The scoring systems most commonly used in critically ill adults are APACHE II,18 APACHE III,19 MPM II,20 SAPS II,21 and SOFA.7,22 The variables included in each of these scoring systems are summarized in Table 6-2. The Pediatric Risk of Mortality (PRISM) score23 is the most widely used scoring system in pediatric critical care.
++
++
Some clinical variables are common to APACHE II, APACHE III, MPM II, SAPS II, and SOFA, probably because these variables measure specific clinical and physiologic functions that are major determinants of mortality. Specifically, each of these scoring systems uses age, type of admission, heart rate, blood pressure, assessment of renal function (blood urea nitrogen, creatinine, and/or urine output), assessment of neurologic function (Glasgow Coma Scale or presence of coma), assessment of respiratory function (mechanical ventilation, Pao2/FIo2, or alveolar-arterial oxygen gradient), and assessment of chronic health status. In contrast, other variables are not uniformly shared: serum potassium in APACHE II, glucose and albumin in APACHE III, and serum bicarbonate in SAPS II. These unique variables exist because of differences in the derivation of each scoring system, such as patient sample size, types of patients included, and statistical methods used to derive each score. An important difference between severity of illness scoring systems is how the predictor variables were chosen.24 For instance, in the APACHE II model, the developers selected those variables they thought relevant to patient outcome and then arbitrarily weighted each variable. In the development of MPM II, SAPS II, and APACHE III, statistical techniques were applied to identify those variables that were independently associated with death. These variables were then further refined by use of linear discriminant function and stepwise logistic regression analysis, and the final set of variables were then weighted by statistical methods and analyzed and presented as a cumulative score to predict mortality.
++
The Acute Physiology and Chronic Health Evaluation II (APACHE II) system18 is the most commonly used clinical severity-of-illness scoring system in North America. APACHE II is a disease-specific scoring system. It uses age, type of admission, chronic health evaluation, and 12 physiologic variables (acute physiology score or APS) to predict hospital mortality (see Table 6-2). The 12 physiologic variables are defined as the most abnormal values during the 24 hours after ICU admission.
++
The predicted hospital death rate is computed from the weighted sum of APACHE II score, a variable determined by whether the patient had emergency surgery, and the specific diagnostic category weight. APACHE II score was validated in 5815 ICU admissions from 13 hospitals. The correct classification rate for a 50% predicted risk of death was 85%. APACHE III19 extended APACHE II by improving calibration and discrimination through the use of a much larger derivation and validation patient sample. However, at this time, APACHE III is a proprietary commercial product.
++
The main disadvantages of the APACHE II system are its failure to compensate for lead-time bias,25 the requirement to select only one clinical diagnosis, and inaccuracies in clinical subsets, which produce poor interobserver reliability. In spite of these shortcomings, APACHE II remains the most well known and most widely used severity of illness scoring system.24
++
APACHE III is a disease-specific score that was developed from 17,440 admissions in 40 U.S. hospitals. Eighteen variables (see Table 6-2) were included, and their respective weights were derived by logistic regression modeling. To improve the accuracy of assessment of neurologic function, the Glasgow Coma Scale (GCS) score was changed, because reliability testing suggested the need to eliminate similar GCS scores that could occur in patients who had different neurologic presentations. The APACHE III score is a sum of physiology, age, and data from seven potential comorbid conditions. The final APACHE III score can vary between 0 and 300. Risk estimate equations for hospital mortality are calculated from the weighted sum of disease category (78 diagnostic categories are included), a coefficient related to prior treatment location, and the APACHE III score. In the original derivation sample, estimates of mortality for the first day in the ICU had an area under the ROC curve of 0.90, and the correct classification at 50% mortality risk level was 88%. Although APACHE III scores can be calculated from published information, weights to convert the score to probability of death are proprietary, therefore the APACHE III system has not been widely accepted or used.
++
The Mortality Probability Model (MPM II)20 was developed from 19,124 ICU admissions in 12 countries. MPM II is not disease specific. MPM0 is the only severity-of-illness scoring system that was derived at ICU admission and can therefore be used at ICU admission. MPM II does not yield a score, but rather a direct probability of survival. Burn, coronary care, and cardiac surgery patients are excluded. MPM0 includes three physiologic variables, three chronic diagnoses, five acute diagnoses, and three other variables: cardiopulmonary resuscitation prior to admission, mechanical ventilation, and medical or unscheduled surgery admission (see Table 6-2). Each variable is scored as absent or present and is allocated a coefficient. The sum of these coefficients constitutes the logit that is used to calculate the probability of hospital mortality.
++
The MPM2420 was designed to be calculated for patients who remained in the ICU for 24 hours or longer. MPM24 includes 13 variables, 5 of which are used in the MPM0. In the validation data set, the area under the ROC curve was 0.82 and 0.84 for the MPM0 and MPM24, respectively.
++
The most recent version of the Simplified Acute Physiology Score II (SAPS II)21 was developed from a sample of 13,152 admissions from 12 countries, based on a European/North American multicenter database. SAPS II is not disease specific. SAPS II uses 17 variables (see Table 6-2) that were selected by logistic regression: 12 physiology variables, age, type of admission (scheduled surgical, unscheduled surgical, or medical), and three underlying disease variables (acquired immunodeficiency syndrome, metastatic cancer, and hematologic malignancy). The area under the ROC curve was 0.86 in the validation sample. The probability of hospital mortality is calculated from the score.
++
The Sequential Organ Failure Assessment (SOFA) was originally developed as a descriptor of a continuum of organ dysfunction in critically ill patients over the course of their ICU stay.22 The SOFA score is composed of scores from six organ systems, graded from 0 to 4 according to the degree of dysfunction/failure. The score was primarily designed to describe morbidity; however, a retrospective analysis of the relationship between the SOFA score and mortality was developed using the European/North American Study of Severity System database.7,21 Subsequently, SOFA was evaluated as a predictor of outcome in a prospective Belgium study.13 SOFA score on admission was not a good predictor of mortality (area under the ROC curve 0.79); however, mean SOFA score and highest SOFA score had better discrimination (area under the ROC curve 0.88 and 0.90, respectively). Independent of the initial value, an increase in the SOFA score during the first 48 hours of ICU admission predicts a mortality rate of at least 50%.
+++
Dynamic Severity of Illness Scoring Systems
++
All severity-of-illness scoring systems at ICU admission have a relatively high rate of misclassification of survivors and nonsurvivors. Misclassifications may be caused by the following: (1) exclusion of strong outcome risk factors that cannot be measured or have not been measured at ICU admission, (2) exclusion of complications that occur during ICU stay26, and/or (3) exclusion of treatment effects that modify outcome. Scoring systems applied over the course of the ICU stay can diminish the impact of these factors. However, discrimination of scoring systems applied during the ICU course is lower than discrimination of scoring systems evaluating outcome at the time of initial admission to the ICU.
++
MPM48 and MPM7227 were developed to estimate the probability of hospital mortality at 48 and 72 hours in the ICU. MPM48 and MPM72 have the same 13 variables and coefficients that are used in MPM24, but the models differ in the constant terms, which reflect the increasing probability of mortality with increasing length of ICU stay, even if physiologic parameters are constant. In the validation group, the areas under the ROC curves of MPM48 and MPM72 were 0.80 and 0.75, respectively.
++
APACHE III also can be used to calculate a daily risk of hospital mortality.28 A series of multiple logistic regression equations was developed for ICU days 2 to 7. The APACHE III daily risk estimate of mortality includes the acute physiology score (APS) on day 1, APS on current day, change in APS since the previous day, the indication for ICU admission, the location and length of treatment before ICU admission, whether the patient was an ICU readmission, age, and chronic health status.
++
The SOFA score has been used to describe increasing accuracy of outcome prediction when used over the first 7 days of the ICU course.13 More recently, the changes in SOFA score in cardiovascular, renal, and respiratory dysfunction from day 0 to day 1 of sepsis were significantly correlated with 28-day mortality in two large cohorts of patients who had severe sepsis.
+++
Comparison of the Different Scoring Systems
++
Comparing the accuracy of the different scoring systems is difficult because of differences in populations used to derive these scores and different statistical methods. Thus there have been few head-to-head comparisons of different scoring systems. A multinational study29 compared different generations of the three main severity-of-illness scoring systems in 4685 ICU patients. APACHE III, APS II, and MPM II all showed good discrimination and calibration in this international database and performed better than did APACHE II, SAPS, and MPM. APACHE II and APACHE III have been compared in 1144 patients from the United Kingdom.30 APACHE II showed better calibration, but discrimination was better with APACHE III. Both scoring systems underestimated hospital mortality, and APACHE III underestimated mortality by a greater degree.
+++
Comparison of Clinical Assessment with Scoring Systems
++
Clinical judgment to predict outcome has been criticized because it is not very reproducible, it has a tendency to overestimate mortality risk, and bias is introduced by the ability to recall particularly memorable, rare, and recent events.15 Three studies compared APACHE II with physicians' mortality predictions in the first 24 hours of ICU admission,31–33 and one study evaluated physicians' predictions only.34 Discrimination by physicians had ROC curve areas ranging between 0.85 and 0.89, which were similar to32,34 and even significantly better than those of APACHE II.31,33 In contrast to ability to discriminate, calibration rate of physicians' predictions of mortality versus APACHE II differed. For high-risk patients, APACHE II and physicians had similarly correct predictions for mortality, ranging from 71% to 85%. However, for estimated mortality risks below 30%, rates of correct classification of physicians' predictions were 39% to 69%, compared with 51% and 67% for APACHE II.31
+++
Customization of Scoring Systems for Specific Diseases
++
Severity-of-illness scoring systems have been developed, derived, and validated for specific diseases to improve the accuracy of general scoring systems. APACHE III uses 74 disease classifications and derives a unique mortality risk prediction for each of these disease classifications. New scoring systems have been introduced to better predict mortality for patients with multiple organ failure and sepsis. The original models of SAPS II and MPM II did not perform well in patients who had severe sepsis, because mortality in severe sepsis was higher than mortality in patients with other diagnoses. Both models subsequently were customized5 for sepsis by using the original data to derive coefficients unique for sepsis to calculate predicted mortality. Furthermore, severity-of-illness scoring systems specifically designed for sepsis have been developed.
++
Prediction of mortality in sepsis will likely benefit from a dynamic approach that is based on evolution of multiple organ dysfunction. Commonly used organ failure–based systems that have been studied include the Sequential Organ Failure Assessment (SOFA) score,22 the Multiple Organ Dysfunction Score (MODS),35 and the Logistic Organ Dysfunction System (LODS).36
++
All three systems attribute points for organ dysfunction in six different organ systems. MODS,35 which applies to surgical patients, differs from SOFA and LODS in the cardiovascular assessment. MODS scores the cardiovascular system based on the “pressure-adjusted heart rate,” defined as the product of the heart rate multiplied by the ratio of the right atrial pressure to the mean arterial pressure. LODS and MODS have excellent discrimination, with ROC curve areas of 0.85 and 0.93, respectively.35,36
++
APACHE II, MODS, and SOFA were recently used to compare outcome prediction in and prospective study of 949 ICU patients.37 There were no significant differences between MODS and SOFA in terms of mortality prediction. The area under the ROC curves for APACHE II, SOFA, and MODS were 0.880, 0.872, and 0.856, respectively. In patients with shock, the MODS and SOFA scores were slightly better mortality predictors than APACHE II score (area under ROC curve 0.852 and 0.869 vs. 0.825).
++
Some have suggested that organ failure–based scoring systems could provide an outcome measure to be used as a surrogate for the end point of mortality.38 Thus, for large (and expensive) randomized clinical trials such as those recently conducted in the treatment of sepsis or acute lung injury, could a reduction in some score of organ failure be taken as a measure of reduced morbidity and hence high drug efficacy?
++
Many randomized controlled trials in critical care have successfully evaluated organ dysfunction as secondary outcome variables by using scoring systems. Important recent examples include the ARDS Network study of 6 mL/kg vs. 12 mL/kg of ideal body weight tidal volume in patients who had acute lung injury.39 The use of a protocol of 6 mL/kg ideal body weight, positive end-expiratory pressure (PEEP), and guidelines for respiratory rate and minute ventilation decreased mortality from 40% (with 12 mL/kg tidal volume) to 30%. In addition, the 6 mL/kg tidal volume strategy significantly increased the number of days patients were alive and free of respiratory, hepatic, cardiovascular, coagulation, and renal dysfunction39 as assessed using the Brussels scoring system.9 The randomized controlled trial of recombinant human activated protein C (rhAPC; drotrecogin alfa) showed that rhAPC decreased mortality of severe sepsis from 31% to 25% compared to placebo.40 The SOFA score was used in this study to evaluate organ dysfunction and rhAPC improved markers of organ dysfunction.
+++
Scoring Systems Specific for Trauma Patients
++
Scoring systems have been developed to improve triage of trauma patients and to predict their mortality (see Chap. 92). Trauma scoring systems were developed using general trauma patient samples, not specifically critically ill trauma patients. The initial scores were either anatomic (Injury Severity Score or ISS1,41) or physiologic (Trauma Score or TS42 and Revised Trauma Score or RTS43). Recently, trauma scoring systems have been expanded to include age, anatomy, and physiology, including the Trauma and the Injury Severity Score or TRISS methodology,2 and A Severity Characterization of Trauma or ASCOT.44 Large trauma registries facilitated implementation and validation of trauma scoring systems in large samples of patients. Table 6-3 summarizes the main trauma scoring systems.
++
++
The accuracy of TRISS and APACHE II have been compared in critically ill trauma patients.45 APACHE II classifies trauma patients under only four diagnostic categories: postoperative multiple trauma, postoperative head trauma, nonoperative multiple trauma, and nonoperative head trauma. In APACHE II, patients with combined head and other injuries were assigned to multiple trauma, which was given a lower weight than the isolated head trauma category in predicting mortality.46 The number of derivation patient samples of APACHE II were much smaller than the samples used for the trauma scores. TRISS tends to perform better than APACHE II. APACHE II significantly overestimates the risk of mortality in the lower ranges of predicted risk and underestimates the risk of mortality in the higher ranges. APACHE III attempted to improve prediction of mortality for head-injured patients by revising the definition for head trauma, allowing assignment of patients with isolated head trauma as well as head trauma and other injuries to the head trauma category. This resulted in a higher predicted mortality that more closely reflected the actual mortality.