BACKGROUND: POSSUM and P-POSSUM are used in the assessment of outcomes in surgical patients. Neither scoring systems' accuracy has been established where a level 1 critical care facility (level 1 care ward) is available for perioperative care. We compared POSSUM and P-POSSUM predicted with observed mortality on a level 1 care ward. METHODS: A prospective, observational study was performed between May 2000 and June 2008. POSSUM and P-POSSUM scores were calculated for all postoperative patients who were admitted to the level 1 care ward. Data for post-operative mortality were obtained from hospital records for 2552 episodes of patient care. Observed vs expected mortality was compared using receiver operating characteristic (ROC) curves and the goodness of fit assessed using the Hosmer-Lemeshow equation. RESULTS: ROC curves show good discriminative ability between survivors and non-survivors for POSSUM and P-POSSUM. Physiological score had far higher discrimination than operative score. Both models showed poor calibration and poor goodness of fit (Hosmer-Lemeshow). Observed to expected (O:E) mortality ratio for POSSUM and P-POSSUM indicated significantly fewer than expected deaths in all deciles of risk. CONCLUSIONS: Our data suggest a 30-60% reduction in O:E mortality. We suggest that the use of POSSUM models to predict mortality in patients admitted to level 1 care ward is inappropriate or that a recalibration of POSSUM is required to make it useful in a level 1 care ward setting.