Lack of Differentiation in Performance of Predictive Radiation Pneumonitis Models
S Krafft*, L Court, T Briere, M Martel, UT MD Anderson Cancer Center, Houston, TXTU-G-108-4 Tuesday 4:30PM - 6:00PM Room: 108
Purpose: Numerous normal tissue complication probability (NTCP) models for radiation pneumonitis (RP) have been published in the literature, but minimal consideration is given to their predictive performance. Using several published model parameters, we quantified the range of model performance metrics to differentiate between and assess clinical utility of predictive RP models.
Methods: Clinical, demographic, and dosimetric data was collected for 453 NSCLC patients treated with definitive photon radiotherapy. A series of logistic NTCP models were developed to assess overall performance. A baseline mean lung dose (MLD) model was considered in addition to our institutions best model, which incorporates MLD and smoking status. The model parameters from several publications were identified and incorporated in the analysis. The considered endpoint was CTCAEv3.0 grade>=3. Metrics of model performance including area under the curve (AUC) and Spearmans rank correlation (rs) were calculated for each of the models to facilitate comparison.
Results: The MLD plus smoking status model performed best in terms of observed AUC (0.685) and rs (0.337); however, model performance was similar to that achieved by all of the other tested models (AUC range=[0.663-0.685] and rs range=[0.263-0.337]). In all considered models, predictive performance was poor (i.e. AUC<0.7 and rs<0.4). Additionally, the narrow range of AUC/rs values indicates that there is marginal difference in the performance of any one model over another.
Conclusion: The range of quantitative performance metrics indicates that there is minimal ability to differentiate between the selected models. The achieved values for AUC and rs indicate poor overall model accuracy, which limits the practical utility of NTCP models for RP prediction. Furthermore, model parameters from the literature cannot by applied to our dataset to improve RP prediction. Other model formulations and still unidentified patient factors should be considered to develop more applicable and clinically useful NTCP models for treatment personalization.