Encrypted login | home

Program Information

Rectal Cancer Outcome Prediction Based On Institutional Data with Random Forests and Random Survival Forests


M Huang

M Huang1*, H Zhong2 , D Liu3 , P Gabriel4 , E Ben-Josef5 , L Yin6 , H Geng7 , C Cheng8 , W Bilker9 , Y Xiao10 , (1) University of Pennsylvania, Philadelphia, PA, (2) University of Pennsylvania, Philadelphia, PA, (3) NASA Jet Propulsion Laboratory, Pasadena, CA, (4) University of Pennsylvania, Philadelphia, PA, (5) University of Pennsylvania, Philadelphia, Pennsylvania, (6) The Hospital of the University of Pennsylvania, Philadelphia, PA, (7) University of Pennsylvania, Bryn Mawr, Pennsylvania, (8) University of Pennsylvania, Philadelphia, PA, (9) University of Pennsylvania, Philadelphia, PA, (10) University of Pennsylvania, Philadelphia, PA

Presentations

SU-F-FS1-6 (Sunday, July 30, 2017) 2:05 PM - 3:00 PM Room: Four Seasons 1


Purpose: To use the machine learning Random Forest (RF) model and novel Random Survival Forest (RSF) model, to predict overall survival (OS) and local recurrence (LR) for patients (150) with locally advanced rectal cancer treated with neoadjuvant chemoradiotherapy followed by surgery, using institutional data from photon and proton radiation therapy (RT).

Methods: The demographics, prognosis, dosimetric, and RT modality information were collected as features for prediction model of RF and RSF. Features included: gender, age, ethnicity, treatment modality (Photon/Proton), tumor clinical staging (cT/cN/cM), pathological staging (pT/pN/pM), surgery type, chemoradiotherapy(Y/N) and treatment techniques (IMRT/CRT, PBS/Double scattering). Patients with above features were used in the RF model for predicting OS and LR. Feature importance was automatically ranked through RF. RSF incorporates RF, and further assesses the corresponding impact of features for survival outcomes. To pursue RSF for event-specific variable outcomes, the prediction result was five-fold cross-validated with RF.

Results: The RF analysis stabilized at 3000 randomized decision trees, with concordance index (C-index) of 0.77 and 0.63 for OS and LR. A five-fold cross-validation was performed on the cohort; for OS, the training set C-index at 0.85, and test set C-index at 0.75. The important features for OS (Gini metric; high to low) are age, cM, pT, cT, RT Duration, pN, IMRT(Y/N), pM, and cN. For LR, importance lists: pN, RT duration, age, pT, pM, dose, cN, cM, fractions, and cT. The RSF model also analyzed survival prediction over time, 98.2%, 89.5%, 84.9%, 81.5%, 78.8% at 1-5-year survival rate.

Conclusion: RF/RSF provide a novel machine learning statistical prediction for OS and LR of locally advanced rectal cancer patients, which features automatic data mining and stable performance, thus a potential tool for decision support of precision medicine. More institutional data will be collected for future study.


Contact Email: