Peripheral T-cell lymphomas are a group of heterogeneous lymphomas arising from mature T-cells. While diverse, these lymphomas often present at advanced stages and carry a poor prognosis. This study aimed to utilize a machine learning model to identify variables most associated with OS using comprehensive data from diagnosis.
97 patients with PTCL were identified. Data modeling was performed in Python. A random forest survival model was used. Model performance was evaluated using concordance index (C-index) and Brier scores were calculated to assess calibration. Permutation feature importance (PFI) was used to evaluate individual variable contribution, and partial dependence plots were used to interpret model behavior.
The C-index of the full dataset was 0.86 and the validation set was 0.68. Brier scores between 30 and 1095 days ranged between 0.07 and 0.21. The most predictive features included LDH, ECOG, Charlson Comorbidity Index score, ejection fraction (EF), ANC, hemoglobin, and early transplant (PFI of 0.03, 0.02, 0.02, 0.02, 0.02, 0.01, and 0.01 respectively). Creatinine (PFI= 0.009), number of extranodal sites (VI=0.009), and Medicaid coverage (PFI= 0.007) had the next highest feature importance scores. Our model demonstrated predictive capacity for OS. The discrepancy in the C-index in the full cohort and validation sets is likely reflective of limited sample size and indicates potential overfitting. In our dataset LDH, PS at diagnosis, EF, and ANC were the variables most associated with predicted survival. Future work will focus on model refinement, incorporation of molecular and genetic data, and validation in expanded cohorts.