Ensemble learning approach for prediction of early complications after radiotherapy for head and neck cancer using CT and MRI radiomic features

Table of Contents

Patients demographics and dosimetrics data

Between 2020 and 2021, 187 HNC patients were treated who received radiotherapy. Among these, 80 patients were selected and evaluated prospectively for the study, all of whom had healthy salivary and parotid glands. We excluded patients who had undergone parotid removal or had parotid tumors, as the study aimed to examine the parotid glands as an organ at risk (OAR). Specifically, 35 patients were excluded due to tumors in the salivary or parotid glands removal, 39 patients were excluded because dental artifacts distorted the CT images in the parotid area, 15 patients were excluded due to complications during treatment, and 18 patients did not agree to complete the follow-up questionnaires. Patients were included based on the following criteria: (1) Histological diagnosis of hypopharynx, oropharynx or nasopharynx carcinoma and lymphoma. (2) Primary treatment with radiotherapy, either alone or in combination with chemotherapy. (3) Weekly follow-up questionnaires during treatments and 3 months after the end of treatment. Patients undergoing chemotherapy were treated according to standard institutional guidelines, primarily with cisplatin or carboplatin. Detailed chemotherapy protocols, including drug types, dosages, and schedules, are provided in the Supplementary Materials.

All individuals were treated by IMRT with a prescribed dose of 50–70 Gy. The patients aged between 16 and 85 years, each underwent CT scans acquired using a CT simulator before the treatment. The delineation of parotid glands on each CT was performed by an expert radiation oncologist and independently verified by another oncologist. The process began with registering CT and MRI images using the Eclipse treatment planning system (Rigid registration). The oncologist then delineated the contours on registered CT and MRI images. At the end of course of radiation therapy, and after 3 months post-treatment each patient underwent another CT scans for follow up.

Patients’ demographics included age, family history of cancer, surgery, smoking status, and history of chemotherapy. Moreover, a questionnaire was completed weekly by a radiation oncologist comparing the original CT and cone beam CT, and the impression of salivary function recorded. The questionnaire was included all information about any changes in primary tumor site, dry month, eating, speaking, and swallowing functions. After the end of the radiotherapy sessions and after three months, the same questionnaire follow up was performed. During this period, the highest severity of the complication was considered as the endpoint.

From the dosimetric data (DVHs), we extracted the dose details for both parotids, and the volume receiving x Gy of radiation (\(V_{xGy}\)). The dose-volume table related to parotid glands is shown in Table 1. The DVHs for parotid and other organs were extracted from the treatment planning system (TPS) and used for modeling. Parotid volumes receiving 5 Gy (\(V_{5Gy}\)) to 70 Gy (\(V_{70Gy}\)) were extracted from DVH curves. Other parameters including volume, maximum dose, minimum dose, modal dose, median dose, mean dose, and diameter of equivalent sphere were obtained from the TPS for each patient. A total of 14 different DVH metrics were extracted for each patient.

All patients provided written informed consent before starting therapy that their data could be used for research purposes. The university Medical School is not applicable to data collection as part of routine clinical practice and therefore, this work was approved by the Ethics Committee of the university Medical School for the conduct of studies based on these data. All patients received standard clinical care of radiotherapy. All ethical issues relating to the patients are approved by the ethical committee of the university. All procedures performed in studies involving human participants were in accordance with the 1964 Helsinki Declaration and its later amendments.

Table 1 Mean values for dosimetric data obtained from treatment planning.

Radiomic feature extraction

The main aim of this work was to predict xerostomia after radiotherapy in HNC patients. As schematically illustrated in Figure 1, the workflow was developed in three steps: (i) feature selection was performed on each dataset, (ii) different dosimetric features extracted from the TPS (iii) finally, machine learning models were trained for each feature subset, and the classifiers were trained using the selected subset together.

Acquisition of CT and MR images was performed before the radiotherapy sessions as part of the treatment planning process. CT scans were obtained using a GE Hi Speed CT scanner (GE Healthcare, Milwaukee, USA) with slice thickness of 1.5 mm, field of view 500 mm, matrix size 512\(\times\)512, 120 kVp and 340 mAs. MR images were acquired from three different centers and the protocols and imagers used are shown in Table 2. There are various studies on the effect of different imaging protocols and imaging devices on the radiomics features of images^32,33,34,35. Due to potential effects on the MR image acquisition from different devices, image harmonization was performed using the COMBAT technique^36,37 to control for multi-center acquisition of these images. Gross tumor volume (GTV) and OARs including parotid and other salivary glands were contoured by an experienced radiotherapy professional. Treatment planning was performed using a Varian Eclipse (version 15, Varian Medical Systems, Palo Alto, Ca, USA) TPS.

Table 2 Protocols used in MRI images.

Prior to feature extraction, we applied intensity normalization to the MR and CT images to ensure consistency across different imaging modalities and centers. Image feature extraction for the contoured structures was performed using the Radiomics module in the freely available 3D Slicer software (version 4.10.0 Harvard University, National Institutes of Health). A total of 642 radiomic features were extracted from both parotids for CT, \(T_1\) and \(T_2\) weighted MR images. The features type included shape, gray level dependence matrix (GLDM), gray level co-occurrence matrix (GLCM), first order, gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM) and neighborhood gray tone difference matrix (NGTDM). The number of features extracted from the images was very large, given that many of these features were redundant and should be removed.

Feature selection and model evaluation

Before any analysis, the dataset was divided into two partitions: 70% for training and validation, and 30% for testing. The division was performed using a stratified random sampling approach to ensure proportional representation of xerostomia cases across both subsets. This stratification ensured that the distributions of demographic, dosimetric, and radiomic features, as well as the prevalence of xerostomia, were consistent between the training and testing sets. Additionally, we have included a table (Table 2S) in the Supplementary Materials that provides anonymized statistics and summaries of patient allocation to the training and testing subsets. All features were normalized using the z-score technique, with the mean and standard deviation calculated from the training data and subsequently applied to the testing data to prevent data leakage. Moreover, a 5-fold cross-validation and random search were conducted on the training data to optimize the hyperparameters of the classifiers. The test dataset, kept completely independent, was used solely for final model evaluation.

One of the important stages in modeling was feature selection. For modeling, there are different algorithms for feature selection. In this study, Pearson linear regression was used for selection of the features. In the selection process, all features were ranked from 0 to 1 based on their relevance to the prediction. We did not set a specific number of features to select; instead, our criterion was the importance of each feature. A feature was selected for modeling if its importance exceeded 0.95 and above. Features selected in the modeling are shown in Table 3. Here, the models used in the evaluation were Random Tree (RT), Neural Network (NN), Linear Support Vector Machine (LSVM) and Bayesian Network (BN) approaches. Different algorithms were combined by using the ensemble learning method as illustrated in Table 4. Specifically, the ensemble learning was conducted using the MATLAB Machine Learning toolbox, which includes an ’Ensemble’ operator for combining multiple models. The parameters for each algorithm were optimized during the model training process:

RT: The number of trees was set to 100, the maximum tree depth limited to 10 to prevent overfitting. A minimum of 5 samples per leaf was required.
NN: A feed-forward architecture with 2 hidden layers was utilized. The activation function chosen was ReLU, with a learning rate of 0.01 to support gradient-based learning.
LSVM: A linear kernel was used with a regularization parameter of 1 to balance the trade-off between model complexity and error.
BN: Maximum likelihood estimation was employed, using a smoothing parameter of \(1e^{-5}\) to account for small probabilities during structure learning.

The ensemble models were built by combining these individual algorithms in double, triplet, and quadruplet combinations (e.g., RT-NN, RT-BN, RT-NN-LSVM, etc.). The majority voting technique was employed to aggregate the predictions from the individual models. Moreover, due to the population imbalance which poses a great challenge in training any learning model, we used synthetic minority oversampling technique (SMOTE), to oversample the radiomics features and leverage the latent features^38,39. The SMOTE was implemented using the Python library imblearn, which is specifically designed for oversampling minority classes in machine learning datasets. Performance of the constructed models was evaluated utilizing the ROC curve approach using accuracy, sensitivity, specificity and area under curve (AUC) metrics.

Table 3 The selected features and their importance weightings used in models studied.

Table 4 The ensemble models used in this study.

link