Machine learning methods to identify risk factors for corneal graft rejection in keratoconus

The protocol of this retrospective interventional study adhered to the tenets of the Declaration of Helsinki and was approved by the Institutional Review Board of Shahid Beheshti University of Medical Sciences, Tehran, Iran. Medical Ethics Committee II of the Ophthalmic Research Center, affiliated with Shahid Beheshti University of Medical Sciences waived the requirement for participant consent for the use of medical records in this retrospective chart review.

Table of Contents

Study population

The study enrolled all the patients who underwent primary keratoplasty for keratoconus between February 24, 1994 and January 12, 2021. Indications for keratoplasty included inappropriate rigid gas-permeable contact lens fit, unacceptable corrected distance visual acuity (CDVA < 20/40), and contact lens intolerance. Inclusion criteria required a follow-up duration of ≥ 1 year and complete suture removal. The data of both grafts were included for analysis in patients who received bilateral keratoplasty. Exclusion criteria included the preoperative presence of corneal neovascularization or other ocular pathologies. Patients with active vernal keratoconjunctivitis (VKC) were treated medically for at least 6 months before keratoplasty.

Surgical technique

A single experienced surgeon (M.A.J.) performed all surgeries under general anesthesia. Penetrating keratoplasty (PK) was performed in all keratoconus patients before December 2005. Deep anterior lamellar keratoplasty (DALK) was the main procedure after December 2005 unless history of corneal hydrops or intraoperative extensive tear in Descemet membrane that led to conversion to PK. Recipient trephine size was chosen 2.5 mm less than the vertical white-to-white distance in all cases, and grafts were sutured to the recipient bed using 10 − 0 nylon sutures. Suturing techniques included a 16-bite continuous running suture, 16 separate sutures, or eight separate sutures combined with a 16-bite continuous running suture.

Donor preparation

Fresh donor corneas were stored in corneal preservation media (Optisol-GS preservative; Chiron Vision, Irvine, CA, USA) within 15 h after donor death and transplanted within 14 days of storage. Donor tissue was not required to be HY-, human leukocyte antigen (HLA)-, or ABO-compatible with recipients. Donors with quality ranging from good to excellent were transplanted in PK eyes, whereas graft quality for DALK varied from fair to excellent. The donor quality was determined based on the endothelial cell morphology and density as described previously²⁴. DALK grafts were prepared by stripping the Descemet membrane. All grafts were cut from the endothelial side 0.25 mm larger than the size of the recipient trephine.

Postoperative course

Postoperative examination was performed on days 1, 3, and 7, and at months 1, 3, 6, and 12; and every 6 months thereafter. Interim examinations were done if patients experienced new symptoms such as photophobia and decreased vision. The patients received topical antibiotic (chloramphenicol 0.5%) every 6 h for 14 days and topical corticosteroids (betamethasone 0.1%) every 6 h for two months then tapered off per surgeon’s decision. Graft rejection was treated aggressively with the frequent application of topical corticosteroids. Oral prednisolone, 1 mg/kg, was started for severe graft rejection early postoperatively.

Selective separate suture removal started at least 3 months postoperatively when corneal astigmatism was > 4 D. Otherwise, sutures would stay unless they degraded and needed to be removed or any suture complications developed. Management of suture-related complications included application of topical corticosteroids for sterile suture abscess and suture removal for suture tract vascularization, broken, or loose suture. Patients received topical antibiotic and corticosteroids every 6 h for 1 week after suture removal.

Outcome measures

The main outcome measures were the incidence and risk factors of graft rejection. Eyes were categorized to those with at least one episode of graft rejection and those with no graft rejection. The time interval from keratoplasty to the first rejection episode was considered for analysis. Graft rejection was considered irreversible when it resulted in a persistent graft stromal edema or opacity with CDVA < 20/40 for a minimum of 3 months despite intensive treatment.

This study included 19 recipient, donor, operative, and postoperative variables that were suggested to be associated with graft rejection based on previous studies. Recipient characteristics were sex, age at the time of keratoplasty, keratoplasty in the fellow eye, and previous history of VKC and atopic diseases. Donor characteristics included sex, age, endothelial cell density, graft quality, and HY compatibility. A graft from male donor transplanted in female recipient results in HY mismatch. Operative data included the technique of corneal transplantation, size of corneal graft, and suturing technique. Postoperative events were duration of corticosteroid application, discontinuation of corticosteroid at the time of graft rejection, time from keratoplasty to complete suture removal, suture-associated complications, VKC reactivation, and secondary surgical intervention. Suture-related complications and secondary surgical intervention were considered as risk factors if occurred within 3 months prior to graft rejection.

Binary variables, including history of VKC and atopic diseases, keratoplasty in the fellow eye, HY compatibility, suture-associated complications, VKC reactivation, and secondary surgical intervention, had a “Yes” or “No” value. Graft quality was assigned to “excellent”, “very good”, “good”, or “fair”. The technique of corneal transplantation was categorized to “PK” or “DALK”, and suturing technique included “separate”, “continuous”, or “combined”.

Prediction model development

This research used supervised learning algorithms including artificial neural network, support vector machine, gradient boosting, extra trees classifier, and random survival forests for the prediction of graft rejection. Supervised learning algorithms learn from data that already have the correct answers and are often useful for classification purposes which can be categorical values such as “rejection” or “no rejection”, or continuous variables such as height or weight. Support vector machine was a sparse kernel model that predicted unknown class labels based on a subset of the data. This algorithm used a good-fitting hyperplane to separate input data. Kernels were used to convert this hyperplane into a non-linear input separator. Among all possible hyperplanes that satisfied this condition, the hyperplane with the largest margin between selected hyperplane and marginal samples was chosen. Based on this algorithm, largest margin could be achieved by reaching the least value from an edited version of hinge loss function²⁵.

Gradient boosting implemented gradient boosted trees algorithm for supervised learning, using merging multiple simpler models. This model recognized process by minimizing a regularized loss function that combined a convex loss function and penalty term as a presenter of complexity²⁶. In this research, the algorithm was trained for 100 boosting rounds.

Extra trees classifier randomized cut-point choice and attribute while splitting a tree node in tree-based model. This model used multiple decision trees and chose features on the basis of their importance scores²⁷. In this study, the model used 100 estimators to learn training data.

Random survival forests were random forests with a survival outcome such as graft rejection. This algorithm obtained cumulative hazard functions for each tree on the basis of 36.8% of the data that were not used to grow it for greater precision. The final forest cumulative hazard function for each observation was the average of the predictions of decision trees²⁸.

These models have been chosen for the following reasons. Support vector machine performs well with small datasets and has the ability to model non-linear decision boundaries. Gradient boosting is able to handle complex relationships in data, protect against overfitting, and improve the predictive accuracy. The extra trees classifier is less sensitive to noise and irrelevant features. In addition, the random selection of subsets and random splitting points in this model help to decrease the bias that can result from utilizing a single decision tree. The random survival forests analysis is a nonparametric method that can model nonlinear effects and interactions. Due to multiple trees contributing to the results, this analysis accommodates various sorts of predictors and interactions among them and makes reliable prediction on time-to-event outcomes.

The patients were categorized into 80% training set (971 eyes) and 20% test set (243 eyes). The training and test sets were used to train and assess machine learning models, respectively. All simulations were performed using Python (Version 3.10, Van Rossum, Scotts Valley, CA, USA).

Comparison of predictive performance among different models

The predictive performance of the machine learning models was compared using C-statistics, the Brier score, accuracy, precision, mean squared error, root mean squared error, and K-fold cross-validation. The concordance index (C-statistics) is a measure of discrimination ability, which determines if the machine learning model correctly allocates higher predicted risk to patients with graft rejection versus those without rejection. A C-statistics is the area under the receiver operating characteristic curve for sensitivity and specificity, and values closer to unity indicate better discrimination ability²⁹. The Brier score is a measure of calibration to investigate the agreement between predicted and actual risk. A smaller difference between these two risks results in a lower Brier score, hence better calibration. The model is considered to have favorable calibration when the Brier score is ≤ 0.25³⁰. Mean squared error measures the average squared difference between the predicted values generated by a model and the actual values from the data. A lower mean squared error specifies that the predictions of the model are closer to the actual values. Root mean squared error is the square root of the mean squared error. This index allows to understand how much the predicted values deviate from the actual values on average. A lower root mean squared error indicates better model performance. K-fold cross-validation is a model validation method that can evaluate how well a machine learning model will generalize to an independent dataset. This method helps in mitigating overfitting and provides a more reliable evaluation by ensuring that every data point is utilized for both training and validation.

Statistical analysis

Data analyses were performed using SPSS statistical software version 25 (IBM Corp., Armonk, New York, USA). Normality of the data was investigated with the Kolmogorov-Smirnov test and Q-Q plot. Continuous data were presented as range and mean ± standard deviation, and categorical variables were presented as percentages and frequencies. The two groups including eyes without versus with graft rejection were compared using Student t-test for continuous and χ² test for categorical variables. Two-tailed p values < 0.05 were considered to be statistically significant.

link