A review of machine learning applications in heart health | BioMedical Engineering OnLine

0
A review of machine learning applications in heart health | BioMedical Engineering OnLine

This section covers papers that apply machine learning methods to predict the diagnosis of a stroke. The section titles are the names of the surveyed papers. Each paper uses separate methods, such as image classification on brain scans, natural language processing on electronic health records, and machine learning models on nominal patient data. These methods are reviewed to provide the advantages and disadvantages of each. In the upcoming subsections, the SVM, Logistic Regression (LR), and RF models were the most commonly used machine learning techniques. Random Forest and LR models appear to be the most useful machine learning methods for stroke diagnosis prediction.

Classification of stroke disease using machine learning algorithms (2019)

The objective of the experiment conducted by Govindarajan et al. [23] was to classify the type of stroke a patient is having using machine learning. The researchers used data mining techniques, such as base-form generator and novel stemmer, to extract patient data from the case sheets. The case sheets from Sugam Multispeciality Hospital, India, included patients between the ages of 35 and 90, with a total of 22 different stroke classification labels. However, all 507 instances are classified as one of the two main types of stroke, as the 22 unique labels are subtypes of patient strokes. The acquired data undergo data preprocessing to transform the data into a useful state. The preprocessing involved removing duplicates, inconsistencies, and missing data. The receiver operating characteristic (ROC) curve was the main statistic used to evaluate the classification models. Five different machine learning algorithms were used to evaluate the data: Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Trees (DTs), Logistic Regression (LR), and bagging/boosting. Random Forest (RF) was used for bagging, and AdaBoost was used for boosting. ANN is a subtype of neural network that refers specifically to a network with artificial neurons used for computation and machine learning tasks.

The first machine learning method used was stochastic gradient descent learning in an ANN. This method supports the optimization of the objective function. The network used one hidden layer consisting of 10 neurons to identify two classes of instance: ischemic and hemorrhagic. Different types of SVMs were used for a kernel method of classification, including linear SVM, quadratic SVM, and more. The DT method used information gain to split the data into subsets. These subsets, or branches, were pruned based on data regarding lasting improvements in patients’ conditions.

Logistic regression was used to determine if age, hypertension, diabetes mellitus level, and other factors have an effect on the stroke outcome in each instance, where features were found to be either dependent or covariate. The accuracy of the LR model was 90.6%, though the model also resulted in an ROC curve where the area under the curve (AUC) was only 0.63. This implies the data could be imbalanced as it shows a high accuracy score with a much lower AUC, and therefore, using accuracy as the main measure may be unreliable in this case [24].

Two ensemble methods were tested on the data to see if combining predictions results in better accuracy: bagging with RF and boosting with AdaBoost. The results showed that ischemic stroke was the most common type of stroke seen in patients, and the symptoms that were most common for this type of stroke were weakness, dysarthria, giddiness, and difficulty walking; whereas in hemorrhagic stroke, the top symptoms were weakness, vomiting, and other symptoms. The ANN produced the highest accuracy score of the models, at 95.3%. Additionally, the ANN resulted in high and balanced precision and recall scores of 95.9 and 99.2, respectively. The ANN also had a sensitivity of 95.9, which means the amount of instances that were falsely classified as negative was low. However, the specificity was 60, meaning that there were a lot of false positives in the classification. The majority of the specificities of the other classifiers were less than 20%, and many were zero. One way to mitigate the model’s inaccurate classifications is to lower the decision threshold [25]. In addition to adjusting the decision threshold, low specificity could stem from class imbalance, which is another complication that can be addressed through random undersampling or random oversampling in future work [26].

Automating ischemic stroke subtype classification using machine learning and natural language processing (2019)

Normally, ischemic stroke subtyping is classified by doctors after manual testing, monitoring, imaging, and reviewing the data. In an attempt to improve efficiency, Garg et al. [27] conducted a study using natural language processing to extract information from documents and machine learning to classify the stroke subtypes. The data used in the study were from Epic Systems Corporation, a leading electronic health records (EHR) technology company. They used various data cleaning methods for natural language processing, such as removing stop words and numeric characters, to improve the quality and avoid corrupt data. The most common words and phrases were used to generate a feature matrix, and the dimensionality was reduced using a principal component analysis (PCA). The features produced in the PCA were combined with the top 25% of attributes from feature selection performed using Extreme Gradient Boosting [28]. The data were then ready to use for classification by K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), Extremely Randomized Tree Classifiers [29], Gradient Boost Machines (GBM) [30], and Extreme Gradient Boosting (XGB) [31]. The authors used 1091 samples for classification and 50 samples as test data. The dataset was evenly split between the five classes (subtypes): cardioembolic, cryptogenic, large-artery atherosclerosis, small artery disease, and a fifth class labeled other determined sources. They also performed stacking by feeding the predictions from the models into a meta-classifier to create a new model. The meta-classifier is an ensemble learning approach that uses the outputs from all of the models to reach a decision.

The models were evaluated based on the kappa value calculated by the inter-rater agreement. The kappa value represents how close the classified value was to the actual value on a scale of \(-\)1 to 1, where 1 is perfect agreement. The models showed results of 0.21–0.57 for the combined data. Almost every model showed better results when using the combined data from clinical notes and radiology reports compared to clinical only and radiology only. The TOAST (Trial of ORG 10172 in Acute Stroke Treatment) subtypes [32] showed that the cardioembolic subtype generally gets the highest kappa value in the classifiers. The five subtypes of ischemic stroke defined by TOAST include Large-artery atherosclerosis (LAA), cardioembolism (CE), small-vessel occlusion (SAO), stroke of other determined etiology (OTH), and stroke of undetermined etiology (UND) [32]. Cardioembolic stroke [33] happens when there is a blockage in a blood vessel from unwanted material pumped into the brain circulation. The TOAST subtype classification using stacking with LR had a kappa of 0.72 and had an 80% agreement with the neurologist classification. Stacking is an ensemble method that takes the results of multiple models and combines them to create a final prediction with improved performance [34]. They concluded that their methods for data parsing and prediction performed well, particularly with cardioembolic and large-artery atherosclerosis types of stroke. The study was well conducted, but the accuracy rates were too low to consider this method to be useful. The accuracy rates were between 53% and 63%, which is extremely low considering this classification is intended to help in life-or-death situations. The authors should consider using a multi-model approach to learning or selecting more features to help get a more accurate result [35].

A machine learning approach for classifying ischemic stroke onset time from imaging (2019)

Ho et al. [36] used MRI scans to predict the time since the stroke onset, as it is useful for determining the required treatment. The dataset included 131 samples, where 35% of them belonged to the minority class (less than 4.5 h since stroke), and 65% of them belonged to the majority class (greater than 4.5 h since stroke). They used four different features from the MRI as inputs for the experiment: diffusion-weighted imaging (DWI), fluid-attenuated inversion recovery imaging (FLAIR), apparent diffusion coefficient (ADC), and perfusion-weighted imaging (PWI) parameter maps. DWI is a type of magnetic resonance imaging that determines where there is cellular swelling based on the motion of water in tissue [37]. FLAIR suppresses the signals from cerebrospinal fluid in images to highlight cell bodies and nerve fibers using a sequence [38]. PWI is a technique in imaging used to ascertain how blood flows through the brain [39]. ADC uses the diffusion-weighted images to determine the extent of diffusion within the tissue [40].

The images were preprocessed to remove spatial noise and standardize intensity values, while certain filters were applied to remove irrelevant tissue overlays. The regions of interest were identified by the Tmax, which indicates the time it takes an injection to reach that area of the brain. The features generated include standard statistical values captured, like standard deviation, mean, and median, and values calculated from the regions of interest identified, like area, volume, and circularity. The list of features was further processed using a deep autoencoder to gather hidden features in the form of a feature map. Different autoencoder architectures were used to evaluate the difference between coupling types. The most correlated baseline and deep autoencoder features were selected based on Pearson correlation. Pearson correlation is calculated by dividing the covariance of variables by their standard deviations [41]. Subgrouping analysis was performed by evaluating the model outcomes after slight changes in magnetic field strength, year of imaging, and other MRI acquisition parameters.

Next, the data were classified using LR, GB Regression Tree (GBRT), SVM, Stepwise Multilinear Regression (SMR), and Convolutional Neural Networks (CNN). In order to avoid classification bias and identify overfitting, tenfold cross-validation was performed. Images from the dataset were separated and evaluated separately to reduce errors caused by changes in medical technologies/practices, like the field strength of MRI. The performance metrics used to evaluate the models were AUC, specificity, F1 score, positive predictive value (PPV), and negative predictive value (NPV). The models were trained with baseline features only, deep features only, and both to determine which method would return the best result. The outcomes of the models, when trained with a combined feature set, produced better results than those where only one feature set was used. In addition, the autoencoder method improved the models in most cases when arterial input function (AIF) and contralateral coupling were used for deep feature generation. When using both AIF and contralateral coupling with baseline and autoencoder features, LR achieved the highest AUC score (0.765), followed by SVM (0.746), SMR (0.690), and GBRT (0.670). The only model that performed higher under different circumstances was GBRT, which achieved an AUC score of 0.676 using only contralateral coupling and only autoencoder features. The results were then compared to another method, DWI-FLAIR, that was previously proposed by Ho et al. [36]. The comparison showed that the model best suited for time since stroke (TSS) classification was the LR baseline imaging features with the deep autoencoder features, where all the AUC scores in this category were less than 0.67. The use of subgrouping and feature correlation improved the performance of the models, showing increased accuracy in results. However, there were only about 180 images for the experiment, which means that the small sample size could show skewed results from lack of variety. Small datasets are common among stroke and heart attack studies, which can have a negative impact on the generalizability of machine learning algorithms. In other words, the model may suffer from overfitting to the training set, and new instances fed to the model will have high error in classification.

Machine learning algorithm for stroke disease classification (2020)

The goal of Badriyah et al. [42] was to improve the performance of machine learning models trained on computed tomography (CT) scans. The authors used a dataset of 233 CT scans of 102 patients from a hospital in Indonesia. Only 7 of the scans belonged to the hemorrhagic class, and the remaining 226 belonged to the ischemic class, indicating severe class imbalance. The scans were preprocessed to refine the image quality using many techniques, such as data augmentation, gray scaling, cropping, data conversion, and scaling. The following features were extracted from the images using a gray-level co-occurrence matrix (GLCM): contrast, dissimilarity, homogeneity, correlation, angular second moment (ASM), and energy. The scans from the dataset were used as input in eight models for comparison: K-Nearest Neighbor (KNN), Naïve Bayes (NB), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Neural Network MLP, Deep Learning (DL) [43], and Support Vector Machine (SVM). The models were evaluated based on their performance in accuracy, precision, recall, and F1 score. The model with the highest and most balanced precision and recall scores was the RF algorithm. This is important because a model is more useful when it can predict the target class and find all the objects of a target class. RF showed the best performance for both 10-fold and leave-one-out (LOO) cross-validations, with accuracies of 95.67% and 95.97%, respectively. In addition, RF with LOO cross-validation had the highest recall (96.12%) and F1 score (95.39%). However, RF was outperformed by KNN in the precision metric, as KNN achieved a precision of 95%, compared to RF’s 94%. This study offered a wide variety of performance metrics and models to compare how well they work with this type of data. However, the use of AUC and precision–recall curves would help prove the models’ effectiveness, as the metrics used can vary based on the threshold chosen. The previous study by Ho et al. [36] found that LR was the best option for MRI scans, compared to RF for CT scans. Future work in relation to this study can include comparing and contrasting the use of CT scans versus MRI scans for the prediction process and determining how their respective features can impact the models. Additionally, future work should include class imbalance mitigation, as it was not addressed in the study.

Machine learning prediction of stroke mechanism in embolic strokes of undetermined source (2020)

Kamel et al. [44] used machine learning to classify subtypes of ischemic strokes. The algorithms were trained using the Cornell acute stroke academic registry and tested on the cases of embolic strokes of an undetermined source (ESUS). Specifically, Kamel et al. [44] were looking to identify the amount of cardioembolic strokes compared to those that were noncardioembolic. The data they used included variables such as demographics, risk factors, vital signs, echocardiograms, and laboratory tests from patients’ medical records. This dataset exhibits class imbalance; the class makeup is as follows: 40.5% cardiac embolism, 34.1% undetermined source, 17.1% large-artery atherosclerosis, and 6.1% from other determined sources. They built predictive models for determining if the source of the stroke is unknown and if the patient suffered a cardioembolic stroke. The models were put through bias-variance balancing, in which the probabilities were tilted via threshold optimization to create a balanced distribution. Lastly, they used Bayesian updating to preserve relevant information. The model used for these predictions was the ensemble learner called the super learner, which included Random Forest, Gradient Boost, and various types of LR. A super learner is one that combines results from many different models, with the weights of each model determined through tenfold cross-validation. They used only the most correlated variables and tenfold cross-validation to evaluate model performance.

The model showed an AUC of 0.85 for the predictor of the stroke being from a cardioembolic source. The study concluded that the most correlated features were age, cardiac disease, location of infarction, and size of atria. The study used a small and mostly uniform population size in its data; as such, it is difficult to generalize the results to other populations. The authors did not use a baseline to compare their results and used a limited number of performance metrics. They did not discuss the other subtypes, such as atherosclerosis. Some future work might include exploring those other subtypes further with the same intent. Although the authors used bias-variance balancing to avoid overfitting, they did not incorporate techniques to address the dataset’s class imbalance. In addition, an effort can be made to apply super learners to a much larger dataset or to one that includes both numerical and image data.

Utilizing machine learning to facilitate the early diagnosis of posterior circulation stroke (2024)

Posterior circulation stroke (PCS) is a specific type of stroke that is particularly difficult to diagnose, as CT scans often cannot be used to diagnose PCS. Abujaber et al. [45] conducted a study to use machine learning for the diagnosis of PCS, where the goal was to determine which model best aids in a physician’s diagnosis. The data used in the study were retrieved from the National Qatar Stroke Registry from the Hamad General Hospital, where the data ranged from 2014 to 2022, and all patients who had suffered hemorrhagic stroke were excluded. The patient information in the dataset included a total of 29 features, such as vitals upon admission, as well as other factors that contribute to stroke risk, like obesity or smoking. Only 20% of the patients in the study had PCS, so the authors weighted the classes to address a potential issue of class imbalance.

The data were separated into training and testing sets, where the training set consisted of 80% of the instances and the testing had the remaining 20%. The data were fed into five machine learning models after feature scaling with data normalization was performed on the data. The following models were used with random undersampling of the majority class as an extra measure for class imbalance: XGB, weight-adjusted RF, SVM, Classification and Regression Tree (CART), and LR. The XGB and RF models produced the best results. The XGB model resulted in an accuracy of 79%, precision of 0.5, recall of 0.62, F1 score of 0.55, and an AUC of 0.81. The RF model had 83% accuracy, 0.72% precision, recall of 0.62, 0.85 AUC, and an F1 Score of 0.39. However, RF was deemed an unreliable classifier due to its extremely low F1 score of 0.39, which was the lowest of all five models. SHAP (SHapely Additive exPlanations) analysis was conducted to determine that body mass index (BMI), stroke severity using the United States National Institute of Health Stroke Scale (NIHSS), random blood sugar, ataxia, dysarthria, diastolic blood pressure, and body temperature were the most important features in classification. SHAP analysis [46] selects the most useful features by calculating their importance based on how a model performs with and without each feature. The mean SHAP values identified in the study are as follows: BMI (1.5), NIHSS baseline (0.43), random blood sugar (0.33), ataxia (0.27), dysarthria (0.25), diastolic blood pressure (0.25), and body temperature (0.26). The study concluded that patients with BMI over 25% are four times more likely to have PCS, and ataxia and diastolic blood pressure are key indicators for early diagnosis of PCS, whereas dysarthria is more likely related to other types of stroke. The authors used random undersampling to mitigate class imbalance; however, this could have removed valuable information from the majority class. The models the authors chose generally perform well on imbalanced data [47,48,49,50], thus handling class imbalance may not have been necessary. Future work should include a comparison between model performance with and without class imbalance mitigations, such as random-undersampling, random-oversampling, and Synthetic Minority Oversampling Technique (SMOTE). In addition, data in this study were gathered from one hospital, meaning the results have limited generalizability.

link

Leave a Reply

Your email address will not be published. Required fields are marked *