A quantum inspired machine learning approach for multimodal Parkinson’s disease screening

Table of Contents

Data description

We utilized the mPower public research portal, which contains measurements from over 6,000 participants—both healthy and those affected by Parkinson’s²³. The dataset is available under protected access to certified researchers. The data includes common Parkinson’s disease biomarkers: demographic information, such as age, gender, and smoking history, as well as voice recordings, tapping measurements, and gait tracking, all recorded through a smartphone app. We restricted our analysis to participants who completed all the different tests (voice, tapping, gait) measured in the mPower dataset. Since many participants completed multiple iterations of the same test, we randomly selected a single trial per activity per participant to mitigate potential biases favoring those with repeated trials. We do so because, including multiple trials from the same participant could inadvertently skew model performance by overrepresenting that individual’s characteristics in the dataset. This risk of overrepresentation would make the model overly tailored to participants with more trials, diminishing its generalizability to broader patient populations²⁴. For model training and testing, we focused on 194 participants who completed the voice, gait, and tapping tests. This subset stands out for its diversity, including male and female participants who identified as Caucasian, African American, Hispanic, East Asian, South Asian, and mixed race. The subset al.so represents a range of educational backgrounds, with 35% of participants not holding a four-year college degree. We divided these 194 samples into 164 for training and 30 for testing, ensuring a balanced representation of both Parkinson’s-affected and healthy individuals.

Feature selection

Using this dataset, we extracted 64 voice, gait, tapping, and demographic features for each of the 194 participants, balanced between healthy individuals and those with Parkinson’s. Each data modality (voice, gait, tapping, demographics) was preprocessed individually to ensure modality-specific feature extraction and noise reduction. For gait and tapping data, we extracted both time-domain metrics (e.g., root mean square, standard deviation, tapping counts) and frequency-domain features (e.g., spectral centroid, spectral spread) to capture Parkinson’s-related tremors. Vocal data consisted of 10-second “ahh” recordings, from which we derived pitch, volume, breathiness, and reduced-dimensional MFCCs, while demographic information included age, smoking history, and gender. Feature correlation was managed through random forest–based importance weighting. Additional details on feature extraction appear in the methods section. We normalized the dataframe using Scikit Learn’s StandardScaler to ensure a consistent magnitude for each feature. Next, we trained a baseline Random Forest model to identify the top-performing features for the final qSVM model, selecting features with importance values above the 80th percentile²⁵ (Fig. 1).

Among the demographic features, age proved to be the most significant, consistent with extensive research illustrating a heightened risk of neurological disorders in older populations. In the voice analysis, the spectral centroid mean was the best predictor of Parkinson’s disease. This feature refers to the “center of mass” of a voice signal and often corresponds to how sharp or muffled the sound is, corresponding to the vocal changes observed in PD. Regarding the gait features, the root mean square and standard deviation of acceleration in the z direction had the highest feature importance. The root mean square, corresponding to the magnitude of acceleration, shows how forcefully the participants moved up and down when walking, and the standard deviation shows the variability of vertical acceleration, corresponding to “shaky motion”, a hallmark of Parkinson’s disease.

For the raw tapping information, the number of left taps, right taps, and total taps quantify how many times the participants tapped their screen in 20 s, providing a measure of their dexterity. Meanwhile, tapping consistency quantified by the standard deviation of the time between taps, shows whether they kept a consistent pace throughout the test. In addition to the raw tapping information, the tapping acceleration measurements from the participants’ smartphones were also significant. Among these features, the root mean square, or magnitude of acceleration, as well as the standard deviations of accelerations in each direction, captures abrupt movements characteristic of PD. Then, in the frequency domain, the average frequency and spectral centroid reflect the smoothness and consistency of tapping acceleration. Finally, the spectral spread of tapping acceleration serves as an additional indicator of the erraticness of the tapping motion, detecting tremors.

For input into the proposed qSVM model, which is highly sensitive to feature ordering and magnitude²⁶, we multiplied each feature by its importance. We sorted them accordingly to emphasize the significance of higher-performing features. We then scaled all features by a factor of 10 to ensure that all features had a magnitude close to 1, enhancing the model’s ability to process the data effectively.

Model architecture

We used Quantum Support Vector Machines (qSVMs) due to their capacity for accurately classifying high-dimensional datasets, capturing subtle patterns that might otherwise go unnoticed, similar to those in the mPower data. Quantum SVM (qSVM) models can access high-dimensional quantum Hilbert spaces, allowing them to encode complex relationships more effectively than standard classification models. This enhanced representation often translates into higher classification accuracy, improved generalization, and more efficient computations. As a result, qSVMs frequently outperform conventional SVM methods, especially for intricate classification tasks²⁷. For the mPower data, which includes diverse and complex biomarkers like voice, gait, tapping, and demographic features, qSVMs can leverage quantum feature mapping to capture subtle, non-linear interactions between these heterogeneous variables. Researchers have increasingly applied qSVMs in clinical diagnosis²⁸. However, quantum computing in the current noisy-intermediate scale quantum (NISQ) era remains costly, time-intensive, and error-prone²⁹. To address these challenges, our study introduces a quantum-inspired kernel architecture that we simulate on classical hardware, while still outperforming traditional models. However, unlike many current qSVM kernels, our model does not rely on entanglement, which is challenging to simulate classically; instead, it uses dynamic angle embedding³⁰ to capture complex data patterns without the overhead of full quantum computation.

Evaluation and comparative analysis

Once we constructed the custom qSVM architecture, we trained the model on our dataframe of 194 samples. Afterward, we compared the accuracy, ROC/AUC score, recall/sensitivity, specificity, and precision to current state-of-the-art models in the field to demonstrate the viability of our approach. These benchmarks collectively provide a comprehensive evaluation of the model’s diagnostic capabilities: accuracy measures overall correctness, ROC/AUC balances true- and false-positive rates, recall (sensitivity) quantifies the ability to detect actual PD cases, specificity ensures minimal false alarms among healthy individuals, and precision assesses correctness among predicted positives. Benchmarking against established state-of-the-art methods, we provide evidence of the proposed model’s viability and robustness across multiple clinical performance criteria. These included architectures that have been explored for medical applications in the past, such as neural networks, SVM and qSVM models, and random forests. The comparative results are displayed in Table 1.

Table 1 Performance of various models across accuracy, ROC/AUC, precision, and recall.

Table 2 This confusion matrix summarizes the classification performance of the proposed model on the 30 test subjects.

Due to the extensive feature set and the high data complexity, models that incorporated strong overfitting protections generally performed better (Table 1). The corresponding confusion matrix for the proposed model is presented in Table 2. Among the classical algorithms, logistic regression and the linear SVM demonstrated the best performance. By contrast, complex neural networks tended to overfit, reducing their accuracy. Among alternative qSVMs, the entanglement-heavy ZZ feature map performed poorly, likely because classical simulators cannot emulate entanglement accurately. The Z feature map without entanglement performed better but lacked the complexity necessary to capture the full dimensionality of the data, as reflected in its lower metrics. The proposed kernel, by using quantum rotation gates to encode features into a complex quantum state without requiring entanglement, achieved the highest performance across accuracy, ROC/AUC score and F1 score (Fig. 2), emphasizing its potential as a baseline for future clinical applications.

We incorporated statistical tests including McNemar’s tests on classification outcomes, which yielded p-values ranging from 0.00049 to 0.0625, indicating that our qSVM model significantly outperforms classical and deep learning benchmarks. The low p-values (mostly < 0.05) confirm that the performance differences are unlikely due to chance, reinforcing the robustness of our model’s 90% accuracy. This demonstrates that qSVM effectively captures complex, nonlinear relationships in the data, improving classification performance over traditional methods. The results validate our approach, showing the model’s superior predictive power.

Feature importance in classification

Figure 3 displays the Shapley value plot, illustrating the relative importance of each feature in the model. As anticipated, age emerges as the most influential factor, exerting a significant effect across the predictive spectrum. Several tapping-related features—such as tap acceleration standard deviation (x, y, z), spectral centroid, and spectral spread—play significant roles, indicating that variability and frequency components of tapping movements contribute meaningfully to classification. Additionally, measures of tap consistency, total taps, and left/right taps show notable importance, reinforcing the relevance of motor coordination. Gait acceleration metrics, including root mean square (z) and standard deviation (z), further underscore the significance of movement-related biomarkers. Overall, these findings highlight the importance of motor-focused data, notably tapping dynamics and gait accelerations, as critical indicators for predicting Parkinson’s disease.