A quantum inspired machine learning approach for multimodal Parkinson’s disease screening

Data description
We utilized the mPower public research portal, which contains measurements from over 6,000 participants—both healthy and those affected by Parkinson’s23. The dataset is available under protected access to certified researchers. The data includes common Parkinson’s disease biomarkers: demographic information, such as age, gender, and smoking history, as well as voice recordings, tapping measurements, and gait tracking, all recorded through a smartphone app. We restricted our analysis to participants who completed all the different tests (voice, tapping, gait) measured in the mPower dataset. Since many participants completed multiple iterations of the same test, we randomly selected a single trial per activity per participant to mitigate potential biases favoring those with repeated trials. We do so because, including multiple trials from the same participant could inadvertently skew model performance by overrepresenting that individual’s characteristics in the dataset. This risk of overrepresentation would make the model overly tailored to participants with more trials, diminishing its generalizability to broader patient populations24. For model training and testing, we focused on 194 participants who completed the voice, gait, and tapping tests. This subset stands out for its diversity, including male and female participants who identified as Caucasian, African American, Hispanic, East Asian, South Asian, and mixed race. The subset al.so represents a range of educational backgrounds, with 35% of participants not holding a four-year college degree. We divided these 194 samples into 164 for training and 30 for testing, ensuring a balanced representation of both Parkinson’s-affected and healthy individuals.
Feature selection
Using this dataset, we extracted 64 voice, gait, tapping, and demographic features for each of the 194 participants, balanced between healthy individuals and those with Parkinson’s. Each data modality (voice, gait, tapping, demographics) was preprocessed individually to ensure modality-specific feature extraction and noise reduction. For gait and tapping data, we extracted both time-domain metrics (e.g., root mean square, standard deviation, tapping counts) and frequency-domain features (e.g., spectral centroid, spectral spread) to capture Parkinson’s-related tremors. Vocal data consisted of 10-second “ahh” recordings, from which we derived pitch, volume, breathiness, and reduced-dimensional MFCCs, while demographic information included age, smoking history, and gender. Feature correlation was managed through random forest–based importance weighting. Additional details on feature extraction appear in the methods section. We normalized the dataframe using Scikit Learn’s StandardScaler to ensure a consistent magnitude for each feature. Next, we trained a baseline Random Forest model to identify the top-performing features for the final qSVM model, selecting features with importance values above the 80th percentile25 (Fig. 1).

Feature importance values for features above the 80th percentile in a baseline random forest model. This figure displays the feature importance values for variables ranking above the 80th percentile in a baseline random forest model. The feature importance metric quantifies each variable’s contribution to the model’s ability to predict outcomes. Age emerges as the most significant feature, with a markedly higher importance score compared to others. This indicates its dominant role in the predictive model, potentially serving as a key demographic indicator for the task at hand. Following Age, features related to motor activity, particularly tapping metrics, demonstrate high importance. These include Tap Consistency, Left Taps, Total Taps, and Right Taps, which collectively reflect fine motor coordination and variability in tapping behavior. A notable inclusion is Voice: Spectral Centroid Mean, reflecting the role of vocal biomarkers in the analysis. Figure was generated using version 3.9.2 of the Python package Matplotlib (https://pypi.org/project/matplotlib/).
Among the demographic features, age proved to be the most significant, consistent with extensive research illustrating a heightened risk of neurological disorders in older populations. In the voice analysis, the spectral centroid mean was the best predictor of Parkinson’s disease. This feature refers to the “center of mass” of a voice signal and often corresponds to how sharp or muffled the sound is, corresponding to the vocal changes observed in PD. Regarding the gait features, the root mean square and standard deviation of acceleration in the z direction had the highest feature importance. The root mean square, corresponding to the magnitude of acceleration, shows how forcefully the participants moved up and down when walking, and the standard deviation shows the variability of vertical acceleration, corresponding to “shaky motion”, a hallmark of Parkinson’s disease.
For the raw tapping information, the number of left taps, right taps, and total taps quantify how many times the participants tapped their screen in 20 s, providing a measure of their dexterity. Meanwhile, tapping consistency quantified by the standard deviation of the time between taps, shows whether they kept a consistent pace throughout the test. In addition to the raw tapping information, the tapping acceleration measurements from the participants’ smartphones were also significant. Among these features, the root mean square, or magnitude of acceleration, as well as the standard deviations of accelerations in each direction, captures abrupt movements characteristic of PD. Then, in the frequency domain, the average frequency and spectral centroid reflect the smoothness and consistency of tapping acceleration. Finally, the spectral spread of tapping acceleration serves as an additional indicator of the erraticness of the tapping motion, detecting tremors.
For input into the proposed qSVM model, which is highly sensitive to feature ordering and magnitude26, we multiplied each feature by its importance. We sorted them accordingly to emphasize the significance of higher-performing features. We then scaled all features by a factor of 10 to ensure that all features had a magnitude close to 1, enhancing the model’s ability to process the data effectively.
Model architecture
We used Quantum Support Vector Machines (qSVMs) due to their capacity for accurately classifying high-dimensional datasets, capturing subtle patterns that might otherwise go unnoticed, similar to those in the mPower data. Quantum SVM (qSVM) models can access high-dimensional quantum Hilbert spaces, allowing them to encode complex relationships more effectively than standard classification models. This enhanced representation often translates into higher classification accuracy, improved generalization, and more efficient computations. As a result, qSVMs frequently outperform conventional SVM methods, especially for intricate classification tasks27. For the mPower data, which includes diverse and complex biomarkers like voice, gait, tapping, and demographic features, qSVMs can leverage quantum feature mapping to capture subtle, non-linear interactions between these heterogeneous variables. Researchers have increasingly applied qSVMs in clinical diagnosis28. However, quantum computing in the current noisy-intermediate scale quantum (NISQ) era remains costly, time-intensive, and error-prone29. To address these challenges, our study introduces a quantum-inspired kernel architecture that we simulate on classical hardware, while still outperforming traditional models. However, unlike many current qSVM kernels, our model does not rely on entanglement, which is challenging to simulate classically; instead, it uses dynamic angle embedding30 to capture complex data patterns without the overhead of full quantum computation.
Evaluation and comparative analysis
Once we constructed the custom qSVM architecture, we trained the model on our dataframe of 194 samples. Afterward, we compared the accuracy, ROC/AUC score, recall/sensitivity, specificity, and precision to current state-of-the-art models in the field to demonstrate the viability of our approach. These benchmarks collectively provide a comprehensive evaluation of the model’s diagnostic capabilities: accuracy measures overall correctness, ROC/AUC balances true- and false-positive rates, recall (sensitivity) quantifies the ability to detect actual PD cases, specificity ensures minimal false alarms among healthy individuals, and precision assesses correctness among predicted positives. Benchmarking against established state-of-the-art methods, we provide evidence of the proposed model’s viability and robustness across multiple clinical performance criteria. These included architectures that have been explored for medical applications in the past, such as neural networks, SVM and qSVM models, and random forests. The comparative results are displayed in Table 1.
Due to the extensive feature set and the high data complexity, models that incorporated strong overfitting protections generally performed better (Table 1). The corresponding confusion matrix for the proposed model is presented in Table 2. Among the classical algorithms, logistic regression and the linear SVM demonstrated the best performance. By contrast, complex neural networks tended to overfit, reducing their accuracy. Among alternative qSVMs, the entanglement-heavy ZZ feature map performed poorly, likely because classical simulators cannot emulate entanglement accurately. The Z feature map without entanglement performed better but lacked the complexity necessary to capture the full dimensionality of the data, as reflected in its lower metrics. The proposed kernel, by using quantum rotation gates to encode features into a complex quantum state without requiring entanglement, achieved the highest performance across accuracy, ROC/AUC score and F1 score (Fig. 2), emphasizing its potential as a baseline for future clinical applications.

Radar chart for proposed model and benchmark comparison across accuracy, precision, recall, F1 score and ROC/AUC score. This figure presents a radar chart comparing the performance of the proposed model and various benchmark machine learning algorithms across five metrics: accuracy, ROC/AUC, precision, recall, and F1 score. The proposed model, represented in red, demonstrates consistently high values across all metrics. In comparison, models such as Linear SVM and Logistic Regression show moderate performance with balanced scores, while Naive Bayes and Random Forest display lower performance levels, as indicated by their proximity to the center of the chart. Neural network-based approaches, including DNN and CNN, exhibit competitive performance, with DNN achieving slightly higher precision and recall. Quantum machine learning models, such as those utilizing the Z Feature Map, perform well in certain metrics like accuracy and F1 score, though their overall scores remain distinct from the proposed model. The radar chart provides a comprehensive visual comparison of the models, highlighting variations in performance across the selected evaluation metrics. Figure was generated using version 3.9.2 of the Python package Matplotlib (https://pypi.org/project/matplotlib/).
We incorporated statistical tests including McNemar’s tests on classification outcomes, which yielded p-values ranging from 0.00049 to 0.0625, indicating that our qSVM model significantly outperforms classical and deep learning benchmarks. The low p-values (mostly < 0.05) confirm that the performance differences are unlikely due to chance, reinforcing the robustness of our model’s 90% accuracy. This demonstrates that qSVM effectively captures complex, nonlinear relationships in the data, improving classification performance over traditional methods. The results validate our approach, showing the model’s superior predictive power.
Feature importance in classification
Figure 3 displays the Shapley value plot, illustrating the relative importance of each feature in the model. As anticipated, age emerges as the most influential factor, exerting a significant effect across the predictive spectrum. Several tapping-related features—such as tap acceleration standard deviation (x, y, z), spectral centroid, and spectral spread—play significant roles, indicating that variability and frequency components of tapping movements contribute meaningfully to classification. Additionally, measures of tap consistency, total taps, and left/right taps show notable importance, reinforcing the relevance of motor coordination. Gait acceleration metrics, including root mean square (z) and standard deviation (z), further underscore the significance of movement-related biomarkers. Overall, these findings highlight the importance of motor-focused data, notably tapping dynamics and gait accelerations, as critical indicators for predicting Parkinson’s disease.

Feature importance in Parkinson’s disease detection model. SHAP values indicate feature impact on model predictions, with positive values (right) increasing disease probability and negative values (left) decreasing it. Color intensity represents relative feature importance.
link