A fused weighted federated learning-based adaptive approach for early-stage drug prediction

admin

10 months ago

A fused weighted federated learning-based adaptive approach for early-stage drug prediction

The primary concern of machine vision in assisting physicians and patients is the early-stage prediction of drugs and monitoring to aid patients. This research demonstrates how machine vision is widely employed in various industries due to rapid technological advancements. An ML technique is employed in this research to predict the onset of drugs early in the disease, providing precise outcomes.

The core novelty of the proposed FWAFL framework lies in its fusion of three key components:

Weighted aggregation: Instead of equal averaging, client contributions are weighted based on validation accuracy and data distribution entropy.
Adaptive Learning: Each client adjusts its learning parameters during training based on local gradient feedback and performance trends.
Fusion Layer: A meta-fusion layer aggregates weighted client models not just at the parameter level but also at the gradient distribution level, enhancing convergence in heterogeneous data conditions.

Figure 1 illustrates the schematic diagram of the proposed Fused Weighted Adaptive Federated Learning (FWAFL) model for early-stage drug prediction in a federated learning setting. There are several local clients on the left, and they have private datasets; local decentralization of sensitive patient data is guaranteed. These clients train their local models locally with local weights Wi, ViW_i, V_iWi, Vi, and do not share the raw data. The trained local models communicate with a central cloud aggregation server, known as a Fusion Layer, which integrates the local weights it receives. This layer introduces a client-wise adaptive weighting scheme, considering local reliability (measured by local accuracy and entropy), with stronger weightage given to more reliable clients concerning the global model. An adaptive learning rate is used to customize updates for each client in accordance with local gradient dynamics and training patterns. The contribution from each image is weighted and integrated into the structure of a robust Global Model by the weight computation block, with subsequent iterative refinement . The last global model is for Drug Prediction, which provides precise and privacy-protected decision support for recommending early disease treatment. This architecture demonstrates the model’s capability for scalability, privacy protection, and application in the heterogeneous data environment inherent in real-world clinical practice.

The architecture uses 6 hidden layers (using empirical benchmarking on the UCI drug review dataaset, it is suggested that deeper architectures than 4 layers results in better generalization performance, without causing significant overfitting). A depth of six layers performed best in terms of the trade-off between speed of convergence and performance. The sigmoid activation function was used, since the output value is smooth and ranges in [0, 1], fitting better with the classification output and the binary nature of the classification problem (drug present/absent). ReLU-type activations are popular, but their unbounded output ranges can be unstable in the federated setting when using small or noisy local data.

The weight aggregation functions are designed by following a strategy of weighted averaging, inspired by FedAvg, yet augmented with client individual trust scores. These scores are derived from local validation performance and entropy of the prediction distribution, in a way that clients with more informative or balanced data contribute more to the global model. Such dynamic adaptation results in faster convergence and overall improved generalization in heterogeneous, non-IID environments, which is encouraged by previous works on adaptive federated optimization.

The accomplished models uploaded to the cloud are then aggregated to develop an efficient and intelligent model. After aggregation, the model is updated with ‘n’ modules to predict diseases more robustly and efficiently in patients. The pseudocode for the server end is given in (Table 1).

Table 1 Proposed pseudocode (server end).

where k is the place of feature’s importance in the dataset, and the Equations: P(LC|X) = Σ[w(G, fml)^k * v(G, fml)^k] and P(G|LC) = Σ[w(G, fml)^k * v(G, fml)^k], thus, by calculating the optimal weight of w(G, fml)^k and v(G, fml)^k in both of the equations it is possible to predict the probability of occurrence of LC given input features (X) and likelihood of a gene (G) be found mutated (fml), thus aiding in more accurate prediction of lung cancer.

As depicted in Table 2, each of the neural network models proposed by the clients includes six hidden layers in addition to the input and output layers. A sigmoid activation function is used in all neurons present in the hidden layer. Consequently, the proposed system, based on this suggested model, can be characterised as follows:

$$\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{\varvec{j}}^{\varvec{l}}}=\frac{1}{1+{\varvec{e}}^{-({b}_{\text{y}}+\sum\:_{\varvec{i}=1}^{\varvec{m}}\left({\varvec{\omega\:}}_{\varvec{i}\varvec{j}}\text{*}{\varvec{r}}_{\varvec{i}}\right))}}\:\:\:\:\:\:where\:j=\text{1,2},\text{3,4},5..n$$

(1)

The given equation contains input data described by a variable r_i and a bias term b_1. It shows that the given multilayer perceptron has m input neurons and j hidden layer neurons.

The output layer activation function is given below;

$$\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{k}=\frac{1}{1+{e}^{-{(b}_{2}+\sum\:_{j=1}^{n}\left({\text{v}}_{{jk}^{y}}\text{*}{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}\right)}}\:\:\:\:\:\:where\:k=\text{1,2},3..r$$

(2)

Variable $\:y$ is showing hidden layers⁴¹;

$$\:Z=\frac{1}{2}\sum\:_{k}{\left({\tau\:}_{k}-{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y=6}}\right)}^{2}$$

(3)

In the above equation, Z shows the backpropagation error, where the other variables are the expected and predicted outputs, respectively.

In Eq. (4), the is structured as follows:

$$\:\varDelta\:W\:\propto\:\:-\frac{\partial\:E}{\partial\:W}$$

$$\:\varDelta\:{\upsilon\:}_{{j,k}^{y}}=-\:\epsilon\frac{\partial\:E}{\partial\:{\nu\:}_{{j,k}^{y}}}$$

(4)

The computation of the hidden layer activations using the sigmoid function is the first step in our model, as formulated in Eq. (1). This non-linear activation function applies a non-linear transformation to the weighted sum of the model’s inputs. These activations are then transmitted to the output layer, where a further sigmoid function, as described in Eq. (2), is used to make the final prediction using the weighted inputs from the first layer. For objective measuring of the prediction error, it employs the MSE loss function in Eq. (3) to compute the difference between the predicted and true target values. Finally, weights are updated using gradient descent, as shown in Eq. (4), where a single weight is adjusted based on its contribution to the error and the learning rate. This series of computations enables the model to gradually improve its predictions during training.

Table 2 Proposed model pseudocode.

Then, it can be comprised of;

$$\:\varDelta\:{\upsilon\:}_{{j,k}^{y}}=-\:\epsilon\:\frac{\partial\:E}{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}}\times\:\frac{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}}{\partial\:{\nu\:}_{{j,k}^{y}}}$$

(5)

Then, it can be converted to Eq. (6).

$$\:\varDelta\:{\upsilon\:}_{{j,k}^{y}}=\:\epsilon\left({\tau\:}_{k}-{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}\right)\times\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}(1-{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}})\times\:\left({\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}\right)\varDelta\:{\upsilon\:}_{{j,k}^{y}}=\epsilon{\xi\:}_{k}{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}$$

(6)

Where,

$$\:{\xi\:}_{k}=\left({\tau\:}_{k}-{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}\right)\times\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}\left(1-{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}\right)$$

It can be converted using the power of the chain rule. The update rule for the output layer weights is more sophisticated for Eq. (5), which is derived from the chain rule in calculus. It represents the weight update as the product of the error derivative and the response to the output, and the derivative of the activation function with respect to the weight being updated. This form allows us to track how the prediction error propagates back through the network during training.

This expression is developed into a more physically meaningful representation in Eq. (6). This equation states that the weight update is a function of the difference between the actual and predicted outputs, the derivative of the sigmoid activation function at the output, and the activation value of the hidden layer one step back. These terms are multiplied and used to determine how much each weight should change. The equation is then rewritten in a simplified form after the introduction of a temporary variable to denote the gradient at the output neuron, which, when multiplied by the hidden layer activation, gives the weight update. This strategy utilizes the chain rule to decompose complex gradients into more tractable parts, facilitating learning through the layers of the network.

$$\:\varDelta\:{\omega\:}_{i,j}\propto\:-\:\:\left[\sum\:_{k}\frac{\partial\:E}{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}}\times\:\frac{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}}{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}}\right]\times\:\frac{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}}{\partial\:{\omega\:}_{i,j}}$$

$$\:\varDelta\:{\omega\:}_{i,j}=-\:\epsilon\:\left[\sum\:_{k}\frac{\partial\:E}{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}}\times\:\frac{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{{k}^{y}}}{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}}\right]\times\:\frac{\partial\:{\text{\text{{\rm d}\kern-2.5pt{\rm b}}}}_{j}}{\partial\:{\omega\:}_{i,j}}$$

In the below eq, $\:\varvec{\epsilon}$ symbolises the constant,

Further, it can be written as

$$\:\varDelta\:{\omega\:}_{i,j}=\:\epsilon{\xi\:}_{j}\:{\alpha\:}_{i}$$

(7)

$$\:{\nu\:}_{{j,k}^{y}}^{+}={\nu\:}_{{j,k}^{y}}+\lambda\:\:\varDelta\:{\upsilon\:}_{{j,k}^{y}}$$

(8)

The way input-to-hidden layer weights are updated is shown in Eq. (7) via the backpropagation process with a chained connection. It estimates the contribution of each input neuron to the final prediction error by tracing back the error to the network. The weight update is then the product of the propagated error and the activation of the corresponding input feature. Equation (8) applies this update rule, showing how each weight is updated. The new weight is determined by adding the previously calculated weight change, which is multiplied by a learning rate factor, to the current weight. This constitutes a complete iteration of learning, where the network continues to learn, reducing the validation error of the prediction in each subsequent learning round. The aforementioned equation is applied to fine-tune the weights that link the output layer to the hidden layer within a neural network, thereby enabling the model to continually develop its predictive prowess. In parallel, the following equation helps determine the loads connecting the input cover to the hidden layer, thereby enhancing the network’s ability to recognize complex relationships within the given dataset.

$$\:{w}_{i,k}^{+}={w}_{i,j}+\lambda\:\:\varDelta\:{w}_{i,j}$$

(9)

The last step of the update of the weights is given by Eq. 9. It can be seen, that the new weight is the sum of the calculated weight change and the old weight scaled by a momentum. It serves to speed up convergence and to make updates less jagged by including a fraction of the previous weight change. It is applied for avoiding oscillations in the training and helping to get an optimal solution for the model quickly.

During the validation stage of the healthcare application, input values are carefully and deliberately obtained from the patient. The pre-developed and refined conceptual model is then selectively retrieved from the cloud, where the value inputs are computed. The primary function of the model is to diagnose potential initial disease states in a patient based on the input data. At the end of the analysis, the system concludes that either disease is present or not in the patient. Suppose the response from the system is positive, meaning a disease is present. In that case, the system sends an appropriately worded notification to the user, correcting and informing the user that a disease has been diagnosed. On the other hand, if the response received is negative, there is no disease. The entire process is meticulously reviewed, and the application proceeds to the next level or receives corresponding feedback.

Dataset description

The data source for this study is the publicly available Drug Review Dataset from the UCI Machine Learning Repository⁷. It is a set of patient-generated, tabular medical data that contains reviews, as well as drug names mentioned by users, user ratings (1–10), and medical conditions. Here, we concentrate on well formatted features such as:

Condition (categorical; for example, depression, diabetes).
Drug Name (categorical).
Review Text Sentiment Score (numeric, scaled from raw text).
Rating (numerical).
Useful Count (numerical).

The preprocessing of the dataset is as follows:

Text fields were cleaned (lowercased and punctuation removed) and sentiment-scored using VADER.
Categorical variables were converted to one hot encoding.
Data min–max scaling was applied to all numerical variables.
The data set was filtered for NA (approximately 2.3% of observations).

The cleaned dataset ultimately contains 50,654 samples, and the binary class labels according to the definition were:

Then, the dataset was divided into the training (70% of data) and validation (30% of data) set, keeping the class distribution (stratified split). We used this binary formulation to train a classifier to estimate the effectiveness of a particular drug based on patient reviews and metadata associated with it.

link