A machine learning approach for corrosion rate modeling in Patna water distribution network of Bihar

Table of Contents

Physicochemical parameters analysis

The results of the statistical analysis of the physicochemical parameters are shown in Table 1. The water distribution network regularly monitored the water quality during the coupon analysis period. Table 1 displays the physicochemical parameters of the flowing water. Since pH is critical at every stage of the water supply, it is a critical measurement⁶⁵. Water (pH > 7) was found to be naturally alkaline during the coupon analysis. The observed pH value was found to be within the BIS standard range of 6.5–8.5⁶⁶. The temperature increased from winter to summer and fluctuated between 23 and 28°C. BIS⁶⁶ does not specify a conductivity limit. Conductivity was observed to remain within the WHO standard of 1500 mg/L⁶⁷. The values of alkalinity, TDS, calcium hardness, magnesium hardness, and total hardness sometimes fall within the prescribed values but occasionally exceed the BIS limits. These prescribed limits are 200 mg/L for alkalinity, 500 mg/L for TDS, 75 mg/L for calcium hardness, 30 mg/L for magnesium hardness, and 200 mg/L for total hardness, although at times, they are within the acceptable range⁶⁶. Alkalinity and total hardness frequently surpassed BIS⁶⁶ standards. Alkalinity quantifies the concentrations of ${HCO}_{3}^{-}$, ${CO}_{3}^{-}$, and ${OH}^{-}$ in water. Insufficient alkalinity can accelerate the deterioration of plumbing systems and increase the risk of heavy metal contamination from pipes and plumbing fixtures⁶⁸. Hardness categorizes the water as slightly to moderately hard (soft water < 75 mg/L as CaCO₃, slightly hard water 75–150 mg/L as CaCO₃, moderately hard water 150–300 mg/L as CaCO₃, and very hard water > 300 mg/L as CaCO₃)⁵⁰. The sulfate and chloride levels consistently adhered to the prescribed standards of 200 mg/L and 250 mg/L, respectively, without exceeding the limits at any point⁶⁶. The observed nitrate value was found to be within the BIS standard range of 45 mg/L⁶⁶. The dissolved oxygen values are often less than 5 mg/L. The DO limit for drinking water was not defined in the BIS⁶⁶. The Indian Council of Medical Research recommended that it be above 5 mg/L⁶⁹. The average corrosion rate was 1.70 MPY.

Corrosion rate analysis

The coupons were inserted into the water distribution pipe and periodically removed, revealing that the surface of the GI plate gradually developed red spots, indicating a scaling phenomenon, as shown in Fig. 6a. After the coupons were cleaned, surface erosion due to corrosion was observed, as shown in Fig. 6b, indicating a corrosion phenomenon. This means scaling and corrosion, which cooccur in the water distribution pipes. The kinetic analysis of the corrosion rate and scaling percentage of the galvanized iron plates over 315 days is shown in Fig. 6c and d. The WDN was directly monitored for this duration. Figure 6d shows that the corrosion rate increased for the first 90 days, then decreased until the 225th day, rose again until the 270th day, and then decreased by the 315th day. Figure 6c shows that the scaling rate initially increased to 180 days, decreased until the 225th day, and increased again until the 315th day.

According to some studies, mineral films are deposited on internal pipe surfaces by scaling with water to help prevent the corrosion of metallic surfaces⁷⁰. Owing to the alkaline nature of water, scaling occurred slowly when the coupons were inserted, and the corrosion reaction rate was high because of the clean surface of the coupons. This increased the corrosion rate up to the 90th day, as shown in Fig. 6d. Between days 90 and 225, the corrosion rate decreased as the higher percentage of scale deposition on the coupon surface protected against corrosion. After its thickness increased, the scales deposited on the coupon surface did not withstand the water pressure, resulting in some of the deposited scales being washed away with water flow. This decreased scale percentage during 180 and 255 days, as shown in Fig. 6c. Consequently, less scale remained on the GI plate surface. As the scales, which act as corrosion inhibitors, decreased, the corrosion rates increased again between days 225 and 270, as depicted in Fig. 6d. The scale deposition on the galvanized iron plate surface subsequently increased between days 255 and 315 (Fig. 6c), leading to enhanced inhibitory behavior and a decrease in the corrosion rate between days 270 and 315, as illustrated in Fig. 6d.

SEM analysis and EDX analysis

Scanning electron microscopy (SEM) creates images by scanning a sample with a focused beam of electrons. When these electrons interact with the atoms in a sample, they produce various signals that provide information about its surface topography⁷¹. Figure 7 displays the SEM and EDX analyses of the GI samples after 15 and 90 days of exposure. The surface morphology shows fewer cavities after 15 days of exposure, as observed at 5000X magnification and a size of 10 µm under high vacuum mode. In contrast, the surface morphology after 90 days of exposure reveals more cavities at the same magnification and size, indicating the disappearance of the zinc coating from the coupon surface.

Energy dispersive X-ray (EDX) is a technique for analyzing the percentage of chemical composition of materials. When electromagnetic radiation strikes a material, it emits X-rays, which are then analyzed via the EDX method⁷². In this study, the samples were first cleaned with appropriate reagents. After that, EDX analysis was conducted. The results of the EDX analysis for galvanized iron after 15 days of exposure revealed the following composition by weight: 4% carbon (C), 20% oxygen (O), 3% sulfur (S), 3% manganese (Mn), 21% iron (Fe), and 49% zinc (Zn) (Fig. 7a). The high percentage of zinc indicates the presence of a zinc coating on the iron surface.

Figure 7b shows the EDX analysis of the GI plate sample after 90 days of exposure to the WDN. The EDX analysis revealed that the composition of the galvanized iron was as follows: 3% carbon (C), 31% oxygen (O), 2% sulfur (S), 1% manganese (Mn), 60% iron (Fe), and 1% zinc (Zn) by weight. The high percentages of iron and oxygen indicate that the zinc coating has diminished. A comparison of the composition before and after exposure revealed that the carbon content decreased from 4 to 3%, the oxygen content increased from 20 to 31%, the iron content increased from 21 to 60%, the sulfur content decreased from 3 to 2%, the manganese content decreased from 3 to 1%, and the zinc content decreased from 49 to 1% by weight.

XRD analysis

X-ray diffraction (XRD) is a nondestructive testing method. It is used to analyze the structure of crystalline materials. X-ray diffraction (XRD) can provide more information about the chemical composition and compounds of a material by identifying its crystal phases⁷³. Before analysis, any byproducts were carefully removed from the sample surface. XRD analysis of the scale deposited on the coupon surface (iron rust) is shown in Fig. 8. The results revealed the presence of chemical compounds, such as calcium silicate (Ca₂SiO₄), magnesium, iron chloride (FeCl₃), and silicon oxide (SiO₂), in the iron rust. The presence of ions in water contributes to the formation of these compounds and the hardness and alkalinity of the water. Calcium silicate, also called slag, is formed from the reaction between silicon dioxide (SiO₂) and calcium carbonate (CaCO₃)⁷⁴. The presence of iron chloride in the iron rust is evidence of a chemical reaction between the iron coupon and chloride in the WDN. The silicon dioxide in iron rust may be produced by the reaction of silicon tetrachloride (SiCl₄) with the physicochemical parameters of water. XRD analysis of the rust powder deposited on the sample also revealed the presence of compounds such as zinc sulfate hydroxide (ZnSO₄(OH)₆), zinc oxide (ZnO), zinc sulfate (ZnSO₄), and zinc chloride (ZnCl₂). Zinc sulfate hydroxide is the dry form of osakaite (Zn₄SO₄(OH)₆·5H₂O), and zinc oxide, known as zincite, is an insoluble byproduct, whereas zinc oxide and zinc sulfate are soluble^75,76. The formation of these byproducts results from the reaction between zinc in the galvanized iron sample and the supply water, which is responsible for the disappearance of the zinc coating.

Model configuration

In machine learning, hyperparameters are essential because they affect training efficiency, model performance, and complexity. In this study, fivefold cross-validation was applied to ensure accuracy, overfitting, and convergence by adjusting the hyperparameters adequately. This paper employs GMDH, MPMR, and MARS to predict corrosion rates in water pipelines. In the forward stage of MARS model training, BFs are generated, and potential knots are selected to increase model accuracy. This process evaluates all possible BFs, incorporating terms that minimize training error, with coefficients estimated using the least-squares method. The procedure terminates upon reaching the predefined number of BFs. However, initially considering all the input variables may cause overfitting. To mitigate this, the backward stage employs pruning based on the GCV criterion, as expressed in Eq. (8). For the GMDH and MPMR models, cross-validation was used to optimize model complexity and hyperparameter optimization and prevent overfitting. The GMDH model training was conducted with the following parameters: an α of 0.6, a maximum of 4 layers, a maximum of 15 neurons per layer, and an 85% utilization rate for training data. Layer 1, which had three neurons and a minimum error of 0.0853, was the first layer of the model to be trained. For Layer 2, the number of neurons decreased to 2, which led to a lower minimum error of 0.0457. Layer 3, which had only one neuron, reported the lowest error, 0.0413. In the MPMR trained under the trial-and-error method, the value of the noise component (ε) is 0.002. The RBF kernel has parameter values of 1 and 6. The optimal MARS model can be obtained via basis functions (BFs). The equation for prediction uses 18 basis functions, as shown in Table 5.

$$\begin{aligned} y = & 0.812 – 0.406 \times BF1 – 3.284 \times BF2 – 0.957 \times BF3 \\ & \; – 0.295 \times BF4 + 2.176 \times BF5 + 1.758 \times BF6 – 4.326 \times BF7 \\ & \; + 1.034 \times BF8 + 3.662 \times BF9 – 3.571 \times BF10 – 0.834 \times BF11 \\ & \; + 1.657 \times BF12 + 4.816 \times BF13 – 1.085 \times BF14 – 0.270 \times BF15 \\ & \; – 0.760 \times BF16 – 4.587 \times BF17 – 3.913 \times BF18 \\ \end{aligned}$$

(15)

Table 5 Equation of the proposed model.

Performance comparison of the proposed models

The performance metrics for the three models, MARS, GMDH, and MPMR, are given in Table 6. The performance metrics for the three models, MARS, GMDH, and MPMR, demonstrate that MARS consistently outperforms the other two models across the training and test datasets. MARS achieves the highest R² values (0.9872 for training and 0.9741 for testing), indicating good predictive accuracy. Additionally, MARS yields the lowest WMAPE, RMSE, and MAE values, suggesting that it provides the most accurate and reliable predictions with minimal errors. In contrast, MPMR has the lowest performance, with the lowest R² and NS values and the highest WMAPE and RMSE values, indicating that its predictions are less accurate and more prone to error. The GMDH model, while better than MPMR, still falls short of MARS, with moderate performance across the metrics. Overall, MARS is the most robust model, offering high accuracy, minimal error, and better reliability in prediction, whereas GMDH and MPMR lag are the least effective.

Table 6 Average performance analysis of the proposed models.

Figure 9 presents a model performance error matrix for corrosion rate prediction. The color scale, ranging from green to red, visually represents performance, where green indicates lower errors (better performance) and red signifies higher errors (poor performance). The error matrix demonstrates that MARS outperforms GMDH and MPMR in terms of WMAPE, RMSE, MAE, PI, WI, VAF, and NS, resulting in lower prediction errors and higher accuracy. GMDH delivers moderate performance, balancing MARS and MPMR. MARS achieves a near-perfect Willmott index (WI) during training (0% error), with only a slight increase to 1% error in testing, indicating that its predictions closely follow actual values. However, all the models present high U95 error values (21–22%), suggesting prediction variability. The A20 index further highlights MARS’s superior accuracy, particularly in scenarios with lower errors.

Regression curve

This regression curve compares fivefold training and testing models, and maximums are chosen for comparison. The MARS model demonstrates the best performance during training in fold 4, with an R² of 0.9962. However, its performance slightly decreases during testing in fold 2, with an R² of 0.9941. The GMDH model, on the other hand, shows more significant errors during training in fold 2, with an R² of 0.9822, and its performance further deteriorates during testing, resulting in an R² of 0.9825. The MPMR model falls between MARS and GMDH, with an R² of 0.9742 during training in fold two and a decrease in performance during testing in fold 3, with an R² of 0.9702. The GMDH and MPMR models yield more scattered results than the MARS model does. In the case of MARS, most data points align closely with the trend line. MARS consistently outperforms the other two models in the training and testing phases. However, all the models exhibit some degree of overfitting, as indicated by their poorer performance on the testing set than on the training set. A comparison of the performance of three models, MARS, GMDH, and MPMR, in predicting corrosion rates via scatter plots is shown in the supplementary file in Fig. 2S.

Model selection using a ranking technique

Rank analysis is essential for comparing and evaluating the three proposed models. Using performance metrics, these models are ranked to identify the best predictors of corrosion rates. During both the training and testing phases, each model is scored on the basis of its performance and error metrics⁷⁷. They finally used the following mathematical expression presented in Eq. (16). The scores for each parameter obtained for training and testing are separately added.

$$Total score = \mathop \sum \limits_{i = 1}^{n} Score_{TR} + \mathop \sum \limits_{i = 1}^{n} Score_{TS}$$

(16)

Score_TR represents the score assigned to the training set, and Score TS represents the score assigned to the testing set. In rank analysis, the number of parameters is denoted by ‘n’. In this case, n = 10.

The outcome of the comprehensive ranking analysis, shown in Table 7, reveals noteworthy insights. Among all the models, the MARS model achieved the highest score, scoring 60 out of 60. The GMDH model subsequently achieves the second position with a score of 40, followed by the MPMR model, which achieves the third position with a score of 20.

Table 7 Model ranking based on score analysis.

Regression error characteristic (REC) curve

The model’s performance was evaluated via the area under the curve (AUC) and area over the curve (AOC) of the REC curve. It is a graphical representation of the performance of a binary classification model as the classification threshold. An ideal model is represented near the top left corner of the ROC plot (Fig. 10a and b). During the training phase, MARS has the most significant area under the curve, followed by GMDH. Thus, MPMR suggested that MARS is more effective in distinguishing between positive and negative instances in the training data. A similar trend is observed in the testing phase, where MARS has the most significant AUC again, implying that it may generalize better to unseen data than the other models do. For a more accurate comparison, exact area-over-the-curve (AOC) values are presented in Fig. 10(c); the AOC values, which presumably indicate better performance when higher, reveal several vital observations: MPMR consistently has the highest AOC values on both the (AOC = 0.0541) in training and (AOC = 0.074) in testing sets, whereas MPMR has a significant difference between (AOC = 0.010) in training and (AOC = 0.0147) in testing AOC, suggesting that MARS obtained better prediction accuracy. The MARS and GMDH models display relatively consistent performance between the two datasets. On the basis of visual representation, MARS appears to outperform GMDH and MPMR in both the training and testing phases.

Comprehensive measure (COM) analysis

Machine learning models are ranked via a comprehensive measure (COM) that is based on multiple performance metrics. Often, a single performance metric may not accurately represent the best-performing model. Therefore, the COM method was employed in this study to offer a holistic evaluation by simultaneously incorporating R², RMSE, and MAPE metrics into a more effective and valuable parameter. In this method, individual metrics are compared more comprehensively, offering a more comprehensive understanding of the performance of a model. The COM is calculated using the following equation:

$$COM = \left( {\frac{1}{3}\frac{{RMSE_{Training} \times MAPE_{Training} }}{{R_{Training}^{2} }}} \right) + \left( {\frac{2}{3}\frac{{RMSE_{Testing} \times MAPE_{Testing} }}{{R_{Testing}^{2} }}} \right)$$

(17)

Their weights are assigned for each metric based on their importance to the prediction output, with a distribution of 1/3 for the training phase and 2/3 for the testing phase. This distribution was chosen to gain significant insights into the model’s generalizability, ensuring accurate and reliable results. A lower COM value indicates better overall model performance⁷⁷. Table 8 summarizes the results of the COM analysis. The MARS model achieved the lowest COM value (0.172), indicating superior performance. It was followed by GMDH (COM = 0.374) and MPMR (COM = 0.451). Thus, the MARS model outperforms the MPMR and GMDH models.

Table 8 Model ranking on the basis of the COM result.

Wilcoxon signed ranks test

The Wilcoxon signed-rank test is a nonparametric statistical method used to compare two machine learning (ML) models and assess whether their medians differ. This test is an alternative to paired t tests when the data deviate from a normal distribution. It relies on ranking the differences between paired observations⁷⁸. Table 9 indicates that the number of positive ranks and the sum of ranks are greater for positive differences, suggesting that the MARS model outperforms GMDH and MPMR in the training phase. However, the minor difference between positive and negative ranks in the MPMR-GMDH comparison suggests closer competition between these two models during training.

Table 9 Wilcoxon signed rank analysis for the training dataset.

Table 10 shows a statistically significant difference between GMDH and MARS, with a p value of 0.043 (< 0.05) at the 5% significance level. The negative z value indicates that MARS outperforms GMDH. Similarly, the comparison between the MPMR and MARS yields a p value of 0.011 (< 0.05), signifying a statistically significant difference. The negative z value further confirms that MARS outperforms MPMR in terms of performance.

Table 10 Statistical analysis of the Wilcoxon signed-rank test for the training dataset.

Whenever MPMR-GMDH is used, the p value of 0.996 (≫ 0.05) and a z value close to zero indicate no statistically significant difference between MPMR and GMDH. The findings confirm that MARS significantly outperforms GMDH and MPMR, as evidenced by the statistically significant p values in the training phase.

Similarly, Table 11 presents the results for the test dataset, indicating that the sum of positive ranks is higher than that of negative ranks. This confirms that MARS outperforms both GMDH and MPMR in the test phase. However, GMDH and MPMR slightly differ in terms of rank. While MARS remains the superior model, the performance gap between MARS and the MPMR is narrower in the test dataset than in the training dataset. This reduction in performance difference may be attributed to the smaller dataset used in the test phase compared with the training dataset.

Table 11 Wilcoxon signed rank analysis for the testing dataset.

The comparisons between GMDH-MARS and MPMR-MARS revealed that the p values (0.063 and 0.058) were slightly greater than the 0.05 threshold (Table 12). This indicates that the differences are not statistically significant at the 5% level. However, at the 10% significance level, MARS can be considered modestly superior to GMDH and MPMR. Conversely, for the MPMR-GMDH comparison, the p value (0.675 > > 0.05) confirmed that there was no statistically significant difference between these two models. This suggests that MPMR and GMDH exhibit very similar performances. On the basis of the Wilcoxon signed-rank analysis conducted for both the training and testing phases, MARS consistently achieves the strongest performance among the models evaluated.

Table 12 Statistical analysis of the Wilcoxon signed-rank test for the testing dataset.

Bland‒Altman plots

The Bland‒Altman plot is a visual tool used to assess the agreement between two machine learning (ML) models by plotting their average predictions against their differences⁷⁹. Figure 11 shows the Bland‒Altman analysis comparing MARS versus GMDH and MARS versus MPMR for the training and testing datasets. These plots provide insights into model agreement by visualizing the mean differences and limits of agreement (LOA). The x-axis represents the average predictions of the two models, whereas the y-axis represents the difference between their predictions. The red line indicates the mean difference between the models, highlighting any systematic bias, whereas the blue lines represent the range where most differences lie. The mean difference (red line) is closer to zero in both MARS versus GMDH and MARS versus MPMR, suggesting minimal systematic bias in the training dataset (Fig. 11a and b). Most data points fall within the LOA, indicating strong overall agreement. However, some outliers, particularly at lower and higher average values, suggest slight variability between the MARS and GMDH models. Additionally, the difference spread appears slightly wider in MARS versus MPMR, implying that these two models exhibit more significant variation in their training predictions than MARS versus GMDH does.

Similarly, the test dataset reveals that the average difference remains near zero, indicating no significant systematic bias (Fig. 11c and d). The spread and range of disagreements appear narrower than those in the training dataset, suggesting improved agreement during testing. While outliers are present, they are less pronounced than they are in the training dataset. The slight average difference confirms the absence of substantial bias between the models. However, some data points exceed the limits of agreement, indicating instances where MPMR and MARS exhibit notable discrepancies. Overall, all the models demonstrate minimal average differences, suggesting that none of the models consistently outperform or underperform relative to the others. The differences tend to be slightly more prominent in the training dataset than in the testing dataset, implying better generalizability and robustness in the testing phase.

SHapley additive exPlanations (SHAP) analysis

SHAP analysis provides a comprehensive interpretation of the significance of input variables in predicting corrosion rates, offering valuable insights into their contributions^80,81. The average SHAP values, as illustrated in Fig. 12a and b, quantify how each parameter influences the model’s predictions, helping to identify the key factors driving corrosion in a running water distribution system. Time emerges as the most significant predictor of all the features, with a SHAP impact value of + 0.27, indicating that more prolonged exposure leads to increased corrosion rates. This finding aligns with the fundamental principle that protracted interactions with water and environmental conditions accelerate material deterioration. Other crucial factors influencing the corrosion rate include magnesium hardness (+ 0.08), nitrate content (+ 0.07), and total hardness (+ 0.05). Higher magnesium hardness contributes to scale formation, which can either protect against or accelerate localized corrosion. Similarly, elevated nitrate concentrations promote oxidation reactions, potentially increasing corrosion activity, whereas total hardness, influenced by calcium and magnesium ions, affects both scaling tendencies and overall corrosion dynamics. Additional significant contributors include temperature (+ 0.05), calcium hardness (+ 0.03), alkalinity (+ 0.02), dissolved oxygen (DO) (+ 0.02), and chloride content, all of which play critical roles in shaping corrosion behavior. Higher temperatures generally increase corrosion reaction rates by increasing solubility and ion mobility, whereas calcium hardness influences the formation of protective layers on metal surfaces. Moreover, alkalinity and DO facilitate oxidation reactions, accelerating metal deterioration, and the chloride content is well known for promoting pitting corrosion in metal pipelines.

Furthermore, moderate effects were observed for pH (+ 0.02), sulfate content (+ 0.01), electrical conductivity (+ 0.01), and total dissolved solids (TDS) (+ 0.01). Although pH impacts corrosion potential, its influence may be moderated by alkalinity and hardness. Sulfates contribute to localized corrosion, whereas conductivity and TDS indicate the concentration of dissolved ions, which can enhance electrochemical reactions and lead to accelerated corrosion rates. Overall, the SHAP analysis highlights the dominant role of time in corrosion progression, followed by key water chemistry parameters such as Mg hardness, nitrate, total hardness, and temperature. These insights provide valuable guidance for engineers and water resource managers in optimizing monitoring strategies and implementing targeted corrosion control measures, ultimately ensuring the longevity and safety of water distribution networks.

Experimental validation performance comparison of the proposed models

Another dataset comprising 32 experimental results predicted by experimental analysis at the Environmental Engineering Laboratory in the NITP was used. Figure 13 illustrates the performance of the MARS model on 32 additional cases used for further evaluation, which are entirely separate from the datasets employed initially for training and testing the machine learning models. The accuracy is evident, as most predictions cluster near the y = x centerline, indicating substantial agreement between the predicted and actual values. This proximity suggests that the model effectively captures the underlying patterns in the data. It was employed to validate the best-performing model for corrosion rate prediction. This distinction ensures that the models are evaluated on truly unseen data, thereby providing a more rigorous test of their generalizability. The results, illustrated in Fig. 13, demonstrate that there is a minor reduction in the R² value from the training and testing phases of the MARS model, as presented in section “Performance comparison of the proposed models”. MARS model maintains substantial accuracy on the unseen dataset, with an R² value of 0.9816, RMSE = 0.0337 and MAE = 0.0212. This slight decline in R² is within acceptable limits, underscoring the model’s robustness and reliability in generalizing new data. Consequently, the findings affirm that the proposed model retains a strong predictive capacity, even when it is applied to cases outside the original training and testing datasets.