Explainable machine learning methods for predicting electricity consumption in a long distance crude oil pipeline

Table of Contents

Evaluation of predictive performance

To evaluate the predictive performance of the proposed hybrid model, the GS-XGBoost hybrid model was compared with five other models: MLP, SVM, ELM, RF, and GBRT. Figure 6 presents the histograms of prediction error distributions for the six models. The analysis revealed that the MLP model exhibited the highest mean and standard deviation of prediction errors, followed by SVM, ELM, and RF. The GBRT model exhibited marginally higher mean and standard deviation values than the GS-XGBoost hybrid model. Compared to the other five models, the GS-XGBoost hybrid model demonstrated the smallest mean and standard deviation of prediction errors, indicating superior predictive performance.

In order to further evaluate the prediction performance of the hybrid model, comprehensive assessments were conducted using metrics such as R², MAPE, RMSE, and MAE, with the prediction results illustrated in Fig. 7. The study found that the MLP model exhibited the worst predictive performance, with R², MAPE, RMSE, and MAE values of 0.89, 9.6%, 2923.1, and 1978.5, respectively. The R², MAPE, RMSE and MAE of SVM model were 0.91, 10.3%, 2809.4 and 1906.8, respectively. Except for MAPE, other indicators of SVM were better than those of MLP model. The ELM model recorded values of 0.94, 8.5%, 2644.3, and 19745.2 for the respective metrics. The RF model achieved values of 0.95, 7.7%, 2355.7, and 1557.3. The GBRT model demonstrated values of 0.96, 6.2%, 2221.3, and 1492.5. The GS-XGBoost hybrid model outperformed all others, with values of 0.98, 4.1%, 2113.0, and 1431.9. The predictive performance of the six models improved sequentially, with GS-XGBoost hybrid model exhibiting the best fitting performance. Given the wide coverage of the dataset, these results also indicate that the GS-XGBoost hybrid model has strong potential for practical engineering applications.

In addition, after model training, it was found that the training time for SVM, ELM and RF models can all be completed within one minute, the training time for GBRT and XGBoost models can all be completed within two minutes, and the training time for MLP model is five minutes. The training time is all within an acceptable range. In the future, with the continuous improvement of computer performance, the computing time of the model can be reduced to a certain extent.

In order to further evaluate the importance of Grid Search and compare the predictive performance of XGBoost before and after adding Grid Search. The research found that without adding Grid Search, The R², MAPE, RMSE and MAE of XGBoost model were 0.92, 9.1%, 2693.7 and 1805.2 respectively. The prediction performance is worse than that of GS-XGBoost, indicating that the addition of Grid Search can effectively improve the prediction performance of XGBoost, highlighting the importance of Grid Search.

Diebold-Mariano test

DM test is a statistical method used to compare the accuracy of different prediction models, proposed by Francis Diebold and Robert Mariano in 1995³⁵. This test determines whether one model is significantly better than the other by assessing the difference in prediction error between the two models. The Diebold-Mariano (DM) test is of great significance for the evaluation of the predictive performance of the model. The core idea of DM test is to compare the loss function of prediction error, and use these differences to construct statistics for hypothesis testing. It has the advantage that it is applicable to various loss functions and does not depend on specific distribution assumptions. The detailed theory of the DM test is as follows:

The null hypothesis H₀ shows that the difference between the established model and the comparable model is not obvious. H₁ represents a dramatic difference between the established model and the comparable model. The formulas for H₀ and H₁ are shown as:

$${H_0}:E\left[ {L\left( {{\text{e}}_{i}^{1}} \right)} \right]{\text{=}}E\left[ {L\left( {{\text{e}}_{i}^{2}} \right)} \right]$$

(1)

$${H_1}:E\left[ {L\left( {{\text{e}}_{i}^{1}} \right)} \right] \ne E\left[ {L\left( {{\text{e}}_{i}^{2}} \right)} \right]$$

(2)

where L is the loss function of prediction error, e_i^m=1, 2 are the prediction errors of the two comparable models.

The DM test statistics can be defined as:

$$DM = \frac{{\sum\nolimits_{{i{\text{ = }}1}}^{T} {(L(e_{i}^{1} ){\text{0}}L(e_{i}^{2} ))/T} }}{{\sqrt {S^{2} /T} }}s^{2}$$

(3)

Where s² is an estimation for the variance of ${d_i}{\text{=}}L\left( {e_{i}^{1}} \right) – L\left( {e_{i}^{2}} \right)$.

Eventually, contrast the value of DM with Z_α/2 at a given significance level α. If the value of DM beyond the range of [− Z_α/2, Z_α/2], the H₀ will be refused, which means that the established model and the comparable models have little difference in prediction performance, otherwise, the H₁ will be accepted.

In this section, the DM values of the proposed model and five models, MLP, SVM, ELM, RF and GBRT, at three different significance levels are listed in Table 5. The study found that the DM values for the MLP and SVM models exceeded the critical value at the 1% significance level, while those for the ELM and RF models exceeded the critical value at the 5% significance level, and the DM value for the GBRT model exceeded the critical value at the 10% significance level. Therefore, the established prediction model is superior to other comparable models at different significance levels, and the established hybrid prediction model has better prediction performance than other comparison models.

Table 5 DM test results of different models.

Prediction effectiveness

Prediction Effectiveness (PE) is a metric used to evaluate the performance of predictive models, which is used to evaluate the consistency between the model’s prediction results and the actual observed values. The core goal of PE is to quantify the effectiveness of the model in the prediction task, help determine whether the model can reliably capture the patterns in the data, and provide a basis for improving the model³⁶. A higher PE value generally indicates stronger predictive capability of the model, while a lower PE value indicates that the model requires further optimization or adjustment. Therefore, the PE value was employed to evaluate the predictive performance of the proposed model. Figure 8 presents the PE values for the different models. The results showed that the PE values of MLP, SVM, ELM, RF, GBRT and GS-XGBoost hybrid models were 0.79, 0.83, 0.86, 0.87, 0.89 and 0.93, respectively. The PE value of the developed prediction model is higher than that of the comparison model, which proves that the established prediction model is superior to the comparable model in improving the prediction performance.

SHAP analysis

Many machine learning models can achieve high prediction accuracy, but the reasons behind certain predictions and the relationships between model outputs and input features remain unclear. The lack of transparency limits their practical application and ability to guide production. Consequently, there is a growing demand for model interpretability. SHAP (SHapley Additive exPlanations) is employed to analyze the influence of input variables on model outputs and to elucidate how input parameters affect the final results, thereby enhancing the interpretability of the model.

The SHAP absolute value is used to quantify the impact of different input parameters on electricity consumption, as illustrated in Fig. 9. The larger the absolute value of SHAP, the greater the influence of parameters on electricity consumption. The analysis reveals that the factors affecting electricity consumption in crude oil pipelines, ranked in descending order of influence, are daily transport volume, average pump-out pressure, next station inlet pressure, average converging pressure, ground temperature, next station inlet temperature, and average output temperature. Figure 10 further illustrates the relationship between the SHAP values of electricity consumption and various influencing factors. As shown in Fig. 10, daily transport volume exhibits a significant positive correlation with electricity consumption. An increase in daily transport volume leads to higher electricity consumption, as oil pumps require greater power to transport larger volumes of crude oil through the pipeline, thereby increasing energy usage. However, the relationship between electricity consumption and daily transport volume is not simple linear. As the transport volume increases, the rate of electricity consumption growth may gradually slow down, potentially due to improved efficiency of the pipeline system at higher throughput levels. Secondly, average pump-out pressure shows a positive correlation with electricity consumption. Higher pump-out pressure requires oil pumps to overcome greater pipeline resistance and static pressure differences, resulting in increased energy consumption. However, the relationship between electricity consumption and pump-out pressure is not entirely linear. In the low-pressure range, electricity consumption grows at a slower rate, whereas in the high-pressure range, the growth rate may accelerate, particularly as the pump approaches its maximum operating pressure. Next station inlet pressure and average converging pressure exhibit a negative correlation with electricity consumption. When next station inlet pressure and average converging pressure are higher, the upstream pump station requires relatively lower pressure to maintain flow, thereby reducing energy consumption. However, the relationship between electricity consumption and these pressures is not entirely linear. In the low-pressure range, electricity consumption is more sensitive to changes in inlet pressure, whereas in the high-pressure range, the variation in electricity consumption may stabilize. Ground temperature typically exhibits a negative correlation with electricity consumption. When ground temperature is higher, the fluidity of crude oil improves, viscosity decreases, and pipeline resistance is reduced, leading to lower energy consumption. Additionally, ground temperature varies significantly with seasonal changes. In winter, lower ground temperatures often result in higher electricity consumption, while in summer, higher ground temperatures are associated with relatively lower electricity consumption. Average outlet temperature and next station inlet temperature generally show a negative correlation with electricity consumption. When average outlet temperature and next station inlet temperature are higher, the fluidity of crude oil improves, viscosity decreases, and pipeline resistance is reduced, leading to lower energy consumption. However, the relationship between electricity consumption and these temperatures is not entirely linear. In the low-temperature range, electricity consumption is more sensitive to temperature changes, whereas in the high-temperature range, the variation in electricity consumption may stabilize.

The proposed model has demonstrated satisfactory performance in practical applications, but it is currently only applicable to a crude oil pipeline system in southwestern China. Future work will focus on expanding its applicability by deploying the model across multiple long-distance crude oil pipeline systems throughout the Southwest region.