A deep learning-based meteorological factors forecast and dynamic risk assessment approach for built cultural heritage

With the advancement of information technology, it has become possible to investigate preventive conservation for built heritage using IoTs, big data, artificial neural networks, and other related new techniques. Forecasting meteorological factors and obtaining reliable risk indicator values are also urgent and challenging tasks. In this study, a data-driven preventive conservation framework for early risk warning on historical buildings in large-scale areas is proposed. The details of the framework are shown in Fig. 1. Initially, a vast amount of meteorological data corresponding to the locations of the heritage sites was collected and screened with correlation analysis. Then, the most representative factors selected from the original data are used to calculate the risk value caused by fluctuations in meteorological factors.

Then, by integrating quantitative models for the vulnerability and exposure of built heritage, a comprehensive risk assessment method for historical buildings in the Yangtze River Delta region based on dynamic meteorological factors was established. The principal distinction between the present method and existing approaches lies in its dynamic nature: dynamic meteorological variables are incorporated into the vulnerability and exposure quantification models to construct a time-series-based dynamic risk assessment framework. In this framework, the selected meteorological factor values are not directly used to compute risk; rather, they serve as input data for the proposed deep learning model, which outputs predicted values for these factors. The meteorological risk indicator values are determined by adjusting the model parameters based on forecast values for different future periods. Therefore, the risk early warning provided by the proposed approach is both proactive and dynamic. Finally, visualization techniques are employed to generate risk maps, which offer a clear display to assist heritage protection researchers and managers in making informed decisions. The methods employed in the hybrid conservation framework contribute to its intelligence and user-friendliness. The details of the framework–from data collection and pre-processing, to the deep learning-based forecast model, and finally the graphical visualization of risk indicators–are presented in this section.

Table of Contents

Data collection

The data used in this study to test the proposed approach were collected within the Yangtze River Delta region in China, which includes Shanghai City, Jiangsu Province, Zhejiang Province, and Anhui Province. The location of this area is shown on the left of Fig. 2. Given its significant historical role in cultural and economic development, this region stands out as one of the most vibrant and open areas in China. Due to the rich intangible cultural heritage resources in this area and the pressing conservation needs, we obtained funding for this research.

Obtained from the official government website of the People’s Republic of China, the State Administration of Cultural Heritage has listed 6 batches of “National protected cultural heritages”. After selection and statistical analysis, a total of 157 built rultural heritage sites are located in the Yangtze River Delta region, marked as red triangles in the right part (an enlarged view of the Yangtze River Delta region) of Fig. 2. Geographically, this region is a plain river network area with low rainfall and poor water discharge, belonging to a subtropical monsoon climate. It is prone to flooding due to frequent occurrences of plum rains, typhoons, and storm surges, especially in the middle and lower reaches of the Yangtze River.

The data for 24 meteorological factors from 133 sampling points over a total of 3653 days (from Jan. 1, 2011, to Dec. 31, 2020) in the Yangtze River Delta region were obtained from the WheatA website. These points are evenly distributed across Jiangsu, Anhui, Zhejiang, and Shanghai, with at least one point in each city. On average, there are 3 to 4 points in each city, with Yancheng having the highest number at 9 points. For example, Table 1 shows the data for the point (E 116.75^∘, N 30.75^∘) in Huaining County, Anqing City, Anhui Province, on January 1, 2012.

Table 1 Meteorological data of Huaining County, Jan. 1, 2012

The scoring data for each sub-indicator used to quantify the vulnerability and exposure of built cultural heritage were estimated, recorded and compiled by the team involved in this study according to the standards described in Tables 2 and 3, which are derived based on literature, public databases, historical archives, and field surveys. For example, indicators such as the annual average air pollution index and the status of biological erosion were obtained from official environmental monitoring data and field observations, while records of modifications and historical disasters were derived from architectural archives and literature analyses. Hydrological and geological disaster data were sourced from the Yangtze River Delta Science Data Center, which is part of the National Earth System Science Data Sharing Infrastructure and the National Science & Technology Infrastructure of China³⁴. All data were archived in the conservation sheets for built heritage in the Yangtze River Delta region and subsequently processed using the vulnerability and exposure quantification methods.

Table 2 Scoring standards and weights of each sub-indicator for the vulnerability

Table 3 Scoring standards and weights of each sub-indicator for the exposure

Meteorological data screening and correlation analysis

According to the instruction of the Ministry of Urban and Rural Construction and Environmental Protection of the People’s Republic of China, the Office of the Central-South Regional Building Standard Design Cooperation Group and the Climate Data Office of the Beijing Meteorological Center of the National Meteorological Administration jointly compiled the “Standard for building meteorological parameters”³⁵. This standard provides statistical methods and standards for various meteorological parameters, including atmospheric pressure, dry bulb temperature, relative humidity, precipitation, wind speed, sunshine duration, solar radiation intensity, ground temperature, and permafrost.

Research on the damage to buildings caused by meteorological factors and disasters is closely linked to the study of temperature, precipitation, wind speed, and humidity^3,36,37. Excessive temperature and humidity generally lead to the deformation or distortion of wooden buildings, accelerate the deterioration and aging of materials, and increase the activity of molds and fungi. This aggravates the decay and collapse of wooden buildings and causes erosion to masonry buildings. Water damage is one of the most common and destructive types of disasters faced by ancient buildings. Daily rainfall may cause roof leakage, damp walls, and wood rot. Heavy rainfall may cause floods, mudslides, and foundation settlement, resulting in structural damage. Sustained low-level winds can intensify water erosion by allowing water droplets to penetrate stone, thereby exacerbating the weathering of masonry buildings.

From Table 1, it can be seen that many values of meteorological factors have small differences. Taking all 24 meteorological factors into account for risk prediction would make the model too large and waste a large amount of computing resources. Therefore, to simplify the meteorological factors, we conducted a correlation analysis between the 24 meteorological factors using data from Huaining County from 2011 to 2020. From the analysis results, we can further screen the values of the required meteorological factors.

The Pearson correlation is chosen as the method for correlation analysis in our study. The correlation coefficient can be calculated using the following equation.

$${\rho }_{x,y}=\frac{\,\text{cov}\,(X,Y)}{{\sigma }_{x}{\sigma }_{y}}=\frac{E\left(\left(X-{\mu }_{x}\right)\left(Y-{\mu }_{y}\right)\right)}{{\sigma }_{x}{\sigma }_{y}}$$

(1)

which can be simplified as:

$${r}_{xy}=\frac{n\sum {x}_{i}{y}_{i}-\sum {x}_{i}\sum {y}_{i}}{\sqrt{n\sum {x}_{i}^{2}-{\left(\sum {x}_{i}\right)}^{2}}\sqrt{n\sum {y}_{i}^{2}-{\left(\sum {y}_{i}\right)}^{2}}}$$

(2)

where ρ_x,y and r_xy represent the correlation coefficient, and cov(X, Y) is the covariance of sequences X and Y. σ_x and σ_y represent the standard deviation of sequences X and Y, respectively. n is the number of values in sequence X and Y.

With this method, the correlations between the 24 meteorological factors can be obtained, and the redundant information in the different factors can be identified clearly. Therefore, we can choose a small number of representative factors that caused the risk of the built cultural heritage. Thus, the calculation of data processing will be reduced in the following steps.

The proposed LSTM-based deep learning model

In order to forecast the values of the needed meteorological factors in the future, the LSTM neural network is a good choice. An LSTM-based deep learning model is proposed in this study. The architecture of this meteorological factors forecast model is depicted in Fig. 3.

There are a total of 4 layers in the model. The first 2 are LSTM layers for feature extraction, followed by 2 fully connected (FC) layers to reconstruct the feature vectors and generate the correct outputs. The activation functions for the LSTM layers and FC layers are tanh and ReLU, respectively. The mean square error (MSE) is used as the loss function. Adam serves as the optimizer of the networks.

The obtained values of meteorological factors are the input of the model, and the forecast values are obtained from the output. As shown in Fig. 3, the input consists of m factors, while the output consists of n factors. The numbers will be set according to the practice by users. In this study, they are decided by correlation analysis from the former step. For each factor, it needs values of the last 30 days consisting a sequence working as the input, and outputs the factor values of following i days. It is worth pointing out that the number of nodes in different layers is dynamically adjusted according to the forecast days span. This ensures that it improves the generalization of the model and reduces the performance degradation caused by the increase in the forecast days span. The number of nodes in the 4 layers can be obtained by the following 4 equations.

$${N}_{L1}=2{D}_{s}+64$$

(3)

$${N}_{L2}=2{D}_{s}+32$$

(4)

$${N}_{F1}={D}_{s}+10$$

(5)

where D_s is the forecast days span. N_L1, N_L2, N_F1, and N_F2 represent the nodes number of the 4 layers sequentially. Because the last layer is the output layer, the nodes number is equal to the forecast days span.

As for the collected dataset consisting of values of different meteorological factors, it is necessary to make normalization before using it as the input of the model. The most commonly used min-max normalization method is employed to normalize the values of each factor. This method is one of the most common ways to normalize data. For every factor, the minimum value gets transformed into 0, the maximum value gets transformed into 1, and every other value gets a decimal between 0 and 1. It can be expressed by the following equation.

$$f^{\prime} =\frac{f-MinValue}{MaxValue-MinValue}$$

(7)

where f and $f^{\prime}$ represent the factor value before and after the normalization. At the end of the process, denormalization is also needed to get the forecast values of the meteorological factors.

To evaluate the effectiveness of the proposed model, forecast accuracy (ACC), mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and processing time are used. The 4 evaluation criteria can be derived from the following equations.

$$ACC=1-\frac{\mathop{\sum }\nolimits_{i = 1}^{n}\left(\frac{\left\vert {v}_{r,i}-{v}_{p,i}\right\vert }{{v}_{r,i}}\right)}{n}$$

(8)

$$MSE=\frac{\mathop{\sum }\nolimits_{i = 1}^{n}{({v}_{r,i}-{v}_{p,i})}^{2}}{n}$$

(9)

$$RMSE=\sqrt{\frac{\mathop{\sum }\nolimits_{i = 1}^{n}{({v}_{r,i}-{v}_{p,i})}^{2}}{n}}$$

(10)

$$MAE=\frac{| \mathop{\sum }\nolimits_{i = 1}^{n}({v}_{r,i}-{v}_{p,i})| }{n}$$

(11)

where v_r,i and v_p,i represent the actual value of the meteorological factor and the forecast one, respectively. The number of forecast values is n.

Risk assessment method

In the proposed approach, the risk assessment includes three parts: the meteorological risk, vulnerability and exposure. The meteorological risk assessment is derived from the forecast values of the deep learning model. The vulnerability and exposure quantification methods were developed based on the characteristics of built heritage.

When obtaining the forecast values of the representative meteorological factors, it is ready to conduct the meteorological risk assessment. For meteorological data, it is difficult to quantify it directly as a risk indicator, and the rationality of such processing cannot be guaranteed. Thus, it is necessary to investigate the risk assessment method to get a practical and simplified method for risk early warning for built cultural heritage in large-scale areas.

In fact, after the meteorological data correlation analysis and selection, the meteorological risk indicator should be derived from the selected representative meteorological factors. As the values of every factor are quite different, not only the meanings but also the numerical range, the first task is normalization. Due to the characteristics of meteorological data, some extreme values can disturb the balance of numerical distribution. Therefore, the min-max normalization method is not suitable for these values in our study. We put forward a risk assessment method for each factor based on the practice of heritage protection and referring to a large number of refs. ^{2,25,38,39,40,41}. The risk indicator value of each factor is divided into 6 levels from 0 to 1 (including 0, 0.2, 0.4, 0.6, 0.8 and 1). Taking the air temperature, for example, excessively high or low temperatures pose a high risk to historical buildings, while relatively mild temperatures correspond to low risk.

In this study, the weighting of each meteorological factor is generally based on the fuzzy integrated evaluation method, and also takes the extreme indicator values of each factor. The general risk indicator can be calculated by the following equation,

$${R}_{m}=\mathop{\sum }\limits_{i=1}^{n}{w}_{i}\times {r}_{i}(f)$$

(12)

where the indicator value of the i-th factor is conducted from r_i(f), where f represents the value of different factors. The range of the indicator value refers to the 6 levels of the risk indicator value of each factor in Table 4. The order of factors is determined by the values of the risk indicators, which means the maximum one is ranked in the first place, and the minimum one is at last. w_i is the weight of the i-th meteorological factor, n is the total number of meteorological factors, which is 5 in this study. And the weights are set as 0.30, 0.25, 0.20, 0.15 and 0.10 for each factor from the first one to the last one, respectively. In this way, factors with higher risks will be given higher weights compared to those with lower risks. It is consistent with the aim to show the high risk point for risk early warning.

Table 4 Risk indicator values of meteorological factors

The vulnerability of built cultural heritage is a comprehensive indicator for assessing the risk of damage to heritage buildings under natural or anthropogenic environmental influences. Based on the material characteristics, structural integrity, and historical condition of built cultural heritage in the Yangtze River Delta region, as well as previous research works of architectural vulnerability^28,42,43,44, we have developed an integrated quantitative method encompassing three primary indicators: “Material Degradation Sensitivity”, “Structural Integrity”, and “Historical State”, each of which is further divided into multiple sub-indicators. The weight of each indicator is determined using the Delphi method, which will be introduced later in this subsection.

The comprehensive vulnerability quantification formula is expressed as follows:

$${R}_{v}={W}_{m}\times M+{W}_{s}\times S+{W}_{h}\times H$$

(13)

where R_v represents the comprehensive vulnerability of built cultural heritage. M, S, and H represent the quantified scores of these three primary indicators. w_m, w_s, and w_h denote the weights of the three primary indicators: material degradation sensitivity, structural integrity, and historical state, respectively, shown as the values in the last column in Table 2.

Material degradation sensitivity assesses the risk of material deterioration in the natural environment due to the material characteristics of the heritage structure. The quantification formula is:

$$M=\mathop{\sum }\limits_{i=1}^{n}\left({w}_{M,i}\times \frac{{M}_{i}}{{M}_{i,max}}\right)$$

(14)

where M_i represents the score of the i-th sub-indicator, including material type, surface color, annual air pollution index, biological erosion status, complexity of decoration, and types of restoration materials (the second column in Table 2). M_i,max is the maximum possible score for the i-th sub-indicator, and w_M,i represents the weight of the i-th sub-indicator, satisfying $\mathop{\sum }\nolimits_{i = 1}^{n}{w}_{M,i}=1$.

Structural integrity reflects the vulnerability of built cultural heritage in terms of structural stability. The quantification formula is:

$$S=\mathop{\sum }\limits_{j=1}^{m}\left({w}_{S,j}\times \frac{{S}_{j}}{{S}_{j,max}}\right)$$

(15)

where S_j represents the score of the j-th sub-indicator, including structural simplicity, number of floors, foundation/roof type, and load distribution uniformity (the second column in Table 2). S_j,max is the maximum possible score for the j-th sub-indicator, and w_S,j represents the weight, satisfying $\mathop{\sum }\nolimits_{j = 1}^{m}{w}_{S,j}=1$.

The historical state evaluates the vulnerability of built cultural heritage based on construction age, historical disaster experience, and modifications. The quantification formula is:

$$H=\mathop{\sum }\limits_{k=1}^{p}\left({w}_{H,k}\times \frac{{H}_{k}}{{H}_{k,max}}\right)$$

(16)

where H_k represents the score of the k-th sub-indicator, including modifications, construction period, and historical disaster records (the second column in Table 2). H_k,max is the maximum possible score for the k-th sub-indicator, and w_H,k represents the weight, satisfying $\mathop{\sum }\nolimits_{k = 1}^{p}{w}_{H,k}=1$.

Exposure is a crucial metric for evaluating the potential threat level of built cultural heritage due to natural environments, human activities, or disasters. This study dynamically quantifies the risk associated with natural environments. Thus, the exposure quantification mainly considers human activities and disaster risks arising from the location of heritage sites. The relative magnitude of exposure is determined through the comprehensive analysis of these risk indices. The exposure is calculated as follows:

$${R}_{e}={W}_{a}\times A+{W}_{d}\times D$$

(17)

where R_e represents the comprehensive exposure, A is the human activity impact index, and D is the disaster risk index from the location of the heritage site. The consideration of extreme disaster risks here differs from the historical disaster records in the vulnerability quantification method, as it primarily focuses on future risk scenarios. The data is derived from the flood hazard index dataset and the earthquake hazard distribution dataset of the Yangtze River Delta region. W_a and W_d denote the weights of each indicator, shown as the values in the last column in Table 3.

The human activity impact index measures the direct anthropogenic pressure on built cultural heritage, and it is calculated as follows:

$$A={W}_{t}\times T/{T}_{max}+{W}_{u}\times U/{U}_{max}$$

(18)

where T represents the tourism development level of the heritage site, and U denotes the industrial activity intensity surrounding the heritage site, and T_max and U_max mean the maximum values of possible scores. W_t and W_u denote the weights of each sub-indicator, shown in Table 3.

In this study, only the sub-indicators related to meteorological factors from the vulnerability quantification method are extracted. While extreme disasters are associated with meteorological factors, they are considered low-probability events and are thus retained in the exposure quantification method. The extreme disaster risk index is calculated as follows:

$$D={W}_{f}\times F/{F}_{max}+{W}_{g}\times G/{G}_{max}$$

(19)

where F represents the flood risk index, and G denotes the geological disaster risk index. F_max and G_max mean the maximum values of possible scores. W_f and W_g denote the weights of each sub-indicator, shown in Table 3.

The weights for vulnerability calculation (w_m, w_s, and w_h) and their corresponding sub-indicators (w_M,i, w_S,j, and w_H,k), as well as the weights for exposure calculation (W_a, W_d, W_t, W_u, W_f, and W_g), are all determined using the Delphi method.

In this study, eight experts participated in the weight assessment process, including three architects with practical experience in built cultural heritage restoration, four scholars specializing in built cultural heritage conservation and restoration research, and one staff member from a cultural heritage protection organization. Experts rated the relative importance of each sub-indicator concerning built cultural heritage vulnerability and exposure, and the final weight distribution results were obtained through multiple rounds of feedback and consistency analysis, shown in Table 5.

Table 5 Expert’s opinions of vulnerability and exposure

Based on the risk quantification methods introduced above, the values of meteorological risk (R_m), vulnerability (R_v) and exposure (R_e) can be derived. To guide preventive protection practices in built cultural heritage, we need to make a risk level classification based on these values. For the three risk categories, high, medium, and low levels are classified based on their quantitative values according to the rules in Table 6. Subsequently, a comprehensive risk classification for overall risk can be determined by applying the rules shown in Table 7. Thus, the risk level of each heritage site can be obtained dynamically according to values of forecast meteorological factors, vulnerability, and exposure, for various future time periods (a maximum of 15 days).

Table 6 Risk level classification

Table 7 Overall risk level classification

Visualization method

With the indicator values of meteorological risk assessment, it is possible to indicate where the high-risk points caused by meteorological factors are, but the raw data is stored in tables with different points marked by longitude and latitude values. Under these circumstances, users without a background in data science may find it difficult to carry out the analysis. Therefore, a visualization method is required to facilitate cultural heritage protection practices.

To fulfill this aim, two-dimensional interpolation of risk values on the map is calculated using the minimum curvature method, which is an interpolation method for terrain surfaces or other data. It uses the principle of minimum curvature to determine the value of an unknown point. The method determines the point with the lowest curvature by calculating the curvature at each location one by one. The value of that point is then the value of the desired interpolation point. It is more efficient than other interpolation methods, such as the triangular interpolation method, and is suitable for interpolating irregular grid data.

The minimum curvature method of 2D interpolation is based on the following curvature minimization principle: given any two points between which a surface is depicted, its curvature is always minimized as one follows that curve. An irregular grid shape can be generated by sampling a few points with elevation values and then interpolating the smoothed values of the unknown points within it. The minimum curvature method is used to generate smooth ground surfaces, making it particularly useful for applications such as generating accurate digital elevation models (DEMs). The minimum curvature method quantifies curvature by calculating the distance between data points and finding the minimum value of curvature within that distance. The smallest data point is found, and thus the value of the required interpolation point is determined.

In this method, the Euclidean distance formula to calculate the distance between any two points is given by the following equation:

$$D=\sqrt{{({x}_{1}-{x}_{2})}^{2}+{({y}_{1}-{y}_{2})}^{2}}$$

(20)

where (x₁, y₁) and (x₂, y₂) are the two points.

The distance from the interpolation point to each sampling point and the angle between them can be calculated as follows,

$${d}_{i}=\sqrt{{({x}_{i}-x)}^{2}+{({y}_{i}-y)}^{2}}$$

(21)

$${\theta }_{i}=arctan(({y}_{i}-y)/({x}_{i}-x))$$

(22)

where (x_i, y_i) and (x, y) represent the interpolation point and the sampling point, respectively.

At last, the interpolation value is as follows,

$$z=\sum _{i=1}\alpha n{z}_{i}{d}_{i}$$

(23)

where n is the number of sampling points, z_i is the value of the sampling point, and α is the smoothing coefficient, set as 2 in our study.

The overall risk levels are assessed according to the characteristics of each heritage site and the meteorological risk of the area where it is located. Thus, to show the overall risk graphically, we need to display the heritage sites on the map based on their coordinate values. Unlike the red triangle marked in Fig. 2, the marked points in the risk map are represented by large circles ranging from white dots to deep red dots, indicating the risk of the heritage sites from very low to very high.

link