A deep-learning approach to parameter fitting for a lithium metal battery cycling model, validated with experimental cell cycling time series

In this Section, we apply our approach to the experimental profiles described in “Experimental materials and methods” Section. Here again the goal will be to estimate the parameters $D$ and ${r}_{\text{max}}$ and then to compare the corresponding NN predictions for the voltage profiles with original experimental data used as input for our CNN-LSTM. Towards this goal, some remarks are needed.

i) Data normalization. In experimental data, the actual voltage jumps following switching between charge and discharge conditions are affected by ohmic drop, which is a cell property, rather than an electrodic one. Since our key focus is on electrode properties, we omitted this aspect from our model. This could be introduced in a very straightforward way, but this would unnecessarily extend the parameter space. In order to cope with this factor, in this work, we simply resorted to normalization of the experimental profiles, since this operation preserves the voltage transient shape and enables identification of the $D$ and ${r}_{\text{max}}$ parameters, which is our aim.

ii) Selection of the charging half-cycle. In general, experimental profiles do not show perfect symmetry between the charge and discharge half-cycles, unlike simulated data. In order to reduce parameter estimation uncertainty, without appreciable loss of electrochemical information, in our numerical pipeline we have considered only the charging half-cycles in the experimental data from Section “Experimental materials and methods”, Fig. 2. Concretely, for simulated time-series in the dataset $D_{0}$ described in Section ’Dataset definition”, we extract the first half-cycle of size $\ell = 1800$ from each selected voltage cycle and we normalize it in $\left[ {0, 1} \right]$. For this reason, we have re-trained our NN model, with the same NN architecture defined in Section “Dataset definition” and the same hyper-parameters as well as dataset splitting reported in Table 1 and and Section S1 of the SI. As in the previous application, the labels consist of parameters $D$ and $r_{{{\text{max}}}}$ associated with each profile (normalized as in Eq. (8)).

To perform the parameter estimation for the selection of experimental profiles described in Section “Experimental materials and methods” and below, we have performed the following computational steps:

i) we train our CNN-LSTM network for increasing values of the learning rate $\alpha$ and for two different choices of the seed $\tau$ responsible of the random weights initialization in the optimization algorithm ADAM. As reported in Table 3, in total we consider six cases ${\text{NN}}_{i} , i = 1, \ldots , 6$;

Table 3 NN fitting for experimental data: hyper-parameters and metrics.

ii) for each set, say ${\mathbf{EXP}}_{j} , j = 1, \ldots 4,$ of experimental data $\bf{z}_{j} \in {\mathbb{R}}^{\ell } , \ell = 1800$, obtained by interpolation on the raw data (see discussion below), we apply all the trained ${\text{NN}}_{i}$ to obtain the parameter predictions couples ${\varvec{p}}_{ij} = \left( {D,r_{{{\text{max}}}} } \right)$ in the original physical ranges ${\Omega }_{D}$ and ${\Omega }_{{r_{max} }}$ introduced in Section “Dataset definition”;

iii) in the post-processing step, we solve the PDE system (1)–(6) for each identified parameter couple ${\varvec{p}}_{ij}$ to find the predicted voltage profile ${\varvec{\xi}}_{ij} = {\Delta }\phi_{pred} \left( {{\varvec{p}}_{{{\varvec{ij}}}} } \right)$ for each data set and for each ${\text{NN}}_{i}$;

iv) we report in Fig. 8 the predicted half-cycles ${\varvec{\xi}}_{ij}$ in time and we compare them with the experimental data ${\varvec{z}}_{j}$ , using a suitable weighted distance $H_{W}^{2}$ based on the $H^{2}$ Sobolev norm^42,43, defined below;

v) finally, for each ${\mathbf{EXP}}_{j} , j = 1, \ldots ,4$, we select the best predicted parameter set ${\varvec{p}}_{j}^{*}$ and the best predicted voltage profile ${\varvec{\xi}}_{j}^{*}$ by calculating the minimum of the $H_{W}^{2}$ distance.

In Table 3, we report the MAE residual values and the $R^{2}$ metrics computed on the test set obtained by each ${\text{NN}}_{i}, i=1, \dots, 6$. Section S6 in the SI reports further information about the NN training processes.

To refine our comparisons, in addition to the predicted voltage profiles ${\varvec{\xi}}_{ij} , i = 1, \ldots ,6$, we also compute for each ${\mathbf{EXP}}_{j}, j=1,\dots, 4$, the “LS” profile ${\varvec{s}}_{LS}^{j}$, i.e. the profile included in our dataset with minimum least squares distance from the data profile ${\varvec{z}}_{j}$. We will show explicitly that, quite predictably, the $L^{2}$ distance is not suitable to identify the profile that better describes the behavior of experimental data. In particular, we prove that in some cases the optimal LS solution ${\varvec{s}}_{LS}$ is not able to follow the trend and the shape of some profiles, while the NN optimal profile predictions ${\varvec{\xi}}^{*}$ do, even if they can have greater LS distance from the data.

To this aim, we propose a new distance based on the $H^{2}$ Sobolev norm^42,43. By definition, this distance is able to capture some properties of a given function, that are ignored by the $L^{2}$ distance. In fact, the $H^{2}$ Sobolev-based distance takes into account the first and the second derivatives of the function, such that we can keep track of maximum, minimum, saddle points and convexity properties. Hence, for our purposes, let us define here the $H^{2}$ discrete Sobolev distance between ${\varvec{u}}, {\varvec{v}} \in {\mathbb{R}}^{s}$ as follows:

$$H^{2} \left( {{\varvec{u}},{\varvec{v}}} \right) = \left( {\left\| {{\varvec{u}} – {\varvec{v}}} \right\|_{{L^{2} }}^{2} + \left\| {{\varvec{u}}^{\prime } – {\varvec{v}}^{\prime } } \right\|_{{L^{2} }}^{2} + \left\| {{\varvec{u}}^{\prime \prime } – {\varvec{v}}^{\prime \prime } } \right\|_{{L^{2} }}^{2} } \right)^{\frac{1}{2}} \quad \forall {\varvec{u}},{\varvec{v}},$$

(12)

where ${\varvec{u}}^{\prime } , {\varvec{v}}^{\prime }$ and ${\varvec{u}}^{\prime \prime } , {\varvec{v}}^{\prime \prime }$ are the approximations of the first and second derivatives of ${\varvec{u}}$ and ${\varvec{v}}$, respectively, computed by finite differences. For our application, indeed we consider the discrete weighted $H^{2}$ distance given by:

$$H_{W}^{2} \left( {{\varvec{u}},{\varvec{v}}} \right) = \left( {\left\| {{\varvec{u}} – {\varvec{v}}} \right\|_{{L_{W}^{2} }}^{2} + \left\| {{\varvec{u}}^{\prime } – {\varvec{v}}^{\prime } } \right\|_{{L_{W}^{2} }}^{2} + \left\| {{\varvec{u}}^{\prime \prime } – {\varvec{v}}^{\prime \prime } } \right\|_{{L_{W}^{2} }}^{2} } \right)^{\frac{1}{2}} \quad \forall {\varvec{u}},{\varvec{v}},$$

(13)

where $\left| {\left| {{\varvec{u}} – {\varvec{v}}} \right|} \right|_{{L_{W}^{2} }} = \left( {\left( {{\varvec{u}} – {\varvec{v}}} \right)^{T} W\left( {{\varvec{u}} – {\varvec{v}}} \right)} \right)^{\frac{1}{2}}$ is a weighted $L^{2}$ norm, with $W$ the diagonal matrix $W = {\text{diag}}\left( {{\varvec{w}}_{{d_{1} }} \circ {\varvec{w}}_{{d_{2} }} } \right),$ where $\circ$ is the Hadamard product between the vectors ${\varvec{w}}_{{d_{1} }}$ and ${\varvec{w}}_{{{\varvec{d}}_{2} }}$. The weights include information about the first and the second (discrete) derivatives as follows:

$$\begin{array}{*{20}c} \begin{gathered} \left( {\user2{w}_{{d_{1} }} } \right)_{i} = \left\{ {\begin{array}{*{20}c} {\ 1,~~~~~~~~~if~sign\left( {\user2{u}_{i}^{\prime } } \right) \ne sign\left( {\user2{v}_{i}^{\prime } } \right)} \\ { \ a,~~~~~~~~~otherwise~~~~~~~~~~~~~~~~~~~~~~~} \\ \end{array} ~~} \right. \hfill \\ \left( {\user2{w}_{{d_{2} }} } \right)_{i} = \left\{ {\begin{array}{*{20}c} {\ 1,~~~~~~~~~if~sign\left( {\user2{u}_{i}^{{\prime \prime }} } \right) \ne sign\left( {\user2{v}_{i}^{{\prime \prime }} } \right)} \\ {\ a,~~~~~~~~~otherwise~~~~~~~~~~~~~~~~~~~~~~~~~} \\ \end{array} ~~} \right. \hfill \\ \end{gathered} & {{\text{for}}~~i = 1, \ldots ,s,} \\ \end{array}$$

(14)

where $a > 0$ is a small number. Specifically, for $a < 1$ the weighted norm gives more importance to the derivatives wrt to the function values. In particular, in our application we used $a = 0.1$. It can be easily proved that $H_{W}^{2}$ is still a distance. This definition of $W$ in terms of (14) allows to penalize the discrepancy between the first and second derivatives of ${\varvec{u}}$ and ${\varvec{v}}$. In fact, we will obtain a higher value in the $H_{W}^{2}$ distance (13) whenever the two profiles ${\varvec{u}}$ and ${\varvec{v}}$ do not match the sign of the derivatives in each point. In the following numerical results, we will thus compute the errors:

$${\text{err}}_{ij}^{H} = H_{W}^{2} \left( {{\varvec{\xi}}_{ij} ,{\varvec{z}}_{j} } \right),$$

(15)

where ${\varvec{z}}_{j} , j = 1, \ldots , 4$ are the data of the ${\mathbf{EXP}}_{j}$ cases discussed below and ${\varvec{\xi}}_{ij} , j = 1, \ldots ,4, i = 1, \ldots ,6$ are the NN predicted profiles. Thus, for each experiment $j$, we will report the errors in Fig. 8 (last row of subplots) and then extract the best NN prediction ${\varvec{\xi}}_{j}^{*}$ as the NN voltage profile with minimum $H_{W}^{2}$ value (green profiles in Fig. 8). For completeness, for all NN profiles predicted in each column j of subplots, we report also the errors in the $L^{2}$ norm (see last row of subplots, in red): of course, the Least Squares LS solution is always that with minimum error, even if it does not preserve the shape of the experimental profile.

Table of Contents

Numerical results

In this Section, we carry our parameter identification tasks for the four typical charge–discharge cycles presented in Section “Experimental materials and methods” with the NNs described above. The half-cycles we will study are extracted from the experimental time-series shown in Fig. 2.

For each set of raw data ${\mathbf{EXP}}_{j} , j = 1, \ldots , 4$, we selected the half-charge cycle ${\varvec{z}}_{j}^{e} \in {\mathbb{R}}^{{\ell_{j} }}$ , then we interpolated it using the Piecewise Cubic Hermite Interpolating Polynomial, using the command pchip in MATLAB on a uniform grid, in order to have a profile of the same size of the simulated data used for the dataset of the NNs. We then rescaled the result between 0 and 1 to obtain the (interpolated), normalized voltage profile ${\varvec{z}}_{j} \in {\mathbb{R}}^{\ell }$ with $\ell = 1800$ points.

In the next Subsections, we implement for each ${\mathbf{EXP}}_{j} , j = 1, \ldots ,4$, the strategy discussed in the points (i)—(v) listed above and discuss the thereby obtained results.

Parameter identification for transient shape EXP1

The experimental profile denominated “EXP 1” comes from a cycle ${\varvec{z}}_{1}^{e} \in {\mathbb{R}}^{{\ell_{1} }} , \ell_{1} = 1202$, in the experiment shown in Fig. 2a. By using the CNN-LSTM networks ${\text{NN}}_{i} , i = 1, \ldots ,6$ described above in this Section, according to step (ii) of our computational strategy, we obtained the predicted parameters reported in the first column of Table 4. To compare the experimental profile with those corresponding to all the NN predicted parameters, we follow steps (iii) and (iv) above. Hence, we show our results in the first column of subplots in Fig. 8: we report the experimental data (red) and the NN predicted profiles (blue) together with the weighted Sobolev error $H_{W}^{2}$ in Eq. (15). According to point (v), we can identify the best NN profile (highlighted in green) ${\varvec{\xi}}_{1}^{*}$ and the corresponding best parameter set ${\varvec{p}}_{1}^{*}$, reported in bold in Table 4. For completeness, we show also the LS solution ${\varvec{s}}_{LS}^1$, extracted from the dataset, in black and its $H_{W}^{2}$ error as a full circle in the subplot for the errors. In the same subplot, we show also, for the sake of comparisons, the classical $L^{2}$ errors for all predicted profiles.

Table 4 Parameter predictions and errors for experimental profiles.

In this case, we can conclude that the profile with the smallest Sobolev error is the LS profile (${\text{err}} = 0.7741$), indeed very similar to that obtained by the NN₃ (${\text{err}} = 0.7853$) for ${\varvec{\xi}}_{1}^{*}$. Nevertheless, even if the corresponding identified parameter values for $D$ are very similar, the predicted $r_{{{\text{max}}}}$ are rather different, as one can note from Table 4.

Parameter identification for transient shape EXP2

The experimental profile ${\varvec{z}}_{2}^{e} \in {\mathbb{R}}^{{\ell_{2} }} , \ell_{2} = 14$, denominated “EXP 2”, is extracted from region 1 of the experiment in Fig. 2d. The parameters predicted with all ${\text{NN}}_{i} , i = 1, \ldots ,6$ are reported in the second column of Table 4. In the second column of subplots of Fig. 8, again we show the experimental data (red) and the NN predicted profiles (blue). The LS solution is also shown. The weighted Sobolev errors $H_{W}^{2}$ between the profiles and the classical $L^{2}$ errors are reported in the last subplot.

It is worth noting that, in this case, the identified LS profile is significantly different from the original data profile, in fact ${\varvec{s}}_{LS}^2$ has a monotonically increasing trend and, unlike the experimental data, it does not exhibit a minimum and two peaks. Therefore, this example proves that the traditional $L^{2}$ distance is not effective to compare experimental and simulated profiles, because it can lose important morphological features and then can fail to capture the main characteristics of the shape of the cycles. Moreover, in this example, by visual inspection we can conclude that, all our CNN-LSTM succeed in providing voltage profiles with the same qualitative shape as the experimental data, even if large errors can be present. Nevertheless, looking at the $H_{W}^{2}$ distances in the last subplot, we find that at least two NNs are quantitatively better than the LS solution: the best performance is provided by NN₂, which predicted profile is highlighted in green and the corresponding parameter set ${\varvec{p}}_{2}^{*}$ is reported in bold in Table 4.

Parameter identification for transient shape EXP3

“EXP 3” denotes the experimental half-cycle ${\varvec{z}}_{3}^{e} \in {\mathbb{R}}^{{\ell_{3} }} , \ell_{3} = 14$ extracted from region 2 of Fig. 2d. The predicted parameters found by the ${\text{NN}}_{i} , i = 1, \ldots ,6$ are shown in Table 4. The outcomes of steps (iii)—(v) are depicted in the third column of Fig. 8, with the same notations of the previous case. With this particular dataset, we have a monotonically increasing profile, but different shapes are found in the predicted profiles, including also the LS solution. The Sobolev $H_{W}^{2}$ errors in (15) and the $L^{2}$ errors in (10) for all above profiles are reported in the last plot of the third column: these indicate that the smallest distance is achieved by NN₂. Here the corresponding best prediction ${\varvec{\xi}}_{3}^{*}$ is highlighted in green and the corresponding parameter set ${\varvec{p}}_{3}^{*}$ is printed in boldface in Table 4.

Parameter identification for transient shape EXP4

In this case, the “EXP 4” dataset corresponds to the charging half-charge cycle ${\varvec{z}}_{4}^{e} \in {\mathbb{R}}^{{\ell_{4} }} , \ell_{4} = 14$ of region 3 in Fig. 2d. The parameters identified by all ${\text{NN}}_{i} , i = 1, \ldots ,6$ are shown in Table 4. The results of our post-processing and error analysis are now presented in the last column of Fig. 8, The weighted Sobolev errors allow to discriminate among the almost similar predicted profiles, including the LS one, and it is easy to see that, again, NN₂ provides the best prediction (profile ${\varvec{\xi}}_{4}^{*}$ highlighted in green, best parameters predictions in bold in Table 4).

In conclusion, we note that in most of the investigated cases the best predictions are provided by NN₂, that is the CNN-LSTM exhibiting the best metrics in Table 3. In addition, it is worth noting that the identified D and $r_{{{\text{max}}}}$ values, on the one hand, lie in the physically meaningful range, and, on the other hand, the values identified with our NN-based parameter identification procedure enable to follow the experimental transients very well. Moreover, notwithstanding the fact that the modelling approach for example of ref.⁴⁴ is very different from that of our work, the diffusivity values are numerically totally compatible with values found in the modelling literature, even though with conceptually and algorithmically very different approaches⁴⁴.

link