 Original research
 Open Access
 Published:
An explainable framework for load forecasting of a regional integrated energy system based on coupled features and multitask learning
Protection and Control of Modern Power Systems volumeÂ 7, ArticleÂ number:Â 24 (2022)
Abstract
To extract strong correlations between different energy loads and improve the interpretability and accuracy for load forecasting of a regional integrated energy system (RIES), an explainable framework for load forecasting of an RIES is proposed. This includes the load forecasting model of RIES and its interpretation. A coupled feature extracting strategy is adopted to construct coupled features between loads as the input variables of the model. It is designed based on multitask learning (MTL) with a long shortterm memory (LSTM) model as the sharing layer. Based on SHapley Additive exPlanations (SHAP), this explainable framework combines global and local interpretations to improve the interpretability of load forecasting of the RIES. In addition, an input variable selection strategy based on the global SHAP value is proposed to select input feature variables of the model. A case study is given to verify the effectiveness of the proposed model, constructed coupled features, and input variable selection strategy. The results show that the explainable framework intuitively improves the interpretability of the prediction model.
1 Introduction
A regional integrated energy system (RIES) is defined as a region with various forms of energy supply, conversion and storage devices, where the consumption of different energies such as electricity, cooling, and heating are coupled with each other. An RIES is generally a unified management area, such as industrial parks, commercial buildings, campuses, and other multienergy joint operation systems, areas which allow for unified access and analysis of different energy data. Accurate and reliable load forecasting is the foundation for optimal dispatching and energy management of an RIES, as it is used to provide data support for developing energy trading plans, energy system security assessment, and demandside management. Therefore, an effort to improve the accuracy of load forecasting is meaningful for the development of an RIES.
Because of the strong correlation between different types of energy consumption, the planning and operation of an RIES are more complex than a traditional single type of energy system [1, 2]. Two types of correlations exist in an RIES: intracoupled relation (the correlations between load and its own power) and intercoupled relation (the correlations between two different energy loads). This means that the demand for different types of loads is not only related to their own historical loads, but also to the coupling relations (the correlations and influences of two or more elements or systems) between other energy type loads. This brings new challenges and ideas for load forecasting of different energy types.
For RIES load forecasting, traditional methods are mainly based on the forecasting methods of electric load such as wavelet neural networks [3], autoregressive integrated moving average (ARIMA) [3], Bayesian regression model [4], and Grey model [5], etc. However, these studies predict cold, heat, and electric loads separately, and lack the exploration and utilization of the coupling relations between different load forecasting tasks. Therefore, the impact that changes in one type of energy demand exert on other types of energy demand cannot be determined, and this has a negative impact on the accuracy of load forecasting for the RIES. In recent years, deep learning has been widely used in load forecasting with its strong learning capability and data feature extraction ability [6], with techniques such as convolutional neural networks (CNNs) [7, 8], long shortterm memory (LSTM) [9, 10], recurrent neural networks (RNNs) [11], and deep belief networks (DBNs) [12], etc.
To extract the coupling relations between load features, reference [8] uses a CNN to excavate and fuse the load features in highdimensional space, and feeds the fused features into LSTM in a timeseries manner. A dynamic time series feature extraction method based on LSTM is proposed to excavate the relevance among the series in different times by reconstructing the input sequence through the encoderdecoder model in [10]. Experimental results of the methods proposed in [8] and [10] show that the prediction accuracy can be improved by using coupling information between different energy loads. However, these methods usually build the model based on original features (OFRs) obtained from the historical load data directly as the input of the load forecasting model, but rarely consider the internal coupling relations between OFRs. The load forecasting model is then unable to make full use of the intrinsic relations at the feature level to obtain better performance. To fully extract and quantify the coupling relations, a method of constructing coupled feature representation (CFR) is proposed to quantify the global dependence of continuous features in [13]. In [14], coupled features of different image data based on CFR are proposed for multimodal Alzheimerâ€™s disease diagnosis. This achieves higher diagnostic accuracy than existing methods. Inspired by these studies and considering the continuity of loads data and strong correlations between cold, heat, and electric loads, this paper introduces CFR to extract the intrinsic relations between OFRs.
To extract coupling relations between different load forecasting tasks, attention is mainly paid to MTL, which puts multiple related tasks together to learn and realize information sharing to make use of the coupling relations between different tasks. Both [15] and [16] achieve good prediction results based on MTL, and this proves the advantage of MTL in the field of multitype load prediction. However, deep learning's opaque structure might hinder further analysis or improvement of the model.
The planning and operation of an RIES highly depend on accurate and reliable load forecasting. Interpretability means finding out the key features that affect the prediction results and the basis of prediction decision. This can then provide a reference for the review of the prediction results and model improvement. Previous work on the interpretability of the deep learning model include sensitivity analysis [17], surrogate model [18], and SHAP [19]. The sensitivity analysis and surrogate model are local interpretation methods, which can only interpret the output results of an individual sample. Thus, they prevent analysis of the importance of input features to the model based on global samples. In addition, the effect of LIME is limited by the surrogate model itself, and the similarity between the surrogate model and target model is also difficult to measure. SHAP, proposed in [19], is an additive feature attribution method based on game theory, and can be used for local and global interpretation of arbitrary models. Some studies on the interpretability of load forecasting have been carried out. For example, based on SHAP, an explainable framework is built in [20] to explain the random forest prediction model for failure modes of reinforced concrete members, while [21] establishes an attribution analysis framework for transient voltage stability evaluation. Experimental results show that the SHAP value can quantify the relationships between input features and output results, and facilitate the understanding of model decisions and complex nonlinear relationships in the model [20,21,22].
In this paper, for better performance and interpretability of the load forecasting model of an RIES, an explainable framework is proposed which consists of two main parts: the load forecasting model and its interpretation. To extract the strong coupling relations between different energy loads at the feature level, CFR proposed in [13] is introduced. Its main idea is to quantify the coupling relations between OFRs by constructing the representation containing intracoupled relation and intercoupled relation as part of the input variables of the prediction model. The MTL model with LSTM as sharing layer (LSTMMTL) is adopted as the base model to excavate coupling relations between different energy loads at the forecasting task levels. To improve the interpretability of the load forecasting model, SHAP is introduced to provide global and local interpretation. The global interpretation can reveal the relationships between the values of features and prediction results, and measure the contribution of features to the different load forecasting tasks. This can provide a basis for selecting input features of the load forecasting model. For the local interpretation, it is used to provide details of the contributions of input feature variables to the load prediction of an individual sample.
Experiments based on load datasets provided by Arizona State University Integrated Energy System validate the feasibility of the framework designed in this paper.
The main contributions of this paper are as follows:

(1)
By introducing the calculation representation of the coupled features as the model input variables and constructing the LSTMMTL model based on hard parameter sharing, the coupling relations between different energy loads can be excavated at the feature level and the forecasting task level, respectively.

(2)
Global and local interpretation based on SHAP is proposed to analyze the difference in the contribution of each feature to prediction results and the relationships between input features and prediction results, so as to improve the interpretability of the load forecasting model of an RIES.

(3)
An input variable selection strategy based on the global SHAP value is proposed to select input feature variables. This helps eliminate the overfitting and overcomputation of the model by removing the input features that contribute less to the prediction model.
The rest of this paper is organized as follows. In Sect.Â 2, the details of the CFR construction method and LSTMMTL load forecasting model are provided. The interpretation method of the LSTMMTL model and its input variable selection method based on SHAP are provided in Sect.Â 3, while Sect.Â 4 summarizes features selection and the experimental process of the explainable framework. Results of the proposed explainable framework are presented in Sect.Â 5, where the implementation of the framework is discussed. SectionÂ 6 concludes the paper and discusses future directions.
2 Proposed RIES load forecasting model
In order to extract the coupling relations between different energy loads in an RIES from the perspectives of input features and forecasting tasks, a coupled feature construction method based on CFR is introduced to extract the featurelevel intrinsic relations between input loads. It combines the representation of intracoupled and intercoupled relations as part of the input variables of the load forecasting model. The load prediction model based on LSTMMTL is built to excavate the coupling relations between different forecasting tasks. Given that the LSTM model has a strong dynamic feature extraction ability of time series and the fully connected network is suitable for the final fitting of the model after feature extraction, LSTM is adopted as the sharing layer, and the fully connected network is used for each subtask in the LSTMMTL model as shown in Fig.Â 1.
2.1 Coupled features representation (CFR)
CFR can capture the global dependence of continuous features by constructing intracoupled relations and intercoupled relations [13]. In this paper, the coupled features between cold, heat, and electric loads are constructed based on CFR. The process of constructing coupled features is listed as follows (the superscripts \(C, \, H,\) and \(P\) represent cold, heat, and electric loads, respectively).

(1)
Suppose that there are M samples including cold, heat, and electric loads, and \(z_{ij}\)\((1 \le i \le M, \, j \in \left\{ {C, \, H, \, P} \right\})\) represents the OFR vector of the ith sample. The expansion matrix \({\mathbf{Z}}\) of the cold, heat, and electric loads is defined as:
$${\mathbf{Z}} = \left( {\begin{array}{*{20}c} {z_{1C}^{1} } & \cdots & {z_{1C}^{E} } & {z_{1H}^{1} } & \cdots & {z_{1H}^{E} } & \cdots & {z_{1P}^{E} } \\ {z_{2C}^{1} } & \cdots & {z_{2C}^{E} } & {z_{2H}^{1} } & \cdots & {z_{2H}^{E} } & \cdots & {z_{2P}^{E} } \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \cdots & \vdots \\ {z_{iC}^{1} } & \cdots & {z_{iC}^{E} } & {z_{iH}^{1} } & \cdots & {z_{iH}^{E} } & \cdots & {z_{iP}^{E} } \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \cdots & \vdots \\ {z_{MC}^{1} } & \cdots & {z_{MC}^{E} } & {z_{MH}^{1} } & \cdots & {z_{MH}^{E} } & \cdots & {z_{MP}^{E} } \\ \end{array} } \right)$$(1)where \({\text{z}}_{iC} {\text{, z}}_{iH}\), and \({\text{z}}_{iP}\) represent the cold, heat, and electric load values of the ith sample, respectively. \(z_{ij}^{e}\) represents the eth \((e = 1,2, \cdot \cdot \cdot ,E)\) power of \(z_{ij}\), e.g., \(z_{ij}^{2}\) represents the square of \(z_{ij}\).\(E\) is the maximal expansion number.

(2)
Calculate the intracoupled relation of cold, heat, and electric loads, and the intracoupled relation matrix is defined as (taking the cold load as an example):
$$\Theta (C) = \left( {\begin{array}{*{20}c} {\Theta_{11}^{C} } & {\Theta_{12}^{C} } & \cdots & {\Theta_{1E}^{C} } \\ {\Theta_{21}^{C} } & {\Theta_{22}^{C} } & \cdots & {\Theta_{2E}^{C} } \\ \vdots & \vdots & \ddots & \vdots \\ {\Theta_{E1}^{C} } & {\Theta_{E2}^{C} } & \cdots & {\Theta_{EE}^{C} } \\ \end{array} } \right)$$(2)where \(\Theta_{{e_{1} e_{2} }}^{C}\) is the Pearsonâ€™s correlation coefficient (PCC) between \({\mathbf{Z}}_{iC}^{{e_{1} }}\) and \({\mathbf{Z}}_{iC}^{{e_{2} }}\). \({\mathbf{\rm Z}}_{iC}^{e} { = }[z_{1C}^{e} ,z_{2C}^{e} , \cdots ,z_{MC}^{e} ]^{{\text{T}}}\) is the eth column vector of expansion matrix \({\mathbf{\rm Z}}\).
Suppose that there are \(M\) data pairs \(\left( {x_{i} ,y_{i} } \right)\). The PCC is calculated as:
$$\rho = \frac{{M\sum\limits_{i = 1}^{M} {x_{i} y_{i} }  \sum\limits_{i = 1}^{M} {x_{i} \sum\limits_{i = 1}^{M} {y_{i} } } }}{{\sqrt {M\sum\limits_{i = 1}^{M} {x_{i}^{2} }  (\sum\limits_{i = 1}^{M} {x_{i} } )^{2} } \sqrt {M\sum\limits_{i = 1}^{M} {y_{i}^{2} }  (\sum\limits_{i = 1}^{M} {y_{i} } )^{2} } }}$$(3)where \(\left( {x_{i} ,y_{i} } \right)\) represents \(({\mathbf{Z}}_{iC}^{{e_{1} }} ,{\mathbf{Z}}_{iC}^{{e_{2} }} )\). \(\rho \in [  1,1]\). The closer the value of \(\rho^{2}\) is to 1, the stronger the correlations between the two variables, and the positive and negative values of \(\rho\) represent positive and negative correlation, respectively.

(3)
Calculate the intercoupled relation between cold, heat, and electric loads, and the intercoupled relation matrix is defined as (taking cold load as an example):
$$\Lambda (C) = \left( {\begin{array}{*{20}c} {\Lambda_{11}^{HC} } & \cdots & {\Lambda_{1E}^{HC} } & {\Lambda_{11}^{PC} } & \cdots & {\Lambda_{1E}^{PC} } \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ {\Lambda_{E1}^{HC} } & \cdots & {\Lambda_{EE}^{HC} } & {\Lambda_{E1}^{PC} } & \cdots & {\Lambda_{EE}^{PC} } \\ \end{array} } \right)$$(4)where \(\Lambda_{{e_{1} e_{2} }}^{HC}\) is the PCC between \({\mathbf{Z}}_{iH}^{{e_{1} }}\) and \({\mathbf{Z}}_{{{\text{i}}C}}^{{e_{2} }}\)\((e_{1} ,e_{2} = 1,2, \cdot \cdot \cdot ,E)\) of expansion matrix \({\mathbf{Z}}\).

(4)
For the cold load in the i^{th} sample, its intracoupled extension matrix is defined as:
$$Z_{{{\text{int}} ra}}^{C} (i) = [z_{iC}^{1} ,z_{iC}^{2} , \cdots ,z_{iC}^{E} ]$$(5) 
(5)
Its intercoupled extension matrix with heat and electric loads is defined as:
$$Z_{{{\text{int}} er}}^{C} (i) = [z_{iH}^{1} ,z_{iH}^{1} , \cdots ,z_{iH}^{E} ,z_{iP}^{1} ,z_{iP}^{2} , \cdots ,z_{iP}^{E} ]$$(6) 
(6)
The coefficient matrix \({\mathbf{w}}\) is introduced to set the weights of PCC between different vectors. For the i^{th} sample, \({\mathbf{w}} = [(1/1!),(1/2!), \cdots ,(1/E!)]\) and the CFR of cold load can be defined as:
$${\mathbf{u}}_{i} (C) = Z_{intra}^{C} (i) \odot {\mathbf{w}} \cdot \Theta (C) + Z_{{{\text{int}} er}}^{C} (i) \odot [{\mathbf{w}},{\mathbf{w}}] \cdot [\Lambda (C)]^{{\text{T}}}$$(7)where \(\odot\) is Hadamard product. \({\mathbf{u}}_{i} (C)\) uses a Taylorlike series to quantify the global dependency, since any analytic function can be approximated by a Taylor polynomial [13]. Although the above steps are illustrated for cold load, they can also be applied directly to heat load and electric load to get \({\mathbf{u}}_{i} (H)\) and \({\mathbf{u}}_{i} (P)\).

(7)
The CFR of the ith sample can be represented as:
$${\mathbf{u}}_{i} = [{\mathbf{u}}_{i} (C),{\mathbf{u}}_{i} (H),{\mathbf{u}}_{i} (P)]^{{\text{T}}} \in R^{3 \times E}$$(8)where \({\mathbf{u}}_{i}\) includes \(3 \times E\) coupled feature variables (CF). \({\mathbf{u}}_{i} (C)\), \({\mathbf{u}}_{i} (H)\), and \({\mathbf{u}}_{i} (P)\) represent the CFR of cold load, heat load, and electric load, respectively.
The coupled feature construction algorithm has the following characteristics: (1) The algorithm is an unsupervised learning method with strong generalization ability; (2) The robustness of the algorithm can be improved using global data; (3) The size of the power can be set flexibly according to the actual situation to extract the coupling relations; (4) The coupled features have physical interpretability and can be quantified to provide data support for subsequent interpretability research.
Previous study has shown that the maximal expansion number Eâ€‰=â€‰3 or 4 is large enough to capture the global couplings of attributes [13]. Thus, in the following experiments, the extended power \(E\) is taken as 3 to avoid excessive computation. A total of 9 coupled features are extracted, called CFR19. CFR13 mainly describe the coupling relations between cold load and other loads, while CFR46 and CFR79 describe the coupling relations of heat load and electric load, respectively.
2.2 LSTMMTL model
A load forecasting model based on LSTMMTL is proposed to learn the complex coupling information between cold, heat, and electric load forecasting tasks. The LSTMMTL model adopts a hard parameter sharing mechanism, which is less prone to overfitting in the case of strong coupling of the loads in the RIES prediction model with many model parameters and complex structure [23]. The core concept of LSTM is gate structure, which decides whether data should be discarded, controls information to be added to the LSTM unit, and outputs unit state information by introducing forget gate \(f_{t}\), input gate \(i_{t}\), and output gate \(o_{t}\), respectively [24]. The formulas involved are:
where the meanings of the relevant variables in (9â€“14) are described in [24].
In the LSTMMTL load forecasting model, the input vector is considered as \(X = \left( {x_{1} ,x_{2} , \cdot \cdot \cdot ,x_{t} } \right)\), and is transmitted to the LSTM sharing layer, where the coupling relations among the three load prediction subtasks are captured through sharing underlying parameters. Then, the output vector \(H = \left( {h_{1} ,h_{2} , \cdot \cdot \cdot ,h_{t} } \right)\) is connected to the linear layer of the three subtasks and the output result is defined as:
where \(Relu\) is the activation function, \(W\) is the weight matrix, and \(b\) is the bias. \(y_{i}^{k}\) is the kth (kâ€‰=â€‰1, 2, and 3 represents cold, heat, and electric load forecasting task, respectively) subtask output result, and the different toplevel parameters can be obtained through the linear layer of each subtask.
Finally, the loss function of the kth subtask is defined as \(L^{k} (\theta^{{{\text{share}}}} ,\theta^{k} )\), where \(\theta^{{{\text{share}}}}\) and \(\theta^{k}\) represent the sharing parameters among subtasks and the unique parameters of the k^{th} subtask, respectively. The integral optimization loss function of LSTMMTL is defined as:
where Tâ€‰=â€‰3 represents that the model has three subtasks in total, and \(a^{k}\) is the weight coefficient of the k^{th} subtask.
Finding the suitable value of \(a^{k}\) to balance the contribution of each task is the key to MTL. The three prediction tasks of cold, heat, and electric are strongly coupled, and the goal of all the three prediction tasks is to minimize the prediction error with the same evaluation index MAE, which makes the three prediction tasks similar. In the following experiments, the goal is to minimize the overall prediction error of the three tasks, and we preset the \(a^{k}\) of each task to 0.4, 0.2, and 0.4, respectively, according to the peak ratio of the three loads in the RIES. In addition, the prediction errors are compared by setting different weights as shown in Table 6 in the Appendix. By training the model, the optimal parameters \(\theta^{{{\text{share}}}}\) and \(\theta^{k}\) are obtained, which helps extract the coupling relations between different subtasks and learn the differences between subtasks, respectively.
To summarize, CFR is introduced to extract the featurelevel intrinsic relations between input loads, and the prediction model based on LSTMMTL is built to determine the complex coupling information between cold, heat, and electric load forecasting tasks. Therefore, the coupling relations between loads can be extracted from the perspectives of input features and forecasting tasks to improve the model performance.
3 Explainable framework
The LSTMMTL model of RIES load forecasting described above is a complex integrated model, and its lack of interpretability can lead to user distrust in model prediction results and hinder further analysis or improvement of the model. To improve the interpretability of the prediction model, this section uses SHAP to provide an explainable framework for the LSTMMTL prediction model, and an input variable selection strategy based on the global SHAP value is proposed to avoid overfitting and overcomputation.
3.1 SHAP (Shapley additive explanation)
SHAP was proposed by Lundberg and SuIn Lee in 2017 based on cooperative game theory to describe the contribution of a feature to the machinelearning model by calculating the average of the marginal contributions. An overview of how to use SHAP to interpret the predictions of the LSTMMTL model is shown in Fig.Â 2. The process of calculating the SHAP value is as follows.
Supposing \({\varvec{x}} = (x_{1} ,x_{2} , \cdot \cdot \cdot x_{M} )\) is the input variable vector of the prediction model, where \(M\) is the number of input variables, an explanation model (linear and additive model) \(g\left( {\user2{x^{\prime}}} \right)\) is applied to fit the original model (the cold, heat, and electric load forecasting model based on LSTMMTL) \(f({\varvec{x}})\) as:
where \(\phi_{0}\) is the average of the explanation model, and \(\phi_{i}\) is the SHAP value of the ith feature. \(\user2{x^{\prime}} \in \left\{ {0,1} \right\}^{M}\) is the coalition vector (simplified features), which is the binary mapping variable of the input features. Inputs \({\varvec{x}}\) and \(\user2{x^{\prime}}\) are related through a mapping function \({\varvec{x}}{ = }h(\user2{x^{\prime}}),\) where the presence of 1 or 0 at the ith position of \(\user2{x^{\prime}}\) means the presence or absence of the i^{th} feature of \({\varvec{x}}\), respectively. For example, there are five features of a, b, c, d, and e in input feature vector \({\varvec{x}}_{1}\), which can be represented as \(\user2{x^{\prime}}_{1}\)â€‰=â€‰[1, 1, 1, 1, 1], while there are only three features of a, d, and e in feature vector \({\varvec{x}}_{2}\), which can be represented as \(\user2{x^{\prime}}_{2}\)â€‰=â€‰[1,0,0,1,1] by function \(h(\user2{x^{\prime}})\).
The linear and additive model \(g\left( {\user2{x^{\prime}}} \right)\) is fitted by optimizing the loss function \(L\) as:
where \(X\) is the set of coalition vector \(\user2{x^{\prime}}\).
There may be more than one valid feature in the input feature \(\user2{x^{\prime}}\). This will affect the evaluation of the contribution of an individual feature. In order to reduce this impact, the weight of loss function \(L\) is defined as:
where \({\kern 1pt} \left {\user2{x^{\prime}}} \right\) represents the number of valid input features in \(\user2{x^{\prime}}\).
The details of calculating the SHAP value of input features \({\varvec{x}}\) are shown below.

(1)
Create the training data set \(X\) in random ways, \(X\) is the set of coalition vectors \(\user2{x^{\prime}}_{k} \in \left\{ {0,1} \right\}^{M}\), \(k \in \left\{ {1,2, \cdot \cdot \cdot ,K} \right\}\). \(K\) represents the number of coalition vectors \(\user2{x^{\prime}}\) in \(X\).

(2)
Convert each \(\user2{x^{\prime}}_{k}\) to original feature space using \({\varvec{x}}_{k} { = }h(\user2{x^{\prime}}_{k} ),\) and obtain the prediction using \(f(h(\user2{x^{\prime}}_{k} )\).

(3)
Calculate the weight \(\pi_{x} (\user2{x^{\prime}}_{k} )\) of each \({\varvec{x}}_{k}\) using (19).

(4)
Fit the explanation model \(g\left( {\user2{x^{\prime}}} \right)\) by optimizing the loss function \(L\).

(5)
Return \(\phi_{i}\) (the SHAP value of the ith feature in \({\varvec{x}}\)).
Compared with other methods, SHAP has a solid theoretical foundation, is fast, and has the advantage of local accuracy and consistency [29]. This indicates that the SHAP value of each feature is uniquely determined and can correctly reflect the contribution of features to the model. References [25,26,27] prove that the SHAP value is the only attribution method that satisfies all the above properties at the same time.
3.2 Principles of interpretation
Applying SHAP to the interpretation of the cold, heat, and electric load forecasting model and fitting the load forecasting model with an interpretable linear model, the contribution of each feature variable to cold, heat, and electric load prediction results can be obtained based on the SHAP value. Using (17â€“19), the SHAP values in the kth (kâ€‰=â€‰1, 2, and 3 represents cold, heat, and electric load forecasting tasks, respectively) prediction subtask can be obtained as:
where \(y_{i}^{k}\) is the prediction value of the ith sample in the kth prediction subtask, \(y_{base}^{k}\) is the average of the kth prediction subtask, and \(M\) is the number of input features. \(x_{ij}\) is the jth feature of the ith sample and \(f^{k} (x_{ij} )\) is its SHAP value, which indicates the contribution of the feature to the kth prediction subtask. \(f^{k} (x_{ij} ) > 0\) indicates that the feature increases the prediction value, while \(f^{k} (x_{ij} ) < 0\) indicates that it decreases the prediction value.
As shown in Fig.Â 3, the explainable framework of the load forecasting model for RIES includes two parts: loads prediction and interpretation. The left side of Fig.Â 3 shows the relationships between the SHAP model and the LSTMMTL model, while the SHAP value calculation is built on the basis of the LSTMMTL model. The right side of Fig.Â 3 shows the interpretability application based on SHAP, where the typical daily sample and the global sample are input into the SHAP model to obtain local interpretability and global interpretability, respectively.
3.2.1 Global interpretation
The global interpretation is generated by analyzing the average contribution of input feature variables to the prediction results based on the global samples and can be obtained by calculating the average SHAP value of each feature variable with global samples. Therefore, the SHAP value of the feature variables in the cold, heat, and electric load forecasting tasks can be calculated separately with global samples to measure the importance of each feature variable in different forecasting tasks. By weighting and summing the SHAP values of cold, heat, and electric loads at a certain proportion, the importance of feature variables to the whole prediction model can be obtained, and an input variable selection strategy based on the global SHAP value is proposed to improve the prediction model by removing the input features that contribute less to the prediction model.
3.2.2 Local interpretation
The local interpretation is generated by analyzing the contribution of input feature variables to the load prediction of an individual sample, and the local interpretation of the ith sample can be obtained using (20). In this paper, typical daily samples in winter and summer are selected to analyze the relationships between the SHAP distribution of the feature variables and the load prediction results in different seasons. This provides a further explanation of the decision basis of the prediction model and enhances the credibility of the load forecasting model.
4 Experimental setup
In this section, the details of prediction and interpretation are shown, including the selection of input variables for the LSTMMTL prediction model, the evaluation method of the prediction model performance, and the main experimental process of interpretable analysis and application for load forecasting of RIES.
4.1 Input variables of load forecasting model
Selecting rational feature variables is the key to obtaining accurate RIES load prediction results. In addition to the historical loads and the constructed coupled feature variables (CF), load demand is also affected by other external factors such as season, week, hour, etc. Table 1 shows the feature variables selected in this paper. The time window length of the LSTMMTL model is a very important parameter in deciding the size of input features. Considering the time series characteristics of the load, the time windows from 24Â h, 48Â h, 72Â h, and 96Â h are chosen, as shown in Table 7 in the Appendix. According to the method shown in Fig.Â 4, the highest prediction accuracy is achieved when the time window is 72Â h, and thus, it is set as the time window of the LSTMMTL model, which predicts the load at the current moment based on the input variables 72Â h before.
4.2 Evaluation metric for forecasting accuracy
Mean absolute percentage error (MAPE) and root mean square error (RMES) are used as error evaluation metrics to evaluate the prediction method described above. The relevant formulas are:
where \(x_{i}\) and \(y_{i}\) are the true value and prediction value, respectively, and \(n\) is the number of samples.
In this paper, the RIES studied is located in the hot Phoenix, where the cold load and electric load demand is higher than the heat load demand. Thus, the importance of cold, heat, and electric loads is weighted to 0.4, 0.2, and 0.4, respectively, according to the ratio of cold, heat, and electric peak loads. The overall errors of the RIES load forecasting model are:
4.3 Prediction and interpretable analysis process
This section introduces the main experimental process of cold, heat, and electric load prediction of RIES and its interpretable analysis. Since the training time of the LSTMMTL model is long, the experimental process is divided into two parts: offline training and online application as shown in Fig.Â 4.
4.3.1 Offline training

Step 1 Collect historical data of cold, heat, and electric loads and other external data in Table 1 as part of the input variable set.

Step 2 Construct the coupled features by using (1â€“8).

Step 3 Build the LSTMMTL load forecasting model, and determine some hyperparameters such as learning rate, optimizer, time window, and weight based on experience and task characteristics.

Step 4 Train the model until the number of iterations reaches the preset value or the accuracy improvement is less than the preset value.

Step 5 Repeat Step 4 and update the hyperparameters in step 3 according to the prediction accuracy.

Step 6 Fit the explanation model and obtain the SHAP value to provide local interpretation, global interpretation, and input variable selection strategy for the prediction model.
4.3.2 Online application

Step 7 Obtain the RIES realtime data and calculate the coupled features.

Step 8 Input the feature variables into the improved model to obtain the load prediction results of RIES and provide explainable analysis for the prediction results.
5 Results and discussion
5.1 Data set
The load data is collected by the campus of Arizona State University in Tempe, USA, where the minimum daily average heat load in summer is about 60% of the annual average heat load, and the minimum daily average cold load in winter is 24% of the annual average cold load. The units of cold load, heat load, and electric load are Ton/h, mmBtu/h, and kW, respectively, and their unit conversion formula is:
The temporal granularity of this data set is 1Â h, i.e., the daily load curve includes 24 points and spans from January 1, 2019 to July 15, 2020, totaling 562Â days. For the modeling part, the training, validation, and test datasets account for 70, 15 and 15%, respectively. The temperature data is obtained from the actual data at the Phoenix weather station [28].
5.2 Coupled feature extraction and load prediction results
Table 2 shows the CF extraction results and the PCC between coupled features and cold, heat, and electric loads. The prediction results of the LSTMMTL model with CF (CFLSTMMTL) for RIES load prediction are shown in Fig.Â 5, and the distribution of the hourly MAPE values is shown in Fig.Â 6. By comparing the prediction errors of the prediction model with or without CF and MTL, the advantages of the coupled features and prediction model constructed in this paper are verified as shown in Table 3.
As shown in Fig.Â 6, most MAPE values of the CFLSTMMTL prediction model for cold, heat, and electric loads are less than 5%. From Table 3, the prediction accuracy of both heat and electric loads are improved significantly after adding CF into the LSTM model (CFLSTM) and the prediction accuracy of all the three loads is significantly improved after introducing MTL and CF together. From the results, we can conclude:

(1)
The construction of coupled features and the multitask learning model can improve the accuracy of the prediction model and avoid modeling the three loads simultaneously. This greatly improves the practical value.

(2)
Nine coupled features are extracted and the PCC between coupled features and loads are very high (PCCâ€‰>â€‰0.8), which satisfies the requirements of the traditional input variable selection method based on correlation. Excess coupled features may lead to overfitting and overcomputation in load prediction and reduce the prediction accuracy. An input variable selection strategy based on SHAP value is proposed in a later experiment to improve the prediction model.
5.3 Global interpretation
The results of the global SHAP value distribution of RIES load forecasting and the contribution of different features to the prediction of cold, heat, and electric loads are shown in this section.
The global SHAP values of feature variables are ranked from large to small as shown in Fig.Â 7, where each point in the scatter plot represents a sample, the horizontal coordinate represents the SHAP value, and the darker color of the sample point represents the larger feature value.
As shown in Fig.Â 7, the feature variables have different contributions in different forecasting tasks, and the top six important features of cold, heat, and electric load forecasting are shown in Table 4. The following conclusions can be drawn:

(1)
In all three load predictions, the SHAP values of their own loads are large, indicating that their own loads have significant effects on the selfprediction results.

(2)
There are some coupled features with high SHAP in all three load predictions. Taking heat load prediction as an example, the SHAP values of coupled features such as CFR6 and CFR1 are larger than those of cold and electric loads. This indicates that some coupled features contribute highly to the prediction results and validate the effectiveness of the coupled features.

(3)
Month has a significant effect on the cold and heat load predictions. Taking the cold load prediction as an example, the SHAP values of â€˜largeâ€™ months (July, August, and September) are positive. This promotes the growth of cold load prediction results, whereas the SHAP values of â€˜smallâ€™ months (January, February, etc.) are negative. These decrease cold load prediction results. These are consistent with the actual situation that local cold load level is low in cold months and high in hot months.

(4)
In heat load prediction, SHAP values of temperature and hour are small and many external factors have no significant effect on heat load prediction. This is caused by the fact that Arizonaâ€™s overall temperature is high and its heat load remains stable.
In summary, global SHAP value distribution can visualize the relationships between the distribution of SHAP value and input features. These are consistent with objective understanding and laws, and validate the effectiveness of the constructed coupled features.
5.4 Local interpretation
The global interpretation mainly shows the contribution of each feature to different forecasting tasks based on the global sample. In this section, we further explain the basis of an individual sample prediction and analyze the contributions of different features to different prediction samples. Considering the limited space available in the paper, typical daily samples in summer and winter are selected for comparison and analysis.
FiguresÂ 8, 9 and 10 show the local interpretation results of the cold, heat, and electric loads, respectively. In these figures, the length of arrow represents the absolute SHAP value of the corresponding feature, the red arrow represents a positive SHAP value that drives predictions higher, and the blue arrow represents a negative SHAP value that makes predictions lower.
Local SHAP value distribution shows that there are significant seasonal differences in the contribution of each feature to the prediction results of cold, heat, and electric loads. Taking temperature and historical load as examples, we make the following analysis:

(1)
For the influence of historical load in the prediction of cold load, the SHAP value of heat load is a large negative value in both summer and winter as shown in Fig.Â 8. Similarly, in the prediction of heat load, the SHAP value of cold load is also a large negative value as shown in Fig.Â 9. These indicate that there is a strong negative correlation between the cold and heat loads, which is consistent with the existence of the strong complementarity between the energy use curves of cold and heat loads.

(2)
For the influence of temperature in summer, the SHAP value of temperature is positive in the prediction of both cold and electric loads as shown in Figs.Â 8a and 10a. This indicates that higher temperature can contribute to a higher load prediction value. In winter, the SHAP of temperature is negative in both heat and electric load predictions, which indicates that higher temperature can contribute to lower load prediction value. In terms of physical significance, high summer temperatures lead to higher cooling load demand and frequent use of electric cooling equipment, while low winter temperatures lead to higher heat load demand and frequent use of electric heating equipment.
In summary, local SHAP value distribution can visualize the relationships between input features and individual prediction results, and these visualizations are consistent with objective understanding and laws and improve the transparency of the prediction model and the credibility of the prediction results.
5.5 Features selection
According to the ratio of cold, heat, and electric peak loads, the global SHAP value of features in cold, heat, and electric load predictions are weighted to 0.4, 0.2, and 0.4, respectively, and the sum is shown in Fig.Â 11. It can be seen that the SHAP values of hour, historical load, CFR8, and CFR4 are at high levels, while the SHAP values of CFR7, CFR3, and CFR5 are at very low levels, which indicates that the three coupled features have less contribution to the prediction results. Then, an input variable selection strategy is applied to improve the model by removing the three coupled features from the input features, and the errors of the original model and improved model are shown in Table 5.
By comparison, it can be seen that MAPE and RMSE are reduced after applying the input variable selection strategy, which shows the strategy is effective and can improve the prediction accuracy of the model. In addition, the strategy can reduce the data dimension, which helps reduce the modeling effort and increase the practical value of the model.
6 Conclusion
In this paper, a coupled feature construction method is introduced to construct CFR between loads. It constructs the representation of intracoupled and intercoupled relations, and integrates the representation by a Taylorlike expansion as part of the input feature variables. It can extract and quantify the intrinsic relations between input loads. An LSTMMTL loads prediction model is proposed to determine the coupling relations between different load forecasting tasks based on a hard parameter sharing mechanism. The coupling relations between loads in the RIES are then fully used from the perspectives of input features and forecasting tasks. In order to improve the interpretability of the load forecasting model, global and local interpretations are provided. The global interpretation can show the important features of the prediction model, and be used to analyze the relationships between the SHAP value and input features. Local interpretation is used to interpret the load prediction results of an individual sample, and show the prediction basis of the prediction model. In addition, an input variable selection strategy based on the global interpretation results is proposed to simplify the prediction model and avoid overfitting and overcomputation by selecting input features according to their importance.
The case study has shown that the proposed model and input variable selection strategy can effectively improve the accuracy of load forecasting in an RIES, and these methods can be extended to the application of the prediction model in different scenarios. From the global and local SHAP value distributions of cold, heat, and electric load prediction, we can intuitively obtain the relationships among input features, SHAP value, and prediction results. The experiments show that the interpretation results are consistent with the characteristics of the load prediction. They can improve the credibility of the prediction results and provide a basis for users to undertake comprehensive energy optimization scheduling and management.
With the continuous development of RIES, main energy users will be extended to commercial areas, residential communities, office buildings, etc. As the objects of energy use become more diversified, the subsequent research needs more refined data support, including on the flow of customers in commercial areas, daily sales, residential area user types, etc. The data helps achieve a more threedimensional portrayal of the coupling relations between loads. Therefore, based on the proposed method in this paper, the load forecasting model and interpretation analysis can be further extended to these areas in the future.
Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 RIES:

Regional integrated energy system
 CFR:

Coupled feature representation
 MTL:

Multitask learning
 LSTM:

Long shortterm memory
 SHAP:

SHapley additive exPlanations
 OFRs:

Original features
 PCC:

Pearsonâ€™s correlation coefficient
 CF:

Coupled feature variables
References
Qian, A. I., & Ran, H. (2018). Key technologies and challenges for multienergy complementarity and optimization of integrated energy system. Automation of Electric Power Systems, 42(4), 2â€“10.
Zhu, M., Xu, C., Dong, S., et al. (2021). An integrated multienergy flow calculation method for electricitygasthermal integrated energy systems. Protection and Control of Modern Power Systems, 6(1), 1â€“12.
Liang, R., Wang, H., Wu, K., et al. (2020). Shortterm forecasting of cooling, heating and power loads based on neural network and arima model. Proceedings of the CSUEPSA, 32(3), 52â€“58.
Bozorg, M., Bracale, A., Caramia, P., et al. (2020). Bayesian bootstrap quantile regression for probabilistic photovoltaic power forecasting. Protection and Control of Modern Power Systems, 5(1), 1â€“12.
Tian, H., Han, A., Yu, L., et al. (2020). Research on multiload shortterm forecasting model of regional integrated energy system based on GRALSTM neural network. Guangdong Electric Power, 33(5), 44â€“51.
Peng, S., Huang, S., Li, B., et al. (2020). Charging pile load prediction based on deep learning quantile regression model. Power System Protection and Control, 48(2), 44â€“50.
Li, J. Research and implementation of news text classification system based on deep learning. (Doctoral dissertation, Beijing University of Posts and Telecommunications).
Li, R., Sun, F., Ding, X., et al. (2020). Ultra shortterm load forecasting for userlevel integrated energy system considering multienergy spatiotemporal coupling. Power System Technology, 44(11), 4121â€“4134.
Hu, W., Zhang, X., Li, Z., et al. (2022). Shortterm load forecasting based on an optimized VMDmRMRLSTM model. Power System Protection and Control, 50(1), 88â€“97.
Wang, X., Wang, S., Chen, H., et al. (2020). Multienergy load forecasting for regional integrated energy systems considering temporal dynamic and coupling characteristics. Energy, 195, 116964â€“116976.
Shi, D. Study and implementation of power load forecasting based on neural networks. (Doctoral dissertation, Shandong University).
Ryu, S., Noh, J., & Kim, H. (2017). Deep neural network based demand side short term load forecasting. Energies, 10(1), 3.
Wang, C., She, Z., & Cao, L. (2013). Coupled attribute analysis on numerical data. In Twentythird international joint conference on artificial intelligence.
Shi, Y., Suk, H. I., Gao, Y., Lee, S. W., & Shen, D. (2019). Leveraging coupled interaction for multimodal Alzheimerâ€™s disease diagnosis. IEEE transactions on neural networks and learning systems, 31(1), 186â€“200.
Tan, Z., De, G., Li, M., et al. (2020). Combined electricityheatcoolinggas load forecasting model for integrated energy system based on multitask learning and least square support vector machine. Journal of Cleaner Production, 248, 119252.
Shi, J. Energy supplydemand forecasting and optimal operating technology of localarea integrated energy system. (Doctoral dissertation, North China Electric Power University).
Saltelli, A., Tarantola, S., Campolongo, F., & Ratto, M. (2004). Sensitivity analysis in practice: A guide to assessing scientific models (Vol. 1). Wiley.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135â€“1144).
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Proceedings of the 31st international conference on neural information processing systems (pp. 4768â€“4777).
Mangalathu, S., Hwang, S. H., & Jeon, J. S. (2020). Failure mode and effects analysis of RC members based on machinelearningbased SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219, 110927.
Zhou, T., Yang, J., Zhan, X., et al. (2021). A datadriven method for transient voltage stability assessment and its interpretability analysis. Power System Technology, 45(11), 4416â€“4425.
Wang, M., Zheng, K., Yang, Y., & Wang, X. (2020). An explainable machine learning framework for intrusion detection systems. IEEE Access, 8, 73127â€“73141.
Caruana, R. (1993). Multitask learning: A knowledgebased source of inductive bias. Machine Learning Proceedings, 10(1), 41â€“48.
Greff, K., Srivastava, R. K., KoutnÃk, J., Steunebrink, B. R., & Schmidhuber, J. (2016). LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2222â€“2232.
Lundberg, S. M., Erion, G. G., & Lee, S. I. (2018). Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888.
Shapley, L. S. (1953). A value for nperson games. Contributions to Theory Games, 2(28), 307â€“317.
Young, H. P. (1985). Monotonic solutions of cooperative games. International Journal of Game Theory, 14(2), 65â€“72.
National Oceanic and Atmospheric Administration. Weather Data. http://www.noaa.gov/web.html. Accessed July 2020
Acknowledgements
I would like to show my gratitude to Arizona State University for providing necessary data.
Author's information
Kailang Wu (1999), male, Major in the AIrelated applications in load forecasting and Nonintrusive load monitoring.
Jie Gu (1971), female, PHD and Associate Professor at Shanghai Jiao Tong University, Major in power system planning, power markets, smart grids, microgrids, and intelligent algorithms.
Lu Meng (1997), female, M.S, Major in the power system planning and integrated energy system.
Honglin Wen (1996), male, PHD, Major in machine learning applications in power system, time series prediction, uncertainty quantification, and decision making under uncertainty.
Jinghuan Ma (1991), male, PHD, assistant professor at Shanghai Jiao Tong University, research interests include applied mathematics for electrical engineering, fundamentals of power system and power system economics, and analysis of power signals and data.
Funding
This work was supported in part by the National Key Research Program of China (2016YFB0900100) and Key Project of Shanghai Science and Technology Committee (18DZ1100303).
Author information
Authors and Affiliations
Contributions
Demand research, literature survey, experiment design, data collection, mathematical modeling, case study, and paper writing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wu, K., Gu, J., Meng, L. et al. An explainable framework for load forecasting of a regional integrated energy system based on coupled features and multitask learning. Prot Control Mod Power Syst 7, 24 (2022). https://doi.org/10.1186/s4160102200245y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4160102200245y
Keywords
 Load forecasting
 Regional integrated energy system
 Coupled feature
 SHapley additive exPlanations
 Interpretability of deep learning