 Original research
 Open access
 Published:
Distribution network state estimation based on attentionenhanced recurrent neural network pseudomeasurement modeling
Protection and Control of Modern Power Systems volume 8, Article number: 31 (2023)
Abstract
Because there is insufficient measurement data when implementing state estimation in distribution networks, this paper proposes an attentionenhanced recurrent neural network (ARNN)based pseudomeasurement modeling metho. First, based on analyzing the power series at the source and load end in the time and frequency domains, a perioddependent extrapolation model is established to characterize the power series in those domains. The complex mapping functions in the model are automatically represented by ARNNs to obtain an ARNNsbased perioddependent pseudomeasurement generation model. The distributed dynamic state estimation model of the distribution network is established, and the pseudomeasurement data generated by the model in real time is used as the input of the state estimation model together with the measurement data. The experimental results show that the method proposed can explore in depth the complex sequence characteristics of the measurement data such that the accuracy of the pseudomeasurement data is further improved. The results also show that the state estimation accuracy of a distribution network is very poor when there is a lack of measurement data, but is greatly improved by adding the pseudomeasurement data generated by the model proposed.
1 Introduction
Power system state estimation is the process of determining the internal state of an energy system (e.g., node voltage vectors), by “fusing” a mathematical model and input/output data measurements. State estimation is fundamental to many analysis, monitoring, and energy management tasks, and its accuracy has an important impact on power system security and stability [1]. The amount of measurement resources is an important factor affecting the estimation accuracy. Adequate measurement data can help to obtain high accuracy estimation results, while on the contrary, the estimation results may not reliably reflect the real state of the system or even make the state estimation problem insoluble if measurement data are inadequate [2].
In practice, the measurement system of a distribution network (DN) cannot provide sufficient resources because of the constraints of cost, communication delay, and uneven distribution of measurement points [3]. To solve this problem, researchers have proposed the solution of pseudomeasurement generation, i.e., artificially generating currently available "measurement data" using existing data (e.g., load power, distributed power output, etc.) without adding extra measurement equipment [3, 4]. Pseudomeasurement generation for DN state estimation has been widely studied. The literature in this field can be roughly categorized into two groups: statistical and probabilistic methods, and machinelearning methods. Statistical and probabilistic methods can explicitly characterize the distribution of the data. Reference [5] considers the nonnormal distribution properties of loads and the correlation of loads at different nodes, and fits a load density function satisfying lognormal and beta distributions based on hourly historical load data. However, the static characteristics of the load cannot be fully described by the mean and variance alone, and the generated load pseudomeasurement types are limited.
Compared with statistical and probabilistic methods, the machinelearningbased pseudomeasure generation method does not need to build an explicit mathematical model, and can directly mine and learn hidden characteristics from the data. This offers a stronger feature characterization capability. In [7], historical load data, adjacent time interval load difference, realtime temperature, humidity, and date are jointly used as inputs to train a Gaussian process regression model to obtain load pseudomeasurements. Reference [8] proposes a distributed gametheoreticbased correlation vector machine model to capture the uncertainty of node injection power and generate pseudomeasurements. However, Gaussian process regression models and correlated vector machines have very limited nonlinear fitting capability and cannot dig deeper into sequence features, and thus it is difficult for ordinary neural networks to mine longterm correlations in time series data (such as load and DG power). The proposed recurrent neural networks (RNNs) can solve this problem, because RNNs retain a memory of what had already been processed and thus can learn from previous iterations during the training. For example, reference [9] takes advantage of this by training recurrent neural network models through historical measurement data to generate more accurate pseudomeasurement data.
Although RNNs have a superior temporal correlation feature learning capability, their initial attention to each element in the input dataset is same, which is not conducive to fast screening of highvalue features. The attention mechanism can help to capture the model hidden layer dimensional relationships, and was initially used mainly for tasks such as natural language processing, where its focus was on the features needed for the target scenario. Reference [10] proposes a shortterm load forecasting method using a dual attention mechanism to improve the traditional gated recurrent unit (GRU). This weakens the influence of each input feature on the grid load situation and enhances the RNNs to capture the longtime dependence of the load data. The results from application of the algorithm show that the prediction accuracy is improved, to different degrees, compared with previous models. Therefore, this paper considers combining RNNs with an attention mechanism to generate pseudomeasurement data with higher accuracy.
For DN state estimation, a welldeveloped and widely used estimation algorithm is Weighted Least Squares (WLS). However, when WLS is directly applied to the state estimation problem, there are significant drawbacks in terms of computational speed and estimation accuracy. Specifically, in terms of computational speed, the computational time complexity of WLS grows in a power series with the number of state variables, and thus the centralized WLS state estimation of all state variables in the DN takes a lot of time, which makes it difficult to provide timely data support for other realtime tasks in dispatch management. In terms of estimation accuracy, the dynamic uncertainty of the state of the DN is greatly enhanced because of the randomness and fluctuation of the power of active components such as renewable energy, while as WLS only uses the current measurement information to obtain the optimal estimation, it is difficult to capture the dynamic change characteristics of the system. This leads to the final estimation results deviating from the real situation and not being able to provide reliable state data for dispatch management. To address the above problems, this paper adopts the technical idea of combining distributed estimation with the dynamic state estimation algorithm, also known as distributed dynamic state estimation. As shown in Fig. 1, the distributed estimation approach can divide a highdimensional state variable into multiple lowdimensional state variables for parallel estimation, which significantly reduces the computational complexity of the estimation. As shown in Fig. 2, the dynamic state estimation algorithm is different from the static estimation. It uses not only the current measurement information but also the historical state information to portray the trend of system state changes, to improve the estimation accuracy.
The main contributions of this paper are as follows:

(i)
Introducing a distributed dynamic state estimation model to achieve fast state estimation of a largescale distribution network.

(ii)
Establishing a periodcorrelation extrapolation model to characterize the time–frequency domain of power series.
It introduces a multiheaded attention mechanism to focus on data with high relevance to the target and combines it with RNNs to construct attentionenhanced RNNs (ARNNs). This can generate more accurate pseudomeasurement data to improve the accuracy of state estimation.
The remainder of this paper is organized as follows. In Sect. 2, the basic principles and algorithms of distributed dynamic state estimation are introduced, while in Sect. 3 the proposed periodiccorrelation pseudomeasurement generation model based on ARNNs is described. Section 4 summarizes data sources and experiment implementation, while results of the proposed model are presented in Sect. 5, where comparisons with existing models are discussed. Section 6 concludes the paper.
2 Distributed dynamic state estimation model
To address the problem of long estimation time caused by DN highdimensional state variables, and to further improve the accuracy of distribution network state estimation, distributed dynamic state estimation is adopted. The distributed estimation can greatly reduce the dimension of state variables and shorten the time of state estimation, and the use of dynamic estimation increases the effective use of state transition information and improves the accuracy of state estimation.
2.1 Distributed dynamic state estimation model and problem description
From a limited amount of measurement data which contain noise, it is difficult to obtain the true system state. However, multiple available valid information (e.g., measurement data and historical state estimates) can create conditions for computing optimal or suboptimal estimates of the true state. Specifically, the state transfer characteristics and their noise distribution obtained from the historical state estimates provide a priori information that reflects the trend of the system, while the measurement data and their error statistics can help measure the likelihood that the estimated state is consistent with the true state. Based on state transfer properties, measurement data and noise statistics, the dynamic estimation algorithm can find the optimal or suboptimal estimation of the system state [16].
2.1.1 State transition equation for distributed dynamic state estimation
Let a DN have s subareas, where the k^{th} area has \(n_{k}\) nodes. Denote the set of nodes constituting the k^{th} area as \(A{}_{k}\) and its state variable at moment t is \(x_{t}^{(k)} = [\theta_{t,1}^{(k)} , \ldots ,\theta_{{t,n_{k} }}^{(k)} ,V_{t,1}^{(k)} , \ldots ,V_{{t,n_{k} }}^{(k)} ]^{T}\), where \(\theta_{t,i}^{(k)}\) and \(V_{t,i}^{(k)}\) represent the phase angle and amplitude of the voltage of the i^{th} node at moment \(t\), respectively. The state transition trend of \(x_{t}^{(k)}\) can be described with a discrete firstorder Markov model, as:
where \(w_{i}^{(k)}\) is the state transition noise and its distribution properties need to be described by a known probability distribution function (pdf), which is called prior pdf, i.e., \(w_{i}^{(k)} \sim p(x_{t + 1} x_{t} )\). The most commonly used Gaussian pdf is chosen to characterize \(w_{i}^{(k)}\). Let \(w_{i}^{(k)}\) obey the mean of 0 and the covariance matrix of \(W_{i}\) with a Gaussian distribution, denoted as \(w_{i}^{(k)} \sim N(0,W_{I} )\). \(f^{(k)} ( \cdot )\) is the state transfer function of \(x_{t}^{(k)}\), and in general it is nonlinear. In order to reduce the computational complexity, \(f^{(k)} ( \cdot )\) is usually linearized to be explicitly expressed [20], i.e.:
Set \(\tilde{x}_{t  1}^{(k)}\) as the status value calculated by the state transition equation, \(\hat{x}_{t  1}^{(k)}\) as the final state estimation results at time \(t  1\), \(F_{t}^{(k)}\) and \(G_{t}^{(k)}\) in (2) can then be obtained by using HoltWinters exponential smoothing method, as:
where γ and ε are artificial smoothing parameters, \(a_{t  1}^{(k)}\) is the state change of horizontal component, and \(b_{t  1}^{(k)}\) is the state change of trend component at time \(t  1\). The value of both at time t can be updated to:
2.1.2 Measurement equations for distributed dynamic state estimation
The mathematical relationship between the measurement data and the system state \(x_{t}^{(k)}\) can be expressed using the measurement equations. Let all the measurement data that can be received at time \(t\) be the measurement vector \(z_{t}^{(k)}\), and assume that all the measurement noise \(r_{t}^{(k)}\) are independent of each other. Then the mathematical relationship between \(x_{t}^{(k)}\) and \(z_{t}^{(k)}\) can be described as:
Similar to \(w_{i}\) in (1), \(r_{t}^{(k)}\) is also described by a given pdf, called the likelihood pdf \(p(z_{t} x_{t} )\), i.e. \(r_{i}^{(k)} \sim p(z_{t} x_{t} )\). Similarly, the most commonly used Gaussian pdf is chosen, i.e., \(r_{t}^{(k)}\) obeys a Gaussian distribution with mean 0 and covariance matrix \(R_{t}\) of the Gaussian distribution, denoted as \(r_{i}^{(k)} \sim N(0,R_{t} )\). \(h^{(k)} ( \cdot )\) represents the nonlinear measurement function, which is determined by the type of measurement data and the state variable.
2.1.3 Distributed dynamic state estimation problem and solution
The objective of DN state estimation is to minimize the estimation errors of all nodes in the DN, i.e., the difference between the estimated state and the true state. According to [11], the optimization objective of the distributed dynamic estimation problem at time \(t\) can be expressed as:
where \(E( \cdot )\) is the expectation function, while \(x_{t}^{(k)}\) and \(\hat{x}_{t}^{(k)}\) represent the true and estimated states, respectively. \(S_{t}^{(k)}\) is a custom weight function, which is usually set based on the measurement error. The constraints of the DN state estimation problem are the state transition equation of (1) and the measurement equation of (7).
Distributed dynamic state estimation for DN is accomplished quickly through three steps: local state estimation, boundary state consistency and local state update for global estimation. Among the three steps, both the local state estimation step and the local state update step can be done based on existing dynamic estimation algorithms, while the boundary state consistency step relies on boundary consistency estimation methods that can compute global estimates of boundary nodes. In the following paragraphs, the three steps of the distributed dynamic estimation method will be introduced as an example of solving the global estimate of the subarea k at time \(t + 1\).
(1) Local state estimation: the computational process of local state estimation can be expressed as a whole as:
where \(b_{t + 1}^{(k)} ( \cdot )\) is the mapping function between the global state estimate \(\hat{x}_{t}^{(k)}\) at time \(t\) of subarea \(k\) and \(\hat{x}_{t + 1}^{l,(k)}\) at time \(t + 1\). \(\hat{\Lambda }_{t + 1}^{(k)}\) is the set of parameters of function \(b_{t + 1}^{(k)} ( \cdot )\), which includes process noise, measurement noise, local measurement data, etc.
(2) Boundary state consistency: \(\hat{x}_{t + 1}^{l,(k)}\) is the estimation result obtained for the subarea \(k\) based on locally valid information, without taking into account the external valid information. This makes it different from the corresponding global estimation results. Thus, the subarea \(k\) needs to perform consistent estimation of the boundary state, i.e., exchanging information with all neighboring subareas and obtaining all external information related to its boundary nodes, so that the local estimate of the boundary nodes in \(\hat{x}_{t + 1}^{l,(k)}\) of the local estimate is corrected to the global estimate.
(3) Local state update: after obtaining the global estimation result \(\hat{x}_{t + 1}^{l,(k,ib)}\) of the boundary nodes in subarea k, the local estimation of the internal nodes (all nodes except the boundary nodes) in subarea k result \(\hat{x}_{t + 1}^{l,(k)}\) is corrected by using it as the base, and finally the global state estimation result of subarea k is obtained as:
where \(e_{t + 1}^{(k)} ( \cdot )\) is the mapping function of subarea k with its local state estimate \(\hat{x}_{t + 1}^{l,(k)}\), boundary global estimate \(\hat{x}_{t + 1}^{(k,ib)}\) and subarea k global state estimate \(\hat{x}_{t + 1}^{(k)}\), after achieving the boundary state consistency. \(\Upsilon_{t + 1}^{(k)}\) is the set of parameters of function \(e_{t + 1}^{(k)} ( \cdot )\), which includes locally valid information for the subarea as well as externally valid information obtained from other subareas.
2.2 Distributed state estimation algorithm
In existing studies, the theoretical basis for solving dynamic state estimation problems is Bayesian filtering [12]. According to (1) and (7), the power system is a dynamic system with firstorder Markovianity, and the measurement at any moment depends only on the state at that moment and is independent of the measurement and state at other moments. Let the initial state of the power system be \(x_{0}\), the state at time t is \(x_{t}\), the measurement data is \(z_{t}\). Its priori pdf is denoted as \(p(x_{t} x_{t  1} )\), and the likelihood pdf is \(p(z_{t} x_{t} )\). According to Bayesian theory, the a posterior pdf that contains the complete statistical information of \(x_{t}\) can be obtained by calculating the a priori pdf and the likelihood pdf. Therefore, the statistical properties of the state variables characterized by the posterior pdf are considered to be approximately equal to the statistical properties of the real state, i.e., the optimal estimate of the state variables can be obtained from the posterior pdf. According to the state transition equation and the measurement equation, Bayesian filtering can obtain the posterior pdf of \(x_{t}\) through the prediction step and the update step.
(1) Prediction step: using the previous \(t  1\) moments of the measure \(z_{1:t  1} = [z_{1} ,z_{2} , \ldots ,z_{t  1} ]\), and the posterior pdf \(p(x_{t  1} z_{1:t  1} )\) at the time \(t  1\) and the prior pdf \(p(x_{t} x_{t  1} )\) at the time t, the predicted probability function at time t can be calculated as:
It can be seen that the prediction step of Bayesian filtering is actually a fusion of statistical properties of the state variables of posterior information at \(t  1\) and prior information at t.
(2) Update step: On the basis of the prediction step, the measurement \(z_{t}\) at time t and the corresponding likelihood pdf \(p(z_{t} x_{t} )\) are added to obtain the posterior pdf of the state variables at time t:
where \(\eta = \int_{  \infty }^{ + \infty } {p(z_{t} x_{t} )} p(x_{t} z_{1:t  1} )dx_{t}\) is a constant unrelated to \(x_{t}\). Usually, the expectation of the posterior pdf is the result of the state estimation at time t, as:
The flow chart and the calculation procedure of the distributed dynamic state estimation of the DN are shown in Figs. 3 and 4, respectively. It can be seen that the solution of the state estimation problem is influenced by the quality of the measurement data in each subregion of the DN. However, because of economic constraints, only a small number of PMUs are installed in the DN subregion, and most of the available measurement data are still provided by traditional measurement systems, including SCADA systems and Advanced Metering Infrastructure (AMI). Although SCADA systems can provide voltage amplitude and node injection power measurements every second, the number of measurements obtained may be fewer than the number of state variables because of communication delays and limited measurement points, while the data upload period of AMI is 5 or 15 min, which cannot meet the realtime requirements for state estimation [17]. Therefore, the DN subregion suffers from a scarcity of measurements in the secondlevel. Therefore, in this paper, a periodiccorrelation pseudomeasurement generation model based on ARNNs is considered to generate pseudomeasurement data in real time to alleviate the measurement scarcity problem in the DN subregion.
3 A periodiccorrelation pseudomeasures generation model based on attentionenhanced recurrent neural networks
Analysis and modeling of time–frequency domain characteristics of load and distributed power output.
3.1 Periodiccorrelation extrapolation model
Generally, the sourceload power series has a strong correlation with the data that lags itself by several time steps. The autocorrelation of the different time series can be described by the maximum lag time step and the minimum autocorrelation function (ACF) value, where the maximum lag time step is selected by humans and the minimum ACF value is the ACF value corresponding to the maximum lag time step. When the minimum ACF value is larger, the historical power data within the maximum lag time step has a stronger correlation with the power data at the current moment.
In addition to the autocorrelation of power series demonstrated by ACF, power data are also related to external factors such as weather and date. In this subsection, the JensenShannon divergence (JS) is used to quantify the similarity between load, PV and wind power data and their corresponding external factor series. The JS value is 0 when the two time series data are distributed independently and is 1 when the opposite is true.
The power series not only has autocorrelation and external similarity in the time domain, but also exhibits periodic fluctuations in the frequency domain. These can be derived from Fourier spectrum analysis.
Combining the above analysis, a periodiccorrelation extrapolation model is constructed to map the time–frequency domain characteristics of the power series. The independent variables of this model consist of two parts. One is the autocorrelated periodic series \(X_{t,T}\), which reflects the autocorrelation and periodic volatility of the power series, and the other is the externally correlated periodic series \(E_{t,T}\), which reflects the similarity and periodic volatility of the power series and the external factor series, and they are collectively called the periodiccorrelation input. The periodiccorrelation extrapolation model characterizing the target power \(x_{t}\) at the time of \(t\) with its periodiccorrelation input is given as:
where \(b_{r}\) is the residual coefficient, \(W_{S1}\) and \(W_{{{\text{E1}}}}\) are the weight matrices of the mapping functions \(\phi_{S} ( \cdot )\) and \(\phi_{E} ( \cdot )\), respectively. \(W_{S2}\) and \(W_{{{\text{E}}2}}\) represent the weight matrices of \(X_{t,T}\) and \(E_{t,T}\), respectively. The autocorrelated periodic series \(X_{t,T}\) is based on the power series data \(\{ x\}\) and can be expressed as:
where \(X_{t,S} = [x_{t  1} , \ldots ,x_{t  m} ]\) is an autocorrelated sequence of m lagged time steps of \(x_{t}\). \(X_{{t,T_{i} }} (i = 1, \cdots ,k)\) is a periodic cycle sequence with \(x_{t}\) and \(X_{t,T}\) in a strong cycle \(T_{i}\), i.e.:
where the jth \((j = 1, \ldots ,c)\) element of \(X_{{t,T_{i} }}\) is \(X_{{t,T_{i} ,j}} = [x_{{t  jT_{i} }} ,x_{{t  1  jT_{i} }} , \ldots ,x_{{t  m  jT_{i} }} ]\), and c is the number of iterations of \(X_{{t,T_{i} }}\). The external correlation periodic sequence \(E_{t,T}\) is based on the external factor sequence \(\{ \varepsilon \}\), and its construction is similar to the above process, which can be expressed as:
Thus, after determining the autocorrelation maximum lag term m, k strong period \(T_{1} ,T_{2} , \ldots ,T_{k}\), and the number of cycles c, the cycle correlation input of the power series can be constructed based on the historical power data and external factor data. If the optimal weight matrix \(\{ \hat{W}_{S1} ,\hat{W}_{S2} ,\hat{W}_{E1} ,\hat{W}_{E2} \}\), residuals \(\hat{b}_{r}\) and mapping functions \(\hat{\phi }_{S} ( \cdot )\) and \(\hat{\phi }_{E} ( \cdot )\) exist, let
then the power data can be predicted using periodiccorrelation input and periodiccorrelation extrapolation models.
However, the mapping functions \(\phi_{S} ( \cdot )\) and \(\phi_{E} ( \cdot )\) are usually nonlinear and may be either explicit or implicit functions, making it difficult for traditional algorithms to find their optimal representations. Deep learning, as an emerging type of datadriven optimization algorithm, can mine the hidden shallow and deep features from a large amount of data through multiple neuron computation nodes and multiple layers of network operations, so as to approximate the correlation between input and output as much as possible and complete the automated representation of complex mappings. Therefore, a suitable deep learning framework is considered to achieve the optimization objective described in (21).
3.2 Pseudomeasurement data generation model
The powerful temporal correlation learning capability and nonlinear fitting ability of RNNs can approximate the periodiccorrelation extrapolation models proposed in (21). However, when these RNN models take the autocorrelated periodic sequence \(X_{t,T}\) as input, they will focus equally on historical data with different lag steps, and incorporate invalid information from some data with large lag time steps into the feature representation, thus wasting computational resources or even affecting the accuracy of the fit. Therefore, RNNs should focus more on data with high relevance to the target in order to obtain more critical detailed information and suppress other useless information.
For this reason, this paper introduces a multiheaded attention mechanism to focus on data with high relevance to the target and combines it with RNNs to construct ARNNs to better fit (21).
3.2.1 Multiheaded attention mechanism
The attention mechanism is an information focusing technique that imitates a human cognitive attention mechanism, which can help RNNs focus more on the less but important information in the input. The core idea is to train the generation of weight coefficients corresponding to the input under a given target, so as to determine the importance of information. As shown in the left side of Fig. 4, the attention mechanism represents the input as a "keyvalue pair", with "key" K used to calculate the attention distribution and "value" V used to calculate the aggregated attention value. Then the attention value of V is obtained by using the scaled dot product scoring function and the query vector Q.
The multihead attention mechanism (MAM) uses multiple attention mechanisms to obtain multiple subfeature spaces, thereby focusing on important features of the original information from different aspects. As shown on the right side of Fig. 5, it decomposes the input into multiple uncorrelated subfeature spaces, each with its own "key" \(K_{i}\), "value" \(V_{i}\), and query vector \(Q_{i}\). Then the attention results of each subspace are calculated, and all the results are spliced to get the final multihead attention results. MAM is consistent with the proposed idea of mining power data features from multiple perspectives in the time–frequency domain. Therefore, this paper considers using a neural network combined with MAM and RNNs to approximate the periodiccorrelation extrapolation model.
3.2.2 Attentionenhanced recurrent neural networks and pseudomeasurement generation model
ARNNs, combining MAM with RNNs framework, can obtain more critical detail information from the input and suppress other, useless, information. As shown in Fig. 6, ARNNs containing one layer of RNNs and one layer of MAMs are used as an example to introduce the model.
(1) Input layer: At time \(t\), the time series \(X_{t} { = [}x_{t,1} {,}x_{t,2} {,} \cdots {,}x_{t,n} {]}^{T}\) with n samples and \(d_{k}\) dimensions is received.
(2) RNNs layer: This layer receives the time series \(X_{t}\) from the input layer, and uses it as the input to the RNNs network structure. Using the ability of RNNs to process and memorize time series correlations over long periods of time, the results of sequence feature characterization are obtained without differentiating between their importance, i.e.:
where \({\text{RNNs}}( \cdot )\) represents the inputtooutput mapping in RNNs, and can be derived from traditional RNN (TRNN), LSTM or GRU. The superscript \({\text{G}}_{{{\text{RNNs}}}}\) represents the corresponding set of training parameters.
(3) MAM layer: this layer obtains all the feature information \(y_{t,RNNs}\) about \(X_{t}\) from RNNs, and then extracts the key information highly associated with the target values from them while discarding useless information, so that the subsequent networks can focus more on characterizing the target data using the key information. The input–output relationship of this layer is expressed as:
where \(MAM( \cdot )\) represents the input to output mapping in MAM, and \({\text{G}}_{MAM}\) represents the corresponding set of training parameters.
(4) Fully connected layer: this layer integrates the information computed in the MAM layer that is significantly associated with the target data, and constitutes the final output, as:
where \(W_{fc}\) and \(b_{fc}\) are the parameters to be trained in the fully connected layer, and \(\text{Re} Lu( \cdot )\) represents an activation function operating elementwise on vector \(v_{k}\) expressed as:
All the training parameters in the input layer, RNN layer, MAM layer, and fully connected layer of the ARNNs can be obtained using the temporal backpropagation algorithm.
Based on the above ARNNs, a pseudomeasurement generation model can be constructed to fit the power series of the periodiccorrelation extrapolation model, and its basic structural framework is shown in Fig. 7. Corresponding to (14), the power autocorrelation series and the external factor correlation series can be used as inputs to fit the functions \(\phi_{S} ( \cdot )\) and \(\phi_{E} ( \cdot )\) by ARNNs respectively, and then the results of both can be integrated using a layer of fully connected layers to obtain the final pseudomeasurement generation results. This model is called the periodic correlation pseudomeasurement generation model based on ARNNs, abbreviated as ARNNsPC. If the RNNs are LSTM models, they are abbreviated as ALSTMPC, and so on for other methods. The schematic diagrams of distributed dynamic state estimation with pseudomeasurement is shown in Fig. 8.
4 Case study
In this section, cases based on the SimBench database are analyzed to verify the effectiveness of the proposed ARNNsPC from the time domain and frequency domain respectively. The impact of the generated pseudomeasurement data on the state estimation accuracy of the DN is then shown. All experiments are run on a computer with Inteli510400F CPU and 16 GB memory, and the programs are compiled by Python compiler.
4.1 Data description and assessment metrics
The German SimBench project is a subproject of the German Federal Government's "Research for an environmentally friendly, reliable and orderly energy supply" energy project, which aims to create a benchmark database for researchers to perform power system trend simulation, transient steadystate analysis, planning, etc. The SimBench database, which is built based on the actual German power system, contains network topology and parameter information of various transmission and distribution networks, as well as historical series data of generators, distributed power sources, and customer loads [13]. In this paper, some cases are selected from the SimBench database to verify the effectiveness of the proposed algorithm.
Corresponding to the time–frequency domain characterization, root mean square error (RMSE) is adopted as the time domain accuracy assessment metric. In the frequency domain, the energy spectra similarity (ESS) evaluation index is defined as:
where \(A_{{{\text{real}}}} (i)\) and \(A_{{{\text{pre}}}} (i)\) represent the energy amplitudes of the Fourier spectra of the real sequence and the pseudomeasurement sequence at the frequency \(f_{i}\), respectively. The smaller the value of ESS is, the closer the spectrum of the pseudomeasurement sequence is to the spectrum of the real sequence.
4.2 Benchmark and parameter setting
Referring to [18, 19], six commonly used machine learningbased pseudomeasurement generation models are chosen, namely, Gradient boosting decision tree (GBDT), Random Forest (RF), Back propagation neural network (BPNN), TRNN, LSTM, and GRU as benchmark models. The inputs of these six benchmark models are the commonly adopted sequential inputs, i.e., continuous historical data with a lag of \({\text{M}}\) time points of the target values are used as the inputs, while the inputs of ARNNsPC are determined by the autocorrelated maximum lag term of its periodiccorrelation inputs \(m\), the strong period \(T_{1} ,T_{2} , \cdots ,T_{k}\), and the number of cycles \(c\). The ARNNsPC models tested in the experiments include ATRNNPC, ALSTMPC and AGRUPC, and the autocorrelation maximum lag term in its periodiccorrelation input \(m\) is determined by the average maximum lag point with autocorrelation greater than 0.85 in the historical sourceload data, while selecting the strong period from the two frequency points corresponding to the highest energy in the Fourier spectrum of the historical sourceload data.
The hyperparameters of all the pseudomeasurement generation models are shown in Table 1, and their values are taken as the parameters with the smallest errors selected after multiple iterations. Among them, n_estimator is the number of GBDT/RF estimators, max_depth is the maximum depth of GBDT, learning_rate represents the learning rate, hidden_layer and hidden_size are the number of hidden layers and the corresponding number of hidden neurons of the neural network, respectively. Heads is the number of heads of multiheaded attention, and drop_out is the neuron dropout rate, which is used to prevent the neural network from overfitting. For the neural networkbased models, i.e., BPNN, TRNN, LSTM, GRU, ATRNNPC, ALSTMPC, and AGRUPC, Adam is adopted as the optimizer, and the number of samples for one training (batch size) is 1000. In particular, learning_rate is given as a range rather than a fixed value in Table 1 because the selection of the learning rate varies with the dataset and it usually requires several iterations to determine the optimal value. The range of learning rates listed in Table 1 is the optimal range selected after a large number of iterations, and the applicable learning rate can be determined within this range according to the dataset. The ratio of the amount of data in all training sets to the test set is 7:3.
5 Results and discussion
5.1 Analysis and modeling of load and PV output time–frequency domain characteristics
Three different sets of PV output curves and six different sets of load power curves with a sampling period of 15 min are selected from the SimBench database at random as examples to explore the characteristics of sourceload power in the time domain and frequency domain. The three sets of photovoltaic (PV) output curves are recorded as PV1, PV2, PV3, and the six sets of load power curves are recorded as Load1, Load2, Load3, Load4, Load5, Load6.
The autocorrelation diagram of the series is shown in Fig. 9. Although the autocorrelation of PV output and load power varies widely, the sourceload power series has a strong correlation with the data that lags itself by several time steps. This conclusion is also confirmed in [21, 22].
According to [14], external factors associated with power data generally include both temporal and climate data. Here, time and temperature, time and sunshine duration are selected as external factors affecting load power and PV output, respectively, and the JS divergence of power series and external factor series are calculated separately. The results are shown in Table 2. It is shown that power sequence has different similarity characteristics with different external factors. The load power has significant similarity with the time factor, but is not sensitive to the climate factor. The PV output is not sensitive to the time factor, while it is highly similar to the sunshine duration.
The power series not only has autocorrelation and external similarity in the time domain, but also exhibits periodic fluctuations in the frequency domain, fluctuations which can be derived from Fourier spectrum analysis. Figure 10 shows the Fourier spectra of the nine power series. It is evident that both the PV power output and load power have multiple significant frequencies, such as PV1 with higher energy at \(\frac{1}{24}h^{  1}\), \(\frac{1}{12}h^{  1}\) and \(\frac{1}{8}h^{  1}\), and Load1 with higher energy at \(\frac{1}{24}h^{  1}\) and \(\frac{1}{12}h^{  1}\).
5.2 Accuracy of the ARNNsPC model
In this section, a 0.4 kV DN with the number "1LVrural30no_sw" from the SimBench dataset is used as the test system. The network has 129 nodes and 127 lines, of which 118 nodes are loaded and 17 nodes are equipped with PV. Detailed network topology and data information can be found on the official SimBench project website [15]. The data contain the time series of load power and PV output in the test system from January 1, 2016 to December 31, 2016 with a collection period of 15 min. The network topology is shown in Fig. 12 in the “Appendix”.
In order to test the accuracy of the load power and PV output pseudomeasurement generated by ARNNsPC, the PV output data numbered "PV1" and "PV4" are chosen, and are then renumbered as "PV1 "and "PV2". For the load power data, the numbers "H0A" and "H0G" are chosen and are then renumbered as "Load1" and "Load2". In addition, from the analysis results in Sect. 5.1, time and sunshine duration are selected as external factors of load and PV, respectively. The time series is the numbering sequence of sampling points in one day from 1 to 96 in chronological order. Based on the PV installation locations provided in [13], the sunshine duration data are obtained with a collection period of 10 min from the website of the German Meteorological Office for the corresponding locations, and these data are interpolated to make a sequence of sunshine duration with a collection period of 15 min. In order to avoid differences in magnitudes between different data and to eliminate the effect of different spans of values on the accuracy of pseudomeasurement generation, the values of all raw data are converted to the range [0, 1] by the maximumminimum normalization method.
Based on the time domain evaluation index RMSE and the frequency domain evaluation index ESS, the accuracy of the pseudomeasurements generated by each model for PV output and active load under the DN is demonstrated to verify the effectiveness of the ARNNsPC model. All results are the average values after 10 replicate tests. The results of pseudomeasurement generation accuracy for PV output (PV1 and PV2) and active load (Load1 and Load2) under nine pseudomeasurement generation models are shown in Table 3.
In terms of the timedomain assessment metric RMSE, the accuracies of the pseudomeasurements generated by GBDT and RF are similar, with slight differences in performance with different data. However, their accuracies are generally inferior to that of the neural networkbased series models, because of the lack of learning ability of GBDT and RF for deeper features of the data. Among the various types of neural network models, BPNNs have the lowest accuracy in generating pseudomeasurements because they do not have the ability to mine temporal correlation properties in timeseries data. RNNs models, i.e., TRNN, LSTM, and GRU, can compensate for this shortcoming, but their accuracies in the time domain are worse than those of LSTM and GRU because of the gradient loss/explosion problem that may occur when TRNNs deal with long time series. By comparing ARNNsPC model and RNNs model, it is shown that the average RMSEs of ATRNNPC, ALSTMPC, and AGRUPC in each sourceload dataset are reduced by 6.28%, 6.24%, and 6.96% compared with TRNN, LSTM, and GRU, respectively. Clearly, the proposed ARNNsPC model can generate pseudomeasurement data closer to the real data. Analyzing the differences between RNNs and ARNNsPC structures, it can be concluded that the effective cooperation of the recurrent neural network structure with the attention mechanism, together with the periodically correlated inputs picking historical autocorrelated and externally correlated data with positive gain on the target value, enhance the ability of the neural network to characterize the data properties, enabling ARNNsPC to focus more on learning the data features from the valid information and then generating pseudomeasurement data with higher accuracy.
In addition to the time domain perspective, the performance of the ARNNsPC model can also be analyzed from the frequency domain perspective using the ESS evaluation index. From the ESS values shown in Table 3, it can be seen that the pseudomeasurement data obtained from the ARNNsPC model are the closest to the frequency domain characteristics of the real data, for both PV output and active load. ALSTMPC and AGRUPC have the highest frequency domain accuracy. ATRNNPC is the next most accurate one, whereas the worst performers are GBDT and RF. A TRNNPC, ALSTMPC, and AGRUPC have average ESS reductions of 10.96%, 11.81%, and 9.71% in each sourceload dataset compared to those of TRNN, LSTM, and GRU, respectively. As an example, the average ESS of ALSTMPC is reduced by 26.14%, 21.86%, and 15.82% compared to those of GBDF, RF, and BPNN, respectively. Therefore, the proposed ARNNsPC model can capture more frequency domain feature information from the sourceload historical data. This makes the generated pseudomeasurements close to the real data in the frequency spectrum, thus improving the accuracy of the pseudomeasurements in the frequency domain.
In summary, ARNNsPC has a better ability to capture the time–frequency domain characteristics of sourceload power data than RNNs, and pseudomeasurement generation models based on machine learning such as GBDF, RF, and BPNN can provide more accurate pseudomeasurement data for the state estimation of DN.
5.3 Number of training parameters and training time of ARNNsPC model
The ARNNsPC model can be encapsulated into a fixed model after training, receive data from the measurement system, and generate the corresponding pseudomeasurement data by combining with the existing historical data in the database. Since the model parameters are already determined, the computation time required to generate pseudomeasurements is usually below microseconds, which has good realtime performance. However, the ARNNsPC model requires a large number of parameters to build, and this will occupy large storage space in the computing device. In addition, the model may need to consume some time at regular intervals to train and update the parameters to adapt to the changing characteristics of the power data at the sourceload end.
Figure 11 shows the numbers of training parameters for ARNNsPCs, RNNs and BPNNs, as well as the running time for one epoch (i.e., all data are fed into the neural network and one forward pass and back propagation calculation is completed). It can be seen that the number of training parameters and training time of BPNN are the smallest, but its accuracy is also lower compared to ARNNsPC and RNNs. The training parameters of ARNNsPC are slightly fewer compared to RNNs, which saves storage space. This is because ARNNsPC trains two subRNNs models with power data and external factor data separately, and its periodiccorrelation input method selects data with positive gain to the output as the input of the two subRNNs, which greatly reduces the number of neurons in the input layer. In contrast, RNNs splice power data and external factor data into onedimensional sequences for input to the model under the sequential input method, and thus, not only more input layer neurons are used, but also the number of weight parameters in the input and hidden layers is increased. In addition, ARNNsPCs and RNNs complete an epoch at a very similar rate, but the former generates pseudomeasurements with higher accuracy. It is shown that ARNNsPC can mine more effective information from the original sourceload data without increasing the training parameters and training time, and quickly build a pseudomeasure generation model with superior performance.
5.4 Effect of pseudomeasurement on the accuracy of state estimation
In this subsection, the network numbered "1EHVHVmixedall0no_sw" is selected from the SimBench database as the test system, and is then divided into one 380 V network, two 220 V networks and two 110 V networks according to the electrical connection relationship, with each subnetwork renumbered. The 380 V network, whose network topology is shown in Fig. 13 in the “Appendix”, is denoted as D380 and has 291 nodes, 354 branches, 222 generators distributed in 94 nodes, and 170 clean energy generators in 118 nodes, including 19 PVs and 123 wind generators. The 220 V network, as shown in Fig. 14 in the “Appendix”, consists of 2 relatively independent subnetworks of D2201 and D2202, which are divided according to the electrical connection relationship. D2201 has 223 nodes, 277 branches, 101 generators distributed on 52 nodes, and 6 generators distributed on 6 nodes, whereas D2202 is smaller than D2201, with only 57 nodes, 67 branches, 23 generators distributed in 13 nodes, and 13 clean energy generators distributed in 8 different nodes, including wind power generators only. The wind power generation equipment is included, as shown in Fig. 15 in the “Appendix”. The two 110 V grid models are D1101 and D1102. D1101 has 61 nodes, 59 branches, no generators, and 41 wind turbines distributed in 41 different nodes, and the network topology is shown in Fig. 16 in the “Appendix”. D1102 has 81 nodes, 82 branches, no generators, and only 19 wind turbines distributed in 19 different nodes, and the network topology is shown in Fig. 17 in the “Appendix”.
Since this paper proposes a pseudomeasurement generation model for active power at the sourceload side, it does not generate line power and node power. The measurement system shown in this table only adjusts the node power measurement of the original measurement system, i.e., reduces the ratio of nodal active/reactive power measurement, and adds nodal active pseudomeasurement. To analyze the impact of pseudomeasurement on distributed dynamic estimation accuracy, three measurement cases are considered:

Case 1: Network measurements are sufficient and the measurement system is set up as shown in Table 4.

Case 2: D1101 and D1102 are measurementpoor, PMU and SCADA measurement ratios are set as in Table 5, and no pseudomeasurement information is added.

Case 3: D1101 and D1102 are measurementpoor, and the measurement ratio and pseudomeasurement ratio of the measurement system are as shown in Table 5.
Distributed dynamic state estimation is performed for this DN model in the three measurement cases. The comparison of voltage magnitude and voltage phase angle estimation accuracy and lifting degree are shown in Tables 6 and 7, respectively. Lifting degree represents the multiple of a model's calculation results superior to the standard model. Let the RMSE values of the model's calculation results be \(v\) and the RMSE values of standard model's calculation results be \(v_{0}\), then lifting degree \(lift_{{v,v_{0} }}\) can be expressed as:
When the RMSE values of model results are taken as indicator, a lifting degree greater than 0 indicates that the model is better than the standard model, equaling 0 indicates that the model's performance is consistent with the standard model, whereas lower than 0 indicates that the model's calculation results are worse than the standard model.
In Case 3, even though the pseudomeasurements are added to D1101 and D1102, the pseudomeasurements are limited to the active power of load, PV and wind power, and the reactive power data are lacking. Therefore, the accuracy of the pseudomeasurements is poorer than the real measurements, so the accuracy of global estimation in Case 3 is inferior to that of Case 1 (as shown in Tables 6 and 7). However, the inclusion of pseudomeasurements in Case 3 significantly improves the estimation accuracy of D1101 and D1102 compared to Case 2, which results in a positive effect on the boundaryconsistent estimation results, leading to a small increase in the estimation accuracy of other subareas, as can be seen in Tables 6 and 7.
In summary, although the pseudomeasurement data obtained by the ARNN pseudomeasurement generation model can hardly help the state estimation to reach the accuracy under the real measurement, it can improve the estimation without increasing the economic cost, and alleviate the negative impact of local measurement shortage on the global estimation of DN.
6 Conclusion
In order to alleviate the negative impact of insufficient measurement of DN on the accuracy of state estimation, and based on the solution idea of generating pseudomeasurement, this paper analyzes the time–frequency domain characteristics of the sourceload historical power sequence, and the autocorrelation of the sequence and the probability similarity with external factors using autocorrelation function and JS dispersion. The periodic fluctuation of the sequence from the frequency domain is verified using Fourier spectrum. ARNNs are then chosen to automate the representation of the mapping function of the model, so that an ARNNsPC pseudomeasurement generation model can be established for mining the power time–frequency domain characteristics of the sourceload end. Finally, the generated pseudomeasurement data are used as the input to the distributed dynamic state estimation model.
The experiments validate the performance of the proposed ARNNsPC pseudomeasurement generation model based on the load and PV output temporal data of the distribution network in the SimBench database. The results show that ARNNsPC exhibits better accuracy in both time and frequency domains than six commonly used machine learningbased pseudomeasurement generation models, including RF, GBDT, BPNN, TRNN, LSTM, and GRU. In addition, it is also verified that ARNNsPC can obtain pseudomeasurements with higher accuracy using similar training parameters and training time as RNNs. The generated pseudomeasurement data can improve the accuracy of distributed dynamic state estimation without adding additional economic burden.
Considering the nonsynchronism of measurement data caused by the different sampling frequencies and communication delays of measurement equipment in the state estimation of distribution network, as well as threephase imbalance in some regions caused by imbalance of line parameters, singlephase load access etc., the solution of asynchronous measurement data in state estimation and distributed dynamic state estimation in the condition of threephase imbalance of a local network will be explored in the future.
Availability of data and materials
The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.
References
Meng, Y., Fan, S., Zheng, X., Xiao, J., Zhou, H., & He, G. (2022). Optimal operation of virtual power plant considering powertohydrogen systems. In: 2022 IEEE power & energy society general meeting (PESGM), Denver, CO, USA, pp. 1–5
Do Coutto Filho, M. B., & Stacchini de Souza, J. C. (2009). Forecastingaided state estimation—part I: Panorama. IEEE Transactions on Power Systems, 24(4), 1667–1677.
Angioni, A., Schlösser, T., Ponci, F., & Monti, A. (2016). Impact of pseudomeasurements from new power profiles on state estimation in lowvoltage grids. IEEE Transactions on Instrumentation and Measurement, 65(1), 70–77.
Wu Zaijun, Xu., Xinghuo, J. Y., Qinran, Hu., Xiaobo, D., & Wei, Gu. (2017). Review of active distribution network state estimation technology. Automation of Electric Power Systems, 13, 182–191.
Zhao, J., Zhang, G., Dong, Z. Y., & La Scala, M. (2018). Robust forecasting aided power system state estimation considering state correlations. IEEE Transactions on Smart Grid, 9(4), 2658–2666.
Jikang, L., Danfeng, S., Xianghong, T., & Yangsheng, L. (2019). Stochastic state estimation based on a new method of adaptive sparse pseudo spectral approximation model and algorithm. Chinese Journal of Electrical Engineering, 39(01), 192–203.
Dobbe, R., van Westering, W., Liu, S., Arnold, D., Callaway, D., & Tomlin, C. (2020). Linear single and threephase voltage forecasting and Bayesian state estimation with limited sensing. IEEE Transactions on Power Systems, 35(3), 1674–1683.
Dehghanpour, K., Yuan, Y., Wang, Z., & Bu, F. (2019). A Gametheoretic datadriven approach for pseudomeasurement generation in distribution system state estimation. IEEE Transactions on Smart Grid, 10(6), 5942–5951.
Zhang, L., Wang, G., & Giannakis, G. B. (2019). Distribution system state estimation via datadriven and physicsaware deep neural networks. IEEE Data Science Workshop (DSW), 2019, 258–262.
Zhenghao, L. I., & Mengfan, L. I. (2020). Smart load forecasting method based on deep learning. Smart Power, 48(10), 78–85.
Yuan, L., Ma, J., Gu, J., Wen, H., & Jin, Z. (2020). Featuring periodic correlations via dual granularity inputs structured RNNs ensemble load forecaster. International Transactions on Electrical Energy Systems, 30, e12571.
Dan, S. (2006). Optimal state estimation (pp. 123–140). New York: Wiley.
Fox, V., Hightower, J., Liao, L., et al. (2003). Bayesian filtering for location estimation. IEEE Pervasive Computing, 2(3), 24–33. https://doi.org/10.1109/MPRV.2003.1228524
Meinecke, S., Drauz, S., Klettke, A, et al. (2020). SimBench Documentation [M/OL], Germany. https://simbench.de/wpcontent/uploads/2021/09/simbench_documentation_en_1.1.0.pdf.
Aprillia, H., Yang, H.T., & Huang, C.M. (2021). Statistical load forecasting using optimal quantile regression random forest and risk assessment index. IEEE Transactions on Smart Grid, 12(2), 1467–1480.
SimBench. Complete list of SimBench Codes and their grid data[DB/OL]. https://simbench.de/en/download/datasets/
Primadianto, A., & Lu, C. (2017). A review on distribution system state estimation. IEEE Transactions on Power Systems, 32(5), 3875–3883. https://doi.org/10.1109/TPWRS.2016.2632156
Monticelli, A. (2000). Electric power system state estimation. Proceedings of the IEEE, 88(2), 262–282. https://doi.org/10.1109/5.824004
Chen, B., Li, H., & Su, X. (2019). Dynamic state estimation of distribution network base on pseudo measurement modeling and UPF. IEEE Innovative Smart Grid Technologies–Asia (ISGT Asia), 2019, 54–59. https://doi.org/10.1109/ISGTAsia.2019.8881190
Qiang, Q., Guoqiang, S., Wei, X., Minghui, Y., Zhinong, W., & Haixiang, Z. (2018). Distribution system state estimation based on pseudo measurement modeling using convolutional neural network. China International Conference on Electricity Distribution (CICED), 2018, 2416–2420. https://doi.org/10.1109/CICED.2018.8592565
Guo, Z., Li, S., Wang, X., et al. (2016). Distributed pointbased gaussian approximation filtering for forecastingaided state estimation in power systems. IEEE Transactions on Power Systems, 31(4), 2597–2608. https://doi.org/10.1109/TPWRS.2015.2477285
Zhu, X., & Genton, M. G. (2012). Shortterm wind speed forecasting for power system operations. International Statistical Review, 80(1), 2–23.
Shi, H., Xu, M., & Li, R. (2017). Deep learning for household load forecasting—A novel pooling deep RNN. IEEE Transactions on Smart Grid, 9(5), 5271–5280.
Acknowledgements
Not applicable.
Funding
This work was supported in part by the National Key Research Program of China (2016YFB0900100) and Key Project of Shanghai Science and Technology Committee (18DZ1100303).
Author information
Authors and Affiliations
Contributions
Demand research, literature survey, experiment design, data collection, mathematical modeling, case study, and paper writing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, Y., Gu, J. & Yuan, L. Distribution network state estimation based on attentionenhanced recurrent neural network pseudomeasurement modeling. Prot Control Mod Power Syst 8, 31 (2023). https://doi.org/10.1186/s4160102300306w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4160102300306w