Skip to main content
  • Original research
  • Open access
  • Published:

Power system transient stability assessment based on the multiple paralleled convolutional neural network and gated recurrent unit


In order to accurately evaluate power system stability in a timely manner after faults, and further improve the feature extraction ability of the model, this paper presents an improved transient stability assessment (TSA) method of CNN + GRU. This comprises a convolutional neural network (CNN) and gated recurrent unit (GRU). CNN has the feature extraction capability for a micro short-term time sequence, while GRU can extract characteristics contained in a macro long-term time sequence. The two are integrated to comprehensively extract the high-order features that are contained in a transient process. To overcome the difficulty of sample misclassification, a multiple parallel (MP) CNN + GRU, with multiple CNN + GRU connected in parallel, is created. Additionally, an improved focal loss (FL) function which can implement self-adaptive adjustment according to the neural network training is introduced to guide model training. Finally, the proposed methods are verified on the IEEE 39 and 145-bus systems. The simulation results indicate that the proposed methods have better TSA performance than other existing methods.

1 Introduction

Given the development of power systems and the integrated access of large intermittent renewable energy resources [1, 2], the existing power system is faced with various challenges and is more prone to various faults [3]. The three-phase short circuit fault, as the strongest destructive fault, may lead to power system transient instability. Thus, it is important to have an applicable transient stability assessment (TSA) method.

General TSA methods the time domain simulation (TDS) [4], the transient energy function (TEF) [5], and the extended equal area criterion (EEAC) methods [6]. The TDS method is computationally intensive and time-consuming, the TEF method has some state variables unavailable, while the EEAC method merely works on a modern power system with limited range of analysis. Recently the progress of machine learning (ML) theory, such as with artificial neural networks (ANNs) [7, 8], support vector machines (SVMs) [9], decision trees (DT) [10], GRU [11] and their application in power system have made TSA methods more diverse. Although TSA law can be acquired from the data handled by ML, ML has insufficient ability to extract features from multi-dimensional data, and is prone to under-fitting.

In recent years, deep learning (DL) has contributed to TSA. Starting from the data itself, DL is expert in capturing the internal laws of large amounts of data, and has robust generalization performance. This overcomes the shortcoming of ML methods. In addition, TSA using a DL algorithm can effectively bypass the procedure of modeling and solving high-order nonlinear equations, and directly obtains the mapping relationship between input features and stability labels. Some have applied DL algorithms, such as long short-term memory (LSTM) [12, 13], stacked autoencoder (SAE) [14, 15], deep belief network (DBN) [16,17,18] and CNN [19,20,21,22] to TSA. Reference [12] uses LSTM to obtain a temporal self-adaptive TSA scheme, aiming to balance the trade-off between TSA accuracy and rapidity, and mine the temporal data dependencies. An LSTM-based model which can decrease the predictive value with an invariant time step is proposed in [13], while [14] proposes an innovative algorithm by clustering a multi-branch stacked denoising autoencoder (MSDAE), combined with one-fusion layer and one logistic regression (LR), which together contribute to the distinctive capability of the mining feature. Reference [15] is based on SAE and proposes an ensemble classifier in which SAE is combined with a fusion layer to classify the state of the power system. Su et al. [16] attempts to integrate DBN with the reference-point-based nondominated sorting genetic algorithm to develop a novel preventive control scheme, whereas [17] presents an advanced DBN which takes the structural features of power system during loss function construction into consideration to better perform TSA. In Liu et al. [18], the number of nodes in each layer of DBN is decided by a particle swarm optimization algorithm and the integrated algorithm has higher TSA accuracy. However, the input of DBN is limited by one-dimensional data, the training process is very slow, and the parameter selection is very difficult. Therefore, it is easy for DBN to fall into a local optimal solution and TSA application of DBN has significant limitations.

In contrast, CNN can adapt to the input data of various dimensions and improve the data-fitting degree by parameter sharing and weight reduction. In Gao et al. [19], a one dimensional-convolutional neural network (1D-CNN) with four convolutional-pooling layers is applied to TSA, and the demand of end-to-end time sequence extraction and TSA classification are fulfilled. However, [19] merely operates an individual CNN to extract the features and there is no in-depth study of the misclassification problem. To effectively address the misclassification problem, MP CNN is proposed in [20], and the classification results provided by several CNNs are synthesized according to the synthesis principle. In Shi et al. [21], a classification model with respect to the difference between two types of instability modes (aperiodic or oscillatory instability) based on CNN is created. This selects the bus voltage phasor provided by phasor measurement units (PMUs) as the original input, and outputs three types of classification results: stable, aperiodic unstable or oscillatory unstable. However, the parameter adjustment of the FL function needs repeated experiments. In Zhao and Shi [22] a creative CNN is designed which applies multi-size convolutional kernels instead of a single size convolutional kernel in order to extract the abstract features from multi-size time scales.

Although the above studies using CNN achieve good results, there exist the following problems:

  1. (1)

    A single CNN cannot effectively extract the high-order features contained in a macro long-term time sequence and does not, considered from the aspect of algorithm fusion, is a deficiency of CNN, one which severely restrains its TSA capability.

  2. (2)

    The existing studies have not deeply probed the misclassification problem of difficulty in a classified sample because when a single neural network is applied to TSA, the difficult to classify samples around the classification threshold exist as an intrinsic misclassification problem, resulting in restricted classification accuracy.

  3. (3)

    The previous FL function needs a large number of time-consuming experiments and a parameter adjustment procedure, which together result in low operational efficiency and will not be applicable in practical engineering.

Given the above problems, three solutions are proposed. The main contributions of the paper are:

  1. (1)

    To overcome the shortcomings of a single CNN network in macro feature extraction, an integrated network, called CNN + GRU, composed of CNN and GRU is proposed. CNN is to extract the high-order features contained in a local short-term time sequence while, most importantly, GRU can fully mine the abstract characteristics hidden in a macro long-term time sequence. They complement each other to extract features more comprehensively.

  2. (2)

    In order to solve the misclassification problem of difficult to classify samples and improve classification accuracy, multiple CNN + GRU are connected in parallel to form the MP CNN + GRU. This can synchronously output multiple TSA results and improve the classification ability of samples around the classification threshold.

  3. (3)

    To effectively avoid unnecessary parameter adjustment, an improved FL function is proposed. This can implement self-adaptive adjustment according to the neural network training, and has stronger engineering applicability.

The rest of the paper is organized as follows: Sect. 2 offers the introduction and basic structure of MP CNN + GRU, and CNN, GRU and CNN + GRU are introduced. Section 3 introduces the MP CNN + GRU-based TSA where feature selection, normalization, stability criterion etc. are involved. Simulation verification is discussed in Sect. 4, and conclusions are drawn in Sect. 5.

2 The structure of the multiple paralleled CNN + GRU

MP CNN + GRU, as an innovative algorithm proposed in this paper, can not only effectively extract the macro long-term and local short-term features of input information, but also address the misclassification problem regarding difficult to classify samples to some extent. The network is illustrated in the following sub-sections.

2.1 The structure of CNN

As illustrated in Fig. 1, CNN [19, 23] is composed of input, hidden and output layers, while the hidden layer consists of convolutional layers and pooling layers. The convolutional kernel in the convolutional layer performs a convolutional operation to complete feature extraction regarding local information, whereas the pooling layers refine the most representative features from the convolutional layer and implement redundant information elimination. The convolutional and the pooling layers are stacked in turn to extract the high-order features.

Fig. 1
figure 1

The structure of CNN

Defining X = [x1, x2, x3, …, xt, …, xs], the original input of the input layer can be abbreviated as \(X \in R^{s \times d}\), where s and d are the length of time sequence and the feature dimension, respectively. After convolutional operation, X enters the convolutional layer. Convolutional formulas are shown as (1), while in the pooling layer, a max-pooling operation expressed by (2), is applied.

$$a_{c,k} = {\text{Re}}\, LU(W_{c,k} * X + b_{c,k} )$$
$$a_{p} = \max (a_{c,k} )\quad i,j = 1,2, \ldots ,n$$

The output layers have an identical structure to the common neural network, expressed by (3). We note that, for the TSA problem, neuron units on the last output layer are set to 1 and the output function is as shown in (4).

$$a_{fc} = {\text{Re}}\, LU(a_{p} W_{fc} + b_{fc} )$$
$$\tilde{y} = \sigma (a_{fc} W_{fc} + b_{fc} )$$

where σ(·) is a sigmoid activation function and , the classification result, indicates the probability of different categories.

2.2 The structure of GRU

GRU [11, 24] realizes the memory and forgetting function of long-term features from input data through its unique update gate and reset gate structure. Compared with an LSTM network, GRU has simpler structure and shorter training time. In addition, compared with recurrent neural network (RNN), it can overcome the difficulty of gradient explosion. GRU’s structure is shown in Fig. 2 and the data handling procedures are shown as:

$$R_{t} = \sigma (W_{r} \cdot [X_{t} ,H_{t - 1} ])$$
$$Z_{t} = \sigma (W_{z} \cdot [X_{t} ,H_{t - 1} ])$$
$$\tilde{H}_{t} = \tanh (W_{h} \cdot [X_{t} ,(R_{t} \otimes H_{t - 1} )])$$
$$H_{t} = Z_{t} \otimes \tilde{H}_{t} + (1 - Z_{t} ) \otimes H_{t - 1}$$

where Xt and Ht−1 denote the original input and the hidden state in the previous period, respectively. Wr, Wz and Wh are the matrices to be trained, and Rt and Zt are the calculation results of the reset gate and the update gate, respectively. σ(·) is used to control the outputs of Rt and Zt between 0 and 1. The candidate hidden state (t) can be obtained through Rt. It should be noted that, in the extreme cases, Rt = 0 means discarding the whole previous processing results and Rt = 1 means retaining all the results. Ht means the candidate state, and Zt is used to weigh the proportion of t and Ht−1 about Ht.

Fig. 2
figure 2

The structure of GRU

2.3 The basic structure of MP CNN + GRU

In order to clearly illustrate MP CNN + GRU, CNN + GRU is clarified first. CNN + GRU, whose structure is depicted in Fig. 3a, is an innovative dual branch network, which can comprehensively acquire the features of original input from local short-term and macro long-term perspectives.

Fig. 3
figure 3

The structure of MP CNN + GRU

After entering the input layer, the original input simultaneously and spontaneously flows into the CNN branch and the GRU branch. It is worth noting that the convolutional kernels in CNN mainly focus on the local short-term information operation within the corresponding scanning range. The previous convolutional operation has no correlation with the operation in the next step. Therefore, CNN has higher sensitivity to the abstract features contained in voltage magnitude and phase angle with a relatively short variation period. By contrast, GRU has an outstanding long-term forgetting feature and memory function from the macro long-term perspective. It reviews all the previously scanned inputs according to the next input, so as to realize the front hanging and back connection of information. So GRU is adopted to extract the high-order features implied in active and reactive power. These have a long variation period. Thus, both networks work together to complement each other. After the division of labor, the representative information handled by the dual branches is appropriately fused through the full connection layer in order to obtain the classification result.

An individual neural network applied in classification inevitably has a misclassification problem. Basically, the reason is that the neural network has intrinsic deviation when classifying the difficult to classify samples around the classification threshold. In the case of large deviation, the classification result will cross the classification threshold, for example, from the stable to the unstable assessment region, ultimately resulting in misclassification. This can cause enormous damage to the operation of the power system.

To resolve the latent issue, MP CNN + GRU is proposed in this paper. It is composed of CNN + GRU parallel connection and its structure is shown in Fig. 3b. The methodology is rooted in the randomness of neural network training. The classification results regarding each CNN + GRU sub-model are synthesized to fundamentally solve the misclassification problem originating in intrinsic deviation. Thus, the output processing unit is adopted so as to obtain the final classification result. The methodology regarding output processing unit is shown as:

$$P_{Z} (C_{k} |X) = \frac{{\sum_{i = 1}^{n} {P_{i} (C_{k} |X)} }}{n},\quad k = 0,1$$

where k = 0 represents a stable sample and k = 1 means an unstable sample. Pi(Ck|X) represents the probability that sample X is identified as category Ck in the ith CNN + GRU. PZ(Ck|X) is the final classification result, and is the average number value of category probability output by each CNN + GRU which denotes the probability that X is eventually identified as category Ck after comprehensive analysis.

Considering the real-time and effectiveness requirements of TSA, the number of the CNN + GRU sub-model is set to 3, i.e., n equals 3.

3 TSA based on the MP CNN + GRU

3.1 Feature selection and arrangement of original input

Selecting proper features makes a real difference to the TSA performance of the model, so three dominant factors are taken into consideration:

  1. (1)

    Human subjectivity should be significantly reduced.

  2. (2)

    Selected features are supposed to reliably summarize the transient fault information of the system [19, 22].

  3. (3)

    The arrangement mode of the original input should be appropriately selected because it can affect the readability of the model to the data during training.

Therefore, as mathematically expressed in (10), four kinds of representative and objective features are determined and arranged in the order of bus voltage magnitude, bus phase angle, and active and reactive power of the transmission line, as:

$$\begin{aligned}X &= [V_{1} ,V_{2} , \ldots ,V_{m} ,\theta_{1} ,\theta_{2} , \ldots ,\theta_{m} ,P_{1} ,P_{2} , \ldots ,P_{t},\\ &\qquad Q_{1} ,Q_{2} , \ldots ,Q_{t} ]\end{aligned}$$

where m represents the number of nodes in the network and t represents the number of transmission lines. All elements of X are vectors with the dimension of d which means the number of sampling points.

3.2 Normalization preprocessing of original input

Common normalization methods include maximum and minimum, mean standard deviation, and so on. In this paper, mean standard deviation normalization is adopted to preprocess the original input, formulated by:

$$x_{normal} = \frac{{x - x_{mean} }}{{x_{std} }}$$

The normalization object of (11) is each vector element in (10), such as V1, θ1, P1, and Q1. x represents each element in the vector, and xnormal is the normalized result of x. xmean denotes the average value of all elements in the corresponding vector, while xstd represents the standard deviation of all elements in the vector. Taking P1 as an example, P1 is a vector with d sampling points, while xmean and xstd are the average value and the standard deviation of active power regarding d elements, respectively. After normalization, all x from P1, i.e., active power at each time of the transmission line, are uniformly compressed to 0–1.

3.3 Stability criterion

TSA applying neural network is essentially a classification problem, which needs to label all samples in a huge data set. 0 and 1 are used to label the stable samples and unstable samples, respectively. For a system with large numbers of generators subjected to a large disturbance, the power angle of each generator in the post-disturbance period can be used to compute the transient stability index (TSI) [10, 16, 19]. The TSI formula and label methodology are given respectively as:

$${\text{TSI}} = \frac{{360^{\circ } - \left| {\Delta \delta_{\max } } \right|}}{{360^{\circ } + \left| {\Delta \delta_{\max } } \right|}}$$
$${\text{Label}} = \left\{ {\begin{array}{*{20}c} 1 & {{\text{TSI}} < 0} \\ 0 & {{\text{TSI}} > 0} \\ \end{array} } \right.$$

where Δδmax represents the maximal power angle difference between any two generators. If |Δδmax| is greater than 360° (TSI < 0), the power system loses stability, and the corresponding sample is marked as 1. In the converse case, the sample is marked with 0.

To ensure the correctness and reliability of sample labeling during simulation, the power angles of generators are selected at least 8 s after fault removal to compute TSI.

3.4 The improved FL function

To clearly explain the improved FL function, the FL function and the binary cross entropy (BCE) function are introduced. The BCE function is the foundation of the FL function and is expressed by:

$$L_{BCE} = \left\{ {\begin{array}{*{20}l} { - \ln y^{\prime}} \hfill & {y = 1} \hfill \\ { - \ln (1 - y^{\prime})} \hfill & {y = 0} \hfill \\ \end{array} } \right.$$

where y and y' are defined as the real label and the classification probability, respectively. 0 and 1 represent stable samples and unstable samples, respectively.

The FL function expressed by (15) advances the BCE function, and this reduces the weight of easily classified samples and improves the fitting degree of difficult to classify samples.

$$l_{fl} = \left\{ {\begin{array}{*{20}l} { - \alpha (1 - y^{\prime})^{\gamma } \ln y^{\prime}} \hfill & {y = 1} \hfill \\ { - y^{{\prime}\gamma } \ln (1 - y^{\prime})} \hfill & {y = 0} \hfill \\ \end{array} } \right.$$

In (15), α and γ are introduced to lfl. γ is used to address the problem that the classification difficulty regarding different samples is unequal. When y' is exceedingly close to the classification threshold, the corresponding sample is defined as difficult to classify samples and more prone to misclassification. When y' greatly differs from the classification threshold, the sample is defined as an easily classified sample. γ improves the weight of difficult to classify samples in the loss function and reduce the weight of easily classified samples, so as to better fit the difficult to classify samples.

α is to balance the number of stable and unstable samples. As for the TSA problem, the number of stable samples is larger than that of unstable samples. In order to balance samples, previous studies [19, 22] set a fixed α. This entails a large number of time-consuming experiments to obtain an appropriate α. Moreover, the generalization ability of the model is restricted and the engineering efficiency is low. To address the problem, an improved FL function is shown as:

$$\begin{aligned}L_{fl} &= - \frac{1}{S}\sum\limits_{i = 1}^{S} [\alpha y(1 - y^{\prime})^{\gamma } \ln y^{\prime}\\ &\quad+ y^{{\prime}\gamma } (1 - y)\ln (1 - y^{\prime})]\end{aligned}$$
$$\alpha = S_{1} /S_{2}$$
$$S = S_{2} + S_{1}$$

Here, a mini-batch training method [25] is adopted and α can be adjusted adaptively in each mini-batch training. This is a kind of simplified stochastic gradient descent (SGD) algorithm [26]. Every time a constant training instance is completed, the parameters are updated. Each parameter update is related, and this can improve the fitting degree of the neural network to data. S1 and S2 represent the number of stable and unstable samples in each mini-batch data set, respectively. Therefore, α can be adjusted automatically according to the proportion of stable and unstable samples in each mini-batch training. This greatly reduces unnecessary parameter adjustment processes.

3.5 Model evaluation index

In this paper, four kinds of typical indices are taken as TSA performance indices, and the corresponding formulas are given by:

$$A_{cc} = \frac{{T_{P} + T_{N} }}{{T_{p} + F_{P} + T_{N} + F_{N} }}$$
$$P_{re} = \frac{{T_{N} }}{{T_{N} + F_{N} }}$$
$$R_{ec} = \frac{{T_{N} }}{{T_{N} + F_{P} }}$$
$$F_{1} = \frac{{2P_{re} R_{ec} }}{{P_{re} + R_{ec} }}$$

accuracy, denoted by Acc is the most commonly used TSA index, which intuitively evaluates the classification ability of the model. Precision, Pre, measures the proportion of the number of real unstable samples in the classified unstable samples, while recall rate Rec measures the proportion of the number of unstable samples correctly classified in the data set. Because of the contradictory relationship between Pre and Rec, F1 is used to weigh the two indices.

TP and TN refer to true stable and true unstable samples, respectively, whereas FP and FN refer to false stable and false unstable samples. The relationship between them is shown in Table 1.

Table 1 The confusion matrix

If an unstable sample is misclassified as a stable sample, there will be disastrous consequences. In contrast, if a stable sample is misclassified as an unstable sample, a false alarm will appear but it will not cause huge damage to power grid operation. Therefore, FP is more important than FN.

3.6 Classification threshold

The selection of the classification threshold is of high significance to the indices of Pre and Rec. Because FP is more important than FN, it is crucial to reduce the number of FP. Improving the classification threshold can effectively enhance the conservatism of the model and reduce the number of FP, so as to improve Rec of unstable samples. The threshold formula is shown as:

$$y = \left\{ {\begin{array}{*{20}c} 0 & {P_{Z} (C_{0} |X) \le \gamma ,P_{Z} (C_{1} |X) > 1 - \gamma } \\ 1 & {P_{Z} (C_{0} |X) \ge \gamma ,P_{Z} (C_{1} |X) < 1 - \gamma } \\ \end{array} } \right.$$

The initial threshold value γ is 0.5. This paper improves the recall rate of unstable samples by manually adjusting γ.

4 Simulation verification

4.1 Data set acquisition

In order to verify the effectiveness of the proposed methods, simulation verification is carried out on the IEEE 39-bus system and IEEE 145-bus system. The IEEE 39-bus system is composed of 39 bus nodes and 46 branches, while the IEEE 145-bus system consists of 145 bus nodes and 453 branches.

To obtain the most representative data set with sufficient data, a degree of freedom in statistics realm is introduced and previous work [19, 22] is fully studied. First, a degree of freedom is adopted to guide the amount of data to be generated. Degree of freedom applied to the TSA problem refers to the required minimum sample quantity to obtain a model that fully fits the data set. Here, the training data set is defined as control points to control the adjustment trend about network parameters, and the model parameters are defined as observation points to observe and output the final TSA results. The ratio is 0.928 on the IEEE 39-bus system and 0.933 on the IEEE 145-bus sytem, both of which are larger than 0.9 to ensure the quantity of sample data in the obtained data set. Secondly, based on [19, 22], four factors of branches, fault locations, fault durations and load burdens, are taken into consideration to ensure the representation of the obtained data set.

The TDS parameters settings of the IEEE 39-bus system for Power System Simulator/Engineering (PSS/E) are shown in Table 8 in the Appendix. The simulation settings of the IEEE 145-bus system are consistent except that the fault lines are different from Table 8. During the TDS, Python API of PSS/E is used to repeatedly call PSS/E to implement batch transient simulation. The obtained data set on the IEEE 39-bus system contains 14,280 samples and the number of stable and unstable samples are 11,200 and 3080 respectively, so the ratio of stable samples to unstable samples is approximate 3.6:1. Similarly, the obtained data set on the IEEE 145-bus system consists of 169,260 samples, while the ratio of stable and unstable samples is 5:1. To ensure the correctness and reliability of sample labeling, the sample label is ultimately completed by calculating the TSI value at 10 s. All of the obtained data sets are divided into training, cross validation and test data sets according to the ratio of 3:1:1.

4.2 TSA procedure

The overall TSA procedure, expressed by Fig. 4, mainly consists of two stages, i.e., offline training and online application. At the offline training stage, the obtained data set is divided into training, cross validation and test data sets, and is used to optimize the MP CNN + GRU. The training data set is used to conduct parameter fitting and adjustment of the original model, while the cross validation data set is to foresee the TSA performance of the trained model, and more importantly, further complete parameter adjustment. At the online application stage, the trained model is put into practical application on the test data set.

Fig. 4
figure 4

The overall TSA process

4.3 Model training process

Figures 5 and 6 show the learning curves on the test and training data sets of the IEEE 39-bus system and IEEE 145-bus system. They indicate that, in the last stage of training iteration, the value of the blue line is higher than that of the red line, while the loss value of the red line is very low and approximately 0. However, the loss value concerning the red line is conversely higher than that of the blue line on the test data set. Therefore, the over-fitting phenomenon occurs but it can be greatly alleviated by the dropout method. After a large number of experiments, both dropout rates are determined as 0.2 on the IEEE 39-bus system and IEEE 145-bus system. Most significantly, the overfitting case also indicates that the quantity of the generated sample data is enough for both the IEEE 39-bus system and IEEE 145-bus system.

Fig. 5
figure 5

Learning curve on the IEEE 39-bus system

Fig. 6
figure 6

Learning curve on the IEEE 145-bus system

The parameter settings of the MP CNN + GRU on the IEEE 39-bus system and IEEE 145-bus system are shown in Tables 9 and 10 in the Appendix, respectively.

4.4 Classification performance comparison

In order to demonstrate the superior TSA capability of MP CNN + GRU, the proposed model is compared with other methods in the ML realm, including ANN, SVM, DT, random forest (RF), GRU [11], 1D-CNN [19], CNN + GRU. For the model fed with one-dimensional vector, such as SVM, the two-dimensional original input is transformed. The data sets used for training and testing of each model are kept consistent with MP CNN + GRU. The dropout rate of ANN is also 0.2, and the activation function adopts ReLU, to be consistent with MP CNN + GRU. Eventually, the ANN structure with the best data-fitting performance is 300–300-150–100-1. Because a computation divergence phenomenon emerges from SVM, DT and RF during the simulation process, the principal component analysis (PCA) method is adopted to perform dimension reduction. TSA results of the models are shown in Tables 2 and 3.

Table 2 TSA performance on the IEEE 39-bus system
Table 3 TSA performance on the IEEE 145-bus system

The following observations can be drawn from the simulation results:

  1. (1)

    Compared with the other four types of methods, ANN, SVM, DT and RF have defective TSA performance. ANN has the highest Acc, up to 96.95% and 96.83%, respectively. However, Pre of ANN, being 90.28% and 91.32%, are unacceptable, which indicates that ANN is not accurate enough to classify unstable samples. Although the DT-based RF algorithm has slightly better TSA performance than DT, its Rec of 88.10% on the IEEE 39-bus system and 88.55% on the IEEE 145-bus system are too low.

  2. (2)

    CNN + GRU has outstanding TSA performance. As for the IEEE 39-bus system, compared with GRU in [11] and 1D-CNN in [19], Rec is the index with the greatest improvement of 3.75%. As for the IEEE 145-bus system, Rec, also has the largest improvement index, rising from 93.20% to 96.91%.

  3. (3)

    MP CNN + GRU performs well in both test power systems and further improves the TSA performance of CNN + GRU. The accuracy rates, 99.40% and 99.32%, are maintained above 99%. Thus, MP CNN + GRU has excellent TSA performance.

4.5 MP CNN + GRU’s TSA performance

To probe the reason for the TSA performance improvement of MP CNN + GRU compared with CNN + GRU, we perform an in-depth study about the classification ability of the proposed model on the IEEE 39-bus system and IEEE 145-bus system from two perspectives: TSA result of an individual sample and TSA results on the test data set.

4.5.1 TSA result analysis of a single difficult to classify sample

Tables 4 and 5 demonstrate the TSA result of a single difficult to classify sample on the IEEE 39-bus system and IEEE 145-bus system, where I and II represent CNN + GRU and MP CNN + GRU, respectively.

Table 4 TSA result of a single sample on the IEEE 39-bus system
Table 5 TSA result of a single sample on the IEEE 145-bus system

The real label of the researched sample on the IEEE 39-bus system is 1. However, the TSA result of the single CNN + GRU is 0 and differs from the real label. Thus, there is a misclassification phenomenon, which can cause serious consequences for the operation of the power system. By comparison, because the distinct multi-parallel structure of MP CNN + GRU can simultaneously output three TSA results of 0.488, 0.524 and 0.510, the final analysis result is 1. Thus, MP CNN + GRU effectively avoids the misclassification of unstable samples into stable samples, and ensures the reliability and correctness of TSA.

Similarly, MP CNN + GRU on the IEEE 145-bus system avoids the sample misclassified into an unstable sample. In summary, for TSA of a single sample, the superiority of MP CNN + GRU is mainly reflected on its strong classification capability regarding difficult to classify samples around the classification threshold.

4.5.2 TSA result analysis of difficult to classify samples on the test data set

After a series of final screening, it is found that there are 40 difficult to classify samples, accounting for about 1.40% of the total samples on the test data set for the IEEE 39-bus system and 406 difficult to classify samples for the IEEE 145-bus system. Figures 7 and 8 show the corresponding TSA results of MP CNN + GRU and CNN + GRU. The results indicate that MP CNN + GRU greatly reduces the number of misclassified samples and improves TSA accuracy. The numbers of FN and FP are reduced from 15 to 7 and from 16 to 8 for the IEEE 39-bus system, while for the IEEE 145-bus system they are reduced from 238 to 21 and from 48 to 8. Thus, it proves that MP CNN + GRU has distinctive TSA classification capability in practical application.

Fig. 7
figure 7

TSA analysis results on the IEEE 39-bus system

Fig. 8
figure 8

TSA analysis results on the IEEE 145-bus system

4.6 TSA results analysis of the improved FL function

To verify the effectiveness of the improved FL, FL and improved FL are adopted as loss functions to train MP CNN + GRU, respectively, whereas the data sets applied in training and testing remain unchanged. Figures 9 and 10 show the confusion matrices of the two models on the test data set. By contrast, α in the improved FL has no parameter adjustment process and only γ needs to be adjusted continuously.

Fig. 9
figure 9

TSA results on the test data set of the IEEE 39-bus system

Fig. 10
figure 10

TSA results on the test data set of the IEEE 145-bus system

Clearly, improved FL can balance samples and enhance the fitting degree of unstable samples. Reducing FP makes a real difference to alleviating damage to the power system. Additionally, the simulation demonstrates that the improved FL can simplify the tedious experimental process and meaningfully build up engineering efficiency.

4.7 TSA results with different classification thresholds

For a power system with highly nonlinear characteristics, the classification information that TSA model needs to acquire has great complexity and the trained model cannot achieve 100% classification accuracy. The MP CNN + GRU proposed in this paper can make the TSA accuracy reach a very high level by virtue of its distinct multi-synchronization, short-term and long-term feature extraction structure. After training, the classification threshold γ can be manually modified to reduce the misclassification phenomenon and improve the recall rate.

Figure 11 illustrates the TSA results with different thresholds on the IEEE 39-bus system and IEEE 145-bus system.

  1. (1)

    With the increasing value of γ, Pre decreases and Rec increases. Both Acc and F1 show a tendency of slightly increasing and then decreasing. When γ equals to 0.5, Acc and F1 synchronously reach the maximum values of 99.40% and 0.9831 on the IEEE 39-bus system, and 99.32% and 0.9799 on the IEEE 145-bus system, respectively.

  2. (2)

    γ with too high or too low value can incur the reduction of Acc, resulting in frequent misclassification and the decline of classification ability, which is not suitable for online TSA. It is worth noting that when γ is 0.9, Pre declines to the lowest value. Unacceptable Pre leads to frequent false alarms in the power system. When γ equals 0.1, Rec reaches the minimum. Too low a recall rate causes a series of disastrous consequences. It can be inferred from Fig. 11 that when γ is 0, Pre reaches 100%. On the other hand, when γ is 1, Rec equals 100%. But selecting 0 or 1 as the classification threshold gives no contribution to the online application.

  3. (3)

    The varying γ exerts enormous impact on both Pre and Rec, while it has small impact on Acc and F1. The resulting variation ranges regarding Pre and Rec are approximately 6.30% and 5.00% on the IEEE 39-bus system, and 6.6% and 4.7% on the IEEE 145-bus system, respectively. Acc almost keeps unchanged with a tiny variation range of 0.9% on the IEEE 39-bus system and 1.1% on the IEEE 145-bus system. F1 is approximately 0.023 on the IEEE 39-bus system and 0.020 on the IEEE 145-bus system.

Fig. 11
figure 11

TSA performance with different classification thresholds

Thus, after training MP CNN + GRU with high accuracy, manually modifying and adjusting the classification threshold can improve the recall rate concerning unstable samples, reduce the misclassification phenomenon and ensure the conservatism of the model in practical application.

4.8 Visualization analysis of MP CNN + GRU’s classification ability

To enhance the TSA interpretability of the model performance, t-distributed stochastic neighbor embedding (t-SNE) [27, 28], as a kind of visualization algorithm, is introduced to show the sample-processing procedure. The distance for the points with great similarity in low-dimensional space is closer after t-SNE. On the contrary, the distance for the points with less similarity is large.

The t-SNE algorithm is to convert the Euclidean distance of high-dimensional data into conditional probability to express the similarity between each sample. The obtained conditional probability is:

$$P(j|i) = \frac{{S(x_{i} ,x_{j} )}}{{\sum_{k = 1,k \ne i}^{n} {S(x_{i} ,x_{j} )} }} \quad j \ne i,\quad i = 1,2,\ldots,n$$

where S(xi, xj) represents the similarity between i and j. Then, the PCA method is used to reduce the data dimension and to retain the most representative characteristics of each sample. The conditional probability between each sample after dimension reduction is as follows:

$$Q(j|i) = \frac{{S^{\prime}(z_{i} ,z_{j} )}}{{\sum_{k = 1,k \ne i}^{n} {S^{\prime}(z_{i} ,z_{j} )} }} \quad j \ne i,\quad i = 1,2,\ldots,n$$

where S’(zi, zj) represents the similarity between i and j after dimensional reduction. The closer the distance is, the more similar the two samples are.

As for the TSA problem, stable samples have the same properties, and unstable and stable samples have different properties so the distance between stable samples is relatively short after t-SNE. By comparison, the distance between stable samples and unstable samples is long (Figs. 12 and 13).

Fig. 12
figure 12

t-SNE result on the IEEE 39-bus system

Fig. 13
figure 13

t-SNE result on the IEEE 145-bus system

4.9 Computational speed analysis about MP CNN + GRU

In order to further verify the superior TSA performance of MP CNN + GRU, its calculation speed is fully analysed in this section. MP CNN + GRU is compared with 1D-CNN in [19] and GRU in [11]. The simulation results of the calculation efficiency and TSA accuracy are shown in Tables 6 and 7.

Table 6 Comparison of computation speed on the IEEE 39-bus system
Table 7 Comparison of computation speed on the IEEE 145-bus system

Here, III represents the number of samples on the test data set. IV and V mean the TSA time of all the samples on the test data set and TSA time of each sample, respectively. It can be seen from Tables 6 and 7 that the computational speed with respect to all of the models can be fully satisfied with the TSA requirement. Although the speed of GRU and 1D-CNN is faster than that of CNN + GRU and MP CNN + GRU, the Acc of GRU and 1D-CNN is lower than that of CNN + GRU and MP CNN + GRU. Additionally, the calculation speed of CNN + GRU is slightly slower than GRU and 1D-CNN because of its complicated dual construction. Finally, within the reasonable range, MP CNN + GRU obtains better TSA performance at the expense of computational speed and the calculation efficiency of MP CNN + GRU on the IEEE 145-bus system is half that of the IEEE 39-bus system, because the input dimension of the IEEE 145-bus system has much more complexity. This brings great difficulty for MP CNN + GRU to implement data analysis. Thus, MP CNN + GRU is a practical method in engineering application.

5 Conclusions

In this paper, a TSA method of CNN + GRU, which is based on CNN and GRU, is proposed. The MP CNN + GRU is then formed by parallel connection of multiple CNN + GRU, so that the classification accuracy of the difficult to classify samples can be advanced. Finally, the improved FL function which can implement self-adaptive adjustment is proposed to guide model training. The proposed methods are verified by simulations on the IEEE 39-bus system and IEEE 145-bus system. The conclusions are as follows:

  1. (1)

    Compared with other AI algorithms and single CNN and GRU algorithms, CNN + GRU can fully extract the high-order features from the micro short-term and macro long-term perspectives, and build the mapping relationship between the original input and the system stability labels. This gives better TSA performance. Its TSA accuracy is up to 98.91% on the IEEE 39-bus system and 98.83% on the IEEE 145-bus system.

  2. (2)

    MP CNN + GRU can simultaneously provide multiple TSA through its unique multi-parallel structure. For the TSA result of a single sample, MP CNN + GRU has a certain error correction ability. For TSA on a large number of samples, it can significantly improve the precision and recall rate concerning unstable samples. These two indices are as high as 98.40% and 98.21% on the IEEE 39-bus system, and 98.11% and 97.88% on the IEEE 145-bus system. Thus, MP CNN + GRU has distinctive TSA performance.

  3. (3)

    The improved FL function can not only avoid the cumbersome parameter adjustment process and enhance the engineering efficiency in practical application, but also build up the TSA accuracy regarding unstable samples and relieve the disastrous impact of misclassification on the power system. The simulation also indicates that α has no parameter adjustment process.

However, this paper has not considered the TSA performance with single or multiple noisy original inputs. Future research will focus on how to ensure high TSA accuracy of the model with noisy original inputs and the TSA method for optimal PMU configuration of the power system considering economic project cost.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.



Transient stability assessment


Convolutional neural network


Gated recurrent unit


Multiple paralleled


Focal loss


Time domain simulation


Transient energy function


Extended equal area criterion


Machine learning


Artificial neural network


Support vector machine


Decision tree


Deep learning


Long short-term memory


Stacked autoencoder


Deep belief network


Multi-branch stacked denoising autoencoder


Logistic regression


One dimensional-convolutional neural network


Phasor measurement units


Recurrent neural network


Transient stability index


Binary cross entropy


Stochastic gradient descent


Power system simulator/engineering


Random forest


Principal component analysis


t-Distributed stochastic neighbor embedding


  1. O’Shaughnessy, E., Heeter, J., Shah, C., & Koebrich, S. (2021). Corporate acceleration of the renewable energy transition and implications for electric grids. Renewable and Sustainable Energy Reviews, 146, 111160.

    Article  Google Scholar 

  2. Erdiwansyah, M., Husin, H., Nasaruddin, M. Z., & Muhibbuddin, A. (2021). A critical review of the integration of renewable energy sources with various technologies. Protection and Control of Modern Power Systems, 6(1), 34–57.

    Article  Google Scholar 

  3. Telukunta, V., Pradhan, J., Agrawal, A., Singh, M., & Srivani, S. G. (2017). Protection challenges under bulk penetration of renewable energy resources in power systems: A review. CSEE Journal of Power and Energy Systems, 3(4), 365–379.

    Article  Google Scholar 

  4. Cecati, C., & Latafat, H. (2012). Time domain approach compared with direct method of Lyapunov for transient stability analysis of controlled power system. In: International symposium on power electronics power electronics, electrical drives, automation and motion (pp. 695–699).

  5. Chiang, H. (2010). Direct methods for stability analysis of electric power system theoretical foundation, BCU methodologies, and applications (pp. 6–8). New Jersey: Wiley.

    Book  Google Scholar 

  6. Xue, Y., Wehenkel, L., Belhomme, R., Rousseaux, P., Pavella, M., Euxibie, E., Heilbronn, B., & Lesigne, J.-F. (1992). Extended equal area criterion revisited (EHV power systems). IEEE Transaction on Power System, 7(3), 1012–1022.

    Article  Google Scholar 

  7. Zhou, Z., Pu, G., Ma, S., Wang, G., Shao, D., Xu, Y., & Dang, J. (2021). Assessment and optimization of power system transient stability based on feature-separated neural networks. Power System Technology, 45(9), 3658–3667.

    Google Scholar 

  8. Desai, J. P., & Makwana, V. H. (2021). A novel out of step relaying algorithm based on wavelet transform and a deep learning machine model. Protection and Control of Modern Power Systems, 6(4), 500–511.

    Google Scholar 

  9. You, D., Wang, K., Ye, L., Wu, J., & Huang, R. (2013). Transient stability assessment of power system using support vector machine with generator combinatorial trajectories inputs. International Journal of Electrical Power and Energy Systems, 44(1), 318–325.

    Article  Google Scholar 

  10. Matin, R., Yu, C. C., Atefeh, P., Ali, M., & Willian, G. D. (2017). Transient stability assessment via decision trees and multivariate adaptive regression splines. Electric Power Systems Research, 142, 320–328.

    Article  Google Scholar 

  11. Chen, Q., & Wang, H. (2021). Time-adaptive transient stability assessment based on gated recurrent unit. International Journal of Electrical Power and Energy Systems, 133, 107156.

    Article  Google Scholar 

  12. Yu, J. J. Q., Hill, D. J., Lam, A. Y. S., Gu, J., & Li, V. O. K. (2018). Intelligent time-adaptive transient stability assessment system. IEEE Transactions on Power Systems, 33(1), 1049–1058.

    Article  Google Scholar 

  13. Chen, Q., Wang, H., & Lin, N. (2021). Imbalance correction method based on ratio of loss function values for transient stability assessment. CSEE Journal of Power and Energy Systems.

    Article  Google Scholar 

  14. Zhu, Q., Chen, J., Zhu, L., Shi, D., Bai, X., Duan, X., & Liu, Y. (2018). A deep end-to-end model for transient stability assessment with PMU data. IEEE Access, 6, 65474–65487.

    Article  Google Scholar 

  15. Tan, B., Yang, J., Tang, Y., Jiang, S., Xie, P., & Yuan, W. (2019). A deep imbalanced learning framework for transient stability assessment of power system. IEEE Access, 7, 81759–81769.

    Article  Google Scholar 

  16. Su, T., Liu, Y., Zhao, J., & Liu, J. (2022). Deep belief network enabled surrogate modeling for fast preventive control of power system transient stability. IEEE Transactions on Industrial Informatics, 18(1), 315–326.

    Article  Google Scholar 

  17. Wu, S., Zheng, L., Hu, W., Yu, R., & Liu, B. (2020). Improved deep belief network and model interpretation method for power system transient stability assessment. Journal of Modern Power Systems and Clean Energy, 8(1), 27–37.

    Article  Google Scholar 

  18. Liu, W., Hao, D., Zhang, S., & Zhang, Y. (2021). Power system transient stability assessment based on PSO-DBN. In: 2021 6th international conference on power and renewable energy (ICPRE) (pp. 333–337).

  19. Gao, K., Yang, S., Liu, S., & Li, X. (2019). Transient stability assessment for power system based on one-dimensional convolutional neural network. Automation of Electric Power Systems, 43(12), 18–26.

    Google Scholar 

  20. Tian, F., Zhou, X., Shi, D., Chen, Y., Huang, Y., & Yu, Z. (2019). Power system transient stability assessment based on comprehensive convolutional neural network model and steady-state feature. Proceedings of the CSEE, 39(14), 4025–4032.

    Google Scholar 

  21. Shi, Z., Yao, W., Zeng, L., Wen, J., Fang, J., Ai, X., & Wen, J. (2020). Convolutional neural network-based power system transient stability assessment and instability mode prediction. Applied Energy, 263, 114586.

    Article  Google Scholar 

  22. Zhao, K., & Shi, L. (2021). Transient stability assessment of power system based on improved one-dimensional convolutional neural network. Power System Technology, 45(8), 2945–2957.

    Google Scholar 

  23. Kim, Y. (2014). Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1746–1751).

  24. Pan, E., Ma, Y., Dai, X., Fan, F., Huang, J., Mei, X., & Ma, J. (2019). GRU with spatial prior for hyperspectral image classification. In: IGARSS 2019–2019 IEEE international geoscience and remote sensing symposium (pp. 967–970).

  25. Gou, P., & Yu, J. (2018). A nonlinear ANN equalizer with mini-batch gradient descent in 40Gbaud PAM-8 IM/DD system. Optical Fiber Technology, 46, 113–117.

    Article  Google Scholar 

  26. Rios, D., & Jüttler, B. (2022). LSPIA, (stochastic) gradient descent, and parameter correction. Journal of Computational and Applied Mathematics, 406, 113921.

    Article  MathSciNet  Google Scholar 

  27. Laurens, V. D. M., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2605), 2579–2605.

    MATH  Google Scholar 

  28. Gisbrecht, A., Schulz, A., & Hammer, B. (2015). Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing, 147(1), 71–82.

    Article  Google Scholar 

Download references


The authors would like to thank the referees and editors of the journals for valuable and constructive comments.

Authors information

Not applicable.


This research was funded by the National Natural Science Foundation of China under Grant No. 51607105.

Author information

Authors and Affiliations



The entire research work has been carried out by ZY, YL and XZ under guidance of SC. The individual contributions of the authors are specified as follows: Methodology, SC and ZY; Data set acquisition and Validation, YL and XZ; Writing-Original Draft Preparation, ZY; Writing-Review and Editing, SC and XZ; Funding Acquisition, SC and YL. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shan Cheng.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.



See the Tables 8, 9 and 10.

Table 8 The parameter setting of PSS/E batch transient simulation on the IEEE 39-bus system
Table 9 The parameter setting of MP CNN + GRU on the IEEE 39-bus system
Table 10 The parameter setting of MP CNN + GRU on the IEEE 145-bus system

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, S., Yu, Z., Liu, Y. et al. Power system transient stability assessment based on the multiple paralleled convolutional neural network and gated recurrent unit. Prot Control Mod Power Syst 7, 39 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: