 Original Research
 Open
 Published:
Research on practical power system stability analysis algorithm based on modified SVM
Protection and Control of Modern Power Systemsvolume 3, Article number: 11 (2018)
Abstract
Stable and safe operation of power grids is an important guarantee for economy development. Support Vector Machine (SVM) based stability analysis method is a significant method started in the last century. However, the SVM method has several drawbacks, e.g. low accuracy around the hyperplane and heavy computational burden when dealing with large amount of data. To tackle the above problems of the SVM model, the algorithm proposed in this paper is optimized from three aspects. Firstly, the gray area of the SVM model is judged by the probability output and the corresponding samples are processed. Therefore the clustering of the samples in the gray area is improved. The problem of low accuracy in the training of the SVM model in the gray area is improved, while the size of the sample is reduced and the efficiency is improved. Finally, by adjusting the model of the penalty factor in the SVM model after the clustering of the samples, the number of samples with unstable states being misjudged as stable is reduced. Test results on the IEEE 118bus test system verify the proposed method.
Introduction
In China’s current social development, the scale of the power grid and the demand for electrical energy have been increasing continuously. Safe and stable operation of the power grid has laid the foundation for the stable development of the whole society, and is the most important research in power system. Traditional power system stability analysis methods are direct method and time domain simulation method [1]. However, facing with the large scale power grid and the increasing amount of data, traditional calculation methods have encountered significant challenges and are difficult to satisfy the requirement of speed and accuracy. With the development of computers and data mining technology in the early 1970s, researchers began to use data mining methods to analyze the security and stability of the power system, e.g. the support vector machine (SVM) [2, 3], artificial neural network [4,5,6,7], decision tree [8,9,10,11] and so on.
SVM is a supervised twoelement classification model. By searching for an optimal hyperplane in the sample space, this method divides the samples into two categories and has the advantages of simple model and good classification effect. It has been widely studied and applied by researchers [12,13,14]. However, existing literatures focus on the optimization of a single SVM model and analyzing the same sample area with multiple SVM models. The final stability classification results are then obtained through voting and thus, the errors can be reduced. However, this algorithm increases the computational complexity of the model.
This paper proposes a twosegment SVM algorithm, mainly aimed at improving the classification accuracy of the analysis of the security regions in the power grid, dealing with the grey space of the SVM model, and reducing the damage of the unstable samples to the power system caused by misjudgment of them being stable ones. The algorithm improves the classification accuracy of single SVM classifier, and also reduces the amount of computation in the second stage using the Kmeans clustering algorithm [15, 16] and segmented processing. It not only improves the accuracy but also the training, optimization and classification speed of the model.
Methods
The foundation of SVM and Kmeans model
SVM model
SVM is a kind of twostage classification model and a supervised machine learning method. SVM method is used to classify the samples by finding an optimal hyperplane in the sample space. In the hyperplanes that can be classified, there exist two hyperplanes which are in contact with two respective classes of data. The optimal hyperplane is between them, and it can make the distance between the two nearest samples on the two sides of the hyperplane maximized.
For a set of data (x_{ i }, y_{ i }), i = 1, 2, ⋯, l, x_{ i } ∈ R^{n}, they can be divided into linear separable and linear nonseparable ones. In the linear separable case, the optimal hyperplane is shown as follows, and the samples can be divided into two categories:
The solution of the optimal hyperplane can be expressed as:
In the linear nonseparable case, SVM maps the sample data to a high dimensional space through a kernel function and subsequently solves the linear nonseparable problem in the original sample space in high dimensional space.
Through the kernel function, the sample space can be mapped to a high dimensional space, but this does not completely guarantee the easy handling of the data. Since the data may have noise, there exists a deviation point where the date deviates from the normal position after the mapping is completed. In order to reduce the effect of noise on the hyperplane, the SVM model allows the existence of the deviation point when constructing the hyperplane, i.e. there is a classification error. A slack variable ξ_{ i } is constructed and the constraint condition is changed to:
However, there is a need to control the relaxation variable. Since the slack variable is a manifestation of the classification error, the optimization function in the SVM can be changed to:
where C is a penalty factor.
The SVM model transforms the classification problem into an optimization problem, as shown in (4). In the standard CSVM model, the optimization problem is transformed into a duality problem. In this transformation, the Lagrange function is introduced as:
Using the Lagrange function, the duality problem that the model needs can be derived as:
The solution can be obtained by α^{∗}, which can be expressed as:
The α_{ i } is valued in the interval (0, C) and the component b^{∗} can be calculated according to α_{ i } as
Finally, the decision function f(x) is constructed as the classification rule by α^{∗} and b^{∗} as:
Kmeans model
Clustering analysis is a common algorithm in data mining algorithm and is an unsupervised learning method. The clustering algorithm usually classifies the samples into different clusters, and in each cluster the samples have a certain similarity.
The Kmeans clustering algorithm is an iterative optimization algorithm. The iteration period of the Kmeans algorithm consists of the following two parts:

(1)
Distribution of samples. For given K central points (the mean points of the cluster samples), each sample is allocated in a cluster represented by the nearest mean point of its Euclidean distance, and the samples are divided into K clusters. Each sample can only be in a deterministic cluster to minimize the sum of squares within a group, i.e., each sample x_{ p } can only be allocated to a cluster \( {S}_i^{(t)} \). The sum of square here is the square of Euclidean distance. In the t iteration, a clustering \( {S}_i^{(t)} \) whose cluster center is \( {m}_i^{(t)} \)can be represented as:

(2)
Updating the mean points. The center of each sample in each cluster is used as a new cluster center. The new mean point is shown as:
In the Kmeans algorithm, there are two key elements, one is the choice of the initial mean point and the other is the least square sum of the distance. In the original algorithm, the initial mean point is chosen randomly or to be near the center point, and the Euclidean distance is usually used as the distance function. However, in the development of the subsequent Kmeans algorithm, the distance function uses the absolute error.For unknown samples, the K value in the Kmeans algorithm cannot be accurately estimated in advance. Selecting inappropriate K value will have an adverse effect on the subsequent SVM training. In this study, the maximum cluster radius (Euclidean distance) is used as an index to evaluate the clustering effect and determine the suitable K value.
Results
Power system security regions analysis method based on twosegment SVM model
The proposed algorithm is improved for single or multiple SVM by improving the accuracy of single SVM classifier. At the same time, through Kmeans clustering algorithm and segmented processing, the computation requirement of multiple SVM in the second stage is reduced, which improves not only the accuracy but also the training, optimization and classification speed of the model. To solve the issues related to mistaking the unstable conditions of the system as stable ones, this paper proposes to adjust the penalty factor of the SVM model in the second stage. In addition, adjusting the penalty factor does not change the computation requirement of model training and classification. Combined with the Kmeans clustering algorithm, it can reduce the number of unstable state of the system being mistaken for stable ones. The training process of this algorithm is shown in Fig. 1.
First stage SVM
In the first stage of SVM training, it is necessary to ensure the accuracy rate is as high as possible, and to reduce the proportion of samples into the second stage to improve the efficiency of security regions analysis. In this model, Grid Search is adopted first to optimize the parameters of the SVM model, and the SVM model with the best classification accuracy is obtained. The probability output of the SVM is then used to judge the fiducial probability of the sample in the SVM model in the first stage.
The SVM model used here is the RBF kernel, which is shown by the kernel function as:
In (4) and (12), there are two parameters in the process of parameter optimization: penalty factor C and kernel function parameter γ. The Grid Search method is to give a range of parameter C and γ, divide the grid under the given step size, traverse each point in the grid by cross validation and select the highest accuracy as the optimal parameter. The method of cross validation is to train samples of SVM model in groups, one is used as a real training set and the other is verified by validation set. The most commonly used is the cross validation of NCV. The method divides samples into N groups, with one group as a test set and the rest as training sets, to evaluate the accuracy of this parameter after all the results are synthesized. After traversing the grid, the classification accuracy of all the parameters is collected.
After selecting the parameters C and γ, the model determines the grey space in the sample by the fiducial probability of the classification results of each sample. It then structures the SVM sample set, where x_{ i } ∈ R^{m} represents the input characteristics of the sample using active and reactive power of the generators and lines, bus voltage magnitude and phase angle, and active and reactive power of the load as the output characteristics, and y ∈ {1, 0} represents the classification of the stability output. The function is mapped to the high dimensional space by using the RBF kernel function, and the hyperplane is constructed. The constructor uses the function g(x) to represent the distance between the sample and the optimal hyperplane as
where a_{ i } ∈ R^{m} is the Lagrange operator and k(x_{ i }, x) is shown in (12).
The distance from the sample to the hyperplane indicates the probability of the type of output classification is the sample in. The probability of y = 0 in this sample is:
and the probability of y = 1 in this sample is:
The SVM selects the maximum probability between them as the output, and in this model, the maximum probability of the two is selected as the maximum probability of the classification results of the sample. After several experiments, it is known that for samples whose output probability is larger than 0.99, the result is consistent with the real stable state. Otherwise, there may be missed or false judgement. Therefore, the sample with SVM output being less than 0.99 will be defined as a grey space sample in the first stage. The samples will be further processed to improve the accuracy of classification.
Kmeans clustering
When using Kmeans clustering to process data in grey space, Euclidean distance is selected as the cluster index. In the sample space, the center points are randomly selected for Kmeans clustering, and the clustering results are observed. After clustering, the appropriate K value and center point are selected. When the K value initially increases, the cluster index declines fast and the clustering effect is improved. However, after the K value increases to a certain degree, the cluster index declines slowly and further increase of the K value may sometimes even lead to the increase of the cluster index. This may reach a state of local convergence. In the Kmeans clustering algorithm, the local optimal condition is eliminated by repeated calculation to select the optimal value.
On the choice of the cluster index, the larger the K value is, the smaller the cluster index is and the better effect the clustering has. However, the increase of K value will increase the number of the SVM models in the second stage, the complexity of the model and the amount of required computation. In this model, the K value is selected when the cluster index starts to decline slowly, and the center point of the optimal clustering effect is recorded after repeated calculation.
After a new cluster sample is judged in the grey space by the SVM at the first stage, the Euclidean distance between the sample and each center point is calculated. The categories belonging to the sample are determined according to the nearest center point, and the second stage SVM model is used to classify them.
Second stage SVM
The second stage SVM is used to deal with the grey space data after clustering. According to the conservation of the power system, the possibility of the leakage is to be minimized. Thus, a penalty factor is introduced to modify the constraints of the second stage SVM.
In the second stage SVM model, the sample is set as (x_{ i }, y_{ i }), i = 1, 2, ⋯, l, x_{ i } ∈ R^{n}. The optimization problem such as shown in (4) indicates that the first one represents the maximization of the interval between samples, the second one is the slack variable ξ_{ i } introduced by the existence of deviating points, and C is the penalty factor. In (4), \( C\sum \limits_{i=1}^n{\xi}_i \) represents the size of the error term. The greater the C is, the more important it is to represent the error.
The standard CSVM model uses a C parameter as a penalty factor and does not distinguish between missed and misjudged errors. In order to deal with the situation that the unstable state of power system being mistaken as stable ones, this model selects different penalty factors C_{1} and C_{0} to deal with the distinction between errors due to different causes. The error term is defined as:
With the interval error, the optimization problem can be changed to:
In solving the optimization problem, the Lagrange function is introduced as:
Being a duality problem, the optimization problem can be changed to:
After the solution, α^{∗} can be expressed as:
The b^{∗} component is then calculated as:
Finally, a new decision function f(x) is constructed as the classification rule by α^{∗} and b^{∗}, which are valued by different penalty factors, i.e.
By increasing the error penalty of the y_{ i } = 0 part of the sample (x_{ i }, y_{ i }), i.e. the penalty of misjudging the unstable state, the error of misjudging the unstable states as stable states is reduced.
Discussion
Case study
In this paper, the PSAT simulation toolbox is used to carry out power flow calculation and time domain simulation. The IEEE 54machine 118bus standard test system is modeled in PSAT, and the single line diagram is shown in Fig. 2. The system consists of 3 regions. In an actual power grid, the lines between regions are usually under heavy load. The fault is located on the interregional tie line and the power flow is calculated and simulated in time domain. The stability of the system under the corresponding fault is judged by simulation results. The flow data and stability of the IEEE 118bus standard test system are used as data set for subsequent data analysis, i.e. the analysis of the system security regions under the corresponding fault. The specific operation is to consider the initial power flow and set the output of each node randomly within ±40% of the rated output, with the load level of the node fluctuating within the range of ±10% of the rated load level. Threephase short circuit faults are applied in the interregional liaison lines and the faults are cleared after a period of time. The initial power flow of the active and reactive power of the generators and lines, the bus voltage magnitude and phase angle, load active and reactive power at the node, and the final stability of the 54machine 118bus system are recorded as the data set. Eight thousand groups of such data are randomly generated as training samples.
In the example test, the fault is applied on the interregional connection Line30–38, and the fault setting is shown in Table 1.
In the 8000 groups of samples, 6000 groups of data are randomly selected as training samples and the other 2000 groups of data are used as test samples.
First, the parameters of the grid cross validation are optimized and the optimization results of the sample are shown in Fig. 3.
Using the best parameter training, the first stage SVM model is obtained and tested in the test set. The number of samples in the nongray space area is 807, and the number of samples in the grey space area is 1193 through the sample fiducial probability. The clustering results of grey space samples are tested by repeatedly clustering 10 times to obtain the best clustering effect to avoid local convergence. The variation of cluster index with K value is shown in Fig. 4. As can be seen, the K value of Kmeans clustering is 9, and 9 s stage SVM are trained by the clustering results.
After clustering analysis and adjusting the penalty factor, the classification effect of the modified SVM model is compared with the standard SVM model, as shown in Table 2.
Through the example analysis in the IEEE 118bus, it verifies the effectiveness of the proposed security and stability analysis algorithm in the analysis of the safety regions during faults in the interregional liaison lines.
Conclusion
Based on the standard SVM model, the algorithm of cluster analysis is merged and the penalty factor is corrected. A twostage SVM model based on clustering analysis and penalty factor adjustment is established. The analysis of the IEEE 118bus system validates that the proposed model can effectively improve the classification accuracy of the SVM model. At the same time, the Kmeans clustering method is used to reduce the sample size of the second stage to improve the training speed of the model.
Through this study, the probability output of SVM can effectively distinguishes the SVM model in gray space. The fiducial probability output of classification results can make the classification results of the vague samples to be effectively recognized for subsequent processing. The classification accuracy of the samples in the gray space can be effectively improved by using the clustering analysis with the SVM model. Using the penalty factor adjustment method can reduce the proportion of unstable results mistaken as stable ones. In addition, this method can effectively deal with each cluster after being clustered into two samples due to unbalanced training difficulties, so as to improve the accuracy of classification.
References
 1.
Kundur, P. (1994). Power system stability and control. New York: McGrawHill.
 2.
Weiling, Z., Wei, H., Yong, M., et al. (2016). Conservative online transient stability assessment in power system based on concept of stability region. Power Syst Technol, 40(4), 992–998.
 3.
Gomez, F. R., Rajapakse, A. D., Annakkage, U. D., & Fernando, I. T. (2011). Support vector machinebased algorithm for postfault transient stability status prediction using synchronized measurements. IEEE Trans Power Syst, 26(3), 1474–1483.
 4.
Zhou, Q., Davidson, J., & Fouad, A. (1994). Application of artificial neural networks in power system security and vulnerability assessment. IEEE Trans Power Syst, 9(1), 525–532.
 5.
Almasri, A., Kadir, M., Hizam, H., & Mariun, N. (2013). A novel implementation for generator rotor angle stability prediction using an adaptive artificial neural network application for dynamic security assessment. IEEE Trans Power Syst, 28(3), 2516–2525.
 6.
Amjady, N., & Majedi, S. (2007). Transient stability prediction by a hybrid intelligent system. IEEE Trans Power Syst, 22(3), 1275–1283.
 7.
Bahbah, A. G., & Girgis, A. A. (2004). New method for generators’ angles and angular velocities prediction for transient stability assessment of multimachine power systems using recurrent artificial neural network. IEEE Trans Power Syst, 19(2), 1015–1022.
 8.
Guo, T., & Milanovic, J. V. (2016). Online identification of power dynamic signature using PMU measurements and data mining [J]. IEEE Trans Power Syst, 31(3), 1760–1768.
 9.
He, M., Vittal, V., & Zhang, J. (2013). Online dynamic security assessment with missing PMU measurements: A data mining approach. IEEE Trans Power Syst, 28(2), 1969–1977.
 10.
Sun, K., Likhate, S., Vittal, V., Kolluri, V. S., & Mandal, S. (2007). An online dynamic security assessment scheme using phasor measurements and decision trees. IEEE Trans Power Syst, 22(4), 1935–1943.
 11.
He, M., Zhang, J., & Vittal, V. (2013). Robust online dynamic security assessment using adaptive ensemble decisiontree leaning. IEEE Trans Power Syst, 28(4), 4089–4098.
 12.
Li, D. H., & Cao, Y. J. (2006). Power system transient stability analysis based on PMU and hybrid support vector machine. Power Syst Technol, 30(9), 46–52.
 13.
Iang, L. X., Wang, X. H., Ang, J. W., et al. (2007). Feature selection for SVM based transient stability classification. Relay, 35(9), 17–21.
 14.
Ai, Y. D., Chen, L., Zhang, W. L., et al. (2016). Power system transient stability assessment based on multisupport vector machines. Proc CSEE, 36(5), 1173–1180.
 15.
Xu, T., Chiang, H., Liu, G., & Tan, C. (2017). Hierarchical Kmeans method for clustering largescale advanced metering infrastructure data. IEEE Trans Power Delivery, 32(2), 609–616.
 16.
Lin, Y. (2011). Using Kmeans clustering and parameter weighting for partialdischarge noise suppression. IEEE Trans Power Delivery, 26(4), 2380–2390.
Funding
This work was supported by China’s National key research and development program 2017YFB0902201, National Natural Science Foundation of China under Grant 51777104, Science and Technology Project of the State Grid Corporation of China.
Author information
Affiliations
Contributions
HK carried out the theoretical studies, algorithm guidance and drafted the manuscript. SG participated in the theoretical analysis and guided the research method. WH participated in the research method and simulation analysis. ZL carried out the theoretical studies, algorithm researches and drafted the manuscript. WS participated in the design of the study and performed the simulation analysis. ZQ participated in the design of the model research and performed the simulation analysis. WH conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Wei Hu.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Security region analysis
 Support vector machine
 Kmeans clustering