- Original Research
- Open Access

# Research on practical power system stability analysis algorithm based on modified SVM

- Kaiyuan Hou
^{1}, - Guanghui Shao
^{1}, - Haiming Wang
^{2}, - Le Zheng
^{3}, - Qiang Zhang
^{3}, - Shuang Wu
^{3}and - Wei Hu
^{3}Email authorView ORCID ID profile

**3**:11

https://doi.org/10.1186/s41601-018-0086-0

© The Author(s) 2018

**Received: **28 December 2017

**Accepted: **23 April 2018

**Published: **8 May 2018

## Abstract

Stable and safe operation of power grids is an important guarantee for economy development. Support Vector Machine (SVM) based stability analysis method is a significant method started in the last century. However, the SVM method has several drawbacks, e.g. low accuracy around the hyperplane and heavy computational burden when dealing with large amount of data. To tackle the above problems of the SVM model, the algorithm proposed in this paper is optimized from three aspects. Firstly, the gray area of the SVM model is judged by the probability output and the corresponding samples are processed. Therefore the clustering of the samples in the gray area is improved. The problem of low accuracy in the training of the SVM model in the gray area is improved, while the size of the sample is reduced and the efficiency is improved. Finally, by adjusting the model of the penalty factor in the SVM model after the clustering of the samples, the number of samples with unstable states being misjudged as stable is reduced. Test results on the IEEE 118-bus test system verify the proposed method.

## Keywords

- Security region analysis
- Support vector machine
- K-means clustering

## 1 Introduction

In China’s current social development, the scale of the power grid and the demand for electrical energy have been increasing continuously. Safe and stable operation of the power grid has laid the foundation for the stable development of the whole society, and is the most important research in power system. Traditional power system stability analysis methods are direct method and time domain simulation method [1]. However, facing with the large scale power grid and the increasing amount of data, traditional calculation methods have encountered significant challenges and are difficult to satisfy the requirement of speed and accuracy. With the development of computers and data mining technology in the early 1970s, researchers began to use data mining methods to analyze the security and stability of the power system, e.g. the support vector machine (SVM) [2, 3], artificial neural network [4–7], decision tree [8–11] and so on.

SVM is a supervised two-element classification model. By searching for an optimal hyperplane in the sample space, this method divides the samples into two categories and has the advantages of simple model and good classification effect. It has been widely studied and applied by researchers [12–14]. However, existing literatures focus on the optimization of a single SVM model and analyzing the same sample area with multiple SVM models. The final stability classification results are then obtained through voting and thus, the errors can be reduced. However, this algorithm increases the computational complexity of the model.

This paper proposes a two-segment SVM algorithm, mainly aimed at improving the classification accuracy of the analysis of the security regions in the power grid, dealing with the grey space of the SVM model, and reducing the damage of the unstable samples to the power system caused by misjudgment of them being stable ones. The algorithm improves the classification accuracy of single SVM classifier, and also reduces the amount of computation in the second stage using the K-means clustering algorithm [15, 16] and segmented processing. It not only improves the accuracy but also the training, optimization and classification speed of the model.

## 2 Methods

### 2.1 The foundation of SVM and K-means model

#### 2.1.1 SVM model

SVM is a kind of two-stage classification model and a supervised machine learning method. SVM method is used to classify the samples by finding an optimal hyperplane in the sample space. In the hyperplanes that can be classified, there exist two hyperplanes which are in contact with two respective classes of data. The optimal hyperplane is between them, and it can make the distance between the two nearest samples on the two sides of the hyperplane maximized.

*x*

_{ i },

*y*

_{ i }),

*i*= 1, 2, ⋯,

*l*,

*x*

_{ i }∈

*R*

^{ n }, they can be divided into linear separable and linear non-separable ones. In the linear separable case, the optimal hyperplane is shown as follows, and the samples can be divided into two categories:

In the linear non-separable case, SVM maps the sample data to a high dimensional space through a kernel function and subsequently solves the linear non-separable problem in the original sample space in high dimensional space.

*ξ*

_{ i }is constructed and the constraint condition is changed to:

*C*is a penalty factor.

*C*-SVM model, the optimization problem is transformed into a duality problem. In this transformation, the Lagrange function is introduced as:

*α*

^{∗}, which can be expressed as:

*α*

_{ i }is valued in the interval (0,

*C*) and the component

*b*

^{∗}can be calculated according to

*α*

_{ i }as

*f*(

*x*) is constructed as the classification rule by

*α*

^{∗}and

*b*

^{∗}as:

#### 2.1.2 K-means model

Clustering analysis is a common algorithm in data mining algorithm and is an unsupervised learning method. The clustering algorithm usually classifies the samples into different clusters, and in each cluster the samples have a certain similarity.

- (1)
Distribution of samples. For given K central points (the mean points of the cluster samples), each sample is allocated in a cluster represented by the nearest mean point of its Euclidean distance, and the samples are divided into K clusters. Each sample can only be in a deterministic cluster to minimize the sum of squares within a group, i.e., each sample

*x*_{ p }can only be allocated to a cluster \( {S}_i^{(t)} \). The sum of square here is the square of Euclidean distance. In the*t*iteration, a clustering \( {S}_i^{(t)} \) whose cluster center is \( {m}_i^{(t)} \)can be represented as:

- (2)
Updating the mean points. The center of each sample in each cluster is used as a new cluster center. The new mean point is shown as:

In the K-means algorithm, there are two key elements, one is the choice of the initial mean point and the other is the least square sum of the distance. In the original algorithm, the initial mean point is chosen randomly or to be near the center point, and the Euclidean distance is usually used as the distance function. However, in the development of the subsequent K-means algorithm, the distance function uses the absolute error.For unknown samples, the K value in the K-means algorithm cannot be accurately estimated in advance. Selecting inappropriate K value will have an adverse effect on the subsequent SVM training. In this study, the maximum cluster radius (Euclidean distance) is used as an index to evaluate the clustering effect and determine the suitable K value.

## 3 Results

### 3.1 Power system security regions analysis method based on two-segment SVM model

### 3.2 First stage SVM

In the first stage of SVM training, it is necessary to ensure the accuracy rate is as high as possible, and to reduce the proportion of samples into the second stage to improve the efficiency of security regions analysis. In this model, Grid Search is adopted first to optimize the parameters of the SVM model, and the SVM model with the best classification accuracy is obtained. The probability output of the SVM is then used to judge the fiducial probability of the sample in the SVM model in the first stage.

In (4) and (12), there are two parameters in the process of parameter optimization: penalty factor *C* and kernel function parameter *γ*. The Grid Search method is to give a range of parameter C and *γ*, divide the grid under the given step size, traverse each point in the grid by cross validation and select the highest accuracy as the optimal parameter. The method of cross validation is to train samples of SVM model in groups, one is used as a real training set and the other is verified by validation set. The most commonly used is the cross validation of N-CV. The method divides samples into N groups, with one group as a test set and the rest as training sets, to evaluate the accuracy of this parameter after all the results are synthesized. After traversing the grid, the classification accuracy of all the parameters is collected.

*γ*, the model determines the grey space in the sample by the fiducial probability of the classification results of each sample. It then structures the SVM sample set, where

*x*

_{ i }∈

*R*

^{ m }represents the input characteristics of the sample using active and reactive power of the generators and lines, bus voltage magnitude and phase angle, and active and reactive power of the load as the output characteristics, and

*y*∈ {1, 0} represents the classification of the stability output. The function is mapped to the high dimensional space by using the RBF kernel function, and the hyperplane is constructed. The constructor uses the function

*g*(

*x*) to represent the distance between the sample and the optimal hyperplane as

*a*

_{ i }∈

*R*

^{ m }is the Lagrange operator and

*k*(

*x*

_{ i },

*x*) is shown in (12).

*y*= 0 in this sample is:

*y*= 1 in this sample is:

The SVM selects the maximum probability between them as the output, and in this model, the maximum probability of the two is selected as the maximum probability of the classification results of the sample. After several experiments, it is known that for samples whose output probability is larger than 0.99, the result is consistent with the real stable state. Otherwise, there may be missed or false judgement. Therefore, the sample with SVM output being less than 0.99 will be defined as a grey space sample in the first stage. The samples will be further processed to improve the accuracy of classification.

### 3.3 K-means clustering

When using K-means clustering to process data in grey space, Euclidean distance is selected as the cluster index. In the sample space, the center points are randomly selected for K-means clustering, and the clustering results are observed. After clustering, the appropriate K value and center point are selected. When the K value initially increases, the cluster index declines fast and the clustering effect is improved. However, after the K value increases to a certain degree, the cluster index declines slowly and further increase of the K value may sometimes even lead to the increase of the cluster index. This may reach a state of local convergence. In the K-means clustering algorithm, the local optimal condition is eliminated by repeated calculation to select the optimal value.

On the choice of the cluster index, the larger the K value is, the smaller the cluster index is and the better effect the clustering has. However, the increase of K value will increase the number of the SVM models in the second stage, the complexity of the model and the amount of required computation. In this model, the K value is selected when the cluster index starts to decline slowly, and the center point of the optimal clustering effect is recorded after repeated calculation.

After a new cluster sample is judged in the grey space by the SVM at the first stage, the Euclidean distance between the sample and each center point is calculated. The categories belonging to the sample are determined according to the nearest center point, and the second stage SVM model is used to classify them.

### 3.4 Second stage SVM

The second stage SVM is used to deal with the grey space data after clustering. According to the conservation of the power system, the possibility of the leakage is to be minimized. Thus, a penalty factor is introduced to modify the constraints of the second stage SVM.

In the second stage SVM model, the sample is set as (*x*_{
i
}, *y*_{
i
}), *i* = 1, 2, ⋯, *l*, *x*_{
i
} ∈ *R*^{
n
}. The optimization problem such as shown in (4) indicates that the first one represents the maximization of the interval between samples, the second one is the slack variable *ξ*_{
i
} introduced by the existence of deviating points, and *C* is the penalty factor. In (4), \( C\sum \limits_{i=1}^n{\xi}_i \) represents the size of the error term. The greater the *C* is, the more important it is to represent the error.

*C*parameter as a penalty factor and does not distinguish between missed and misjudged errors. In order to deal with the situation that the unstable state of power system being mistaken as stable ones, this model selects different penalty factors

*C*

_{1}and

*C*

_{0}to deal with the distinction between errors due to different causes. The error term is defined as:

*α*

^{∗}can be expressed as:

*b*

^{∗}component is then calculated as:

*f*(

*x*) is constructed as the classification rule by

*α*

^{∗}and

*b*

^{∗}, which are valued by different penalty factors, i.e.

By increasing the error penalty of the *y*_{
i
} = 0 part of the sample (*x*_{
i
}, *y*_{
i
}), i.e. the penalty of misjudging the unstable state, the error of misjudging the unstable states as stable states is reduced.

## 4 Discussion

### 4.1 Case study

The fault configuration

Fault Bus | Tripping Line | Failure duration | Stable sample | Unstable sample |
---|---|---|---|---|

Bus38 | Line30–38 | 0.214 s | 5846 | 2154 |

In the 8000 groups of samples, 6000 groups of data are randomly selected as training samples and the other 2000 groups of data are used as test samples.

Comparison of classification effect between original SVM and the proposed method

Algorithm | GR accuracy | Erroneous judgement sample | Missing judgement sample | Classification accuracy |
---|---|---|---|---|

Standard SVM | 90.78% | 110 | 70 | 94.50% |

Modified SVM | 93.96% | 72 | 37 | 96.40% |

Through the example analysis in the IEEE 118-bus, it verifies the effectiveness of the proposed security and stability analysis algorithm in the analysis of the safety regions during faults in the interregional liaison lines.

## 5 Conclusion

Based on the standard SVM model, the algorithm of cluster analysis is merged and the penalty factor is corrected. A two-stage SVM model based on clustering analysis and penalty factor adjustment is established. The analysis of the IEEE 118-bus system validates that the proposed model can effectively improve the classification accuracy of the SVM model. At the same time, the K-means clustering method is used to reduce the sample size of the second stage to improve the training speed of the model.

Through this study, the probability output of SVM can effectively distinguishes the SVM model in gray space. The fiducial probability output of classification results can make the classification results of the vague samples to be effectively recognized for subsequent processing. The classification accuracy of the samples in the gray space can be effectively improved by using the clustering analysis with the SVM model. Using the penalty factor adjustment method can reduce the proportion of unstable results mistaken as stable ones. In addition, this method can effectively deal with each cluster after being clustered into two samples due to unbalanced training difficulties, so as to improve the accuracy of classification.

## Declarations

### Funding

This work was supported by China’s National key research and development program 2017YFB0902201, National Natural Science Foundation of China under Grant 51777104, Science and Technology Project of the State Grid Corporation of China.

### Authors’ contributions

HK carried out the theoretical studies, algorithm guidance and drafted the manuscript. SG participated in the theoretical analysis and guided the research method. WH participated in the research method and simulation analysis. ZL carried out the theoretical studies, algorithm researches and drafted the manuscript. WS participated in the design of the study and performed the simulation analysis. ZQ participated in the design of the model research and performed the simulation analysis. WH conceived of the study, and participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

### Competing interests

The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Kundur, P. (1994).
*Power system stability and control*. New York: McGraw-Hill.Google Scholar - Weiling, Z., Wei, H., Yong, M., et al. (2016). Conservative online transient stability assessment in power system based on concept of stability region.
*Power Syst Technol, 40*(4), 992–998.Google Scholar - Gomez, F. R., Rajapakse, A. D., Annakkage, U. D., & Fernando, I. T. (2011). Support vector machine-based algorithm for post-fault transient stability status prediction using synchronized measurements.
*IEEE Trans Power Syst, 26*(3), 1474–1483.View ArticleGoogle Scholar - Zhou, Q., Davidson, J., & Fouad, A. (1994). Application of artificial neural networks in power system security and vulnerability assessment.
*IEEE Trans Power Syst, 9*(1), 525–532.View ArticleGoogle Scholar - Al-masri, A., Kadir, M., Hizam, H., & Mariun, N. (2013). A novel implementation for generator rotor angle stability prediction using an adaptive artificial neural network application for dynamic security assessment.
*IEEE Trans Power Syst, 28*(3), 2516–2525.View ArticleGoogle Scholar - Amjady, N., & Majedi, S. (2007). Transient stability prediction by a hybrid intelligent system.
*IEEE Trans Power Syst, 22*(3), 1275–1283.View ArticleGoogle Scholar - Bahbah, A. G., & Girgis, A. A. (2004). New method for generators’ angles and angular velocities prediction for transient stability assessment of multimachine power systems using recurrent artificial neural network.
*IEEE Trans Power Syst, 19*(2), 1015–1022.View ArticleGoogle Scholar - Guo, T., & Milanovic, J. V. (2016). Online identification of power dynamic signature using PMU measurements and data mining [J].
*IEEE Trans Power Syst, 31*(3), 1760–1768.View ArticleGoogle Scholar - He, M., Vittal, V., & Zhang, J. (2013). Online dynamic security assessment with missing PMU measurements: A data mining approach.
*IEEE Trans Power Syst, 28*(2), 1969–1977.View ArticleGoogle Scholar - Sun, K., Likhate, S., Vittal, V., Kolluri, V. S., & Mandal, S. (2007). An online dynamic security assessment scheme using phasor measurements and decision trees.
*IEEE Trans Power Syst, 22*(4), 1935–1943.View ArticleGoogle Scholar - He, M., Zhang, J., & Vittal, V. (2013). Robust online dynamic security assessment using adaptive ensemble decision-tree leaning.
*IEEE Trans Power Syst, 28*(4), 4089–4098.View ArticleGoogle Scholar - Li, D. H., & Cao, Y. J. (2006). Power system transient stability analysis based on PMU and hybrid support vector machine.
*Power Syst Technol, 30*(9), 46–52.Google Scholar - Iang, L. X., Wang, X. H., Ang, J. W., et al. (2007). Feature selection for SVM based transient stability classification.
*Relay, 35*(9), 17–21.Google Scholar - Ai, Y. D., Chen, L., Zhang, W. L., et al. (2016). Power system transient stability assessment based on multi-support vector machines.
*Proc CSEE, 36*(5), 1173–1180.Google Scholar - Xu, T., Chiang, H., Liu, G., & Tan, C. (2017). Hierarchical K-means method for clustering large-scale advanced metering infrastructure data.
*IEEE Trans Power Delivery, 32*(2), 609–616.View ArticleGoogle Scholar - Lin, Y. (2011). Using K-means clustering and parameter weighting for partial-discharge noise suppression.
*IEEE Trans Power Delivery, 26*(4), 2380–2390.View ArticleGoogle Scholar