### 3.1 Power system security regions analysis method based on two-segment SVM model

The proposed algorithm is improved for single or multiple SVM by improving the accuracy of single SVM classifier. At the same time, through K-means clustering algorithm and segmented processing, the computation requirement of multiple SVM in the second stage is reduced, which improves not only the accuracy but also the training, optimization and classification speed of the model. To solve the issues related to mistaking the unstable conditions of the system as stable ones, this paper proposes to adjust the penalty factor of the SVM model in the second stage. In addition, adjusting the penalty factor does not change the computation requirement of model training and classification. Combined with the K-means clustering algorithm, it can reduce the number of unstable state of the system being mistaken for stable ones. The training process of this algorithm is shown in Fig. 1.

### 3.2 First stage SVM

In the first stage of SVM training, it is necessary to ensure the accuracy rate is as high as possible, and to reduce the proportion of samples into the second stage to improve the efficiency of security regions analysis. In this model, Grid Search is adopted first to optimize the parameters of the SVM model, and the SVM model with the best classification accuracy is obtained. The probability output of the SVM is then used to judge the fiducial probability of the sample in the SVM model in the first stage.

The SVM model used here is the RBF kernel, which is shown by the kernel function as:

$$ k\left({x}_i,x\right)=\exp \left(-\gamma {\left\Vert {x}_i-x\right\Vert}^2\right),\kern1.75em \gamma >0 $$

(12)

In (4) and (12), there are two parameters in the process of parameter optimization: penalty factor *C* and kernel function parameter *γ*. The Grid Search method is to give a range of parameter C and *γ*, divide the grid under the given step size, traverse each point in the grid by cross validation and select the highest accuracy as the optimal parameter. The method of cross validation is to train samples of SVM model in groups, one is used as a real training set and the other is verified by validation set. The most commonly used is the cross validation of N-CV. The method divides samples into N groups, with one group as a test set and the rest as training sets, to evaluate the accuracy of this parameter after all the results are synthesized. After traversing the grid, the classification accuracy of all the parameters is collected.

After selecting the parameters C and *γ*, the model determines the grey space in the sample by the fiducial probability of the classification results of each sample. It then structures the SVM sample set, where *x*_{
i
} ∈ *R*^{m} represents the input characteristics of the sample using active and reactive power of the generators and lines, bus voltage magnitude and phase angle, and active and reactive power of the load as the output characteristics, and *y* ∈ {1, 0} represents the classification of the stability output. The function is mapped to the high dimensional space by using the RBF kernel function, and the hyperplane is constructed. The constructor uses the function *g*(*x*) to represent the distance between the sample and the optimal hyperplane as

$$ g(x)=\sum \limits_{i=1}^l{a}_i{y}_ik\left({x}_i,x\right)+b $$

(13)

where *a*_{
i
} ∈ *R*^{m} is the Lagrange operator and *k*(*x*_{
i
}, *x*) is shown in (12).

The distance from the sample to the hyperplane indicates the probability of the type of output classification is the sample in. The probability of *y* = 0 in this sample is:

$$ P\left({C}_0\left|x\right.\right)=1/\left(1+{e}^{g(x)}\right) $$

(14)

and the probability of *y* = 1 in this sample is:

$$ P\left({C}_1\left|x\right.\right)=1/\left(1+{e}^{-g(x)}\right) $$

(15)

The SVM selects the maximum probability between them as the output, and in this model, the maximum probability of the two is selected as the maximum probability of the classification results of the sample. After several experiments, it is known that for samples whose output probability is larger than 0.99, the result is consistent with the real stable state. Otherwise, there may be missed or false judgement. Therefore, the sample with SVM output being less than 0.99 will be defined as a grey space sample in the first stage. The samples will be further processed to improve the accuracy of classification.

### 3.3 K-means clustering

When using K-means clustering to process data in grey space, Euclidean distance is selected as the cluster index. In the sample space, the center points are randomly selected for K-means clustering, and the clustering results are observed. After clustering, the appropriate K value and center point are selected. When the K value initially increases, the cluster index declines fast and the clustering effect is improved. However, after the K value increases to a certain degree, the cluster index declines slowly and further increase of the K value may sometimes even lead to the increase of the cluster index. This may reach a state of local convergence. In the K-means clustering algorithm, the local optimal condition is eliminated by repeated calculation to select the optimal value.

On the choice of the cluster index, the larger the K value is, the smaller the cluster index is and the better effect the clustering has. However, the increase of K value will increase the number of the SVM models in the second stage, the complexity of the model and the amount of required computation. In this model, the K value is selected when the cluster index starts to decline slowly, and the center point of the optimal clustering effect is recorded after repeated calculation.

After a new cluster sample is judged in the grey space by the SVM at the first stage, the Euclidean distance between the sample and each center point is calculated. The categories belonging to the sample are determined according to the nearest center point, and the second stage SVM model is used to classify them.

### 3.4 Second stage SVM

The second stage SVM is used to deal with the grey space data after clustering. According to the conservation of the power system, the possibility of the leakage is to be minimized. Thus, a penalty factor is introduced to modify the constraints of the second stage SVM.

In the second stage SVM model, the sample is set as (*x*_{
i
}, *y*_{
i
}), *i* = 1, 2, ⋯, *l*, *x*_{
i
} ∈ *R*^{n}. The optimization problem such as shown in (4) indicates that the first one represents the maximization of the interval between samples, the second one is the slack variable *ξ*_{
i
} introduced by the existence of deviating points, and *C* is the penalty factor. In (4), \( C\sum \limits_{i=1}^n{\xi}_i \) represents the size of the error term. The greater the *C* is, the more important it is to represent the error.

The standard C-SVM model uses a *C* parameter as a penalty factor and does not distinguish between missed and misjudged errors. In order to deal with the situation that the unstable state of power system being mistaken as stable ones, this model selects different penalty factors *C*_{1} and *C*_{0} to deal with the distinction between errors due to different causes. The error term is defined as:

$$ {C}_1\sum \limits_{i\left|{y}_i=1\right.}{\xi}_i+{C}_0\sum \limits_{i\left|{y}_i=0\right.}{\xi}_i $$

(16)

With the interval error, the optimization problem can be changed to:

$$ {\displaystyle \begin{array}{l}\min \frac{1}{2}{\left\Vert w\right\Vert}^2+{C}_1\sum \limits_{i\left|{y}_i=1\right.}{\xi}_i+{C}_0\sum \limits_{i\left|{y}_i=0\right.}{\xi}_i\\ {}s.t.\kern0.75em {y}_i\left({w}^T{x}_i+b\right)\ge 1-{\xi}_i,\kern0.75em i=1,\cdots, n\\ {}\kern1.75em {\xi}_i\ge 0,\kern0.5em i=1,\cdots, n\end{array}} $$

(17)

In solving the optimization problem, the Lagrange function is introduced as:

$$ {\displaystyle \begin{array}{l}L\left(w,b,\xi, \alpha, \beta \right)=\frac{1}{2}{\left\Vert w\right\Vert}^2+{C}_1\sum \limits_{i\left|{y}_i=1\right.}{\xi}_i+{C}_0\sum \limits_{i\left|{y}_i=0\right.}{\xi}_i\\ {}-\sum \limits_{i=1}^n{a}_i\left({y}_i\left(\left(w\cdot \Phi \left({x}_i\right)\right)+b\right)-1+{\xi}_i\right)-\sum \limits_{i=1}^n{\beta}_i{\xi}_i\end{array}} $$

(18)

Being a duality problem, the optimization problem can be changed to:

$$ {\displaystyle \begin{array}{l}\underset{\alpha }{\min}\frac{1}{2}\sum \limits_{i=1}^n\sum \limits_{j=1}^n{\alpha}_i{\alpha}_j{y}_i{y}_jk\left({x}_i,{x}_j\right)-\sum \limits_{i=1}^n{\alpha}_i\\ {}s.t.\kern0.5em \sum \limits_{i=1}^n{y}_i{\alpha}_i=0\\ {}\begin{array}{cc}& 0\le {\alpha}_i\end{array}\le {C}_1,\kern0.5em {y}_i=1\\ {}\begin{array}{cc}& 0\le {\alpha}_i\end{array}\le {C}_0,\kern0.5em {y}_i=0\end{array}} $$

(19)

After the solution, *α*^{∗} can be expressed as:

$$ {\displaystyle \begin{array}{l}{\alpha}^{\ast }={\left({\alpha}_1^{\ast },{\alpha}_2^{\ast },\cdots, {\alpha}_n^{\ast}\right)}^T\\ {}\left\{\begin{array}{c}{\alpha}_i\in \left(0,{C}_1\right)\kern0.5em {y}_i=1\\ {}\begin{array}{cc}{\alpha}_i\in \left(0,{C}_0\right)& {y}_i=0\end{array}\end{array}\right.\end{array}} $$

(20)

The *b*^{∗} component is then calculated as:

$$ {b}^{\ast }={y}_i\sum \limits_{i=1}^n{y}_i{\alpha}_ik\left({x}_i,x\right) $$

(21)

Finally, a new decision function *f*(*x*) is constructed as the classification rule by *α*^{∗} and *b*^{∗}, which are valued by different penalty factors, i.e.

$$ f(x)=\operatorname{sgn}\left(\sum \limits_{i=1}^n{y}_i{\alpha}_i^{\ast }k\left({x}_i,x\right)+{b}^{\ast}\right) $$

(22)

By increasing the error penalty of the *y*_{
i
} = 0 part of the sample (*x*_{
i
}, *y*_{
i
}), i.e. the penalty of misjudging the unstable state, the error of misjudging the unstable states as stable states is reduced.