Based on the aforementioned basic framework of cascading failure, an enhanced model is proposed in this section. Compared with the basic model, the corresponding improvements mainly focus on emerging prevention and control measures, subsequent failure search strategy and results analysis.
Emerging prevention and control measures
In this paper, both load shedding and generator tripping are considered during analysis. For the situation of power flow diverged, the generator tripping or load shedding strategy will be activated, and the corresponding strategy selection is due to the total generation outputs and loads. These two strategies are both based on a kind of power flow tracing technique [20]. Given that the applied stability and control measures for the two strategies are similar, here only the load shedding strategy is taken as the example to illustrate the implementation details.
Power flow tracing technique is an effective method to obtain the relationship between power sources and loads. Through this method, the source generations and destination loads of failure lines can be distinguished and the specific influence degree can be calculated, which is significant to specify the influenced nodes.
In the proposed load shedding strategy, the power flow tracing technique is used to find such nodes which are influenced by the failures positioned in previous layer, and an index named “Degree of Failure Impact (DFI)” is defined to evaluate the degree of influence:
$$ \mathrm{D}\mathrm{F}\mathrm{I}={\omega}_1\frac{P}{P_l}+{\omega}_2\frac{P}{P_f} $$
(1)
where P and P
_{
l
} are the influenced load power and the current load power of studied node respectively; P
_{
f
} is the total load power of the previous failure line; ω
_{1} and ω
_{2} are coefficient values related to power system. They can be determined by using variation coefficient method. It can be seen that the physical meanings of Eq.(1) denotes the proportion of the influenced load power accounted for by the total amount of load power and all influenced load powers of the studied node, respectively.
In this strategy, if the DFI index of a node reaches a given threshold, the “influenced node” will perform load shedding firstly. All nodes will be divided into several groups according to the DFI level, and such group with lager value of DFI will perform load shedding firstly. After performing load shedding for all influenced nodes, the under specific voltage load shedding strategy will be applied for remaining nodes.
In addition, considering the reactions of actual power system, the action of load shedding should be divided into several rounds instead of shedding all loads once. Operations in each round have exactly the same form, and the detailed procedures shown in Fig. 2 can be described as follows:

1)
Subgroup: Divide load nodes into groups according to DFI value from large to small at regular intervals, the first of which is defined as the current “Shedding Group”.

2)
Shedding load: For nodes in the current “Shedding Group”, perform the load shedding strategy as a preset proportion and calculate power flow. If the power flow is still diverged, go to 3), else go to 4).

3)
Change Shedding Group: Check whether all groups in this round have finished the load shedding, if so, go to the next round and set the first group as the current “Shedding Group”, if all rounds are over but the system is still diverged, that means this subregion collapses, go to 4); if not, change the next group as the current “Shedding Group”, continue to 2).

4)
Record: Save load shedding record, stop process.
In our work, total 3 rounds are set, proportion of which are 50, 30 and 20%, respectively, and the load shedding has a 5% minimum restriction at each time period. In each round, the load shedding will be performed according to the order mentioned above until the power system returns to the normal operating condition.
The corresponding simulation results can prove the effectiveness of this proposed strategy. For example, regarding the obtained failure chain l
_{1}, l
_{14}, l
_{44}, l
_{26}, l
_{6}, l
_{7} and l
_{8} in IEEE39 test system, if using under specific voltage load shedding strategy, the load losses will be 1138.2 MW, while when using the proposed load shedding strategy, the load losses are only 446.5 MW. Besides, the stability of power system is also improved. For the previous example, the voltage variance after shedding load is used to evaluate the system stability, and the value when using the proposed strategy is only 10% of the value when using under specific voltage load shedding strategy.
Subsequent failure search strategy
In most conventional cascading failure models, the number of failure lines in the same layer is mainly restricted to 1 or 2. However the number of failures occurred at the same time is uncertain. Considering the actual blackout events, the number of failures occurred at the same time is still limited. Hence in this section only the probability of failure lines is discussed. The number of failures in the same layer can be restricted to an appropriate number according to the needs of the studied power system.
In this model, if the number of failures in the same layer is restricted, the failure lines can be selected according to the probability sorted in descending order. When the number of selected lines reaches the restriction or all possible lines are decided whether they are selected, the procedure of failure line selection is stopped.
For the overload failure, an index named “Line Load Ratio (LLR)” is defined to evaluate the degree of overload:
$$ \mathrm{L}\mathrm{L}\mathrm{R}={T}_{lo}/{T}_{lr} $$
(2)
where T
_{
lo
} and T
_{
lr
} are the normal and rated transmission power of studied line respectively.
In the conventional models, only the degree of overload in the current layer is considered during analysis. In fact, the possibility of overloading for line is closely related to both the overload degree and the duration time of overload. In this case the following piecewise function is proposed to describe the overload probability:
$$ {P}_n=\left\{\begin{array}{l}{P}_{n1}+0.2\kern1.75em {L}_0\le \mathrm{LLR}\le {L}_1\\ {}{P}_{n1}+0.4\kern1.75em \mathrm{LLR}\ge {L}_1\\ {}0\kern4.75em \mathrm{LLR}\le {L}_0\end{array}\right. $$
(3)
where P
_{
n
} is the overload probability of the nth layer (The initial P
_{0} is 0, and if P
_{
n
} is greater than 1, the failure occurs certainly), and L
_{0} and L
_{1} are LLR thresholds related to the system. In such way it can be seen that the overload possibility increases with the increase of overload time.
For hidden failure, in the enhanced model, the power transfer will be taken into account to extend the possible line selection.
In regard to the distance, considering a line which may be near several previous failures, an index named “Equivalent Distance (ED)” is defined as:
$$ \mathrm{E}\mathrm{D}=\frac{1}{\frac{1}{D_1}+\frac{1}{D_2}+\cdots } $$
(4)
where D
_{1} and D
_{2} are the distances between the line and failures nearby (only such distance which is less than 3 is considered).
Here an example as shown in Fig. 3 is given to explain the equivalent distance. Suppose that the previous line is line 3–5 and line 1–2 (which is marked with thick line in Fig. 3). Accordingly the equivalent distance of line 2–3 is 1/2 and the equivalent distance of line 4–5 is 2/3.
The shift amount of power flow is used to evaluate the degree of power transfer as given in following:
$$ S=\frac{\left P{P}_{norm}\right}{P_{norm}} X $$
(5)
where P and P
_{
norm
} are the power flows of the current and previous layer. Considering the power system (especially for the studied line) will be affected much more when the power flow increases. The coefficient of power flow change X is introduced. In this model, it will be 2 when the power flow increases and 1 when the power flow decreases.
The final probability is defined as the ratio of shift amount of power flow to the equivalent distance. It is also be restricted to an upper limit Phm considering hidden failures.
$$ {P}_h=\frac{S}{\mathrm{ED}} $$
(6)
Sequential pattern mining model
As mentioned above, the conventional statistical methods for the simulation results are hard to reveal some key information. In this paper, the sequential pattern mining technique [21] will be employed to analyze the failure chains.
Sequential mining technique is a kind of association analysis, which is mainly used to find sequential patterns. By sorting all the events associated with an object in increasing order of their timestamps, a sequence for the object is obtained.
Actually, the failures have time sequence. Therefore in this model, the failure chains are thought to be sequences so that the sequence pattern mining technique can be introduced.
Subsequences obtained are usually measured in terms of their support and confidence. For subsequence XY (that means failure line X will trigger failure line Y), support determines how often this subsequence appears, while confidence determines how frequently this subsequence appears in failure chains that contains line X. Considering actual situation of cascading failure, the confidence value will be mainly employed to measure, while the support value will be used to make sure that the number of studied subsequence will not be less.
The traditional confidence value can be defined like following:
$$ {C}_{i j}={N}_{i j}/{N}_i $$
(7)
where C
_{
ij
} is the confidence value of subsequence ij, N
_{
ij
} and N
_{
i
} are the appearance number of this subsequence and line i.
Considering the propagation process of cascading failure, this subsequence cannot happen in some cases of which line j has already in failure chains before failure line i occurs. After removing this situation, the association probability of a subsequence is proposed to evaluate the correlation of lines:
$$ {P}_{i j}={N}_{i j}/{N}_i\hbox{'} $$
(8)
where N
_{
i
}
^{’} is the appearance number of line i after removing this situation.
In order to evaluate the losses of related lines, an index named “Sequence Load Loss (SLL)” is defined as the average load loss of failure chains which include the studied sequence. This proposed index can help to identify some key sequences.
In addition to the conventional sequence pattern search, the relation between subsequences and results is also be researched. Here, a kind of result relation search strategy is proposed. After classification, the load loss will be added to the sequence as the last subitem so that the relation can be revealed directly.
Cluster analysis model
In data analysis of cascading failure, a large part of data is ignored actually, which mainly involves the intermediate data. In this paper, the cluster analysis technique [21] will be employed to analyze the relationship between the intermediate data and result data.
Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. This technique can group and evaluate data without knowing possible relationship in advance.
For data of cascading failure, the relationship between the intermediate data and result data cannot be obtained in advance. Through using cluster analysis technique, this kind of relationship can be found and evaluated.
Firstly, all studied data should be determined. For the result data, the load loss amount can be determined absolutely. For the intermediate amount, in this paper the indices which are related to a single failure layer are considered. Here, the load loss, the offset of low voltage and the shift amount of power flow of a single layer are introduced to evaluate its state.
The load loss of a single layer is the difference of remaining load between the previous layer and the studied layer:
$$ P{L}_i={P}_{i1}{P}_i $$
(9)
where P
_{
i
} is the remaining load of the ith layer (The initial P
_{0} is the load of the initial state).
The offset of low voltage is used to measure the degree of low voltage in the studied power system:
$$ U{O}_i={\displaystyle \sum_{i=1}^n\varDelta {U}_i} $$
(10)
where n is the number of nodes operated currently, and ΔU
_{
i
} is the offset of low voltage to a single node, which is defined as: (expressed in p.u. value)
$$ \varDelta {U}_i=\left\{\begin{array}{l}0\kern4em {U}_i\ge 0.95\\ {}0.95{U}_i\kern1em {U}_i<0.95\end{array}\right. $$
(11)
The shift amount of power flow in a single layer is the sum of shift amount in all operated lines which are positive.
Secondly, the method of cluster analysis should be determined. In this model, the agglomerative hierarchical clustering method will first be used to determine the number of clusters and initial centroids, and Kmeans algorithm will be used to implement the classification in detail. Consider that the result of Kmeans algorithm is related to initial centroids, the program will be run repeatedly and the optimal solution will be chosen from multiple initial centroids.
Finally, the clusters obtained need to be evaluated. Commonly the silhouette coefficient, which combines both cohesion, and the separation is used. The silhouette coefficient can evaluate an individual point in terms of its closeness to its cluster. In this model, the average value of silhouette coefficients of all nodes is used to evaluate cluster results, and the number of larger silhouette coefficients is also a good reference.