- Original research
- Open Access
An enhanced cascading failure model integrating data mining technique
© The Author(s) 2017
Received: 15 December 2016
Accepted: 24 January 2017
Published: 8 February 2017
An enhanced cascading failure model integrating data mining technique is proposed in this paper. In order to better simulate the process of cascading failure propagation and further analyze the relationship between failure chains, in view of a basic framework of cascading failure described in this paper, some significant improvements in emerging prevention and control measures, the subsequent failure search strategy as well as the statistical analysis for the failure chains are made elaborately. Especially, a sequential pattern mining model is employed to find out the association pertinent to the obtained failure chains. In addition, a cluster analysis model is applied to evaluate the relationship between the intermediate data and the consequence of obtained failure chain, which can provide the prediction in potential propagation path of cascading failure to reduce the risk of catastrophic events. Finally, the case studies are conducted on the IEEE 10-machine-39-bus test system as benchmark to demonstrate the validity and effectiveness of the proposed enhanced cascading failure model. Some preliminary concluding remarks and comments are drawn.
In recent years, many blackouts occurred around the world due to the increasing complexity and immensity of modern power systems. So far, there have been over 10 large-scale blackout events, such as the US-Canadian blackout of August 14th, 2003 , the UCTE blackout of November 4th, 2006 , the blackout in Brazil power grid on November 10th, 2009 , the India blackouts of July 30 and July 31, 2012  etc., occurred since 2000. It has been generally acknowledged that the cascading failure is one of the main root causes which leads to the severe blackout events.
Several kinds of theories and methods have been proposed to investigate cascading failure: some of them are based on self-organized critically (SOC) theory, including the OPA model , which considers the SOC of the growth of load demand and power supply, the CASCADE model [6, 7], which simulates cascading failure from initial disturbance and load growth, and the branching process model  etc.; some theories based on network topology analysis [9–12], which analyze cascading failure through distinguishing the topological feature of power system, like small-world network, scale-free network and so on; pattern search strategy [13–19], which aims to reveal cascading failure directly through searching failure chains in accordance with previously given strategy, etc. Among these pattern search strategies, the traditional ways can already simulate the process of cascading failure to a certain extent. However, some problems disclosed from important procedures make them hard to fully reflect the actual process and reaction of cascading failure. Firstly, the emerging prevention and control measures applied in traditional models only consider one means commonly, and the operability of measures taken is usually poor. Besides, the restriction of subsequent failure search is quite strict as well as some key factors are not considered, like the duration time of overload state and the distance to previous failures. What’s more, the obtained data from the statistical methods can only describe some basic characteristics of cascading failure to some extent. However, some key information such as the correlation between the former event chain and the subsequent one is difficult to be revealed. At the same time, data from intermediate process in cascading has not been used at all. Therefore, it is imperative for the inherently existed and generated data during cascading failure analysis to be explored and exploited elaborately further.
In this paper, an enhanced cascading failure model is proposed. In this model, a new kind of emerging prevention and control measures considering both effect and operation are proposed to make sure these actions close to practical situation as could as possible. Additionally, an improved subsequent search strategy including overload failure and hidden failure is introduced to search failure more realistically. Especially, a kind of sequence pattern mining model is employed to analyze the obtained failure chains comprehensively, from which the association of failure lines can be obtained. Besides, a kind of cluster analysis model is employed to analyze the relationship between the intermediate data and the result data. These relationships obtained are beneficial to cascading failure predictions. Finally based on IEEE 10-machine-39-bus test system, the simulations are conducted to demonstrate the effectiveness of the proposed model. The relevance of cascading failure is also analyzed as well as some useful information is drawn.
The rest of the paper is organized as follows: In Section 2, the basic framework of cascading failure search is introduced. The detailed enhanced cascading failure model, including emerging prevention and control measures, subsequent failure search strategy and data mining technique (including sequential pattern mining model and cluster mining model) is discussed in Section 3. The case studies with different simulation scenarios are carried out in Section 4. Finally, conclusions can be found in Section 5.
Basic framework of cascading failure
Initial line outage: Set a line as an initial failure chain under normal operating condition. The initial outage line can be generated randomly or set specifically.
System partition and power flow calculation: Divide the system into several sub-regions based on current network topology and choose these regions which can operate independently. Calculate power flow in each region.
Emerging prevention and control measures: If the power flow calculated in any sub-regions diverged, the stability measures are activated, the most common of which is load shedding. Currently, the widely used load shedding methods mainly involve overall load shedding strategy  and under specific voltage load shedding strategy .
Blackout judgement: Calculate load losses of the whole test system. If all loads are lost, the search is stopped. That means a blackout event occurred.
- 5)Subsequent failure search: Identify and determine whether there exist the subsequent failures, including overload failure and hidden failure. If there are no failures, stop searching.
Cascading failure record: Record this search process when it has been finished. Conduct the statistical analysis pertinent to the simulation results.
Discussion and Methods
Based on the aforementioned basic framework of cascading failure, an enhanced model is proposed in this section. Compared with the basic model, the corresponding improvements mainly focus on emerging prevention and control measures, subsequent failure search strategy and results analysis.
Emerging prevention and control measures
In this paper, both load shedding and generator tripping are considered during analysis. For the situation of power flow diverged, the generator tripping or load shedding strategy will be activated, and the corresponding strategy selection is due to the total generation outputs and loads. These two strategies are both based on a kind of power flow tracing technique . Given that the applied stability and control measures for the two strategies are similar, here only the load shedding strategy is taken as the example to illustrate the implementation details.
Power flow tracing technique is an effective method to obtain the relationship between power sources and loads. Through this method, the source generations and destination loads of failure lines can be distinguished and the specific influence degree can be calculated, which is significant to specify the influenced nodes.
where P and P l are the influenced load power and the current load power of studied node respectively; P f is the total load power of the previous failure line; ω 1 and ω 2 are coefficient values related to power system. They can be determined by using variation coefficient method. It can be seen that the physical meanings of Eq.(1) denotes the proportion of the influenced load power accounted for by the total amount of load power and all influenced load powers of the studied node, respectively.
In this strategy, if the DFI index of a node reaches a given threshold, the “influenced node” will perform load shedding firstly. All nodes will be divided into several groups according to the DFI level, and such group with lager value of DFI will perform load shedding firstly. After performing load shedding for all influenced nodes, the under specific voltage load shedding strategy will be applied for remaining nodes.
Subgroup: Divide load nodes into groups according to DFI value from large to small at regular intervals, the first of which is defined as the current “Shedding Group”.
Shedding load: For nodes in the current “Shedding Group”, perform the load shedding strategy as a preset proportion and calculate power flow. If the power flow is still diverged, go to 3), else go to 4).
Change Shedding Group: Check whether all groups in this round have finished the load shedding, if so, go to the next round and set the first group as the current “Shedding Group”, if all rounds are over but the system is still diverged, that means this sub-region collapses, go to 4); if not, change the next group as the current “Shedding Group”, continue to 2).
Record: Save load shedding record, stop process.
In our work, total 3 rounds are set, proportion of which are 50, 30 and 20%, respectively, and the load shedding has a 5% minimum restriction at each time period. In each round, the load shedding will be performed according to the order mentioned above until the power system returns to the normal operating condition.
The corresponding simulation results can prove the effectiveness of this proposed strategy. For example, regarding the obtained failure chain l 1, l 14, l 44, l 26, l 6, l 7 and l 8 in IEEE-39 test system, if using under specific voltage load shedding strategy, the load losses will be 1138.2 MW, while when using the proposed load shedding strategy, the load losses are only 446.5 MW. Besides, the stability of power system is also improved. For the previous example, the voltage variance after shedding load is used to evaluate the system stability, and the value when using the proposed strategy is only 10% of the value when using under specific voltage load shedding strategy.
Subsequent failure search strategy
In most conventional cascading failure models, the number of failure lines in the same layer is mainly restricted to 1 or 2. However the number of failures occurred at the same time is uncertain. Considering the actual blackout events, the number of failures occurred at the same time is still limited. Hence in this section only the probability of failure lines is discussed. The number of failures in the same layer can be restricted to an appropriate number according to the needs of the studied power system.
In this model, if the number of failures in the same layer is restricted, the failure lines can be selected according to the probability sorted in descending order. When the number of selected lines reaches the restriction or all possible lines are decided whether they are selected, the procedure of failure line selection is stopped.
For hidden failure, in the enhanced model, the power transfer will be taken into account to extend the possible line selection.
Sequential pattern mining model
As mentioned above, the conventional statistical methods for the simulation results are hard to reveal some key information. In this paper, the sequential pattern mining technique  will be employed to analyze the failure chains.
Sequential mining technique is a kind of association analysis, which is mainly used to find sequential patterns. By sorting all the events associated with an object in increasing order of their timestamps, a sequence for the object is obtained.
Actually, the failures have time sequence. Therefore in this model, the failure chains are thought to be sequences so that the sequence pattern mining technique can be introduced.
Subsequences obtained are usually measured in terms of their support and confidence. For subsequence X-Y (that means failure line X will trigger failure line Y), support determines how often this subsequence appears, while confidence determines how frequently this subsequence appears in failure chains that contains line X. Considering actual situation of cascading failure, the confidence value will be mainly employed to measure, while the support value will be used to make sure that the number of studied subsequence will not be less.
In order to evaluate the losses of related lines, an index named “Sequence Load Loss (SLL)” is defined as the average load loss of failure chains which include the studied sequence. This proposed index can help to identify some key sequences.
In addition to the conventional sequence pattern search, the relation between subsequences and results is also be researched. Here, a kind of result relation search strategy is proposed. After classification, the load loss will be added to the sequence as the last sub-item so that the relation can be revealed directly.
Cluster analysis model
In data analysis of cascading failure, a large part of data is ignored actually, which mainly involves the intermediate data. In this paper, the cluster analysis technique  will be employed to analyze the relationship between the intermediate data and result data.
Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. This technique can group and evaluate data without knowing possible relationship in advance.
For data of cascading failure, the relationship between the intermediate data and result data cannot be obtained in advance. Through using cluster analysis technique, this kind of relationship can be found and evaluated.
Firstly, all studied data should be determined. For the result data, the load loss amount can be determined absolutely. For the intermediate amount, in this paper the indices which are related to a single failure layer are considered. Here, the load loss, the offset of low voltage and the shift amount of power flow of a single layer are introduced to evaluate its state.
The shift amount of power flow in a single layer is the sum of shift amount in all operated lines which are positive.
Secondly, the method of cluster analysis should be determined. In this model, the agglomerative hierarchical clustering method will first be used to determine the number of clusters and initial centroids, and K-means algorithm will be used to implement the classification in detail. Consider that the result of K-means algorithm is related to initial centroids, the program will be run repeatedly and the optimal solution will be chosen from multiple initial centroids.
Finally, the clusters obtained need to be evaluated. Commonly the silhouette coefficient, which combines both cohesion, and the separation is used. The silhouette coefficient can evaluate an individual point in terms of its closeness to its cluster. In this model, the average value of silhouette coefficients of all nodes is used to evaluate cluster results, and the number of larger silhouette coefficients is also a good reference.
In order to reveal the effects of layer failure restriction, 2 similar simulation scenarios named S1 and S2 are designed. In S1, only one overload failure and one hidden failure are studied in failure chains, while in S2 the number is set to be 3. Considering the actual propagation of cascading failure, in S1 and S2, the upper limit of failure layer is limited to 6 and 5, respectively.
Total 10,000 trials are conducted in these 2 simulation scenarios and some meaningful conclusions can be drawn with following analysis results.
From Fig. 4, it can be found that the results from S1 and S2 are similar, which indicate that the failure chain group is not related to layer failure number. Additionally, most chains belong to B-chain while a few chains belong to C-chain. Such results are consistent with the actual situation as the blackout events are hard to occur.
In order to analyze the failure chains in detail, the statistical analysis is used to obtain the basic information of failure chains. From this section, only failure chains belonging to B-chain and C-chain are considered for the statistical results.
Statistical data pertinent to failure type
Averge layers of failure chains
Average number of overload failure in a layer of failure chains
Average number of overload failure in a layer of failure chains whose load losses are over 50%
Average number of hidden failure in failure chains
Average number of hidden failure in failure chains whose load losses are over 50%
Average number of hidden failure in failure chains whose load losses are over 90%
Sequence pattern mining analysis
In this section, the association probability and index SLL as mentioned above are used to identify the related lines and evaluate the corresponding losses respectively. Only when association probability is greater than 0.5, the subsequence can be thought as related lines.
Partial results of related lines (Case 1)
6(O),5(H)-14(O) (compared with 6(O),5(H) in same layer)
Partial results of related lines (Case 2)
42-19-14(compared with 42–19)
From Tables 3 and 4, it can be seen that various kinds of related lines can be found, including the failures in different layers, even in multiple layers, and all types of failures including initial failure, overload failure and hidden failure can be found in Case 1. In addition, the related failures from S1 and S2 show up some differences, however they are generally consistent. Besides, the association probability and the sequence load loss obtained from related failures can help to identify the importance of sequences. For example, the failure chain 44(I)-22(O) in S1 should be handled most carefully from Table 3.
On the other hand, the sequence pattern mining method can also be performed according to special needs. For example, suppose that the hidden failure of line 10 is very important in failure chains. In order to get more information of this failure, the chains containing this failure can be analyzed specially. It can be found that the sequences 3(I)-9(H), 10(H)-5(O), 37(H) appear frequently when the failure 10(H) happens. This kind of conclusion has a certain guidance in making counter-measures to the blackout events.
Finally, the simulation parameters given in Table 1 are changed to further verify this sequence pattern mining model. Parameters L0, L1 and Phm are changed to 1.4, 1.8 and 0.2 so that the failures are hard to occur. The sequence pattern mining analysis results show that the most related lines whose relevancy is high are still kept, like sequence 46–6, 1–14, etc. The association probability of some disappeared lines, like 20(H)-26(O), is not high originally. We can conclude that the related failure lines which have strong correlation are not affected by the simulation parameters.
Cluster mining analysis
In this section, the layer whose load losses are the highest is chosen to conduct the analysis.
Firstly, the layer number of the chosen layer is analyzed, and it can be found from results that the higher of the layer number, the more the selected times. This result shows that the cascading failures are more likely to evolve the serious condition.
From Figs. 6 and 7, it can be found that most points studied are suited to their clusters. In fact, the silhouette values of nearly half of points reach 0.8, and over 75% points reach 0.6. These results show that there exists a strong relationship between the layer of the highest load losses and the whole failure chain. The cluster results from S1and S2 are similar, which show that this relationship is not related to simulation sets.
Coordinates of cluster centers (S2)
Layer load loss
Shift amount of power flow
Final load loss
From Table 5, it can be found that the higher of the final load losses, the higher of intermediate amount of the studied layer.
The sequence pattern mining analysis and the cluster mining analysis show that the interiors of cascading failures have some relevance. This kind of relevance can provide some useful suggestions and guidance for the prevention and cure of cascading failure.
An enhanced cascading failure model integrating data mining technique is proposed in this paper. Some significant improvements including the emerging prevention and control measures and the subsequent failure search strategy are proposed to try to simulate the actual situation. Furthermore, a sequence pattern mining model and a cluster mining model are applied to make the anatomy of the failure chains deeply and comprehensively. By performing simulations on IEEE 39-bus test system, some related failure lines are obtained based on the proposed model and method. The relationship between the layer of the highest load losses and the whole chains is studied and analyzed. Additionally, some useful conclusions are drawn, including that the proposed emerging prevention and control measures can decrease load losses and improve system stability, and the severe failure chains are more likely to involve much more hidden failures. Comparative analysis shows that the related failure lines and the cluster relationships are not influenced by simulation parameters. Future work is under way to further improve the proposed enhanced model.
This work was supported by the National Basic Research Program of China, 973 program (2013CB228203).
QS wrote the manuscript and performed the experiments; LS conceived and designed the framework; YN reviewed and edited this manuscript; DS instructed the load shedding strategy; JZ reviewed and edited this manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- U.S.-Canada Power system outage task force (2004). Final report. [Online]. Available: http://www.epa.gov/region1/npdes/merrimackstation/pdfs/ar/AR-1165.pdf.
- Final report system Disturbance on 4 November 2006. [Online]. Available: http://www.ucte.org/_library/otherreports/Final-Report-20070130.pdf. 2007.
- ANEEL Report on Nov 10, 2009 Brazil Blackout. [Online]. Available: http://www.aneel.gov.br/aplicacoes/noticias_area/dsp_detalheNoticia.cfm?idNoticia=3338&idAreaNoticia=347. 2010.
- Report on the Grid Disturbance on 30th July 2012 and Grid Disturbance on 31st July 2012. [Online]. Available: http://www.cercind.gov.in/2012/orders/Final_Report_Grid_Disturbance.pdf. 2012.
- Mei, S., He, F., Zhang, X., et al. (2009). An improved OPA model and blackout risk assessment. IEEE Transactions on Power Systems, 24(2), 814–823.Google Scholar
- Dobson, I., Carreras, B. A., & Newman, D. E. (2005). A loading-dependent model of probabilistic cascading failure. Probability in the Engineering and Informational Sciences, 19(1), 15–32.MathSciNetView ArticleMATHGoogle Scholar
- Wu, H., & Dobson, I. (2013). Analysis of induction motor cascading stall in a simple system based on the cascade model. IEEE Transactions on Power Systems, 28(3), 3184–3193.View ArticleGoogle Scholar
- Kim, J., & Dobson, I. (2010). Approximating a loading-dependent cascading failure model with a branching process. IEEE Transactions on Power Reliability, 59(4), 691–699.View ArticleGoogle Scholar
- Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 3(6684), 440–442.View ArticleGoogle Scholar
- Bompard, E., Wu, D., & Xue, F. (2011). Structural vulnerability of power systems: a topological approach. Electric Power Systems Research, 81(7), 1334–1340.View ArticleGoogle Scholar
- Chang, L., & Wu, Z. (2011). Performance and reliability of electrical power grids under cascading failures. International Journal of Electrical Power & Energy Systems, 33(8), 1410–1419.View ArticleGoogle Scholar
- Dey, P., Mehra, R., Kazi, F., et al. (2016). Impact of topology on the propagation of cascading failure in power grid. IEEE Transactions on Power Systems, 7(4), 1970–1978.Google Scholar
- Wang, A., Luo, Y., Tu, G., et al. (2011). Vulnerability assessment scheme for power system transmission networks based on the fault chain theory. IEEE Transactions on Power Systems, 26(1), 442–450.View ArticleGoogle Scholar
- Rahnamay-Naeini, M., & Hayat, M. (2016). Cascading failures in interdependent infrastructures: an interdependent Markov-chain approach. IEEE Transactions on Smart Grid, 7(4), 1997–2006.View ArticleGoogle Scholar
- Nedic, D. P., Dobson, I., Kirschen, D. S., et al. (2006). Criticality in a cascading failure blackout model. Electrical Power and Energy Systems, 28(9), 627–633.View ArticleGoogle Scholar
- Z. Shi, L. Shi, Y. Nin, et al. (2011) “Identifying Chains of Events During Power System Cascading Failure,” In: Power and Energy Engineering Conference (APPEEC). New York: Institute of Electrical and Electronic Engineers (IEEE).Google Scholar
- Song, J., Cotilla-Sanchez, E., Ghanavati, G., et al. (2016). Dynamic modeling of cascading failure in power systems. IEEE Transactions on Power Systems, 31(3), 2085–2095.View ArticleGoogle Scholar
- Chen, J., Thorp, J. S., & Dobson, I. (2005). Cascading dynamics and mitigation assessment in power system disturbances via a hidden failure model. Electrical Power and Energy System, 27(4), 318–326.View ArticleGoogle Scholar
- Zhang, L. Y., Ding, L. J., Xiao, X. Y., et al. (2012). Risk assessment of power system cascading failure considering hidden failures and violation of temperature. Advanced Materials Research, 354, 1083–1087.Google Scholar
- Li, C., Wang, J., & Yang, J. (2013). Analytical algorithm for tracing power flow. Proceedings of the CSU-EPSA, 25(3), 119–123.MathSciNetGoogle Scholar
- Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. New Jersey: Addison Wesley.Google Scholar