Skip to main content

Reliability of IEC 61850 based substation communication network architecture considering quality of repairs and common cause failures

Abstract

Mission-critical IEC 61850 system architectures are designed to tolerate hardware failures to achieve the highest reliability performance. Hence, multi-channel systems are used in such systems within industrial facilities to isolate machinery when there are process abnormalities. Inevitably, multi-channel systems introduce Common Cause Failure (CCF) since the subsystems can rarely be independent. This paper integrates CCF into the Markov reliability model to enhance the model flexibility to investigate synchronous generator intra-bay SCN architecture reliability performance considering the quality of repairs and CCF. The Markov process enables integration of the impact of CCF factors on system performance. The case study results indicate that CCF, coupled with imperfect repairs, significantly reduce system reliability performance. High sensitivity is observed at low levels of CCF, whereas the highest level of impact occurs when the system diagnostic coverage is 99% based on ISO 13849-1, and reduces as the diagnostic coverage level reduces. Therefore, it is concluded that the severity of CCF depends more on system diagnostic coverage level than the repair efficiency, although both factors impact the system overall performance. Hence, CCF should be considered in determining the reliability performance of mission-critical communication networks in power distribution centres.

Introduction

Digitalisation of substations is increasing as industries have increased confidence in applying Substation Communication Networks (SCNs) for process automation. The system offers many advantages such as easy system diagnostics, reduced copper wire, less installation time, increased monitoring, and simplifying the process of effecting or implementing system design changes [1, 2]. IEC 61850 is the latest standard for SCNs that enables peer-to-peer communication between substation devices, allowing a faster communication platform between Intelligent Electronic Devices (IEDs) to share critical interlocking and protection messages [3]. Moreover, the IEC 61850 standard also supports bay distributed functions. This ensures high reliability, because the loss of communication resulting from a switch failure at the station level does not render the intra-bay schemes inoperable. However, the industry is still cautious about the reliability of IEC 61850-based SCNs for the execution of mission-critical functions in power distribution centres of industrial facilities in cases of process abnormality [3, 4].

The ability of IEC 61850-based SCNs to enable substation devices to share information is highly preferred compared to legacy communication protocols that only allow master–slave communication configurations. These do not support the peer-to-peer communication required for distributed mission-critical signal exchange [1, 5]. The standard also addresses the challenges resulting from multiple substation communication protocols, including proprietary protocols that make the integration of substation devices even more challenging [1, 2]. The reliability of IEC 61850-based SCN architectures has been explored at both component and system levels using many approaches based on combinatorial analysis methods to investigate composite reliability of the system, such as reliability block diagrams and failure mode effect analysis in the form of a state-space transition approach. However, these approaches fall short when it comes to establishing some of the requirements of the safety-related standard IEC 61508 for Electrical, Electronic and Programmable Electronic (E/E/PE) devices. Complete digitalisation of SCNs in industrial facilities, including power utilities, requires SCNs to interface to IEC 61508-based safety-related systems for exchanging mission-critical messages to ensure a safe and reliable power generation process [6,7,8].

Mission-critical IEC-61850 systems are designed to tolerate hardware failures to achieve the highest reliability performance, which is the prerequisite of the IEC-60870-4 standard [2, 8, 9]. Hence, multi-channel systems are used in mission-critical systems within industrial facilities to isolate machinery in the event of process abnormalities. These systems offer higher reliability than single-channel systems when their failures are independent between the channels. However, multi-channel systems introduce Common Cause Failure (CCF) since the subsystems are rarely independent. CCF factors reduce subsystem independence in multi-channel architectures [10,11,12,13,14], and therefore, incorporating CCF in reliability models is essential to ensure that meaningful and realistic results are obtained [11, 15]. A CCF is defined as a single point of failure in a system that simultaneously causes a system subsystem to become non-functional. The failure could be caused by one or more components failing within a specified time, resulting in the whole system becoming inoperable [11, 12, 16].

Even though dependent failures are primarily due to CCF and cascading failures, both types of failures are modelled as CCF in the literature [14,15,16,17,18]. Hence, dependent failures occur as a result of common stressors that affect multiple subsystems or components within a system [10, 11, 16]. Common causes can result from root causes or coupling factors, where root causes are related to system design and engineering, manufacturing and installation, testing and commissioning, and operating and maintenance. Coupling factors, however, can be associated with the same physical location and design, the same hardware and/or software, or the same installation and maintenance teams [12, 14, 16, 18, 19]. Nevertheless, root causes are the main reason for component failures, whereas coupling factors make a component susceptible to the same root cause. Hence, mitigating root causes does not necessarily eliminate coupling factors, making the modelling of CCF complicated. Consequently, common modelling of CCF follows a fixed proportion estimation approach considering the subsystem overall failure rate as the probability of CCF occurrence. This does not require system-specific data of the CCF itself [16, 20].

The consideration of CCF as hazards leading to system failure necessitates their careful evaluation in system reliability studies to ensure that the reliability performance of the system is not over-stated since theses hazards tend to increase joint system probability of failure. This leads to inaccurate system reliability evaluation [12, 13, 17]. Explicit modelling and analysis of the impact of CCF on the reliability and availability of a system can be a challenging task when the failure probabilities due to CCF are used in the development of the system reliability models [16, 20, 21]. Hence, various reliability models have been developed to ease the quantification and modelling effort for CCF. The models share one main objective even though their approaches may differ. This objective is to quantify the level of both the dependent and independent factors [11, 15, 16, 20]. The contributions of this research are as follows:

  1. (a)

    Integrating CCF in the Markov process reliability model of mission-critical applications considering the quality of repairs.

  2. (b)

    Analysing IEC 61850-based SCN architecture reliability performance considering the quality of repairs for executing mission-critical functions.

  3. (c)

    Investigating SCN architecture’s responsiveness to increasing CCF levels based on the sensitivity and elasticity of mean state transitions.

The remainder of the paper is as follows. Section 2 presents a critical review of IEC 61850 SCN architecture reliability studies. Section 3 provides an overview of synchronous generator protection system architecture and the study basis. The β-factor model is presented in Sect. 4. The modelling of CCF in systems with imperfect repairs and limited diagnostic coverage is presented in Sect. 5 based on a Markov process, while Sect. 6 discusses system reliability considering CCF based on mean system state transitions using the absorbing Markov Chain process and matrix calculus. Case study results and discussions are presented in Sect. 7, and the findings and conclusions are highlighted and discussed in Sect. 8.

A critical review of IEC 61850 SCN architecture reliability

Reliability and availability performance studies

In [22], the transmission performance of different data streams using an Optimised Network Engineering Tool (OPNET) is investigated and the approach focuses on the network architecture, while other investigations using OPNET have focused on the end-to-end delays of the messages on the network [23,24,25]. A comparative study of IEC 61850 editions I and II, to highlight the reliability enhancements in the edition II standard based on Parallel Redundancy Protocol (PRP) and Highly Available Seamless Redundancy (HSR) protocol, is presented in [26]. The PRP and HSR are considered deterministic because of their zero switchover time in a link failure case [27]. The accuracy of frame detection and discarding is presented and discussed in-depth in [28]. Even though the architectural analysis presented in [26] is comprehensive, it does not address the quality of repairs and the associated CCFs. In [29], the application of IEC 61850-based Remote Terminal Units (RTUs) to integrate legacy devices in Substation Automation Systems (SAS) is demonstrated, but no reliability assessment is presented even though it claims that the selected architecture is reliable. Security issues concerning IEC 61850 SCN based on the IEC 62351-7 for network and system management are addressed in [30]. These issues are critical for the overall dependability of the SCNs. Strategies and methods of improving IEC 61850 based SCNs are addressed in [31], which highlights cost as the main hindrance to employing fully redundant systems, while it agrees that PRP and HSR offer high reliability and effectiveness, with HSR being more affordable than PRP. However, the quality of repairs and CCF impact associated with architecture complexity are not addressed. Integration of circuit measurements using Conventional Instrument Transformers (CIT) and Non-Conventional Instrument Transformers (NCIT) in SASs is addressed in [32]. The reliability of the two architectures using the Reliability Block Diagram (RBD) method are evaluated and it concludes that NCITs offer higher reliability than CITs considering PRP and HSR protocol architectures [33, 34]. In [35], the reliability performance of the star, ring, star-ring and redundant ring architectures are comprehensively investigated employing the RBD method, and the advantages and disadvantages of the architectures, as well as their communication efficiency using OPNET are summarised. It states that, while the mathematical analysis resulting from using RBD enables detailed analysis of the network reliability performance, its drawback lies in its failure to consider the quality of repairs [2, 8]. Another IEC 61850-based SCN architecture analysis using the RBD method is presented in [36], which does not consider the quality of repairs associated with the architecture in the case where a device failure occurs. Moreover, the discussed architectures' reliability assumes zero network switchover when network links fail even though the RSTP protocol is applied. This is impossible to achieve. In [37], an algorithm used to minimise traffic congestion in HSR is presented and discussed, though it does not discuss the impact of the quality of repairs. Reference [38] employs Monte Carlo Simulation to investigate the reliability of different IEC 61850-based SCN architectures. Although the method is flexible in evaluating various impacts of failures and repairs, it only considers the reliability and availability of the SCN architectures assuming that the failure rate is nonconstant but follows the Weibull distribution without the repairs’ quality impact. In [7], RBD is adopted to analyse system reliability at the bay level while state-space approach is used to construct a transition probability matrix. The state transition matrix is similar to the Markov transition probability Matrix, but their similarity is not discussed. In addition, repair quality is not considered and it assumes that all repairs are fully implemented. Therefore, although the studies present comprehensive research concerning the reliability of IEC 61850-based SCN architectures, the quality of repairs (viz. imperfect repairs and diagnostic coverage of the system) in determining the reliability of SCN architectures, is not considered.

Advanced IEC 61850 SCN architecture reliability studies

Reference [3] investigates the application of IEC 61850 SCNs in mission-critical safety-related systems using the Markov process. Results demonstrate that IEC 61850 can be considered for executing safety-related missions, whereas in [39], the performance of various SCN architectures is investigated using the Markov process, and it concludes that the performance is acceptable and economical. Reference [4] investigates IEC 61850-based SCNs for executing safety-related mission-critical commands based on the IEC 61508, which is the standard for safety-related systems, and concludes that the IEC 61850 standard can be considered for executing safety-related functional requirements. In addition, the research presented in [5] reveals that the IEC 61850 standard meets all the qualitative dependability requirements of the IEC 61508 as prescribed in IEC 61784-3. The impact of quality of repairs on the performance of SCN architectures and the basis for parameter optimisation are investigated in [40, 41], whereas the responsiveness of the architectures’ mean time to failure based on the mean system state transitions is investigated in [42, 43]. However, CCF impact is not considered in these studies.

Suitability and flexibility of evaluation methods

The reliability performance of a mission-critical system needs to be modelled with high accuracy to ensure its performance. This cannot be achieved by combinatorial analysis methods [2, 8]. The review in [44] states that the Markov process, Petri Nets and Monte Carlo Simulation methods can all be considered for investigating the reliability of a mission-critical system. Even though all the three simulation methods offer high accuracy, consideration on their flexibility, complexity, and ease of implementation in modelling system reliability, is needed. Petri Nets offer both state and transition modelling using places and arcs [45, 46]. However, the method does not consider time and requires further translation into stochastic Petri Nets to simulate discrete systems. In contrast, the Markov process can model both discrete and continuous times naturally. Moreover, there is still insufficient information about the use of Petri Nets application integration, while the Markov process is commonly used to investigate the reliability of safety-related systems [44, 45, 47].

In contrast to the Markov process, the Monte Carlo Simulation method can model various individual parameter failure distributions by sampling multiple parameter values for computation, making it more flexible than the Markov process. Nevertheless, the said flexibility is not needed during a system’s useful life where only exponential distribution is considered for E/E/PE systems. In addition, the Markov process offers more comprehension of the insights of system dynamics through its transition probability matrix, which enables various theoretical concepts for investigating the behavioural characteristics, including transient and asymptotic system response to system parameter changes [41,42,43]. The seamless transformation of the transition probability diagram into a transition probability matrix allows the integration of varied system parameters, enabling a holistic approach in studying the interaction of a system’s subsystems, its environment, and human intervention through Systems Thinking [8, 45, 48, 49]. In addition, unlike Monte Carlo Simulation, where a high number of simulations are required to obtain statistically meaningful results, the Markov process uses mathematical analysis of the transition probability matrix based on dynamical system studies and calculus methods [41,42,43]. Hence, the Markov process is most suitable for studying the reliability of mission-critical safety-related systems during their useful life because of its flexibility and accuracy, while also being simpler to implement than Petri Nets and Monte Carlo Simulation methods.

Overview of synchronous generator protection system SCN architecture and study basis

A simplified single line diagram of a synchronous generator with a ‘one-out-of-two’ IEC 61850-based protection scheme is presented. The scheme channels are based on star configured SCN architectures, where Merging Units (MU) are employed at the process bus to interface the Conventional Instrument Transformer (CIT) measurements to the respective scheme channels. Although it is common for the scheme to cover the auxiliary and generator step-up transformers, this paper focuses on the generator only because their SCN architecture concepts are similar. Figure 1 depicts the configuration of the SCN architectures on the generator, and Table 1 presents the Mean Time To Failure (MTTF) of the SCN devices, where the Mean Time To Repair (MTTR) of each device is 8 h [36].

Fig. 1
figure 1

Synchronous generator IEC-61850 SCN based power distribution system [51,52,53]

Table 1 MTTF data of substation devices [36]

The RBD of the protection scheme architecture is depicted in Fig. 2, and considers the independence of the individual scheme channels. In order to incorporate the impact of quality of repairs, the scheme in Fig. 2. is remodelled using a Markov process, as depicted in Fig. 3 [8, 40]. As shown, λ, µ, reff, edc and β represent the system failure rate, repair rate, repair efficiency, diagnostic coverage and common cause failure factor, respectively. State S-1 represents the fully functional state of the protection scheme, and states S-2 and S-3 represent a condition where only one of the scheme channels is available, whereas state S-4 represents a complete scheme failure. Consequently, the sum of states S-1, S-2 and S-3 probabilities is the system availability probability [40]. The integration of CCF impact resulting from the scheme location, engineering, design, manufacturing, installation and testing, commissioning and operating, and maintenance is presented in the following section based on the beta factor model.

Fig. 2
figure 2

Reliability Block Diagram of ‘one-out-of-two’ scheme [36]

Fig. 3
figure 3

Markov process ‘one-out-of-two’ voting scheme

The beta-factor model

The β-factor model is the most preferred and commonly used parametric method of evaluating the impact of CCF in ‘one-out-of-two’ system configurations [10, 11, 16]. The model is also presented and discussed in the IEC 61508 standard as one of the recommended methods of determining the effect of CCF in multi-channel systems. Modelling of CCF aims to determine their effect on system reliability and availability performance and enable the development of strategies against their impact [16, 21]. Parametric models can be classified into shock and non-shock models, where shock models incorporate CCF basic mechanisms, while non-shock models are based only on the failure probabilities of CCFs. The β-factor model is based on an historical time to failure that is broadly applied. However, it is simplified since it does not explicitly account for individual sub-factors [50].

Nevertheless, considering that only the level of CCF is needed to determine the impact of common causes on the system reliability and that the channels under consideration are identical, the β-factor model can be used to model CCF in ‘one-out-of-two’ system configurations because its application is simple to comprehend and apply. Also, it reduces the effort needed to analyse the results [11, 15, 16]. As a single parameter model, the β-factor model assumes that a constant fraction of the system, subsystem or component failure rate can be attributed to the failure probability of the CCF [15, 16]. Thus, the total system failure rate \(\lambda_{T}\) is given by:

$$\lambda_{{\mathrm{T}}} = \lambda _{{{\mathrm{CCF}}}} + \lambda _{{{\mathrm{IND}}}}$$
(1)

where \(\lambda_{{{\mathrm{CCF}}}}\) represents the failure rate due to CCF while \(\lambda_{{{\mathrm{IND}}}}\) represents the failure rate due to independent components [20], which are given respectively as:

$$\lambda_{{{\mathrm{CCF}}}} = \beta f\left( {\lambda_{A} ,\lambda_{B} } \right)$$
(2)

and [10, 12]:

$$\lambda_{{{\mathrm{IND}}}} = \left( {1 - \beta } \right)\lambda_{A} + \left( {1 - \beta } \right)\lambda_{B}$$
(3)

The estimation of the β-factor is based on system diversity or properties, as well as the architecture [21]. Figure 4 depicts a RBD model of a ‘one-out-of-two’ multi-channel system comprising subsystems A and B, where \(\lambda_{A}\) and \(\lambda_{B}\) are their respective failure rates. Notably, the failure of any component represented by the failure rate function (2) causes the overall mission to fail. Hence, the RBD model offers an effortless comprehension of the β-factor model application. The model of Fig. 4 is redesigned using the Markov process introduced in Sect. 3 and described in [8, 41, 43], to enable the integration of CCF and imperfect repairs into the reliability model [2, 8, 49, 54, 55].

Fig. 4
figure 4

Reliability block diagram model of ‘one-out-of-two’ system incorporating CCF based on the β-factor model

Figure 5 depicts the ‘one-out-of-two’ Markov state transition diagram model integrating the β-factor [21]. It is assumed that the CCF rate function \(f\left( {\lambda_{A} ,\lambda_{B} } \right)\) given by (2) is an averaging function of the two subsystems’ failure rates, such that the CCF rate is the fraction of the CCF function value determined by the β-factor. In comparison to the model presented in Sect. 3, the model depicted in Fig. 5 shows that a system state transition from state S-1 to S-4 is possible due to the presence of CCFs, of which the failure rate is given by (2). The complete state transition probabilities of the ‘one-out-of-two’ system model depicted in Fig. 5 are given as:

$$P_{\beta } = \left[ { \begin{array}{*{20}c} {1 - \left( {1 - \beta } \right)_{A} - \left( {1 - \beta } \right)_{B} - \beta f\left( {\lambda_{A} ,\lambda_{B} } \right)} \\ 0 \\ 0 \\ 0 \\ \end{array} \begin{array}{*{20}c} {\left( {1 - \beta } \right)_{A} } & {\left( {1 - \beta } \right)_{B} } & {\beta f\left( {\lambda_{A} ,\lambda_{B} } \right)} \\ {1 - \lambda_{B} } & 0 & {\lambda_{B} } \\ 0 & {1 - \lambda_{A} } & {\lambda_{A} } \\ 0 & 0 & 1 \\ \end{array} } \right.$$
(4)
Fig. 5
figure 5

Markov state transition diagram of ‘one-out-of-two’ system incorporating CCF based on the β factor model

The Markov state transition β-factor model and its associated state transition matrix are used to enhance the ‘one-out-of-two’ Markov diagram state model depicted in Fig. 3, to investigate the impact of CCF on the system reliability performance considering imperfect repair factors. The integration of the CCF effect on the ‘one-out-of-two’ model with imperfect repairs and limited system diagnostic coverage is presented in Sect. 5.

Modelling imperfect repairs and CCFs

The ‘one-out-of-two’ system model presented in Sect. 3 is enhanced by incorporating CCF using the β-factor model described by (4) for investigating the impact of imperfect repairs at different CCF levels. Figure 6 depicts the Markov ‘one-out-of-two’ system transition probability diagram with imperfect repairs and CCF [8, 45, 56]. The associated transition matrix of the model depicted in the transition diagram of Fig. 6 is given by:

$$P = \left[ {\begin{array}{*{20}c} {1 - \left( {1 - \beta } \right)_{A} - \left( {1 - \beta } \right)_{B} - \beta f\left( {\lambda_{A} ,\lambda_{B} } \right)} \\ {\mu_{A} e_{dcA} r_{effA} } \\ {\mu_{B} e_{dcB} r_{effB} } \\ 0 \\ \end{array} } \right.\begin{array}{*{20}c} {\left( {1 - {{\beta}}} \right)_{{{A}}} } \\ {1 - {{\mu}}_{{{A}}} {{e}}_{{{{dcA}}}} {{r}}_{{{{effA}}}} - ({{\lambda}}_{{{B}}} + {{\mu}}_{{{A}}} \left( {1 - {{e}}_{{{{dcA}}}} } \right))} \\ 0 \\ 0 \\ \end{array} \begin{array}{*{20}c} {\left( {1 - {{\beta}}} \right){{\lambda}}_{{{B}}} } \\ 0 \\ {1 - {{\mu}}_{{{B}}} {{e}}_{{{{dcB}}}} {{r}}_{{{{effB}}}} - \left( {{{\lambda}}_{{{A}}} + {{\mu}}_{{{B}}} \left( {1 - {{e}}_{{{{dcB}}}} } \right)} \right)} \\ 0 \\ \end{array} \left. {\begin{array}{*{20}c} {\beta f\left( {\lambda_{A} ,\lambda_{B} } \right)} \\ {\lambda_{B} + \mu_{A} \left( {1 - e_{dcA} } \right)} \\ {\lambda_{A} + \mu_{B} \left( {1 - e_{dcB} } \right)} \\ 1 \\ \end{array} } \right]$$
(5)

Equation (5) enables the investigation of system reliability performance analysis by observing the number of mean system state transitions at various levels of CCFs, depending on the selected value of the β parameter [8, 21]. The model's flexibility to incorporate various factors allows the effectiveness of the CCF factors on system reliability performance to be determined at different levels of imperfect repairs (viz. quality of repairs as discussed in [8]).

Fig. 6
figure 6

System transition diagram of the ‘one-out-of-two’ with imperfect repairs and CCF based on Markov process

Henceforth, the subsystems are assumed not to be entirely independent. This is to improve the accuracy of the reliability performance evaluation results, except in exceptional cases where β is set to zero to represent the non-existence of CCF in the system [21].

Sensitivity and elasticity of system performance to common cause failures

The sensitivity of the system reliability performance to CCF can be determined by investigating the fundamental matrix's responsiveness to different CCFs levels. Given the transition probability matrix \({\varvec{P}}\), the fundamental matrix \(N\) is given by [8, 42, 45, 57]:

$${\varvec{N}} = \left( {{\varvec{I}} - {\varvec{Q}}} \right)^{ - 1}$$
(6)

The identity matrix \({\varvec{I}}\) represents the number of recurrent system states, and \({\varvec{Q}}\) represents the probabilities of the transient system state [41, 58, 59]. It can be shown that the sensitivity and elasticity of the fundamental matrix are given by (7) and (8) using matrix calculus methods [58, 60, 61], where \(R\) is a vector of elements of interest and \({\mathcal{D}}\left( X \right)\) is a matrix whose diagonal entries are the elements of vector \(X\).

$$\frac{{d{\mathrm{vec }}{\varvec{N}}}}{{d{\mathrm{vec }}R^{{\mathrm{T}}} }} = ({\varvec{N}}^{{\mathrm{T}}} \otimes {\varvec{N}})\frac{{d{\mathrm{vec }}{\varvec{Q}}}}{{d{\mathrm{vec }}R^{{\mathrm{T}}} }}$$
(7)
$$\frac{{\upvarepsilon {\mathrm{vec }}{\varvec{N}}}}{{\upvarepsilon R^{{\mathrm{T}}} }} = {\mathcal{D}}\left( {{\mathrm{vec }}{\varvec{N}}} \right)^{ - 1} \frac{{d{\mathrm{vec }}{\varvec{N}}}}{{dR^{{\mathrm{T}}} }} {\mathcal{D}}\left( R \right)$$
(8)

The stochastic probability matrix \({\varvec{P}}\) of the system depicted in Fig. 6 is given in (9) in its lower-level form, while the transient probability matrix \({\varvec{Q}}\) of the system depicted in Fig. 6 is given in (10) based on \({\varvec{P}}\) given in (9).

$$\begin{aligned} {\varvec{P}}&=\left[\begin{array}{c}\frac{1-{{\varvec{P}}}_{12}-{{\varvec{P}}}_{13}-{{\varvec{P}}}_{14}}{\left(1-\beta \right){\uplambda }_{A}+\left(1-\beta \right){\uplambda }_{B}+\beta f({\uplambda }_{A},{\uplambda }_{B})}\\ \frac{{\upmu }_{A}{e}_{dcA}{r}_{effA}}{{\upmu }_{A}{e}_{dcA}{r}_{effA}+{(\uplambda }_{B}+{\upmu }_{A}-{\upmu }_{A}{e}_{edcA})}\\ \frac{{\upmu }_{B}{e}_{dcB}{r}_{effB}}{{\upmu }_{B}{e}_{dcB}{r}_{effB}+{(\uplambda }_{A}+{\upmu }_{B}-{\upmu }_{B}{e}_{edcA})}\\ 0\end{array} \right. \begin{array}{c}\frac{\left(1-\beta \right){\lambda }_{A}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f\left({\lambda }_{A},{\lambda }_{B}\right)}\\ \frac{1-{{\varvec{P}}}_{21}-{{\varvec{P}}}_{24}}{{\upmu }_{A}{e}_{dcA}{r}_{effA}+{(\uplambda }_{B}+{\upmu }_{A}-{\upmu }_{A}{e}_{edcA})}\\ 0\\ 0\end{array}\\ &\quad \begin{array}{c}\frac{\left(1-\beta \right){\lambda }_{B}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f({\lambda }_{A},{\lambda }_{B})}\\ 0\\ \frac{1-{{\varvec{P}}}_{31}-{{\varvec{P}}}_{34}}{{\uplambda }_{A}+{\upmu }_{B}\left(1-{e}_{edcB}\right)+{\upmu }_{B}{e}_{dcB}{r}_{effB}}\\ 0\end{array}\left. \begin{array}{c}\frac{\beta f({\lambda }_{A},{\lambda }_{B})}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f({\lambda }_{A},{\lambda }_{B})}\\ \frac{{\uplambda }_{B}+{\upmu }_{A}(1-{e}_{edcA})}{{\uplambda }_{B}+{\upmu }_{A}\left(1-{e}_{edcA}\right)+{\upmu }_{A}{e}_{dcA}{r}_{effA}}\\ \frac{{\uplambda }_{A}+{\upmu }_{B}(1-{e}_{edcB})}{{\uplambda }_{A}+{\upmu }_{B}\left(1-{e}_{edcB}\right)+{\upmu }_{B}{e}_{dcB}{r}_{effB}}\\ 1\end{array} \right] \end{aligned}$$
(9)
$$\begin{aligned} {\varvec{Q}}&=\left[\begin{array}{c}\frac{1-{{\varvec{P}}}_{12}-{{\varvec{P}}}_{13}-{{\varvec{P}}}_{14}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f({\lambda }_{A},{\lambda }_{B})}\\ \frac{{\upmu }_{A}{e}_{dcA}{r}_{effA}}{{\upmu }_{A}{e}_{dcA}{r}_{effA}+{(\uplambda }_{B}+{\upmu }_{A}-{\upmu }_{A}{e}_{edcA})}\\ \frac{{\upmu }_{B}{e}_{dcB}{r}_{effB}}{{\upmu }_{B}{e}_{dcB}{r}_{effB}+{(\uplambda }_{A}+{\upmu }_{B}-{\upmu }_{B}{e}_{edcA})}\end{array} \right.\\ &\quad \begin{array}{c}\frac{\left(1-\beta \right){\lambda }_{A}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f\left({\lambda }_{A},{\lambda }_{B}\right)}\\ \frac{1-{{\varvec{P}}}_{21}-{{\varvec{P}}}_{24}}{{\upmu }_{A}{e}_{dcA}{r}_{effA}+{(\uplambda }_{B}+{\upmu }_{A}-{\upmu }_{A}{e}_{edcA})} \\ 0\end{array} \\ &\quad \left. \begin{array}{c}\frac{\left(1-\beta \right){\lambda }_{B}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f\left({\lambda }_{A},{\lambda }_{B}\right)}\\ 0\\ \frac{1-{{\varvec{P}}}_{31}-{{\varvec{P}}}_{34}}{{\uplambda }_{A}+{\upmu }_{B}\left(1-{e}_{edcB}\right)+{\upmu }_{B}{e}_{dcB}{r}_{effB}}\end{array} \right] \end{aligned}$$
(10)

It follows that the vector arrangement of the transient matrix \({\varvec{Q}}\) is given by [42, 43, 58]:

$$\mathrm{vec}{\varvec{Q}}=\left[\begin{array}{c}\frac{1-{{\varvec{P}}}_{12}-{{\varvec{P}}}_{13}-{{\varvec{P}}}_{14}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f({\lambda }_{A},{\lambda }_{B})}\\ \frac{{\upmu }_{A}{e}_{dcA}{r}_{effA}}{{\upmu }_{A}{e}_{dcA}{r}_{effA}+{(\uplambda }_{B}+{\upmu }_{A}-{\upmu }_{A}{e}_{edcA})}\\ \frac{{\upmu }_{B}{e}_{dcB}{r}_{effB}}{{\upmu }_{B}{e}_{dcB}{r}_{effB}+{(\uplambda }_{A}+{\upmu }_{B}-{\upmu }_{B}{e}_{edcA})}\\ \frac{\left(1-\beta \right){\lambda }_{A}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f\left({\lambda }_{A},{\lambda }_{B}\right)}\\ \frac{1-{{\varvec{P}}}_{21}-{{\varvec{P}}}_{24}}{{\upmu }_{A}{e}_{dcA}{r}_{effA}+{(\uplambda }_{B}+{\upmu }_{A}-{\upmu }_{A}{e}_{edcA})}\\ 0\\ \frac{\left(1-\beta \right){\lambda }_{B}}{\left(1-\beta \right){\lambda }_{A}+\left(1-\beta \right){\lambda }_{B}+\beta f\left({\lambda }_{A},{\lambda }_{B}\right)}\\ 0\\ \frac{1-{{\varvec{P}}}_{31}-{{\varvec{P}}}_{34}}{{\uplambda }_{A}+{\upmu }_{B}\left(1-{e}_{edcB}\right)+{\upmu }_{B}{e}_{dcB}{r}_{effB}}\end{array}\right]$$
(11)

Differentiating (11) with respect to CCF factor β gives:

$$\frac{{dvec{\varvec{Q}}}}{d\beta } = \left[ {\begin{array}{*{20}c} 0 \\ 0 \\ 0 \\ {\frac{{ - \lambda_{A} Q_{dn} + \lambda_{A} \left( {1 - \beta } \right)(\lambda_{A} + \lambda_{B} - f\left( {\lambda_{A} ,\lambda_{B} } \right)}}{{Q_{dn}^{2} }}} \\ 0 \\ 0 \\ {\frac{{ - \lambda_{B} Q_{dn} + \lambda_{B} \left( {1 - \beta } \right)(\lambda_{A} + \lambda_{B} - f\left( {\lambda_{A} ,\lambda_{B} } \right)}}{{Q_{dn}^{2} }}} \\ 0 \\ 0 \\ \end{array} } \right]$$
(12)

where \(Qdn\) is:

$$Qdn = \left( {1 - \beta } \right)_{A} + \left( {1 - \beta } \right)_{B} + \beta f\left( {\lambda_{A} ,\lambda_{B} } \right)$$
(13)

Substituting (17) into (12) and (13) enables the system reliability performance evaluation by careful observation of the system sensitivity and elasticity to CCFs. The notation and basics of calculus techniques applied in this paper are discussed in [42, 43, 58].

Case study results and discussions

This section presents the results and analysis of the impact of CCF on the reliability performance of the ‘one-out-of-two’ system configuration depicted in Fig. 6. The impact of CCF is investigated for the three levels of diagnostic coverage presented in ISO 13849-1. Table 2 presents the different system diagnostic coverage levels [62,63,64,65].

Table 2 Levels of diagnostic coverage and range

The following assumptions are made to ease the analysis of the case study results, recognizing that simulation and analysis of different subsystem repair efficiency levels and diagnostic coverage are possible considering a system with partial failure resulting in either subsystem A or B being unavailable.

  1. (a)

    The two subsystems are of the same technology, hence they have the same diagnostic capability.

  2. (b)

    Identical resources support both subsystems such that equal repair efficiencies are applied to them.

  3. (c)

    The system is operational and without partial failures at the beginning of the simulation.

Even though the system is assumed to be operational and without partial failures at the beginning of the simulation, any system state can be selected as the system's initial state assuming a partial failure has occurred in either subsystem A or B. Figure 7 depicts the system transition probability heatmap at 90% diagnostic coverage and 95% level of repair efficiency. Selecting a level below 100% acknowledges that 100% repair efficiency is unlikely to be achieved. The CCF level β is considered at 10% to illustrate the system's characteristic behaviour.

Fig. 7
figure 7

Transition probability matrix heatmap with CCFs

In contrast to the system configuration discussed in Sect. 3, the system under consideration can transition into either states S-2, S-3 or S-4, with equal probability of transitioning into either state S-2 or S-3 considering S-1 as the initial state. Thereafter, the system will transition back to state S-1 except if it has transitioned into state S-4, which is the system's failsafe recurrent state. The likelihood that the system moves to state S-4 is relatively low, at about 0.05. This condition implies that the system is likely to move between states S-1, S-2 and S-3 before moving to state S-4.

However, the system can transition to state S-4 at any time if one or more of the dependent failures occur. Hence, the system performance analysis under consideration investigates the mean state transitions before failure as the state transitioning characteristics in its transient state.

High diagnostic coverage

The system diagnostic coverage level is assumed to be 99%, whereas its repair efficiency is 95%. Figure 8 depicts the reliability of the ‘one-out-of-two’ system shown in Fig. 6 for different levels of CCF represented by the β-factor. It can be observed from Fig. 8 that the system has the highest reliability performance level when the β-factor is zero, as a zero β-factor represents a condition where the subsystems A and B are assumed to be entirely independent of each other. Hence, the probability of the two subsystems A and B simultaneously failing is improbable. Nevertheless, the system reliability rapidly decreases with increasing CCF as the failure probability due to CCF increases, represented by the direct state transition from state S-1 to S-4.

Fig. 8
figure 8

Reliability performance at high system diagnostic coverage level and 95% repair efficiency

The results also indicate that the reliability performance is sensitive to changes at low levels of the β-factor. Moreover, the change in the system probability performance curves can be precisely associated with different levels of mean state transitions, which in turn represents a change in system reliability level. Figure 9 depicts the reliability of the system when its subsystems have low repair efficiencies. The much-reduced level of repair effectiveness represents a high level of incomplete and/or incorrect repairs carried out on the system. The scenario’s objective is to investigate the impact of CCF on system reliability performance when the quality of repairs is deficient. Hence, the repair efficiency of the individual subsystems is considered as 50% for simulation purposes.

Fig. 9
figure 9

Reliability performance at high system diagnostic coverage level and 50% repair efficiency

It is noticeable that the impact of CCF is relatively low for changes of the β-factor compared with that in the previous scenario. The impact also reduces as the level of CCF increases, as was the case with 95% repair efficiency. As expected, the system reliability becomes zero at fewer time steps, as seen in Fig. 9 for the different levels of CCF represented by the β-factor. However, CCF appear to have a smaller impact on system reliability at low repair efficiency levels than when efficiency is high. The system behaviour can be attributed to reducing the repair rates of the subsystems, which reduces the likelihood of the system moving from states S-2 and S-3 back to S-1, whereas the likelihood of the system moving to state S-4 increases.

Figure 10 depicts the mean state transitions at various levels of CCFs. It is notable that the mean system state transitions are highly sensitive to changes of β-factor, particularly at low levels of β. This indicates that the presence of CCF significantly reduces the performance regardless of the CCF level. This is similar to the various repair efficiency levels.

Fig. 10
figure 10

Mean system state transitions with imperfect repairs—99% diagnostic coverage

Medium diagnostic coverage

The system diagnostic capability is assumed as 90%, whereas the repair efficiency remains unchanged at 95%. Figure 11 depicts the reliability of the considered system for different levels of CCF represented by the β-factors. Again, it is noticeable from Fig. 11 that the system maintains the highest reliability performance level when the β-factor is zero, as in the scenario when the coverage was 99%. As expected, the reliability decreases with increasing CCF level as the system failure probability increases. The reliability of the system becomes zero at much lower system state transitions when more system faults remain hidden, than at the high diagnostic coverage of 99%.

Fig. 11
figure 11

Reliability performance at medium system diagnostic coverage level and 95% repair efficiency

Figure 12 depicts the system reliability when its subsystems have low repair efficiencies of 50%. It can be seen that the impact of CCF is relatively uniform for the levels of the β-factors, and the relative impact is less than the scenario with repair efficiency of 95%. The impact also reduces uniformly as the level of CCF increases, as was the case with repair efficiency of 95%.

Fig. 12
figure 12

Reliability performance at medium system diagnostic coverage level and 50% repair efficiency

As expected, the system reliability becomes zero at fewer time steps, as depicted in Fig. 12 for the different levels of CCF represented by the β-factor levels. However, the impact of CCF appears to have a smaller effect on system reliability at low repair efficiency levels than the high level of 99%.

The system behaviour can be attributed to the reduction in the subsystem repair rates, which reduces the likelihood of the system moving from states S-2 and S-3 back to S-1, whereas the likelihood of the system moving to state S-4 increases. Figure 13 depicts the system mean state transitions at various CCF levels. As seen, the mean number of state transitions of the system is marginally sensitive to the changes of the β-factor level as expected. This observation is the same for the different levels of system repair efficiency as was the case with high coverage of 99% even though the number of transitions has significantly reduced, particularly at low levels of β.

Fig. 13
figure 13

Mean system state transitions with imperfect repairs – 90% diagnostic coverage

Low diagnostic coverage

The system diagnostic coverage level is assumed to be 60% for this case study. Initially, the repair efficiency is 95%, as in the previous case studies. Figure 14 depicts the reliability of the system for different levels of CCF represented by the β-factor levels. It is noticeable again that the system has the highest reliability performance level when the β-factor is zero, as in the previous case studies with 99% and 90% diagnostic coverage levels.

Fig. 14
figure 14

Reliability performance at low system diagnostic coverage level and 95% repair efficiency

Contrary to the results obtained when the diagnostic coverages were at 99% and 90%, the system reliability only decreases marginally with increasing CCF.

Moreover, the reliability becomes zero at only 20 transitions compared to 950 and 90 transitions when the system diagnostic coverages were at 99% and 90% for β = 0, respectively, as more system faults remain hidden. In addition, the system is characterised by low sensitivity to changes in β levels. Figure 15 depicts the system reliability when its subsystems have low repair efficiencies of 50%.

Fig. 15
figure 15

Reliability performance at low system diagnostic coverage level and 50% repair efficiency

The impact of CCF is relatively lower for changes in the β-factor level than with 95% repair efficiency. Again, the impact also increases as the level of CCF increases. The system reliability becomes zero at fewer time steps, as depicted in Fig. 15 for the different levels of CCF represented by the β-factor levels. Moreover, the impact of CCF on system reliability appears to be proportionally the same at all repair efficiency levels. The system behaviour can be attributed to the reduction in the repair rates of the subsystems. This reduces the likelihood of the system moving from states S-2 and S-3 back to S-1, whereas the likelihood of the system moving to state S-4 increases. Figure 16 depicts the mean state transitions at various CCF levels, indicating that they are relatively insensitive to the changes of the β-factor levels.

Fig.16
figure 16

Mean system state transitions with imperfect repairs – 60% diagnostic coverage

Sensitivity of system reliability

This section presents the sensitivity and elasticity analysis results of the system performance considering mean transitions based on an absorbing Markov chain process and calculus inferences. The symbol Sxy represents transitions into state S-y when the system’s initial state condition is S-x.

High diagnostic coverage

Figure 17 depicts the system responsiveness to CCF based on sensitivity and elasticity when the diagnostic coverage is 99% for β = 0.1 and β = 0.5. It can be observed in Fig. 17a that the state mean transitions into state S-1 is the most sensitive at − 139.7, when the level of CCF level is 10%.

Fig. 17
figure 17

Sensitivity and elasticity of system to CCF—High diagnostic coverage

The negative magnitude indicates that the incremental change in the CCF level causes the system’s mean state transitions to decrease, which implies that the system reliability performance decreases. Again, it is noticeable that state S-1 is the most sensitive when β = 0.5, as depicted in Fig. 17b. However, the magnitude of the state transition sensitivity is reduced by − 7.6. Although state S-1 has the highest sensitivity, its elasticity is the least compared to moving into S-2 and S-3. This observation is similar for the two CCF levels. Nevertheless, the results depicted in Fig. 17 indicate that the system reliability performance is most sensitive to low β-factor levels when the diagnostic coverage is high.

Medium diagnostic coverage

Figure 18 depicts the system responsiveness to CCF based on sensitivity and elasticity when the coverage is 90% for β = 0.1 and β = 0.5. It can be seen in Fig. 18a that the system state mean transitions into state S-1 are the most sensitive at − 21.5 when β = 0.1. Again, the negative magnitude indicates that the incremental change in the CCF level causes the mean system state transitions to decrease, which implies that the system reliability performance decreases when the level of CCF increases. Similar to the previous scenario, it is noticeable in Fig. 18b that state S-1 remains sensitive when β = 0.5 even though the diagnostic coverage is reduced.

Fig. 18
figure 18

Sensitivity and elasticity of system to CCF—Medium diagnostic coverage

However, the magnitude of the state transition sensitivity is reduced further to − 4.9. The elasticity of state S-1 transitions is the least compared to moving into S-2 and S-3. This observation is similar for the two CCF levels at β = 0.1 and β = 0.5. The results confirm that the system performance is most sensitive to low β-factor levels when the diagnostic coverage is medium.

Low diagnostic coverage

Figure 19 depicts the system responsiveness to CCF based on sensitivity and elasticity when the coverage is 60% for β = 0.1 and β = 0.5. The system state mean transitions into state S-1 are the most sensitive at − 1.7 when β = 0.1, as depicted in Fig. 19a. Again, the system reliability performance decreases when the level of CCF increases. Similar to the two previous scenarios, it is noticeable in Fig. 19b that state S-1 remains sensitive when β = 0.5 even though the diagnostic coverage is reduced further to 60%.

Fig. 19
figure 19

Sensitivity and elasticity of system to CCF—low diagnostic coverage

However, the magnitude of the state transition sensitivity is marginally reduced to − 1.4. The elasticity of state S-1 transitions is consistently the least compared to those moving into S-2 and S-3. This observation is the same for the two CCF levels at β = 0.1 and β = 0.5. Again, the results confirm that the system performance is most sensitive to low β-factor levels.

Conclusions

The integration of the β-factor model into the Markov reliability model enhances the model flexibility in investigating various system cases, enabling the impact of CCF to be studied at different imperfect repairs levels (viz. repair efficiency and system diagnostic coverage). The Markov process provides a comprehensive method of evaluating the system performance’s responsiveness and effectiveness to incremental CCF levels based on sensitivity and elasticity analysis studies. The case study results indicate that the existence of CCF significantly reduces system reliability performance. The most significant impact on system reliability is observed at low levels of CCF represented by small changes of the β-factor magnitude, whereas the highest level of impact is noticeable when the system diagnostic coverage is 99% based on ISO 13849-1. This reduces as the level of diagnostic coverage reduces. The characteristic impact of CCF is relatively similar for a given level of system diagnostic coverage and repair efficiency, as demonstrated by the case study results. Therefore, it is concluded that the severity of CCF depends more on system diagnostic coverage level than the repair efficiency as evidenced in the sensitivity and elasticity studies, even though both factors impact the system overall performance.

This system response is evident from the case study results where the system sensitivity based on mean state transitions to CCF of 10% is − 281 while its elasticity is − 2.74, assuming 99% system diagnostic coverage. Similar behaviour is observed when the diagnostic coverage is 90%, where the system sensitivity is − 45.5 at 10% CCF while its elasticity is − 1.11. Sensitivity of − 4.5 and elasticity of − 0.33 of the system are observed with 60% diagnostic coverage. Overall, the system sensitivity is decreased by 84% when the diagnostic coverage is reduced from 99 to 90%, and by 90% when the diagnostic coverage reduces from 90 to 60%. The system response is similar when the CCF is 50%, and its sensitivity decreases by 32% for the diagnostic coverage reducing from 99 to 90%, and by 63% when the diagnostic coverage reduces from 90 to 60%. The system elasticity indicates the effectiveness of managing the CCF level as presented in the results.

Hence, the impact of CCF must be considered in developing reliability models of a mission-critical system to determine system performance accurately. Future research will consider diversifying the scheme channels to minimise CCF impact on the scheme reliability and employ a multiple beta factor model to determine the impact of the individual channels. The research will also consider the use of global sensitivity analysis methods. Future research will also focus on the generalisation of the findings to a KooN system.

Availability of data and materials

Not applicable.

Abbreviations

CCF:

Common Cause Failure

CIT:

Conventional Instrument Transformers

E/E/PE:

Electrical, Electronic and Programmable Electronic

HSR:

Highly Available Seamless Redundancy

IED:

Intelligent Electronic Device

MTTR:

Mean Time To Repair

MTTF:

Mean Time To Failure

MU:

Merging Unit

NCIT:

Non-Conventional Instrument Transformers

OPNET:

Optimised Network Engineering Tool

PRP:

Parallel Redundancy Protocol

RBD:

Reliability Block Diagram

RSTP:

Rapid Spanning Tree Protocol

SCN:

Substation Communication Network

References

  1. Yang, X., Das, N., & Islam, S. (2014). Analysis of IEC 61850 for a reliable communication system between substations. In 2013 Australasian universities power engineering conference (AUPEC), 2014, no. October (pp. 1–6). https://doi.org/10.1109/aupec.2013.6725482.

  2. Mathebula, V. C., & Saha, A. K. (2020). Mission critical safety functions in IEC-61850 based substation automation system: A reliability review. International Journal of Engineering Research in Africa, 48, 149–161. https://doi.org/10.4028/www.scientific.net/jera.48.149

    Article  Google Scholar 

  3. Brand, K. P., Ostertag, M., & Wimmer, W. (2003). Safety related, distributed functions in substations and the standard IEC 61850. In 2003 IEEE Bologna PowerTech—conference proceedings, 2003 (Vol. 2, No. July, pp. 260–264). https://doi.org/10.1109/PTC.2003.1304319.

  4. Magro, M. C., Pinceti, P., & Rocca, L. (2016). Can we use IEC 61850 for safety related functions? In EEEIC 2016—international conference on environment and electrical engineering (pp. 1–6). https://doi.org/10.1109/EEEIC.2016.7555402.

  5. Caserza Magro, M., Pinceti, P., Rocca, L., & Rossi, G. (2019). Safety related functions with IEC 61850 GOOSE messaging. International Transactions On Electrical Energy Systems, 104, 515–523. https://doi.org/10.1016/j.ijepes.2018.07.033

    Article  Google Scholar 

  6. Lloyd, M. H., & Reeve, P. J. (2009). IEC 61508 and IEC 61511 assessments—some lessons learned. In 4th IET international conference on systems safety 2009. incorporating the SaRS annual conference. IET, London. https://doi.org/10.1049/cp.2009.1540.

  7. Zhang, Y., Sprintson, A., & Singh, C. (2012). An integrative approach to reliability analysis of an IEC 61850 digital substation. IEEE Power and Energy Society General Meeting. https://doi.org/10.1109/PESGM.2012.6345699

    Article  Google Scholar 

  8. Mathebula, V. C., & Saha, A. K. (2020). Reliability and availability of multi-channel IEC-61850 substation communication networks for mission-critical applications. International Journal of Engineering Research in Africa, 51, 199–216. https://doi.org/10.4028/www.scientific.net/JERA.51.199

    Article  Google Scholar 

  9. De Klerk, M. L., & Saha, A. K. (2020). A review of the methods used to model traffic flow in a substation communication network. IEEE Access, 8, 204545–204562. https://doi.org/10.1109/access.2020.3037143

    Article  Google Scholar 

  10. Bukowski, J. V., & Chalupa, R. (2010). Calculating an appropriate multiplier for βλ when modeling common cause failure in triplex systems. In Proceedings—annual reliability and maintainability symposium (RAMS). San Jose, CA, USA. https://doi.org/10.1109/RAMS.2010.5447996.

  11. Belland, J. R. (2017). Modeling common cause failures in diverse components with fault tree applications. In 2017 Proceedings—annual reliability and maintainability symposium (RAMS). IEEE, Orlando, FL, USA. https://doi.org/10.1109/RAM.2017.7889659

  12. Xing, L., & Wang, W. (2008). Probabilistic common-cause failures analysis. In 2008 Proceedings—annual reliability and maintainability symposium. IEEE, Las Vegas, NV, USA. https://doi.org/10.1109/RAMS.2008.4925821.

  13. Kumar, D., Pahuja, G. L., & Quamara, J. K. (2018). Chemical reactor safety system reliability under common cause failure. In 2018 3rd IEEE international conference on recent trends in electronics, information & communication technology (RTEICT) (pp. 2534–2537). IEEE, Bangalore, India, India. https://doi.org/10.1109/RTEICT42901.2018.9012319.

  14. Kumar, M., Kabra, A., Karmakar, G., & Marathe, P. P. (2015). A review of defences against common cause failures in reactor protection systems. In 2015 4th international conference on reliability, Infocom technologies and optimization: trends and future directions, ICRITO 2015 (pp. 1–6). IEEE, Noida, India. https://doi.org/10.1109/ICRITO.2015.7359232.

  15. Muhammad, Q., Amjad, N., & Zubair, M. (2014) Modeling of common cause failures (CCFs) by using beta factor parametric model. IEEE, Islamabad, Pakistan. https://doi.org/10.1109/ICESP.2014.7347004.

  16. Pourali, M. (2014). Incorporating common cause failures in mission-critical facilities reliability analysis. IEEE Transactions on Industry Applications, 50(4), 2883–2890. https://doi.org/10.1109/TIA.2013.2295472

    Article  Google Scholar 

  17. Qin, J., Gu, R., & Li, G. (2017). Reliability modeling of incomplete common cause failure systems subject to two common causes. In 2017 IEEE international conference on industrial engineering and engineering management (IEEM). IEEE, Singapore, Singapore (pp. 1906–1910). https://doi.org/10.1109/IEEM.2017.8290223.

  18. Hokstad, P., & Rausand, M. (2008). Common cause failure modeling: Status and trends. Springer.

    Google Scholar 

  19. Zhang, A., Srivastav, H., Barros, A., & Liu, Y. (2021). Study of testing and maintenance strategies for redundant final elements in SIS with imperfect detection of degraded state. Reliability Engineering and System Safety. https://doi.org/10.1016/j.ress.2020.107393

    Article  Google Scholar 

  20. Shao, Q., Yang, S., Bian, C., & Gou, X. (2020). Formal analysis of repairable phased-mission systems with common cause failures. IEEE Transactions on Reliability, 10, 1–12. https://doi.org/10.1109/tr.2020.3032178

    Article  Google Scholar 

  21. Winkovich, T., & Eckardt, D. (2005). Reliability analysis of safety systems using Markov-chain modelling. In 2005 European conference on power electronics and applications. IEEE, Dresden, Germany, Germany. https://doi.org/10.1109/epe.2005.219620.

  22. Das, N., Islam, S. (2015). Analysis of power system communication architectures between substations using IEC 61850. In 5th Brunei international conference on engineering and technology (BICET 2014) (pp. 1.06 (6 .)-1.06 (6 .)). https://doi.org/10.1049/cp.2014.1060.

  23. Das, N., Ma, W., & Islam, S. (2015). Analysis of end-to-end delay characteristics for various packets in IEC 61850 substation communications system. In 2015 Australasian universities power engineering conference (AUPEC). IEEE, Wollongong, NSW, Australia. https://doi.org/10.1109/AUPEC.2015.7324831.

  24. Wong, T. J., & Das, N. (2014). Modelling and analysis of IEC 61850 for end-to-end delay characteristics with various packet sizes in modern power substation systems. https://doi.org/10.1049/cp.2014.1073.

  25. Kumar, S., Das, N., & Islam, S. (2014). Performance analysis of substation automation systems architecture based on IEC 61850. https://doi.org/10.1109/AUPEC.2014.6966532.

  26. Khavnekar, A., Wagh, S., & More, A. (2015). Comparative analysis of IEC 61850 edition-I and II standards for substation automation. In 2015 IEEE international conference on computational intelligence and computing research, no. Iccic (pp. 1–6). https://doi.org/10.1109/ICCIC.2015.7435756.

  27. Suhail Hussain, S. M., Aftab, M. A., & Ali, I. (2016). A novel PRP based deterministic, redundant and resilient IEC 61850 substation communication architecture. Perspectives in Science, 8, 747–750. https://doi.org/10.1016/j.pisc.2016.06.077

    Article  Google Scholar 

  28. Araujo, J. Á., Lázaro, J., Astarloa, A., Zuloaga, A., & Gárate, J. I. (2015). PRP and HSR for high availability networks in power utility automation: A method for redundant frames discarding. Journal of Research, 6(5), 2325–2332.

    Google Scholar 

  29. Mnukwa, S., & Saha, A. K. (2020). SCADA and substation automation systems for the port of durban power supply upgrade. In 2020 international SAUPEC/RobMech/PRASA conference, SAUPEC/RobMech/PRASA 2020. IEEE, Cape Town, South Africa, South Africa (2020). https://doi.org/10.1109/SAUPEC/RobMech/PRASA48453.2020.9041078.

  30. Albarakati, A. et al. (2019). Security monitoring of IEC 61850 substations using IEC 62351-7 network and system management. https://doi.org/10.1109/SmartGridComm.2019.8909710.

  31. Pereira, A. T. A., Lisboa, L. A. C., & Lima, A. M. N. (2016). Strategies and techniques applied to IEC 61850 based DSAS architectures. https://doi.org/10.1049/cp.2016.0009.

  32. Stark, J., Wimmer, W., & Majer, K. (2013). Switchgear optimization using IEC 61850-9-2.

  33. S. Kumar, N. Das, and S. Islam, “High performance communication redundancy in a digital substation based on IEC 62439–3 with a station bus configuration,” 2015, doi: https://doi.org/10.1109/AUPEC.2015.7324838.

  34. Yunus, B., Musa, A., Ong, H. S., Khalid, A. R., Hashim, H. (2008). Reliability and availability study on substation automation system based on IEC 61850. https://doi.org/10.1109/PECON.2008.4762462.

  35. Rahat, R. M., Imam, M. H., & Das, N. (2019). Comprehensive analysis of reliability and availability of sub-station automation system with IEC 61850. https://doi.org/10.1109/ICREST.2019.8644416.

  36. Kanabar, M. G., & Sidhu, T. S. (2009). Reliability and availability analysis of IEC 61850 based substation communication architectures. In 2009 IEEE power and energy society general meeting (pp. 1–8). https://doi.org/10.1109/PES.2009.5276001.

  37. Ngo, H. et al. (201). An improved high-availability seamless redundancy (HSR) for dependable substation automation system.pdf. https://doi.org/10.1109/ICACT.2014.6779094.

  38. Mekkanen, M., Virrankoski, R., Elmusrati, M., & Antila, E. (2013). Reliability evaluation and comparison for next-generation substation function based on IEC 61850 using Monte Carlo simulation. https://doi.org/10.1109/ICCSPA.2013.6487306.

  39. Andersson, L., Brand, K. P., Brunner, C., & Wimmer, W. (2005). Reliability investigations for SA communication architectures based on IEC 61850. In 2005 IEEE Russia Power Tech, PowerTech (pp. 1–7). https://doi.org/10.1109/PTC.2005.4524707.

  40. Mathebula, V. C., & Saha, A. K. (2021). Multi-state IEC-61850 substation communication network based on markov partitions and symbolic dynamics. Sustainable Energy Grids and Networks. https://doi.org/10.1016/j.segan.2021.100466

    Article  Google Scholar 

  41. Mathebula, V. C., & Saha, A. K. (2021). Impact of imperfect repairs and diagnostic coverage on the reliability of multi-channel IEC-61850 substation communication network. IEEE Access, 9, 2758–2769. https://doi.org/10.1109/ACCESS.2020.3047781

    Article  Google Scholar 

  42. Mathebula, V. C., & Saha, A. K. (2021). Responsiveness of multi-channel IEC-61850 substation communication network reliability performance to changes in repair factors. IEEE Access, 9, 789–800. https://doi.org/10.1109/ACCESS.2020.3046950

    Article  Google Scholar 

  43. Mathebula, V. C., & Saha, A. K. (2021). Sensitivity and elasticity of multi-channel IEC-61850 substation communication networks to imperfect repairs. Sustainable Energy Grids and Networks, 26, 20. https://doi.org/10.1016/j.segan.2021.100443

    Article  Google Scholar 

  44. Ding, L., Wang, H., Jiang, J., & Xu, A. (2017). SIL verification for SRS with diverse redundancy based on system degradation using reliability block diagram. Reliability Engineering and System Safety, 165(114), 170–187. https://doi.org/10.1016/j.ress.2017.03.005

    Article  Google Scholar 

  45. Billinton, R., & Allan, R. N. (1984). Reliability evaluation of power systems (2nd ed.). Plenum Publishing Corporation.

    Book  Google Scholar 

  46. Mo, H., Wang, W., Xie, M., & Xiong, J. (2017). Modeling and analysis of the reliability of digital networked control systems considering networked degradations. IEEE Transactions on Automation Science and Engineering, 14(3), 1491–1503. https://doi.org/10.1109/TASE.2015.2443132

    Article  Google Scholar 

  47. Belusso, C. L. M., Sawicki, S., Roos-frantz, F., & Frantz, R. Z. (2016). A study of petri nets, Markov chains and queueing theory as mathematical modelling languages aiming at the simulation of enterprise application integration solutions: a first step. Procedia Comput. Sci., 100, 229–236. https://doi.org/10.1016/j.procs.2016.09.147

    Article  Google Scholar 

  48. Meadows, D. H. (2009). Thinking in systems. Earthscan.

    Google Scholar 

  49. Mkandawire, B. O., Ijumba, N. M., & Saha, A. K. (2015). Component risk trending based on systems thinking incorporating Markov and Weibull inferences. IEEE Systems Journal, 9(4), 1185–1196. https://doi.org/10.1109/JSYST.2014.2363384

    Article  Google Scholar 

  50. Smith, R., Modarres, M. (2020). A physics of failure approach to common cause failure considering component degradation. In 2020 Proceedings—annual reliability and maintainability symposium. IEEE, Palm Springs, CA, USA (pp. 1–6). https://doi.org/10.1109/RAMS48030.2020.9153651.

  51. Mathebula, V. C. (2019). Application of bus transfer schemes of stabilise power supply in a coal fired power plant unit auxiliary reticulation, Durban, South Africa, SE-08, 2019. https://researchspace.ukzn.ac.za/handle/10413/17029.

  52. Mathebula, V. C., Saha, A. K. (2019). Coal fired power plant in-phase bus transfer simulation of forced and induced draught fan motors. In Proceedings—2019 Southern African universities power engineering conference/robotics and mechatronics/pattern recognition association of South Africa, SAUPEC/RobMech/PRASA 2019. IEEE, Bloemfontein, South Africa (pp. 293–298). https://doi.org/10.1109/RoboMech.2019.8704820.

  53. Mathebula, V. C., & Saha, A. K. (2019). Development of In-Phase Bus Transfer Scheme Using Matlab Simulink. In Proceedings—2019 Southern African Universities power engineering conference/robotics and mechatronics/pattern recognition association of South Africa, SAUPEC/RobMech/PRASA 2019, no. 6. IEEE, Bloemfontein, South Africa (pp. 275–280). https://doi.org/10.1109/RoboMech.2019.8704815.

  54. Mkandawire, B. O., Ijumba, N., & Saha, A. K. (2015). Transformer risk modelling by stochastic augmentation of reliability-centred maintenance. Electric Power Systems Research, 119, 471–477. https://doi.org/10.1016/j.epsr.2014.11.005

    Article  Google Scholar 

  55. Yi, H., Cui, L., & Gao, H. (2020). Reliabilities of some multistate consecutive κ systems. IEEE Transactions on Reliability, 69(2), 414–429. https://doi.org/10.1109/TR.2019.2897726

    Article  Google Scholar 

  56. Bukowski, J. V., & Van Beurden, I. (2009). Impact of proof test effectiveness on safety instrumented system performance. In Proceedings—annual reliability and maintainability symposium (pp. 157–163). https://doi.org/10.1109/RAMS.2009.4914668.

  57. Torres, E. S., Sriramula, S., Celeita, D., & Ramos, G. (2020). Reliability model and sensitivity analysis for electrical/electronic/programmable electronic safety-related systems. IEEE Transactions on Industry Applications, 56(4), 3422–3430. https://doi.org/10.1109/TIA.2020.2990583

    Article  Google Scholar 

  58. Caswell, H. (2019). Sensitivity analysis: Matrix methods in demography and ecology. Springer.

    Book  Google Scholar 

  59. Caswell, H. (2013). Sensitivity analysis of discrete Markov chains via matrix calculus. Linear Algebra and its Applications, 438(4), 1727–1745. https://doi.org/10.1016/j.laa.2011.07.046

    MathSciNet  Article  MATH  Google Scholar 

  60. Spiegel, M. R. (1983). Advanced mathematics for engineers and scientists, S.I.ed. Singapore: Schaum’s Outline Series.

  61. Mathebula, V. C., & Saha, A. K. (2021). Impact of quality of repairs and common cause failures on the reliability performance of intra-bay IEC 61850 substation communication network architecture based on Markov and linear dynamical systems. IEEE Access, 9, 112805–112820. https://doi.org/10.1109/ACCESS.2021.3104020

    Article  Google Scholar 

  62. Porras-Vázquez, A., & Romero-Pérez, J. A. (2018). A new methodology for facilitating the design of safety-related parts of control systems in machines according to ISO 13849:2006 standard. Reliability Engineering and System Safety, 174, 60–70. https://doi.org/10.1016/j.ress.2018.02.018

    Article  Google Scholar 

  63. Fukuda, T., Hirayama, M., Kasai, N., & Sekine, K. (2007) Evaluation of operative reliability of safety-related part of control system of machine and safety level. In Proceedings of the SICE annual conference (pp. 2480–2483). https://doi.org/10.1109/SICE.2007.4421406.

  64. Lerévérend, P. (2008). Inside the standardization jungle: IEC 62061 and ISO 13849-1, complementary or competing? In 2008 5th petroleum and chemical industry conference Europe—electrical and instrumentation applications. IEEE, Weimar, Germany. https://doi.org/10.1109/PCICEUROPE.2008.4563534.

  65. Srivastav, H., Barros, A., & Lundteigen, M. A. (2020). Modelling framework for performance analysis of SIS subject to degradation due to proof tests. Reliability Engineering and System Safety, 195, 106702. https://doi.org/10.1016/j.ress.2019.106702

    Article  Google Scholar 

Download references

Acknowledgements

The University of KwaZulu-Natal, School of Engineering, supported this research work.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

VCM: Conceptualization, Methodology, Investigation, Software, Formal analysis, Writing—Original Draft, Review and Editing, Writing—Revised Manuscript, Review and Final Editing. AKS: Supervision, Conceptualization, Writing—Review and Editing. All the authors read and approved the final manuscript.

Corresponding author

Correspondence to Vonani Clive Mathebula.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mathebula, V.C., Saha, A.K. Reliability of IEC 61850 based substation communication network architecture considering quality of repairs and common cause failures. Prot Control Mod Power Syst 7, 13 (2022). https://doi.org/10.1186/s41601-022-00234-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s41601-022-00234-1

Keywords

  • IEC 61850
  • Substation communication network (SCN)
  • Reliability
  • Common cause failure (CCF)
  • Diagnostic coverage
  • Repair efficiency
  • Sensitivity
  • Elasticity
  • Markov
  • Matrix calculus