Skip to main content

A denoising-classification neural network for power transformer protection

Abstract

Artificial intelligence (AI) can potentially improve the reliability of transformer protection by fusing multiple features. However, owing to the data scarcity of inrush current and internal fault, the existing methods face the problem of poor generalizability. In this paper, a denoising-classification neural network (DCNN) is proposed, one which integrates a convolutional auto-encoder (CAE) and a convolutional neural network (CNN), and is used to develop a reliable transformer protection scheme by identifying the exciting voltage-differential current curve (VICur). In the DCNN, CAE shares its encoder part with the CNN, where the CNN combines the encoder and a classifier. Based on the interaction of the CAE reconstruction process and the CNN classification process, the CAE regards the saturated features of the VICur as noise and removes them accurately. Consequently, it guides CNN to focus on the unsaturated features of the VICur. The unsaturated part of the VICur approximates an ellipse, and this significantly differentiates between a healthy and faulty transformer. Therefore, the unsaturated features extracted by the CNN help to decrease the data ergodicity requirement of AI and improve the generalizability. Finally, a CNN which is trained well by the DCNN is used to develop a protection scheme. PSCAD simulations and dynamic model experiments verify its superior performance.

Introduction

A power transformer is a critical element in a power system. The core issue of transformer protection is the discrimination between inrush current and an internal fault. Because of the advantages of simplicity and rapid response, differential protection configured with the second harmonic restraint [1] has been widely used in power systems. However, it can no longer meet the reliability requirement of the increasingly complex power system because the second harmonic is inconsistent with inrush current and an internal fault [1]. With the rapid development of artificial intelligence (AI) [2, 3], many AI-based protection schemes have emerged which fuse multiple features. Recent work is summarized below.

  1. (1)

    The first category directly uses differential current as input to a machine learning (ML) algorithm to identify the operating state. The adopted ML algorithms include artificial neural (ANN) [4,5,6,7], radial basis function neural networks [8, 9], evolving neural nets [10], probabilistic neural network [11, 12], hidden Markov model (HMM) [13], decision tree (DT) [14, 15], random forest (RF) [16], etc. In recent years, deep neural networks have gained a lot of attention for developing transformer protection. Examples are such as the accelerated convolutional neural network (CNN) presented in [17], and the new structure CLGNN in [18] combining a CNN and a light-gated recurrent unit.

  2. (2)

    The second category extracts the data features from differential current first, and then uses them as input to an ML algorithm to identify the operating state. In [19,20,21,22,23,24], various wavelet features of differential current are extracted by wavelet analysis and used as input to the ML algorithms to build transformer protection, such as support vector machine (SVM) [19, 20], ANN [21], DT [22], Gaussian mixed model [23], k-nearest neighbors algorithm [24]. Similarly, reference [25] uses the amplitude features of the primary and secondary currents as input to a finite impulse response artificial neural network to build transformer protection.

Generally, AI demands the training samples to cover almost all the scenarios in a power system. However, the inrush current and internal fault of the on-site transformers are small probability events whose recorded samples are scarce. Therefore, the on-site transformers in a real power system cannot meet the ergodicity requirement of the training samples. As a result, it is difficult for AI-based protection schemes to perform satisfactorily in a real power system. In AI application, to improve the classifier’s performance, many first use a convolutional auto-encoder (CAE) [26] to extract the main features of input data before training the classifier. However, in fact, the features extracted by the CAE are not always helpful because they also face the problem of generalizability when the training samples are scarce.

In summary, to improve the generalizability of AI-based protection schemes, it is critical to decrease the data ergodicity requirement of AI. In this paper, a novel deep neural network called a denoising-classification neural network (DCNN) is proposed and used to develop an AI-based transformer protection scheme by identifying the exciting voltage- differential current curve (VICur) [27,28,29]. Typical VICurs are shown in Fig. 1, including normal operation, healthy transformer energization (inrush current), internal fault, and faulty transformer energization (superposition of inrush and fault currents).

Fig. 1
figure 1

VICurs under various operating states (experiments)

From Fig. 1, the VICurs of internal fault and normal operation approximate ellipses with different features. When the transformer is energized, iron core saturation causes the ellipse to distort irregularly. Thus, the VICurs of both healthy and faulty transformer energization have both an unsaturated and a saturated part, where the unsaturated parts exhibit the same features as normal operation and internal fault. Clearly, the unsaturated features of a VICur differ significantly between a healthy and a faulty transformer. If the adopted ML algorithm can focus on the unsaturated part of the VICur and avoid the influence of the saturated part, the extracted features can be used as the basis for identifying the operating states of the transformer reliably, and are useful for decreasing the ergodicity requirement of the training samples.

The proposed DCNN is a new deep structure integrating a CAE and a CNN. The CAE extracts the unsaturated features of the VICur by reconstructing it as the unsaturated part while regarding the saturated part as noise for removal. It shares its encoder with the CNN, and thus the CNN combines the shared encoder and a classifier and realizes the data classification. During the training process, the DCNN achieves the interaction of the CAE reconstruction and the CNN classification through the shared encoder. Therefore, the CAE effectively guides the CNN to focus on the unsaturated part of the VICur. Finally, by paying attention to the unsaturated part of the VICur, the CNN develops a strong generalizability and is used to build an AI-based transformer protection scheme. To certain extent, the developed protection scheme in this paper can avoid the influences of the saturated features of inrush current and decrease the ergodicity requirement of the training samples. PSCAD simulations and dynamic model experiments verify the superior performance of the proposed transformer protection scheme through comparisons with existing work.

Proposed denoising-classification neural network

The comprehensive features of the VICur can be exhibited by its image. A CNN has a great ability to mine and classify the depth features of the VICur image, and the CAE can guide its encoder part to extract the unsaturated features and remove the saturated features through the reconstruction process of the input. We propose a DCNN structure to realize the interaction of the CNN and CAE. Through the guidance of the CAE, the CNN develops the ability to focus on the unsaturated features of the VICur image. The extracted comprehensive features complement each other to reliably identify the operating states. Finally, the CNN trained by the DCNN is used to build the protection scheme.

Input of DCNN

The input to the DCNN is a VICur image. Its acquisition process involves calculating and normalizing the exciting voltage and differential current, and converting the discrete data to a grayscale image. First, the exciting voltage and differential current are calculated. The exciting voltage U is approximately equal to the primary voltage and the differential current I is the sum of the primary and secondary currents, as:

$$\begin{aligned} U & = [u_{1} , \ldots ,u_{k} , \ldots ,u_{n} ]^{{\text{T}}} \\ I & = [i_{1} , \ldots ,i_{k} , \ldots ,i_{n} ]^{{\text{T}}} \\ \end{aligned}$$
(1)

where \(u_{k}\) and \(i_{k}\) are the kth instantaneous values, and n is the sample number.

The exciting voltage and differential current are then normalized. This aims to limit the VICur into a fixed range without changing the graphic features. Specifically, the exciting voltage and differential current are normalized by using the same maximum and minimum values, as:

$$\left\{ \begin{gathered} u_{k}^{\prime } = 2 \times \left( {u_{k} - q_{\min } } \right)/\left( {q_{\max } - q_{\min } } \right) \hfill \\ i_{k}^{\prime } = 2 \times \left( {i_{k} - q_{\min } } \right)/\left( {q_{\max } - q_{\min } } \right) \hfill \\ \end{gathered} \right.$$
(2)

where \(u_{k}^{{\prime }}\) and \(i_{k}^{{\prime }}\) are the normalized values, \(q_{\min }\) and \(q_{\max }\) are the minimum and maximum values of the vector Q shown as:

$$Q = \left[ {i_{1} , \ldots ,i_{k} , \ldots ,i_{n} ,u_{1} , \ldots ,u_{k} , \ldots ,u_{n} } \right]^{{\text{T}}}$$
(3)

Finally, the normalized VICur in (2) is converted into a grayscale image, which covers the classification information and its size is m × m.

Overview of the proposed DCNN

The proposed DCNN integrates a CAE block and a CNN block which share the encoder, as shown in Fig. 2. During the training process of the DCNN, the CAE block is trained with the objective of minimizing the reconstruction loss, while the CNN block is trained with the objective of minimizing the classification loss. They are then used to build the transformer protection scheme. The details are provided below.

Fig. 2
figure 2

DCNN structure illustrations

CAE block of DCNN

During the feature extraction of the VICur image, the CAE block regards the saturated features as noise for removal and extracts the unsaturated features. Its structure is shown in Fig. 3.

Fig. 3
figure 3

CAE block illustrations

The CAE block consists of an encoder and a decoder. \(X_{{\text{U - I}}}^{{\prime }}\) and \(\tilde{X}_{{\text{U - I}}}^{{\prime }}\) are its input and target output, respectively. The target output is the unsaturated part of the input image. Unit k of the encoder involves calculations of convolution, batch normalization, and activation, and the result is written as \(o_{{{\text{e}}k}}\).

The output of the encoder can be represented by (4), where the symbol “*” indicates the convolution calculation, and r is the encoder’s unit number.

$$h_{{{\text{en}}}} \left( {X_{{\text{U - I}}}^{{\prime }} } \right) = o_{{{\text{e1}}}} * \cdots * o_{{{\text{e}}k}} * \cdots * o_{{{\text{e}}r}}$$
(4)

Likewise, unit k of the decoder involves calculations of deconvolution [30, 31], batch normalization, and activation, and the result is represented as \(o_{{{\text{d}}k}}\). The output of the decoder can be represented by (5), where the symbol “**” indicates the deconvolution calculation, and s is the decoder’s unit number.

$$\begin{aligned} & \tilde{X}_{{\text{U - I}}}^{{\prime }} h_{{{\text{de}}}} \left( {h_{{{\text{en}}}} \left( {X_{{\text{U - I}}}^{{\prime }} } \right)} \right) \\ & \quad = o_{{{\text{d1}}}} * * \cdots * * o_{{{\text{d}}k}} * * \cdots * * o_{{{\text{d}}s}} \\ \end{aligned}$$
(5)

Based on (4) and (5), the reconstruction loss of the CAE block is defined by the mean square error between the actual output and the target output, as:

$$L_{{{\text{CAE}}}} = \frac{1}{{{\text{Dm}}^{2} }}\sum\limits_{{d = 1}}^{{\text{D}}} {\sum\limits_{{u = 1}}^{{\text{m}}} {\sum\limits_{{v = 1}}^{{\text{m}}} {\left( {x^{\prime\prime}_{{duv}} - \tilde{x}^{\prime}_{{duv}} } \right)^{2} } } }$$
(6)

where d indicates the dth training sample, and D is the number of training samples. \(\tilde{x}^{\prime}_{duv}\) and \(x^{\prime\prime}_{duv}\) refer to the pixel values of the target output and actual output in the uth row and vth column, respectively. By minimizing the reconstruction loss in (6), the CAE block reconstructs the VICur image as the unsaturated part, and consequently, the encoder extracts the unsaturated features. Meanwhile, the CNN block of the DCNN is guided by this reconstruction process to develop the ability of focusing on the unsaturated part.

CNN block of DCNN

The CNN block realizes the data classification to identify the operating states of the power transformer. Specifically, it deals with a task of binary classification of the healthy transformer including normal operation/external fault and healthy transformer energization, and faulty transformer including internal fault and faulty transformer energization. Its structure is shown in Fig. 4.

Fig. 4
figure 4

CNN block illustrations

The CNN block consists of the encoder of the CAE block and a classifier. Unit k of the classifier also involves the calculations of convolution, batch normalization, and activation. The result is designated as \(o_{{{\text{c}}k}}\), and therefore the classifier’s output can be written as:

$$\begin{aligned} S\left( {G\left( {h_{{{\text{en}}}} \left( {X^{\prime}_{{\text{U - I}}} } \right)} \right)} \right) & = S\left( {o_{{{\text{c1}}}} * \cdots * o_{{{\text{c}}k}} * \cdots * o_{{{\text{c}}r}} } \right) \\ & = \left[ {P_{d0} ;P_{d1} } \right] \\ \end{aligned}$$
(7)

where the function S(x) is the softmax function, and the output is a 2-dimension vector, whose elements indicate the probability that the dth input image belongs to healthy transformer or faulty transformer, respectively. r is the classifier’s unit number.

Based on (7), the classification loss of the CNN block is defined by the cross-entropy loss, as:

$$L_{{{\text{CNN}}}} = - \sum\limits_{{d \in {\text{M}}_{{\text{h}}} }} {\log (P_{d0} )} - \sum\limits_{{d \in {\text{M}}_{{\text{f}}} }} {\log (P_{d1} )}$$
(8)

where \({\text{M}}_{{\text{h}}}\) and \({\text{M}}_{{\text{f}}}\) refer to the training samples of healthy and faulty transformer, respectively. With the guidance of the CAE block and the objective of minimizing the classification loss in (8), the CNN block develops the ability to focus on the VICur image's unsaturated part.

Loss function of the proposed DCNN

The loss function of the DCNN is the weighted sum of the reconstruction loss and the classification loss, given as:

$$\begin{aligned} L_{{{\text{DCNN}}}} & = \frac{\alpha }{{{\text{Dm}}^{2} }}\sum\limits_{d = 1}^{{\text{D}}} {\sum\limits_{u = 1}^{{\text{m}}} {\sum\limits_{v = 1}^{{\text{m}}} {\left( {x^{\prime\prime}_{duv} - \tilde{x}^{\prime}_{duv} } \right)^{2} } } } \\ & \quad {\kern 1pt} - \beta \left[ {\sum\limits_{{d \in {\text{M}}_{{\text{h}}} }} {\log (P_{d0} )} + \sum\limits_{{d \in {\text{M}}_{{\text{f}}} }} {\log (P_{d1} )} } \right] \\ \end{aligned}$$
(9)

where α and β are the weights of the reconstruction loss and the classification loss, respectively.

Based on the loss function (9), the CAE block and the CNN block interact through the shared encoder. From the reconstruction process of DCNN, the CAE block guides the CNN block to focus on the unsaturated part of the VICur image. Conversely, according to the classification results, the CNN block tests the unsaturated features extracted by the CAE block. Thus, the encoder parameters are determined by the CAE block and the CNN block together. The features extracted by the encoder are suitable for both the ideal reconstruction process and a satisfactory classification process. Therefore they are the optimal features for identifying the operating states of the power transformer. Finally, the CNN block trained by the DCNN has an improved generalization ability and is used to build a reliable protection scheme.

Proposed power transformer protection scheme

Figure 5 shows the procedure of the transformer protection scheme, including the following steps.

  1. (1)

    Calculate the differential current and the exciting voltage according to (1).

  2. (2)

    Identify whether a disturbance occurs through the start-up criterion. The fault components of the exciting voltage and differential current are used to construct the start-up criterion, as:

    $$\begin{aligned} \Delta u_{k} & = \left| {|u_{k} - u_{k - h} | - |u_{k - h} - u_{k - 2h} |} \right| > K_{{\text{U}}} \;or \\ \Delta i_{k} & = \left| {|i_{k} - i_{k - h} | - |i_{k - h} - i_{k - 2h} |} \right| > K_{{\text{I}}} \\ \end{aligned}$$
    (10)

    where k is the kth sampling data, h is the sample number in one cycle, \(K_{{\text{U}}}\) and \(K_{{\text{I}}}\) are thresholds.

  3. (3)

    Obtain the VICur image. Suppose the start-up criterion (10) is met at the sth sampling data, the obtained VICur can be represented as (11), where n is the sample number in the adopted data window. Then, the VICur in (11) is normalized through the method in (2), and is then converted into a grayscale image.

    $$X_{s} = \left[ {\left( {i_{s + 1} ,u_{s + 1} } \right), \ldots ,\left( {i_{s + k} ,u_{s + k} } \right), \ldots ,\left( {i_{s + n} ,u_{s + n} } \right)} \right]^{{\text{T}}}$$
    (11)
  4. (4)

    Identify the operating states of the power transformer. The VICur image of each phase is used as input to the CNN block to determine the operating states of each phase. When at least one phase is identified as “faulty transformer,” the differential relay sends a tripping signal.

Fig. 5
figure 5

Logic diagram of transformer protection scheme

Case study

The training samples are collected from PSCAD simulation systems. To improve and verify the generalizability of the proposed protection scheme, the validation samples are obtained in the simulation system whose parameters are different from that of the training samples. Test samples are collected in the dynamic model experiments. Figure 6 illustrates the equivalent model of the PSCAD simulations and dynamic model experiments.

Fig. 6
figure 6

Model of simulations and experiments

Sample collection

All training samples are obtained from the step-down transformer in the simulation system. The simulation conditions have been given full consideration, provided by Table 5 in the “Appendix”. In Table 5, NO, EF, HTE, IF, and FTE refer to normal operation, external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively.  For example, the energization time is one of the decisive factors for the saturation occurrence and duration time; the faulty turn is an essential factor that determines the differential current, the transformer loss, and the terminal voltage. In addition, the magnetization curves, another factor deciding the saturation features, are provided by Table 6 in the “Appendix”. The sample numbers are 564, 478, 760, and 760 for normal operation/external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively.

The validation samples are obtained from a three-winding transformer whose operational conditions are also shown in Table 5, and its magnetization curve is provided in Table 6. The sample numbers are 225, 80, 160, and 160 for normal operation/external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively.

The test samples are obtained from an experimental transformer consisting of three single transformers. Table 7 in the “Appendix” provides the transformer parameters and the experimental scenarios, e.g., the internal faults are conducted on the primary or secondary side by connecting the contact terminals; the occurrence times of external fault, internal fault, and transformer energization are set randomly; the minimum turn ratio of internal fault is 2.3%, etc. In Table 7, NO, EF, HTE, IF, and FTE refer to normal operation, external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively. The sample numbers are 48, 58, 63, and 54 for normal operation/external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively.

Before training the DCNN and building the protection scheme, the raw samples collected in the simulations and the experiments are processed according to the methods in Sect. 2.1. Because the saturation duration of the iron core is no larger than 10 ms when the residual flux is not considered, a data window of 12–15 ms can contain sufficient unsaturated features. Herein, a data window of 13 ms is adopted, and the size of the VICur image is 50 × 50. The target output of the CAE block is the image of the unsaturated part, and the saturated parts of the input images are deleted manually.

Selection of the optimal DCNN

Considering the reconstruction and classification losses, and the classification accuracy of VICur images comprehensively, we select a relatively optimal structure shown in Fig. 7. The meanings of the characters, w, c, s, and p, have also been given in Fig. 7. “Conv2d”, “ConvTranspose2d”, “ReLU,” and “BatchNorm” refer to 2-dimensional convolution, 2-dimensional deconvolution, ReLU function, and batch normalization, respectively.

Fig. 7
figure 7

Structure of a relatively optimal DCNN

Table 1 summarizes the reconstruction and classification performance of the DCNN with different loss weights. In Table 1, ACC, RecLoss, ClaLoss refer to accuracy, reconstruction loss, and classification loss, respectively. The weight β of the classification loss is set to 1.0, and the weight α of the reconstruction loss ranges from 0 to 2.0. As can be seen from Table 1, the DCNNs with the weights below perform the best (the following results are given in order of the reconstruction loss, classification loss, and classification accuracy):

  1. (1)

    α = 0.4, β = 1.0. The results of training samples are 0.3967, 1.7331, and 99.88%, respectively. The results of validation samples are 0.2299, 2.8638, and 99.84%, respectively.

  2. (2)

    α = 1.5, β = 1.0. The results of training samples are 0.2226, 0.2600, and 99.96%, respectively. The results of validation samples are 0.1702, 5.0990, and 99.68%, respectively.

  3. (3)

    α = 1.7, β = 1.0. The results of training samples are 0.9992, 0.9991, and 99.92%, respectively. The results of validation samples are 0.1803, 0.6478, and 99.84%, respectively.

Table 1 Effects of weights on the performance of the DCNN

The CNN blocks of the DCNNs above are then used as the alternatives to classify the test samples. As expected, they show strong generalization ability, with accuracies of 96.38%, 96.84%, and 96.59%, respectively. Therefore, any one of them is promising for building a reliable protection scheme. In the following section, the DCNN with loss weights of 1.5 and 1.0 is taken as an example to detail the training process and the advantages of the proposed protection scheme.

Training and test process of the selected DCNN

Figure 8 details the training process of the selected DCNN with weights of 1.5 and 1.0. According to Fig. 8a, the reconstruction and classification losses decrease gradually as the iteration time increases. As shown in Fig. 8b, when the iteration progresses to the 100th time, the total losses of the training and validation samples drop to below 0.1 and 20, respectively.

Fig. 8
figure 8

Training and validation results

Figure 9 shows the reconstruction results of several VICur images in the validation samples, indicating that the CAE block of the DCNN extracts the unsaturated features effectively. The classification accuracies of the training and validation samples increase to 99.96% and 99.68%, respectively. It indicates that the CNN block of the DCNN has the desired classification performance. The test samples are used to test the generalizability of the CNN.

Fig. 9
figure 9

Reconstruction results of validation samples

Since they are affected by the complicated transient environment of the experiments, the test samples have different features from the training samples. Because of these different features, the CAE block of the DCNN fails to completely remove the saturated parts of the test samples, as can be seen from the reconstruction results of the partial VICur images in Fig. 10. However, the reconstructed images still contain sufficient unsaturated features because the DCNN has the ability to focus on the unsaturated part of the VICur image. In particular, for normal operation/external fault, healthy transformer energization, and internal fault, the CAE block of the DCNN has satisfactory reconstruction performance. However, since the differential current of faulty transformer energization is the superposition of fault current and inrush current, which have certain similarities, there are no distinct dividing points between the unsaturated and saturated parts. As a result, it is inevitable that the CAE block performs worse to the faulty transformer energization. However, the faulty and saturated features are still significantly different from the unsaturated features of the healthy transformer. Therefore, the reconstruction results of the faulty transformer energization do not affect the correct identification of the CNN block.

Fig. 10
figure 10

Reconstruction results of test samples

Equation (12) details the classification results, where ACC is the classification accuracy of the test samples, TP and TN are the respective scenario numbers that the healthy and faulty transformers are identified correctly. FP and FN are the scenario numbers of faulty and healthy transformers that are wrongly identified, respectively.

$$\begin{aligned} {\text{ACC}} & = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}}{\kern 1pt} \\ & = \frac{99 + 117}{{99 + 117 + 0 + 7}} \times 100\% = 96.86\% \\ \end{aligned}$$
(12)

In (12), all the scenarios identified wrongly are related to the healthy transformer energization due to longer saturation duration resulting from remanence. With no consideration of remanence, the data window of 13 ms adopted in the proposed protection scheme contains sufficient unsaturated features for the operating state identification. However, it is difficult for the dynamic model experiments to fully eliminate the effects of remanence when collecting the test samples. Affected by the remanence, the unsaturated features may be insufficient for some test samples, as in an example shown in Fig. 11.

Fig. 11
figure 11

An example of healthy transformer identified wrongly

In Fig. 11a, after the transformer is energized, the iron core is saturated between 0.6786 and 0.6908 s in the first cycle. Therefore, the duration time of differential current saturation is 12.2 ms. Consequently, the VICur image in Fig. 11b only contains unsaturated features of 0.8 ms. Inevitably, it is wrongly identified as a “faulty transformer” by the CNN block.

The training samples fully consider various simulation conditions but the simulated scenarios are far from covering all possible scenarios. In particular, the validation and test samples have different operational conditions from the training samples. However, from the reconstruction and classification results, the proposed DCNN can effectively extract the unsaturated features and reliably identify the operating states of the power transformers. It effectively proves that the DCNN helps the CNN block decrease the ergodicity requirements. Therefore, the CNN block is promising for building an AI-based transformer protection scheme with a strong generalizability.

Comparisons with common neural network

Comparisons with common neural networks are made to verify the improved generalizability of the proposed DCNN, including:

  1. (1)

    CNN. The VICur images are directly used as input of a CNN to identify the operating states of the power transformer.

  2. (2)

    CAE and classifier. The CAE extracts the features by reconstructing the VICur image as the unsaturated part. They are then used as input of a classifier to identify the operating states of the power transformer.

These two neural networks adopt the same structure, same initial values, and same samples as the DCNN in Sect. 4.3.

Comparison with CNN

Figure 12 shows the training process of the CNN. As can be seen, the classification accuracies of the training and validation samples increase gradually as the classification loss decreases. When iterating to the 70th step, the classification loss and accuracy become stable. Finally, the classification accuracies of the training and validation samples reach 99.84% and 99.52%, respectively. It seems that the CNN is trained well and has good performance on the simulation samples. The test samples are also used to test the generalizability of the CNN. From the test results, the classification accuracy of the samples is only 92.83%, which indicates the CNN’s poor generalizability. The CNN block of the DCNN develops the ability of focusing on the unsaturated part by the guidance of the CAE block. Therefore it has better generalizability, and can classify the training, validation, and test samples reliably with classification accuracies of 99.96%, 99.68%, and 96.86%, respectively.

Fig. 12
figure 12

Training and validation results of CNN

Comparison with CAE and classifier

Figure 13 shows the training processes of the CAE and the classifier. The final reconstruction loss shown in Fig. 13a is lower than 0.2, so it seems that the CAE has extracted sufficient main features of the VICur images. The output of the CAE encoder is then used as input to train the classifier. During the training process shown in Fig. 13b, its parameters are updated, and the encoder parameters are fine-tuned to decrease the classification loss. With the decrease of classification loss, the classification accuracies of the training and validation samples increase gradually and finally reach 100% and 99.68%, respectively. Hence, the classifier based on the features extraction of the CAE has good classification performance on both the training and validation samples.

Fig. 13
figure 13

Training and validation results

Further, the generalizability of the CAE and the classifier is tested on the test samples. Compared with the CNN in Sect. 4.4.1, the classifier has better generalizability with a classification accuracy of 95.07% due to the feature extraction process of the CAE. However, because the training processes of the encoder and the classifier are independent, the training process of the encoder doesn’t consider the effects of the updated parameters on the classification performance of the classifier. Therefore, the features extracted by the encoder are not the optimal ones for the sample classification of the classifier. Comparatively, the proposed DCNN realizes the interaction of the CAE block’s reconstruction process and the CNN block’s classification process during the process of the DCNN. Therefore, the features extracted by the encoder of the DCNN are optimal for the sample classification of the CNN block. From the results, the CNN block of the DCNN has the best generalization ability with a classification accuracy of 96.86% for the test samples.

Comparisons with different data size

The size of the training samples is adjusted further to compare the classification performance of the three neural networks. The new training samples are randomly selected from the original training samples in the proportions of 15%, 30%, 45%, 60%, 75%, and 90%. The validation and test samples, the neural networks, and the initial parameters are unchanged. Table 2 compares the results, after the training and test processes, of the classification accuracies and the determined weights of the DCNN. From Table 2, it can be seen that as the size of the training samples increases, the classification performance of the three neural networks improves noticeably. They perform similarly and have good generalizability to the validation samples, but perform quite differently to the test samples. Comparatively, the CNN block of the DCNN with the suitable weights has the best generalizability under any size of training samples. For instance, when the size of the training samples is 75%, the CNN block of the DCNN with the weights of 1.6 and 1.0 performs the best with a classification accuracy of 95.07% for the test samples, whereas the classification accuracies of the other two neural networks are only 91.93% and 93.72%. The results in Table 2 demonstrate that the CNN block determined by the proposed DCNN has significantly improved generalizability.

Table 2 Comparisons under different training samples size

Comparisons with other methods

This section compares this proposed protection scheme with other schemes as briefly described below. We highlight its advantages.

  1. (1)

    Traditional second harmonic restraint (Scheme 1). The threshold of this harmonic restraint method is 15%.

  2. (2)

    ANN-based protection scheme (Scheme 2). The differential current in one cycle is used as input to the ANN to identify the operating states of the power transformer. It adopts a structure of double hidden layer, where the neuron numbers of the first and the second hidden layers are 550 and 10, respectively.

  3. (3)

    RF-based protection scheme (Scheme 3). The differential current in one cycle is used as input of RF to identify the operating states of each phase. The decision tree number of the selected RF is 50, where the maximum depth is 20. The node splitting of each decision tree is based on the Gini index.

  4. (4)

    Wavelet transform and SVM based protection scheme (Scheme 4) [20]. The detailed components (d2-d4) of the differential current, which are extracted through db4 mother over level d4, are divided into four equal sections. Then the average energy of the three phases in each section is computed. Finally, the average energy is used as input to an SVM to identify the operating states of the power transformer.

  5. (5)

    Geometric features of the VICur and SVM-based protection scheme (Scheme 5) [27]. The inclination angle, the major axis and the ellipticity of the VICur are calculated by the methods provided in [27]. The three features are combined as a feature vector used as input to the SVM to identify the operating states. The parameters, c and g, are 18.3792 and 0.3789, respectively.

The above five schemes adopt the same training samples as the proposed protection scheme, as shown in Table 5. The classification accuracy of the test scenarios in Table 7 is used to compare the classification performance. As these schemes require the training and test scenarios to adopt the same sampling rate, the sampling rate is thus adjusted to 10 kHz in this comparison. Table 3 summarizes the classification results of Schemes 1–5 after the training and test processes, and the proposed protection scheme.

Table 3 Comparison results of classification accuracy (%)

From Table 3, Scheme 1 performs better than schemes 2–5 with an accuracy of 94.21% for the identification of internal faults and healthy transformer energization. However, it has the highest rejection rate of 11.11% when an internal fault occurs because of the harmonics which cannot be ignored. In addition, Scheme 1 is obviously unsuitable for identifying faulty transformer energization and a normal operational/external fault. Schemes 2–4 have satisfactory performance for faulty transformer energization, normal operational/ external faults, and internal faults. However, they perform worse with a higher malfunction rate when a healthy transformer is energized, and specifically, the identification accuracies of healthy transformer energization are only 77.59%, 75.86%, and 81.03%. Except for the scenarios of normal operational/external fault, Scheme 5 performs the best for the identification of healthy transformer energization, internal faults, and faulty transformer energization, compared with Schemes 1–4. Comparatively, the proposed protection scheme in this paper shows the best classification performance for all scenarios. When an internal fault occurs or a healthy transformer is energized, its rejection rate and malfunction rate are the lowest with classification accuracies of 100% and 87.93%, respectively. It can also identify faulty transformer energization and normal operational/external fault reliably. In summary, the proposed protection scheme is superior, indicating that the proposed DCNN improves the performance of the AI-based transformer scheme.

Run time test

To test the runtime of the proposed protection scheme, this section uses the Python language and deploys it on the developer kit of NVIDIA® Jetson AGX Xavier™. The test platform is shown in Fig. 14 and the parameters of the developer kit are provided in Table 4.

Fig. 14
figure 14

Platform for runtime test

Table 4 Parameters of developer kit

The runtime test by the platform only involves the computation time t of the protection procedure, including data window length \(t_{{{\text{dw}}}}\), data processing time \(t_{{{\text{dp}}}}\), and computation time \(t_{{{\text{ct}}}}\) of the CNN block. From the test results, the computation time of the proposed protection scheme is 25.92 ms, as:

$$\begin{aligned} t & = t_{{{\text{dw}}}} + t_{{{\text{dw}}}} + t_{{{\text{dw}}}} \\ & = (13 + 4.14 + 8.78){\text{ms}} \\ & = 25.92{\text{ms}} \\ \end{aligned}$$
(13)

Although the runtime reaches 25.92 ms, it still meets the requirements of relay protection. Considering the classification accuracy of the test samples, the proposed DCNN based transformer protection scheme has a certain practicability.

Conclusion

The power transformer plays an essential role in a power system. Therefore, its protection scheme is a critical issue. The traditional protection scheme is differential protection configured with a second harmonic restraint. It has been widely used in power systems. However, as the power system becomes increasingly complex, this differential protection can no longer meet requirements. On the other hand, previously proposed AI-based protection schemes are not being accepted by the power system because of the ergodicity requirement of the training samples.

In this paper, a new deep structure called DCNN is proposed to decrease the ergodicity requirement of the training samples, and a reliable transformer protection scheme is developed by using the DCNN to identify the VICur image. The DCNN is an integration of a CAE block and a CNN block, where the CAE block shares its encoder with the CNN block. The DCNN uses the CAE block to reconstruct the VICur image as the unsaturated part and uses the CNN block to classify the training samples. Because of the interaction process in the encoder, the CAE block guides the CNN block to focus on the unsaturated part of the VICur image. Because the unsaturated part of the VICur approximates an ellipse and differs distinctly between a healthy and faulty transformer, the ergodicity requirement of the training samples is decreased significantly. Therefore, the CNN block trained by the DCNN is helpful for building an AI-based transformer protection scheme with a strong generalization ability. PSCAD simulations and dynamic experiments show that the proposed protection scheme is a promising prospect for power systems.

Availability of data and materials

Please contact author for data and material request.

Abbreviations

AI:

Artificial intelligence

CAE:

Convolutional auto-encoder

CNN:

Convolutional neural network

DCNN:

Denoising-classification neural network

VICur:

Exciting voltage-differential current curve

ML:

Machine learning

ANN:

Artificial neural network

HMM:

Hidden Markov model

DT:

Decision tree

RF:

Random forest

SVM:

Support vector machine

NO:

Normal operation

EF:

External fault

HTE:

Healthy transformer energization

IF:

Internal fault

FTE:

Faulty transformer energization

References

  1. Medeiros, R. P., Costa, F. B., Silva, K. M., Muro, J. D. J. C., Júnior, J. R. L., & Popov, M. (2022). A clarke-wavelet-based time-domain power transformer differential protection. IEEE Transactions on Power Delivery, 37(1), 317–328.

    Article  Google Scholar 

  2. Haenlein, M., & Kaplan, A. (2019). A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 61(4), 5–14.

    Article  Google Scholar 

  3. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

    Article  Google Scholar 

  4. Perez, L. G., Flechsig, A. J., Meador, J. L., & Obradovic, Z. (1994). Training an artificial neural network to discriminate between magnetizing inrush and internal faults. IEEE Transactions on Power Delivery, 9(1), 434–441.

    Article  Google Scholar 

  5. Balaga, H., Gupta, N., & Vishwakarma, D. N. (2015). GA trained parallel hidden layered ANN based differential protection of three phase power transformer. International Journal of Electrical Power & Energy Systems, 67, 286–297.

    Article  Google Scholar 

  6. Segatto, E. C., & Coury, D. V. (2006). A differential relay for power transformers using intelligent tools. IEEE Transactions on Power Systems, 21(3), 1154–1162.

    Article  Google Scholar 

  7. Geethanjali, M., Slochanal, S. M. R., & Bhavani, R. (2008). PSO trained ANN-based differential protection scheme for power transformers. Neurocomputing, 71(4–6), 904–918.

    Article  Google Scholar 

  8. Moravej, Z., Vishwakarma, D. N., & Singh, S. P. (2003). Application of radial basis function neural network for differential relaying of a power transformer. Computers & Electrical Engineering, 29(3), 421–434.

    Article  MATH  Google Scholar 

  9. Tripathy, M., Maheshwari, R. P., & Verma, H. K. (2008). Radial basis probabilistic neural network for differential protection of power transformer. IET Generation Transmission & Distribution, 2(1), 43–52.

    Article  Google Scholar 

  10. Moravej, Z. (2005). Evolving neural nets for protection and condition monitoring of power transformer. Electric Power Components and Systems, 33(11), 1229–1236.

    Article  Google Scholar 

  11. Tripathy, M., Maheshwari, R. P., & Verma, H. K. (2010). Power transformer differential protection based on optimal probabilistic neural network. IEEE Transactions on Power Delivery, 25(1), 102–112.

    Article  Google Scholar 

  12. Tripathy, M., Maheshwari, R. P., & Verma, H. K. (2007). Probabilistic neural-network-based protection of power transformer. IET Electric Power Applications, 1(5), 793–798.

    Article  Google Scholar 

  13. Ma, X. X., & Shi, J. (2000). A new method for discrimination between fault and magnetizing inrush current using HMM. Electric Power Systems Research, 56(1), 43–49.

    Article  Google Scholar 

  14. Samantaray, S. R., & Dash, P. K. (2011). Decision tree based discrimination between inrush currents and internal faults in power transformer. International Journal of Electrical Power & Energy Systems, 33(4), 1043–1048.

    Article  Google Scholar 

  15. Ozgonenel, O., & Karagol, S. (2014). Power transformer protection based on decision tree approach. IET Electric Power Applications, 8(7), 251–256.

    Article  Google Scholar 

  16. Shah, A. M., & Bhalja, B. R. (2016). Fault discrimination scheme for power transformer using random forest technique. IET Generation Transmission & Distribution, 10(6), 1431–1439.

    Article  Google Scholar 

  17. Afrasiabi, S., Afrasiabi, M., Parang, B., & Mohammadi, M. (2020). Integration of accelerated deep neural network into power transformer differential protection. IEEE Transactions on Industrial Informatics, 16(2), 865–876.

    Article  Google Scholar 

  18. Afrasiabi, S., Afrasiabi, M., Parang, B., & Mohammadi, M. (2020). Designing a composite deep learning based differential protection scheme of power transformers. Applied Soft Computing, 87, 105975.

    Article  Google Scholar 

  19. Shah, A. M., & Bhalja, B. R. (2013). Discrimination between internal faults and other disturbances in transformer using the support vector machine-based protection scheme. IEEE Transactions on Power Delivery, 28(3), 1508–1515.

    Article  Google Scholar 

  20. Jazebi, S., Vahidi, B., & Jannati, M. (2011). A novel application of wavelet based SVM to transient phenomena identification of power transformers. Energy Conversion and Management, 52(2), 1354–1363.

    Article  Google Scholar 

  21. Mao, P. L., & Aggarwal, R. K. (2001). A novel approach to the classification of the transient phenomena in power transformers using combined wavelet transform and neural network. IEEE Transactions on Power Delivery, 16(4), 654–660.

    Article  Google Scholar 

  22. Bagheri, S., Moravej, Z., & Gharehpetian, G. B. (2018). Classification and discrimination among winding mechanical defects, internal and external electrical faults, and inrush current of transformer. IEEE Transactions on Industrial Informatics, 14(2), 484–493.

    Article  Google Scholar 

  23. Jazebi, S., Vahidi, B., Hosseinian, S. H., & Faiz, J. (2009). Magnetizing inrush current identification using wavelet based Gaussian mixture models. Simulation Modelling Practice and Theory, 17(6), 991–1010.

    Article  Google Scholar 

  24. Thote, P. B., Daigavane, M. B., Daigavane, P. M., & Gawande, S. P. (2017). An intelligent hybrid approach using KNN-GA to enhance the performance of digital protection transformer scheme. Canadian Journal of Electrical and Computer Engineering-Revue Canadienne de Genie Electrique et Informatique, 40(3), 151–161.

    Article  Google Scholar 

  25. Orille, A. L., Khalil, N., & Valencia, J. A. V. (1999). A transformer differential protection based on finite impulse response artificial neural network. Computers & Industrial Engineering, 37(1–2), 399–402.

    Article  Google Scholar 

  26. Yu, J. B., & Zhou, X. K. (2020). One-dimensional residual convolutional autoencoder based characteristic learning for gearbox fault diagnosis. IEEE Transactions on Industrial Informatics, 16(10), 6347–6358.

    Article  Google Scholar 

  27. Jiao, Z. B., & Li, Z. B. (2018). Novel magnetization hysteresis-based power-transformer protection algorithm. IEEE Transactions on Power Delivery, 33(5), 2562–2570.

    Article  Google Scholar 

  28. Li, Z. B., Jiao, Z. B., & He, A. Y. (2020). Knowledge-based artificial neural network for power transformer protection. IET Generation Transmission & Distribution, 14(24), 5782–5791.

    Article  Google Scholar 

  29. Li, Z. B., Jiao, Z. B., & He, A. Y. (2021). Knowledge-based convolutional neural networks for transformer protection. CSEE Journal of Power and Energy Systems, 7(2), 270–278.

    Google Scholar 

  30. Noh, H., Hong, S., & Han, B. (2015) Learning deconvolution network for semantic segmentation. In 2015 IEEE International Conference on Computer Vision.

  31. Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010) Deconvolutional networks. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No.: 20210333).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the research, read and approved the manuscript. ZJ proposed the initial concept of VICur, analyzed the feasible of VICur in the AI-based transformer protection scheme, and gave technique guidance in the whole research process. ZL modeled the simulation system, conducted the experiments, proposed this new neural network in this paper, and achieved its successful application in the transformer protection, and wrote the manuscript. AH checked the training and test results, built the platform to test the runtime of the proposed protection scheme. NX completed the comparison work, wrote the response letter and corrected the manuscript format.

Corresponding author

Correspondence to Zongbo Li.

Ethics declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix

Appendix

See Tables 5, 6 and 7.

Table 5 Training and validation samples in PSCAD
Table 6 Magnetization curves of training samples (p.u.)
Table 7 Test samples in experiments

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, Z., Jiao, Z., He, A. et al. A denoising-classification neural network for power transformer protection. Prot Control Mod Power Syst 7, 52 (2022). https://doi.org/10.1186/s41601-022-00273-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s41601-022-00273-8

Keywords

  • Transformer protection
  • Exciting voltage-differential current curve
  • Convolutional auto-encoder
  • Convolutional neural network
  • Denoising-classification neural network