A denoising-classification neural network for power transformer protection

Artificial intelligence (AI) can potentially improve the reliability of transformer protection by fusing multiple features. However, owing to the data scarcity of inrush current and internal fault, the existing methods face the problem of poor generalizability. In this paper, a denoising-classification neural network (DCNN) is proposed, one which integrates a convolutional auto-encoder (CAE) and a convolutional neural network (CNN), and is used to develop a reliable transformer protection scheme by identifying the exciting voltage-differential current curve (VICur). In the DCNN, CAE shares its encoder part with the CNN, where the CNN combines the encoder and a classifier. Based on the interaction of the CAE reconstruction process and the CNN classification process, the CAE regards the saturated features of the VICur as noise and removes them accurately. Consequently, it guides CNN to focus on the unsaturated features of the VICur. The unsaturated part of the VICur approximates an ellipse, and this significantly differentiates between a healthy and faulty transformer. Therefore, the unsaturated features extracted by the CNN help to decrease the data ergodicity requirement of AI and improve the generalizability. Finally, a CNN which is trained well by the DCNN is used to develop a protection scheme. PSCAD simulations and dynamic model experiments verify its superior performance.


Introduction
A power transformer is a critical element in a power system. The core issue of transformer protection is the discrimination between inrush current and an internal fault. Because of the advantages of simplicity and rapid response, differential protection configured with the second harmonic restraint [1] has been widely used in power systems. However, it can no longer meet the reliability requirement of the increasingly complex power system because the second harmonic is inconsistent with inrush current and an internal fault [1]. With the rapid development of artificial intelligence (AI) [2,3], many AI-based protection schemes have emerged which fuse multiple features. Recent work is summarized below.
(1) The first category directly uses differential current as input to a machine learning (ML) algorithm to identify the operating state. The adopted ML algorithms include artificial neural (ANN) [4][5][6][7], radial basis function neural networks [8,9], evolving neural nets [10], probabilistic neural network [11,12], hidden Markov model (HMM) [13], decision tree (DT) [14,15], random forest (RF) [16], etc. In recent years, deep neural networks have gained a lot of attention for developing transformer protection. Examples are such as the accelerated convolutional neural network (CNN) presented in [17], and the new structure CLGNN in [18] combining a CNN and a light-gated recurrent unit. (2) The second category extracts the data features from differential current first, and then uses them as input to an ML algorithm to identify the operating state. In [19][20][21][22][23][24], various wavelet features of differential current are extracted by wavelet analysis and *Correspondence: lizb@neepu.edu.cn used as input to the ML algorithms to build transformer protection, such as support vector machine (SVM) [19,20], ANN [21], DT [22], Gaussian mixed model [23], k-nearest neighbors algorithm [24]. Similarly, reference [25] uses the amplitude features of the primary and secondary currents as input to a finite impulse response artificial neural network to build transformer protection.
Generally, AI demands the training samples to cover almost all the scenarios in a power system. However, the inrush current and internal fault of the on-site transformers are small probability events whose recorded samples are scarce. Therefore, the on-site transformers in a real power system cannot meet the ergodicity requirement of the training samples. As a result, it is difficult for AI-based protection schemes to perform satisfactorily in a real power system. In AI application, to improve the classifier's performance, many first use a convolutional auto-encoder (CAE) [26] to extract the main features of input data before training the classifier. However, in fact, the features extracted by the CAE are not always helpful because they also face the problem of generalizability when the training samples are scarce.
In summary, to improve the generalizability of AIbased protection schemes, it is critical to decrease the data ergodicity requirement of AI. In this paper, a novel deep neural network called a denoising-classification neural network (DCNN) is proposed and used to develop an AI-based transformer protection scheme by identifying the exciting voltage-differential current curve (VICur) [27][28][29]. Typical VICurs are shown in Fig. 1, including normal operation, healthy transformer energization (inrush current), internal fault, and faulty transformer energization (superposition of inrush and fault currents).
From Fig. 1, the VICurs of internal fault and normal operation approximate ellipses with different features. When the transformer is energized, iron core saturation causes the ellipse to distort irregularly. Thus, the VICurs of both healthy and faulty transformer energization have both an unsaturated and a saturated part, where the unsaturated parts exhibit the same features as normal operation and internal fault. Clearly, the unsaturated features of a VICur differ significantly between a healthy and a faulty transformer. If the adopted ML algorithm can focus on the unsaturated part of the VICur and avoid the influence of the saturated part, the extracted features can be used as the basis for identifying the operating states of the transformer reliably, and are useful for decreasing the ergodicity requirement of the training samples.
The proposed DCNN is a new deep structure integrating a CAE and a CNN. The CAE extracts the unsaturated features of the VICur by reconstructing it as the unsaturated part while regarding the saturated part as noise for removal. It shares its encoder with the CNN, and thus the CNN combines the shared encoder and a classifier and realizes the data classification. During the training process, the DCNN achieves the interaction of the CAE reconstruction and the CNN classification through the shared encoder. Therefore, the CAE effectively guides the CNN to focus on the unsaturated part of the VICur. Finally, by paying attention to the unsaturated part of the VICur, the CNN develops a strong generalizability and is used to build an AI-based transformer protection scheme. To certain extent, the developed protection scheme in this paper can avoid the influences of the saturated features of inrush current and decrease the ergodicity requirement of the training samples. PSCAD simulations and dynamic model experiments verify the superior performance of the proposed transformer protection scheme through comparisons with existing work.

Proposed denoising-classification neural network
The comprehensive features of the VICur can be exhibited by its image. A CNN has a great ability to mine and classify the depth features of the VICur image, and the CAE can guide its encoder part to extract the unsaturated features and remove the saturated features through the reconstruction process of the input. We propose a DCNN structure to realize the interaction of the CNN and CAE. Through the guidance of the CAE, the CNN develops the ability to focus on the unsaturated features of the VICur image. The extracted comprehensive features complement each other to reliably identify the operating states. Finally, the CNN trained by the DCNN is used to build the protection scheme.

Input of DCNN
The input to the DCNN is a VICur image. Its acquisition process involves calculating and normalizing the exciting voltage and differential current, and converting the discrete data to a grayscale image. First, the exciting voltage and differential current are calculated. The exciting voltage U is approximately equal to the primary voltage and the differential current I is the sum of the primary and secondary currents, as: where u k and i k are the kth instantaneous values, and n is the sample number. The exciting voltage and differential current are then normalized. This aims to limit the VICur into a fixed range without changing the graphic features. Specifically, the exciting voltage and differential current are normalized by using the same maximum and minimum values, as: where u ′ k and i ′ k are the normalized values, q min and q max are the minimum and maximum values of the vector Q shown as: Finally, the normalized VICur in (2) is converted into a grayscale image, which covers the classification information and its size is m × m.

Overview of the proposed DCNN
The proposed DCNN integrates a CAE block and a CNN block which share the encoder, as shown in Fig. 2. During the training process of the DCNN, the CAE block is trained with the objective of minimizing the reconstruction loss, while the CNN block is trained with the objective of minimizing the classification loss. They are then used to build the transformer protection scheme. The details are provided below.

CAE block of DCNN
During the feature extraction of the VICur image, the CAE block regards the saturated features as noise for removal and extracts the unsaturated features. Its structure is shown in Fig. 3.
The CAE block consists of an encoder and a decoder. X ′ U -I and X ′ U -I are its input and target output, respectively. The target output is the unsaturated part of the input image. Unit k of the encoder involves calculations of convolution, batch normalization, and activation, and the result is written as o ek .
The output of the encoder can be represented by (4), where the symbol "*" indicates the convolution calculation, and r is the encoder's unit number.  Likewise, unit k of the decoder involves calculations of deconvolution [30,31], batch normalization, and activation, and the result is represented as o dk . The output of the decoder can be represented by (5), where the symbol "**" indicates the deconvolution calculation, and s is the decoder's unit number.
Based on (4) and (5), the reconstruction loss of the CAE block is defined by the mean square error between the actual output and the target output, as: where d indicates the dth training sample, and D is the number of training samples. x ′ duv and x ′′ duv refer to the pixel values of the target output and actual output in the uth row and vth column, respectively. By minimizing the reconstruction loss in (6), the CAE block reconstructs the VICur image as the unsaturated part, and consequently, the encoder extracts the unsaturated features. Meanwhile, the CNN block of the DCNN is guided by this reconstruction process to develop the ability of focusing on the unsaturated part.

CNN block of DCNN
The CNN block realizes the data classification to identify the operating states of the power transformer. Specifically, it deals with a task of binary classification of the healthy transformer including normal operation/external fault and healthy transformer energization, and faulty transformer including internal fault and faulty transformer energization. Its structure is shown in Fig. 4.
The CNN block consists of the encoder of the CAE block and a classifier. Unit k of the classifier also involves the calculations of convolution, batch normalization, and activation. The result is designated as o ck , and therefore the classifier's output can be written as: where the function S(x) is the softmax function, and the output is a 2-dimension vector, whose elements indicate the probability that the dth input image belongs to healthy transformer or faulty transformer, respectively. r is the classifier's unit number.
Based on (7), the classification loss of the CNN block is defined by the cross-entropy loss, as: where M h and M f refer to the training samples of healthy and faulty transformer, respectively. With the guidance of the CAE block and the objective of minimizing the classification loss in (8), the CNN block develops the ability to focus on the VICur image's unsaturated part.

Loss function of the proposed DCNN
The loss function of the DCNN is the weighted sum of the reconstruction loss and the classification loss, given as: where α and β are the weights of the reconstruction loss and the classification loss, respectively.
Based on the loss function (9), the CAE block and the CNN block interact through the shared encoder. From the reconstruction process of DCNN, the CAE block guides the CNN block to focus on the unsaturated part of the VICur image. Conversely, according to the classification results, the CNN block tests the unsaturated features extracted by the CAE block. Thus, the encoder parameters are determined by the CAE block and the CNN block together. The features extracted by the encoder are suitable for both the ideal reconstruction process and a satisfactory classification process. Therefore they are the optimal features for identifying the operating states of the power transformer. Finally, the CNN block trained by the DCNN has an improved generalization ability and is used to build a reliable protection scheme. 3 Proposed power transformer protection scheme Figure 5 shows the procedure of the transformer protection scheme, including the following steps.
(1) Calculate the differential current and the exciting voltage according to (1). (2) Identify whether a disturbance occurs through the start-up criterion. The fault components of the exciting voltage and differential current are used to construct the start-up criterion, as: where k is the kth sampling data, h is the sample number in one cycle, K U and K I are thresholds. (3) Obtain the VICur image. Suppose the start-up criterion (10) is met at the sth sampling data, the obtained VICur can be represented as (11), where n is the sample number in the adopted data window. Then, the VICur in (11) is normalized through the method in (2), and is then converted into a grayscale image. (10) Identify the operating states of the power transformer. The VICur image of each phase is used as input to the CNN block to determine the operating states of each phase. When at least one phase is identified as "faulty transformer, " the differential relay sends a tripping signal.

Case study
The training samples are collected from PSCAD simulation systems. To improve and verify the generalizability of the proposed protection scheme, the validation samples are obtained in the simulation system whose parameters are different from that of the training samples. Test samples are collected in the dynamic model experiments. Figure 6 illustrates the equivalent model of the PSCAD simulations and dynamic model experiments.

Sample collection
All training samples are obtained from the step-down transformer in the simulation system. The simulation conditions have been given full consideration, provided by Table 5 in the "Appendix". In Table 5, NO, EF, HTE, IF, and FTE refer to normal operation, external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively. For example, the energization time is one of the decisive factors for the saturation occurrence and duration time; the faulty turn is an essential factor that determines the differential current, the transformer loss, and the terminal voltage. In addition, the magnetization curves, another factor deciding the saturation features, are provided by Table 6 in the "Appendix". The sample numbers are 564, 478, 760, and 760 for normal operation/external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively. The validation samples are obtained from a three-winding transformer whose operational conditions are also shown in Table 5, and its magnetization curve is provided in Table 6. The sample numbers are 225, 80, 160, and 160 (11) X s = (i s+1 , u s+1 ), . . . , i s+k , u s+k , . . . , (i s+n , u s+n ) for normal operation/external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively. The test samples are obtained from an experimental transformer consisting of three single transformers. Table 7 in the "Appendix" provides the transformer parameters and the experimental scenarios, e.g., the internal faults are conducted on the primary or secondary side by connecting the contact terminals; the occurrence times of external fault, internal fault, and transformer energization are set randomly; the minimum turn ratio of internal fault is 2.3%, etc. In Table 7, NO, EF, HTE, IF, and FTE refer to normal operation, external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively. The sample numbers are 48, 58, 63, and 54 for normal operation/external fault, healthy transformer energization, internal fault, and faulty transformer energization, respectively.
Before training the DCNN and building the protection scheme, the raw samples collected in the simulations and the experiments are processed according to the methods in Sect. 2.1. Because the saturation duration of the iron core is no larger than 10 ms when the residual flux is not considered, a data window of 12-15 ms can contain sufficient unsaturated features. Herein, a data window of 13 ms is adopted, and the size of the VICur image is 50 × 50. The target output of the CAE block is the image of the unsaturated part, and the saturated parts of the input images are deleted manually.

Selection of the optimal DCNN
Considering the reconstruction and classification losses, and the classification accuracy of VICur images comprehensively, we select a relatively optimal structure shown in Fig. 7. The meanings of the characters, w, c, s, and p, have also been given in Fig. 7. "Conv2d", "ConvTrans-pose2d", "ReLU, " and "BatchNorm" refer to 2-dimensional  convolution, 2-dimensional deconvolution, ReLU function, and batch normalization, respectively. Table 1 summarizes the reconstruction and classification performance of the DCNN with different loss weights. In Table 1, ACC, RecLoss, ClaLoss refer to accuracy, reconstruction loss, and classification loss, respectively. The weight β of the classification loss is set to 1.0, and the weight α of the reconstruction loss ranges from 0 to 2.0. As can be seen from Table 1, the DCNNs with the weights below perform the best (the following results are given in order of the reconstruction loss, classification loss, and classification accuracy): The CNN blocks of the DCNNs above are then used as the alternatives to classify the test samples. As expected, they show strong generalization ability, with accuracies of 96.38%, 96.84%, and 96.59%, respectively. Therefore, any one of them is promising for building a reliable protection scheme. In the following section, the DCNN with loss weights of 1.5 and 1.0 is taken as an example to detail the training process and the advantages of the proposed protection scheme.  Fig. 8a, the reconstruction and classification losses decrease gradually as the iteration time increases. As shown in Fig. 8b, when the iteration progresses to the 100th time, the total losses of the training and validation samples drop to below 0.1 and 20, respectively. Figure 9 shows the reconstruction results of several VICur images in the validation samples, indicating that the CAE block of the DCNN extracts the unsaturated features effectively. The classification accuracies of the training and validation samples increase to 99.96% and 99.68%, respectively. It indicates that the CNN block of the DCNN has the desired classification performance. The test samples are used to test the generalizability of the CNN.

Training and test process of the selected DCNN
Since they are affected by the complicated transient environment of the experiments, the test samples have different features from the training samples. Because of these different features, the CAE block of the DCNN fails to completely remove the saturated parts of the test samples, as can be seen from the reconstruction results of the partial VICur images in Fig. 10. However, the reconstructed images still contain sufficient unsaturated features because the DCNN has the ability to focus on the unsaturated part of the VICur image. In particular, for normal operation/external fault, healthy transformer energization, and internal fault, the CAE block of the DCNN has satisfactory reconstruction performance. However, since the differential current of faulty transformer energization is the superposition of fault current and inrush current, which have certain similarities, there are no distinct dividing points between the unsaturated and saturated parts. As a result, it is inevitable that the In (12), all the scenarios identified wrongly are related to the healthy transformer energization due to longer saturation duration resulting from remanence. With no consideration of remanence, the data window of 13 ms adopted in the proposed protection scheme contains sufficient unsaturated features for the operating state identification. However, it is difficult for the dynamic model experiments to fully eliminate the effects of remanence when collecting the test samples. Affected by the  remanence, the unsaturated features may be insufficient for some test samples, as in an example shown in Fig. 11. In Fig. 11a, after the transformer is energized, the iron core is saturated between 0.6786 and 0.6908 s in the first cycle. Therefore, the duration time of differential current saturation is 12.2 ms. Consequently, the VICur image in Fig. 11b only contains unsaturated features of 0.8 ms. Inevitably, it is wrongly identified as a "faulty transformer" by the CNN block.
The training samples fully consider various simulation conditions but the simulated scenarios are far from covering all possible scenarios. In particular, the validation and test samples have different operational conditions from the training samples. However, from the reconstruction and classification results, the proposed DCNN can effectively extract the unsaturated features and reliably identify the operating states of the power transformers. It effectively proves that the DCNN helps the CNN block decrease the ergodicity requirements. Therefore, the CNN block is promising for building an AI-based transformer protection scheme with a strong generalizability.

Comparisons with common neural network
Comparisons with common neural networks are made to verify the improved generalizability of the proposed DCNN, including: (1) CNN. The VICur images are directly used as input of a CNN to identify the operating states of the power transformer. (2) CAE and classifier. The CAE extracts the features by reconstructing the VICur image as the unsaturated part. They are then used as input of a classifier to identify the operating states of the power transformer.
These two neural networks adopt the same structure, same initial values, and same samples as the DCNN in Sect. 4.3. Figure 12 shows the training process of the CNN. As can be seen, the classification accuracies of the training and validation samples increase gradually as the classification loss decreases. When iterating to the 70th step, the classification loss and accuracy become stable. Finally, the classification accuracies of the training and validation samples reach 99.84% and 99.52%, respectively. It seems that the CNN is trained well and has good performance on the simulation samples. The test samples are also used to test the generalizability of the CNN. From the test results, the classification accuracy of the samples is only 92.83%, which indicates the CNN's poor generalizability. The CNN block of the DCNN develops the ability of focusing on the unsaturated part by the guidance of the CAE block. Therefore it has better generalizability, and can classify the training, validation, and test samples reliably with classification accuracies of 99.96%, 99.68%, and 96.86%, respectively. Figure 13 shows the training processes of the CAE and the classifier. The final reconstruction loss shown in Fig. 13a is lower than 0.2, so it seems that the CAE has extracted sufficient main features of the VICur images. The output of the CAE encoder is then used as input to train the classifier. During the training process shown in Fig. 13b, its parameters are updated, and the encoder parameters are fine-tuned to decrease the classification loss. With the decrease of classification loss, the classification accuracies of the training and validation samples increase gradually and finally reach 100% and 99.68%, respectively. Hence, the classifier based on the features extraction of the CAE has good classification performance on both the training and validation samples.

Comparison with CAE and classifier
Further, the generalizability of the CAE and the classifier is tested on the test samples. Compared with the CNN in Sect. 4.4.1, the classifier has better generalizability with a classification accuracy of 95.07% due to the feature extraction process of the CAE. However, because the training processes of the encoder and the classifier are independent, the training process of the encoder doesn't consider the effects of the updated parameters on the classification performance of the classifier. Therefore, the features extracted by the encoder are not the optimal ones for the sample classification of the classifier. Comparatively, the proposed DCNN realizes the interaction of the CAE block's reconstruction process and the CNN block's classification process during the process of the DCNN. Therefore, the features extracted by the encoder of the DCNN are optimal for the sample classification of the CNN block. From the results, the CNN block of the DCNN has the best generalization ability with a classification accuracy of 96.86% for the test samples.

Comparisons with different data size
The size of the training samples is adjusted further to compare the classification performance of the three neural networks. The new training samples are randomly selected from the original training samples in the proportions of 15%, 30%, 45%, 60%, 75%, and 90%. The validation and test samples, the neural networks, and the initial parameters are unchanged. Table 2 compares the results, after the training and test processes, of the classification accuracies and the determined weights of the DCNN. From Table 2, it can be seen that as the size of the training samples increases, the classification performance of the three neural networks improves noticeably. They perform similarly and have good generalizability to the validation samples, but perform quite differently to the test samples. Comparatively, the CNN block of the DCNN with the suitable weights has the best generalizability under any size of training samples. For instance, when the size of the training samples is 75%, the CNN block of the DCNN with the weights of 1.6 and 1.0 performs the best with a classification accuracy of 95.07% for the test samples, whereas the classification accuracies of the other two neural networks are only 91.93% and 93.72%. The results in Table 2 demonstrate that the CNN block determined by the proposed DCNN has significantly improved generalizability.

Comparisons with other methods
This section compares this proposed protection scheme with other schemes as briefly described below. We highlight its advantages.
(1) Traditional second harmonic restraint (Scheme 1). The threshold of this harmonic restraint method is 15%. (2) ANN-based protection scheme (Scheme 2). The differential current in one cycle is used as input to the ANN to identify the operating states of the power transformer. It adopts a structure of double hidden  layer, where the neuron numbers of the first and the second hidden layers are 550 and 10, respectively. (3) RF-based protection scheme (Scheme 3). The differential current in one cycle is used as input of RF to identify the operating states of each phase. The decision tree number of the selected RF is 50, where the maximum depth is 20. The node splitting of each decision tree is based on the Gini index. (4) Wavelet transform and SVM based protection scheme (Scheme 4) [20]. The detailed components (d2-d4) of the differential current, which are extracted through db4 mother over level d4, are divided into four equal sections. Then the average energy of the three phases in each section is computed. Finally, the average energy is used as input to an SVM to identify the operating states of the power transformer. (5) Geometric features of the VICur and SVM-based protection scheme (Scheme 5) [27]. The inclination angle, the major axis and the ellipticity of the VICur are calculated by the methods provided in [27]. The three features are combined as a feature vector used as input to the SVM to identify the operating states. The parameters, c and g, are 18.3792 and 0.3789, respectively.
The above five schemes adopt the same training samples as the proposed protection scheme, as shown in Table 5. The classification accuracy of the test scenarios in Table 7 is used to compare the classification performance. As these schemes require the training and test scenarios to adopt the same sampling rate, the sampling rate is thus adjusted to 10 kHz in this comparison. Table 3 summarizes the classification results of Schemes 1-5 after the training and test processes, and the proposed protection scheme.
From Table 3, Scheme 1 performs better than schemes 2-5 with an accuracy of 94.21% for the identification of internal faults and healthy transformer energization. However, it has the highest rejection rate of 11.11% when an internal fault occurs because of the harmonics which cannot be ignored. In addition, Scheme 1 is obviously unsuitable for identifying faulty transformer energization and a normal operational/external fault. Schemes 2-4 have satisfactory performance for faulty transformer energization, normal operational/ external faults, and internal faults. However, they perform worse with a higher malfunction rate when a healthy transformer is energized, and specifically, the identification accuracies of healthy transformer energization are only 77.59%, 75.86%, and 81.03%. Except for the scenarios of normal operational/external fault, Scheme 5 performs the best for the identification of healthy transformer energization, internal faults, and faulty transformer energization, compared with Schemes 1-4. Comparatively, the proposed protection scheme in this paper shows the best classification performance for all scenarios. When an internal fault occurs or a healthy transformer is energized, its rejection rate and malfunction rate are the lowest with classification accuracies of 100% and 87.93%, respectively. It can also identify faulty transformer energization and normal operational/external fault reliably. In summary, the proposed protection scheme is superior, indicating that the proposed DCNN improves the performance of the AI-based transformer scheme.

Run time test
To test the runtime of the proposed protection scheme, this section uses the Python language and deploys it on the developer kit of NVIDIA ® Jetson AGX Xavier ™ . The test platform is shown in Fig. 14 and the parameters of the developer kit are provided in Table 4.
The runtime test by the platform only involves the computation time t of the protection procedure, including data window length t dw , data processing time t dp , and computation time t ct of the CNN block. From the test results, the computation time of the proposed protection scheme is 25.92 ms, as: Although the runtime reaches 25.92 ms, it still meets the requirements of relay protection. Considering the classification accuracy of the test samples, the proposed DCNN based transformer protection scheme has a certain practicability.

Conclusion
The power transformer plays an essential role in a power system. Therefore, its protection scheme is a critical issue. The traditional protection scheme is differential protection configured with a second harmonic restraint. It has been widely used in power systems. However, as the power system becomes increasingly complex, this differential protection can no longer meet requirements. On the other hand, previously proposed AI-based protection schemes are not being accepted by the power system because of the ergodicity requirement of the training samples.
In this paper, a new deep structure called DCNN is proposed to decrease the ergodicity requirement of the training samples, and a reliable transformer protection scheme is developed by using the DCNN to identify the VICur image. The DCNN is an integration of a CAE block and a CNN block, where the CAE block shares its encoder with the CNN block. The DCNN uses the CAE block to reconstruct the VICur image as the unsaturated part and uses the CNN block to classify the training samples. Because of the interaction process in the encoder, the CAE block guides the CNN block to focus on the unsaturated part of the VICur image. Because the unsaturated part of the VICur approximates an ellipse and differs distinctly between a healthy and faulty transformer, the ergodicity requirement of the training samples is (13) t = t dw + t dw + t dw = (13 + 4.14 + 8.78)ms = 25.92ms decreased significantly. Therefore, the CNN block trained by the DCNN is helpful for building an AI-based transformer protection scheme with a strong generalization ability. PSCAD simulations and dynamic experiments show that the proposed protection scheme is a promising prospect for power systems.