http://ijpeds.iaescore.com A long short-term memory based prediction model for transformer fault diagnosis using dissolved gas analysis with digital twin

Info 2022 The most significant tool for defect diagnostics in transformers is dissolved gas analysis (DGA). The time series prediction of dissolved gas levels in oil, when combined with dissolved gas analysis, provides a foundation for transformer fault diagnosis and an early warning. A long short-term memory (LSTM) based prediction model is developed in this paper to train the digital twin for identifying the essential fault in the transformer via DGA. The model is fed with three different gas concentrations as input. This study achieves the performance evaluation in terms of validation accuracy. The suggested model exhibits significant validation accuracy of 99.83%, as indicated by the analyses, thus the early prediction of transformer maintenance is aided. It can be validated that the LSTM model for fault identification and analysis using dissolved gas in the transformer has a lot of research


INTRODUCTION
Transformers are the heart of electric power systems, and their operational state decides whether or not the power network is well-regulated. Electrical, mechanical, and thermal stresses cause some gases created during an operation to dissolve in insulating oil [1]- [3]. Dissolved amounts of these gases are being examined as a possible approach to diagnose and anticipate the transformer's performance both inside and out [4]- [7]. The outcome of a dissolved gas analysis (DGA) offers sufficient information to diagnose the state of the transformer's operation. Two different sorts of procedures for getting dissolved gas analysis data have been proposed during the last several decades for predictive maintenance of transformers which is an alternative to breakdown corrective maintenance. Predictive maintenance increases the operational availability of transformers, preclude downtime due to unscheduled maintenance, minimize the costs and maximize safety. Data driven methodologies are superior to model based predictive maintenance elucidations as they attempt to learn predictive models from the data automatically that makes them suitable for a wide range of such problems. Machine learning has begun to play an important role in predicting transformer failures, and as a result, contributing to predictive maintenance. Artificial neural networks (ANN), support vector machines (SVM), relevance vector machines (RVN), and fuzzy theory [8]- [11] are examples of common machine learning approaches that have been implemented for DGA. All these methods have several specific flaws, such as the need to choose a large number of parameters for accuracy, overfitting, and training duration. Deep learning approaches such as long short-term memory (LSTM) that can extract features automatically have been used, with promising results. To be effective, deep learning techniques require a large amount of data. When compared to other modern methodologies, many deep learning algorithms offered for predictive maintenance problems [12] such as estimating the remaining useful life of transformers showed encouraging results. For greater accuracy LSTM require a significant amount of labelled data to be adequately trained. The following are the paper's main contributions: i) To diagnose transformer defects with precision, huge DGA data consisting of 960 samples of fault-free training and 500 samples of fault-free testing, faulty training, and faulty testing is pre-processed with the proposed high pass filter ,scaling and windowing approaches; ii) LSTM network based prediction model is developed to train the digital twin for precisely diagnosing the essential fault in the transformer through the DGA with highest validation accuracy. The rest of this paper is structured as follows: Section 2 describes LSTM network, section 3 explains methodology with the framework for the LSTM-based digital twin training approach, section 4 presents results and discussion, and finally, section 5 discusses the conclusion.

LSTM NETWORK
Hochreiter and Schmidhuber proposed LSTM as a recurrent neural network (RNN) based technique [13]. When it comes to processing time series, RNN outperforms other neural networks. RNN training time exists, and LSTM solves RNN based on RNN with only a short-term memory. LSTM adds a memory unit to determine whether the information is helpful and has a sophisticated dynamic structure when compared to RNN. The model can handle the challenge of long-term sequence prediction since it has a long-term memory function. In addition, during long sequence training, LSTM may tackle gradient disappearance and gradient explosion [14], [15]. The classification of sequence data is well suited to long short-term memory (LSTM) networks. For time-series data, LSTM networks are useful because, in order to categorize new signals, they recall the uniqueness of earlier signals. An LSTM network allows users to feed sequence data into it and make predictions based on the discrete time steps in the data.
A gating cell is added to the network architecture in the LSTM model, giving it a "long time memory function" that makes it suited for long-term nonlinear series prognosis issues. Memory cells with a gating mechanism replaces unseen layer neurons in LSTM, as opposed to regular RNN. The forget gate determines which parts of the current state move through to the next, the input gate modifies the current input before it is added to the new state, and the output gate modifies the values that pass from the current state to the output. Figure 1 depicts the basic structure of a memory cell [16], [17].
Where , and are the state computation results of the forget, input, and output gates respectively. , ℎ , , ℎ , , ℎ and , , respectively, are the weight matrix and offset term of the corresponding gate. Sigmoid activation function is denoted by .
Memory cell state and hidden layer state ℎ are the output outcomes of memory cell at time t. The following are the formulas: Where ̃ denotes the memory cell's state input at time t, and tanh denotes hyperbolic tangent activation.
The state weight matrix and offset term of cell state are represented by and , respectively. The elementby-element multiplication is denoted by the symbol o.

METHODOLOGY
The LSTM network outperforms well-established models in predicting transformer oil dissolved gases and can successfully deal with the challenge of prediction of nonlinear sequences. Figure 2 depicts the framework for the LSTM-based digital twin training approach, and the actions that follow are carried out.

DGA data acquisition
The three dissolved gasses (C2H2, C2H4, CH4) from transformer oil (Grade-1) obtained by using Hydran sensor setup and different gas intensity percentage combinations from Duval triangle has been taken and formed four ensemble files, namely Faultfreetraining.csv, Faultfreetesting.csv, Faultytraining.csv and Faultytesting.csv respectively. In order to affirm the efficacy of the proposed prediction work, the case of this paper analyzes time series data using online monitoring of a 500 kVA transformer as an example shown in Figure 3. The fault-free data is taken for the period of 5 years (2013-17) and the faulty data is taken for 3 years (2018-2021). The specifications of the test transformer are given in the Table 1.

DGA data set
The DGA dataset taken by using hydran DGA sensor on 500 kVA, 11000/430 V test transformer from 2013-2021 is divided into fault-free testing set, fault-free training set, faulty testing set and faulty training set. The practical DGA data in this case consists of 960 samples of fault-free training with a data size of 480000x55, 500 samples of fault-free testing with a data size of 250000x55, faulty training and faulty testing with a data size of 960000x55, and faulty training and faulty testing with a data size of 960000x55. Table 2 summarizes the data set. . Similar procedure for Faulty data set. The columns of apiece data frame includes the following variables: − The fault type is shown in column 1 (fault code), which ranges from 0 to 20. A fault code of 0 indicates that the system is fault-free, whereas fault numbers 1 through 20 indicate distinct fault categories based on percentage gas intensity combinations obtained from the Duval Triangle. − The number of times the simulation ran to acquire complete data is mentioned in column 2 (simulationRun). The number of runs in the training and test data sets ranges from 1 to 500, with each value representing a unique random generator state for the simulation across all fault codes. − The number of times each variable is recorded per simulation is indicated in column 3 (sample). − The measured variables from both the Hydran Sensor and the Duval Triangle are found in columns 4-55.The Duval triangle method is used in this paper as it has a 96% accuracy rate of determining a transformer defect, according to a review based on the IEC data bank of inspected transformer failures and several other reports [19], [20].

Data preprocessing
The following steps are carried out during DGA data preprocessing: − Clean the data: Inconsistent values are removed using a high-pass filter. In both the training and testing data sets, delete data entries with the fault numbers 3, 9, and 15. These fault numbers are unrecognizable, and the simulation findings that go with them are incorrect. The data from the entire DGA set is translated to frequency domain, and the dimensionality is lowered, which is necessary for such a big data collection. − Divide data: By retaining 20% of the training data for validation, the training data is divided into training and validation data. A validation data set allows to assess the model's fit on the training data set while adjusting the hyper parameters of the model. Data splitting is a typical technique for preventing overfitting and under fitting in networks. − Network design and preprocessing: In this process, the sample train and sample test data are preprocessed to find the network parameters viz. Xtrain, Ytrain, Xtest, Ytest, Xval and Yval.

Identifying the condition indicators
Normalize data sets: Normalization is a technique for converting numeric values in a data set to a similar level without distorting range disparities. This strategy assures that a variable with a higher value does not dominate the training variables. It also converts numeric data from a larger range to a smaller range without sacrificing any crucial training information. Using data from all simulations in the training data set, the mean and standard deviation for 52 signals are determined.
Visualize data: There are 400 fault-free simulations in the Xtrain data set, followed by 6800 defective simulations. A plot of the fault-free data is created first to visualize the fault-free and defective data. The total 10 signals are labeled in the Xtrain data set for the sake of this plot in order to construct an easy-to-read image. The signals from 1 to 3 are gas concentrations of C2H2, C₂H4 and CH4 in ppm respectively. The signals from 4 to 6 are equivalent percentages of C2H2, C₂H4 and CH4 respectively obtained from Duval's triangle method. The signals from 7 to 10 are fault code, simulation run, sample number and abnormal condition for the total failure of oil respectively. The visualization of non-faulty data (fault-free) and faulty data for 10 signals of the Xtrain data set is plotted in Figure 4 and Figure 5 respectively. As shown in the picture Figure 5, the LSTM has been completely trained after around 130 samples (time step) and ready for validation.  steps in the final data set which includes training, validation, and testing data. As a result, the signal or sequence must be categorized to the correct fault number, making it a sequence classification challenge.
A LSTM network, a full-connect linear layer and a softmax layer are used in the proposed prediction model. The input layer size is 52 and number classes are 18. The fully connected layer's outputs are equal to the number of DGA classes to be categorized. In deep neural network models that predict a multinomial probability distribution, the softmax function is utilized as the activation function in the output layer. The number of hidden layers of the network considered are 3 with a unit size of 52, 40 and 25 respectively. The training epochs are 30 with the minimum batch size of 50 and the drop out is 0.2. Figure 6 depicts the LSTM's internal loop structure.

RESULTS AND DISCUSSION
The case study in this work examines time series data utilizing online monitoring of a 500 kVA transformer as a case study to demonstrate the effectiveness of the suggested prediction model. Based on training and testing data for fault free state (2013-17) and faulty condition (2018-21), the dissolved gas content for test transformer oil was calculated, which is elaborately mentioned in section 3.2. The suggested high pass filter approach eliminates unexpected and unwanted intensity in the CH4, C2H2, and C2H4 gases. The scaling and windowing strategies applied for the filtered DGA data result in a more apparent output signal by lowering its dimensionality. Dividing the data for training & testing and normalizing the data for identifying health indicators are done to train LSTM model in the given DGA data set. The data pertaining to LSTM network layers and number epochs, including approach to network design and preprocessing are mentioned in section 3.3 to 3.5.

Accuracy of fault diagnosis evaluation
The performance accuracy ratio (PR) is used to measure fault diagnostic accuracy [23]. The criteria for the overall performance of the model is defined as the ratio of the number of equal predictions (n) to the total number of predictions (N) and is given by: Validation accuracy (VA) = * 100% This criterion demonstrates the neural network's high accuracy in correctly identifying the fault type of unseen signals with few errors. As a result, the better the network, the higher the accuracy. Validation accuracy and loss curves obtained on the test data for LSTM algorithm against the training data are shown in the Figure 7. The efficacy of a categorization network is evaluated using a confusion matrix [24]. The confusion matrix aids in identifying a model's correct predictions as well as its errors for various particular classes. The confusion matrix is made up of columns with the LSTM networks predicted values and rows with true values. As seen in Figure 8, the major diagonal of the LSTM confusion matrix has numerical values, while the other elements in the off diagonal have almost zeros. This indicates that the trained network is efficient, classifying over 99 percent of signals correctly. The validation accuracy in terms of precision achieved by the most commonly used convolutional neural network (CNN) model and the machine learning-based support vector machine (SVM) model is 96.04% and 97.11% respectively [25]. For condition monitoring, we employed a unique DGA data set acquired over an 8-year period to properly train the LSTM network so that the validation accuracy attained in this investigation is 99.83%, indicating an effective index for evaluating the fault diagnosis. As a result, the suggested model's performance can be objectively measured for research purposes. A comparison of validation accuracy on different models is shown in the Figure 9.

Digital twin training
The use of a digital twin is an important step towards health management, as it introduces a new paradigm for fault diagnosis [26], [27]. The conceptual model of digital twin is shown in the Figure 10. Data from the functioning asset can be used to tune both data-driven and physics-based models, resulting in a digital twin. The digital twin technology is implemented by using Jetson Nano Hardware, graphical processor unit (GPU), NVIDIA Maxwell architecture with 128 NVIDIA CUDA® cores. It is trained by using LSTM network with the help of CUDA tool kit.
The experimental set up of Jetson Nano Hardware Integrated with MATLAB is shown in the Figure 11. Jetson Nano hardware is used to deploy and compute the DNN for training and testing. High validation accuracy has been achieved by running LSTM network in GPU under parallel computation as a novel approach. The GPU, a digital twin of LSTM network acts as an autonomous device to precisely predict the fault, to monitor the condition and to anticipate the remaining useful life of the transformer.

CONCLUSION
Based on the current rapid development in testifying the efficacy of the proposed prediction model, this paper analyzed a time series data using a continuous monitoring of 500 kVA transformer. Generally the amount of sampled dissolved gas data is quite limited and the accuracy of multistep prediction is not high. This paper is focused on the application of large scale data using a promising model, LSTM based digital