Effects of shorter phase-resolved partial discharge duration on PD classification accuracy

Received Jul 17, 2019 Revised Oct 2, 2019 Accepted Nov 23, 2019 Partial discharge (PD) pattern recognition is useful to diagnose insulation condition. PD measurement data is commonly represented in phase-resolved partial discharge (PRPD) format. PRPD is useful as it provides a visible pattern for different PD source and various features can be extracted for PD pattern recognition. Shorter PRPD duration will enable more training data but the information in each data is less and vice versa. This works aims to investigate the effects of using very short duration PRPD data on the accuracy of PD pattern recognition. The results conclude that machine learning models such as Artificial Neural Network (ANN) and Support Vector Machine (SVM) are robust enough such that reduction of PRPD duration from 15-seconds to 1-second causes less than 5 % drop in the classification accuracy. However, this is only true for noise free condition. When the same PD data is overlapped with random noise, the classification accuracy suffers a significant reduction up to 19%. Therefore, longer PRPD duration is recommended to withstand the effects of noise contamination.


INTRODUCTION
Insulation failure in electrical power system components will cause catastrophic damage. Therefore, it is important to frequently monitor the insulation quality. Since PD measurement is a nondestructive test, it is widely used for insulation condition assessment [1][2][3]. PD is defined as electrical discharge that partially bridges the insulation according to IEC 60270 [4]. Despite only partially bridging the insulation, PD will cause eventual insulation breakdown if left undetected. If PD can be detected at an incipient stage, utility companies can avoid expensive electrical equipment failures [5,6]. Each insulation defect has its own unique discharge attributes, which can be used to train machine learning models to identify the defect type based on the measured PD pattern [7]. Such PD classifiers will greatly facilitate the insulation condition monitoring of electrical power components at low cost and efficient manner.
PRPD is the most widely used representation for PD [8,9]. In order to obtain a PRPD data representation, a PD detector is required to measure a continuous stream of PD pulses. Each individual PD pulse will be quantified into the phase angle (ϕ), charge magnitude (q) and the number of PD occurrence (n). Because of this, PRPD is also known as ϕ-q-n pattern [10]. PRPD can be represented as a 3-dimensional data array, 3D figure, or 2D image with color contour. The PD measurement duration to generate a single PRPD representation is not standardized and different duration has been used by researchers for PD classification related research. For example, 300 seconds [11], 120 seconds [12], 60 seconds [9,13], 50 seconds [14], and 3 seconds only [15]. This work aims to investigate the effects of using very short PRPD duration on PD classification accuracy. A duration of 1second to 15-seconds was chosen to test the robustness of machine learning models to recognize the PD source when provided with just 1 to 15 seconds of PRPD pattern.
Two groups of PD data source were used for this work. The first group consists of 3 lab fabricated insulation materials, which provide a more consistent PRPD pattern while the second group consists of 5 PRPD pattern measured from Cross-linked Polyethylene (XLPE) cable joint defects, which provides a more inconsistent PRPD pattern. Comparing the results of both groups will give a more comprehensive view of the effect of reducing the PRPD duration.
The PRPD duration directly correlates to the number of PD occurring per measurement. When features extracted from PRPD pattern were used for classification, the accuracy depends on a variety of factors. Since this work is focusing on examining the effects of shorter PRPD duration, the other factors are kept constant. In other words, the type of feature extraction performed, the number of training and test data, as well as the classifier hyperparameters remains the same while only varying the PRPD duration to observe its effects on PD pattern recognition accuracy.
The remainder of the paper is organized as follows; Section II describes the overall experiment setup, which covers the PD measurement setup, PD source preparation, and random noise data used. Section III describes the PD classification procedure, which includes feature extraction and PD classifier. The results & discussion are included in Section IV while Section V provides the conclusion for this work.

RESEARCH METHOD 2.1. PD measurement setup
A commercial PD detector, which complies with the IEC 60270 standard, was used in this work. The PD detector is able to display the PRPD pattern in real time and the data can be exported to a PC for further processing. A block diagram of the measurement setup is shown in Figure 1.
The HV source is a step-up transformer capable of supplying up to 200 kV. The measuring capacitor measures the voltage supplied. The coupling capacitor will transfer an apparent charge to the test object to stabilize the voltage whenever it detects a voltage drop due to PD occurrence. This data is passed to the PD detector and the USB controller handles the data transfer between the PD detector and the PC.

PD source preparation
Two groups of PD sources were prepared and compared in this work. PD Group 1 consists of 3 classes of PD which are void, corona and surface discharge measured from lab fabricated low-density polyethylene (LDPE). The details of the sample preparation and measurement condition can be found in [16]. PD Group 2 consists of 5 classes of PD source measured from XLPE cable joints with artificial defects. The artificial defects include incision on insulation layer, metallic particles on insulation layer, rough edges at semiconductor layer, air gap at semiconductor layer and off-axis joint installation. More information about the sample preparation can be found in [17].
PD Group 1 consist of 66 PD data where every 3 classes have 22 data each. PD Group 2 consist of 100 PD data where every 5 classes have 20 data each. Figure 2 shows one example of void PRPD from PD Group while Figure 3 shows one example of incision defect from PD Group 2 at 1-second and 15-seconds duration. The x-axis represents the phase angle of the PD occurrence, the y-axis represents the charge The rationale for using these two groups of PD source is to better observe the impact of using shorter PRPD duration.
PD Group 1 has a more consistent PD pattern and just 3 different classes. Conversely, PD Group 2 has a more inconsistent PD pattern and 5 different classes. With the same PRPD duration, it is expected that PD Group 1 will be easier to classify and hence SVM and ANN can achieve higher classification accuracy. When reducing the PRPD duration for PD Group 1, it can be seen that although the PD intensity is different for 1 second and 15 seconds, the general shape is similar. Since the opposite is true for Group 2, this will make it more challenging to be classified when the PRPD duration is reduced. In order to investigate the robustness of the PD classifier under noise contamination, random noise measured from ground interference was used to overlap the clean PRPD pattern. For example, T duration of PRPD will be overlapped with T duration of random noise to generate a noise contaminated PRPD data. The PD classifier will be trained using clean PRPD data but tested against contaminated PRPD data. An example of random noise PRPD is shown in Figure 4.

PD CLASSIFICATION 3.1. Feature extraction
The raw PD data and PRPD is too large to be used as input feature to train the PD classifier. Hence, feature extraction is required to obtain a useful representation of the PRPD pattern. The extracted features are also known as "PD fingerprint". The PRPD can be sorted into two primary distributions Hn(φ) and Hqn(φ). Hn(φ) is a 2-D plot of PD intensity vs phase occurrence while Hqn(φ) is a 2-D plot of PD charge magnitude vs phase occurrence. These two distributions can then be divided into another two separate distributions based on the positive and negative half of the phase cycle. Four statistical features such as Mean, Variance, Kurtosis, and Skewness can be calculated from all four distributions to form a total of 16 features for each PRPD data. The Kurtosis and Skewness can be calculated by using the following formulas: where N is the total data size, f(xi) is the function of interest, and xi is the individual discrete value of the distribution. A complete mathematical description of Kurtosis and Skewness can be found in [18][19][20].

PD classifier
Two commonly used machine learning classifier was used for this work as the PD classifier, ANN [21][22][23] and SVM [24][25][26][27]. Usually, the total input data will be divided into training & testing data. The classifier will be trained using the training data and tested using the testing data. For this work, the performance of the PD classifier was evaluated using K-fold cross-validation. The input data were randomly divided into K number of sets, the first set will be used for testing while the other sets will be used for training. This process was repeated K number of times where each set will take a turn to be used once as testing data. The average classification accuracy is then calculated. For PD Group 1, 11-fold cross-validation was used while 10-fold cross-validation was used for PD Group 2. This K number was chosen so that each fold contains the same number of data from each class.
The benefit of using K-fold cross-validation is to avoid overfitting and selection bias. In order to observe the performance of the PD classifier when using noise contaminated data, the classifier was trained using clean input data, and the test data was overlapped with noise data prior to testing. This will properly gauge the capability of the PD classifier to recognize contaminated input data that was not seen before during the training process.

RESULTS AND DISCUSSION
The effects of reducing PRPD duration on both PD Group 1 and PD Group 2 as well as the effects of noise contamination are shown in Figure 5 and Figure 6 where the x-axis represents the PRPD duration while the y-axis represents the average classification accuracy. Under noise free condition, using shorter PRPD duration barely affects the ANN classification accuracy of PD Group 1. The average accuracy is 94 % with only 1.36 % standard deviation. For PD Group 2, the classification accuracy suffers a minor reduction of 5 % when the PRPD duration reduces from 15seconds to 1-second. This can be explained by the relatively consistent PRPD pattern of PD group 1, hence the shorter PRPD duration does not significantly affect the PD classification accuracy compared to PD group 2. The overall small reduction in classification accuracy shows the robustness of ANN and SVM in dealing with shorter PRPD duration as input data.
When noise contamination is taken into consideration, the ANN classification accuracy of PD group 1 deteriorates more severely compared to PD group 2. Due to the low variation of PRPD pattern in PD group 1, the PD classifier for PD Group 1 is not good in generalizing. Hence, any variation in the input data will cause a larger reduction in classification accuracy. For PD group 2, there is an obvious trend where higher PRPD duration results in better classification accuracy.
A similar behavior is observed for the SVM classifier where shorter PRPD duration affects PD Group 2 more severely compared to Group 1. However, the overall accuracy of SVM is lower than ANN. For PD Group 1 and Group 2, SVM has an average of 13 % and 19 % lower accuracy compared to ANN under noise contamination.

CONCLUSION
The effects of using shorter duration PRPD for PD classification has been successfully investigated. Based on the results obtained, it can be concluded that PD classification accuracy of PD source measured from lab fabricated insulation materials will not be significantly affected by using shorter PRPD duration. However, this is only true for lab fabricated materials. For more realistic and practical PD measured from power system components such as XLPE cable joints, using longer PRPD duration can improve classification accuracy of ANN and SVM. Using longer PRPD duration also enables the PD classifier to be less susceptible to classification accuracy reduction in dealing with noise contamination.