Detection of power transmission lines faults based on voltages and currents values using K-nearest neighbors

ABSTRACT


INTRODUCTION
Presently, the cost of developing electric power transmission lines is higher than ever before, because of the higher demand for electricity. This high cost covers for power generation, transmission, and distribution as contained in Figure 1 [1]. The performance of these transmission lines is affected by heavy and continuous use, as well as other external factors [2]. Undetected faults can be a major obstacle in the functioning of any power system, as they can stop the operation of the entire electrical system [3]- [5]. There are different kinds of faults that can be found in transmission lines, and these different faults can be categorized as either asymmetric or symmetrical. An example of such faults that can arise in transmission lines include phase fault such as phase-to-ground fault, phase-to-phase fault, phase-to-earth fault, and three-phase fault. Nevertheless, there presence does not affect the functionality of the power system. More so, other faults like overlapping faults, circuit fault, and other faults are faults that are also regarded as unimportant faults compared to the aforementioned faults. Traditionally, these lines are maintained through the use of megger device which facilitates the detection of faults. Alternatively, the faults are also detected through physical inspection of lines [6]- [12]. These two methods are time-consuming, and as such, countries around the world are exploring new ways through which the faults can be detected and addressed within a short period of time. In this work the ability of the k-nearest neighbor (KNN) algorithm to detect the faults is explored, and the specific kind of fault is identified through the use of MATLAB software, whether phase-ground fault or phase-to-phase fault. The elements that should be considered when a fault is detected in any type of power system include voltage, current, resistance, power factor, and frequency. Several techniques of fault detection show the presence of a fault by comparing the post-fault values with the pre-fault values of the systems.

A LITERATURE SURVEYS
A wide range of approaches and techniques have been used in the detection of faults in electric power transmission lines. In the work done by Muir and Lopatto a new method was introduced; the method helps in detecting fault in digital relays-based power system through the use of Petri nets [1]. The authors made use of Petri nets for modelling and detection of location, and with the proposed technique, the power system is monitored in a hierarchical manner. Their experimental results revealed that the use of Petri nets reduced the time required to process information, and the precision of fault detection increased. As early as in 1994, the use of microprocessors was employed by Barros and Drake to detect faults in real time [13] based on the estimation of the three phase voltage phasors by mean of a set of Kalman filters, and on the calculation of the fault probability. Subsequently, in 2004, wavelet transform was proposed by [3] for the detection of fault in a transformer by measuring neural currents. The analysis of the wavelet transform was carried out based on the Morlet wavelet (mother wavelet). It was concluded that significant improvement was achieved in terms of the fault detection sensitivity by the use of wavelet analysis approaches for the assessment of impulse tests on transformer. Similar efforts geared towards fault detection were made by Bracho and Martinez [14], who used dynamic power supply current test in 1997. Subsequently, in 1998, Chowdhury and Aravena [15] introduced a new technique through which faults can be detected by the use of a modular methodology. The method which is relatively flexible also allows classification in power system. Upon detection of the fault, the fault indicator is processed by a Kohonen network for the classification of faults. Abed and AlRikabi [8] who presented a conference paper in 2021 focused on the detection of faults in underground cables as transmission lines, used IoT applications to monitor and detect underground cable faults. In the work done by Majd et al. [16], the protection and control of power systems were investigated. In their work, a technique for the detection of transmission line faults was presented. In their proposed approach, the use of KNN based fault detection and classification approaches was employed. Similar efforts made by Samet et al. [17] led to the production of a technique for the detection and classification fault for transmission lines through; the authors used an improved alienation coefficients method. In the research carried out by Gafoor and Rao, a wavelet-based fault detection technique was proposed. The proposed technique is able to detect, classify faults as well as the location of the fault in the transmission lines [18], [19].

THE PROPOSED METHOD
The dataset used in this work was used in modeling a power system in MATLAB was simulated for fault analysis. As seen in Figure 2 labeled. The dataset can be accessed through Kaggle [20]- [22]. The dataset is made up of input features including voltages (Va, Vb, Vc) and currents (Ia, Ib, Ic) of the three phases, the description of statistics of the input features as seen in Table 1, as well as their histogram distribution as presented in Figure 3 (see Appendix).  Also contained in the database are the values for the outputs (G, A, B, C), which possess just two values; the value of 0 denotes no fault, while the value of 1 denotes the presence of faults. In this work, additional output parameter (S) has been added to the entire system. Figure 4 (see Appendix) shows the distribution of the output features. A summary of the dataset is presented using the correlation matrix in Table 2, which presents the correlation between all features, whereby, the value-100% represents a perfectly negative linear relationship between all feature, while the value 0% means there is no linear relationship between two features, and the value 100% denotes a perfectly positive linear relationship between two features.

. K-nearest neighbor
A KNN algorithm can be described as simple and efficient supervised machine learning method employed in regression and classification operations [23]- [25]. Given that, the algorithm carries out classification directly and based on the training examples, it is categorized as case-based classification or example-based classification it classified as example-based classification, or case-based classification [26]. This algorithm performs the classification operation based on similarity criteria, giving consideration to the distance measure. Here, "K" denotes the integer value that ranges from 3 to +10. Compared to even values, the odd values are mostly preferred when seeking to get a good prediction. A given class is selected based on majority votes given by neighboring points that correspond to the nearest class. The neighbors are assigned weights so that the nearer neighbor adds more weight to the average that that of the farther one. Weights are assigned to the assigned to the neighbors based on their Euclidean distance [27]. A flowchart for KNN algorithm modeling is shown in Figure 5. The detection of faults in transmission lines is carried in five stages. In the first stage, the faults in phase will be detected, followed by the second stage which involves the detection of faults in phase B. in the third stage, the fault in phase C is detected, and followed by the detection of fault in the ground, and lastly, the overall faults in the entire system are detected. The operations are performed according to the values of currents and voltages. The application of these values is done in the following manner: phase A only features, phase B only features, phase C only features, phase A and phase B features, phase A and phase C features, phase B and phase C features, voltages only features, currents only features, and all features. The application of the KNN technique involved the use of Euclidean distance for weights, while K = 3 for number of neighbors. The dataset was divided into two for training and testing, with 70% of the dataset designated for training the algorithm, and 30% for testing it. Table 3 shows the kind of features that were used in this paper.

RESULTS AND DISCUSSION
Majority of the parameters used in measuring the performance of the algorithm are based on the confusion matrix, are classified as 'True' prediction/reality matches (TP and TN) and 'False' non-matches or errors (FP and FN) [29]. − True positive, means that the actual and predicted outcomes both fall under "faults" class. − False positive, means that the predicted is in faults class whereas the actual is classified as "no fault" class.   The proposed models were evaluated based on parameters in the confusion matrix including accuracy, sensitivity, specificity, and precision. Accuracy refers to the ratio of total number of correct faults and no faults predictions to sample size [29]. Sensitivity (recall) is the measure of faults points that correctly detected [30]. Specificity is the measure of no-fault points that are detected correctly [30]. Precision or confidence is the measure of predicted faults that are actual faults [31]. These metrics were calculated for the results of the methods that were used in this work. The results are presented in Tables 6 and 7.
For the detection of faults phase, A, only currents were used as input features, and optimal results were obtained, which will be the same even if all features are used as inputs. Very good results were obtained when the features of phase A were used (phase A only, phase A and phase B, phase A and phase C). Also, the result obtained from the use of only voltages is better than the results of the features used in phase B and phase C. For the detection of faults phase B, only currents were used as input features, and optimal results were obtained, which will be the same even if all features are used as inputs. Very good results were obtained when the features of phase B were used (phase B only, phase A and phase C, phase B and phase C). Also, the result obtained from the use of only voltages is better than the results of the features used in phase B and phase C. For the detection of faults phase C, only currents were used as input features, and optimal results were obtained, which will be the same even if all features are used as inputs. Very good results were obtained when the features of phase C were used (phase C only, phase A and phase C, Phase B and phase C). Also, the result obtained from the use of only voltages is better than the results of the features used in phase A and phase B. For ground fault detection, the best results were obtained by using only currents as inputs, which will be the same if all features are used as inputs. Higher results were obtained for voltages only features as compared to the results of features used in phase A and phase B and phase C. The use of the features in the two phases at the same time yielded optimal results in comparison to when a single phase is used. For the detection of faults in the entire system, optimal results were obtained using only current as input features, which will be the same if all the features were used as inputs. Optimal results were obtained by using the features of two phases at the same time as inputs. The results were better than using only single phase and voltages only. Generally, it was found that better results were achieved in the detection of ground faults and those in the entire system. More so, the least performance was recorded in the detection of ground faults. High values of sensitivity and specificity were achieved in the case ground faults detection. This reveals that the algorithm is able to accurately differentiate faults points from no fault points, indicating that the algorithm can be used reliably for faults detection based on the values of voltages and currents of the transmission lines.

CONCLUSION
In this study, the process of faults detection in transmission lines was performed in five phases. The algorithm proposed in this work successfully detected faults in phase A, phase B, phase C, ground, and whole system. The detection of faults in the transmission lines was done through the use of K-nearest neighbor model on a simulated power system that is made up of 11 KV generators. Also, the detection involved the use of values of voltages and currents of the transmission lines and in different combinations. The algorithm's performance was evaluated using different parameters from the confusion matrix, including accuracy, sensitivity, precision, and specificity. Analysis and discussion of the findings have been presented, showing the best feature combinations for the detection of faults in electrical transmission lines, as well as the worst combination.