Artificial neural network based short term electrical load forecasting

Received Mar 29, 2021 Revised Jan 30, 2022 Accepted Feb 6, 2022 In power generation, a 24-hour load profile can vary significantly throughout the day. Therefore, power generation must be adjusted to reduce money loss due to excess generation. This paper presents a short-term load forecasting (STLF) system design using artificial neural network (ANN). As ANN come in many different configurations, this paper analyzes the best ANN configuration via trial-and-error method. To train the ANN, historical load data from 2016 to 2018 of power south energy cooperative (AEC) is used. A simple feedforward ANN type with one hidden layer is implemented, where 48 neurons are used at the input layer. For hidden layer, an arbitrary 50 neurons are chosen and 24 neurons at output layer are used to generate a day ahead 24-hour load profile. To measure the best activation function for SLTF application, four non-linear activation functions will be tested and the best activation function is used to create two and three hidden layer ANN architecture. Finally, the performance of the two new networks will be compared against one hidden layer model. From the obtained result, the best performing model is found as two hidden layers ANN with Tanh as its hidden layer activation function with 8.9% of testing mean absolute percentage error (MAPE).


INTRODUCTION
Electrical energy is a very crucial resource of modern human society as it powered various important industries that satisfy human needs. However, electricity is difficult to be produced and distributed especially in large scale area. In addition to that, electrical energy storage system does not have widespread implementation due to economic reasoning as most of the generated electricity must be consumed immediately. Therefore, to ensure that the power system distribution operation to be running smoothly, an efficient load forecasting system is required [1]. In general, electrical load forecasting is divided into three types which are short-term load forecasting (STLF), medium-term load forecasting (MTLF) and long-term load forecasting [2]. Each type is divided based on different forecasting ranges. In designing load forecasting system, an inaccurate forecast will cause a mismatch between demand and generation of electrical power and eventually it will result in significant amount of money loss. Thus, the selection of load forecasting method needs to be chosen based on its application to solve specific load forecasting type.
Load forecasting methods can be categorized in three major groups which are traditional forecasting technique, Modified forecasting technique and soft computing technique. All types of load forecasting methods have their own advantage and disadvantage and their usage is dependent on the load pattern, type of  [3], [4], multiple-regression [5] and exponential smoothing [6], [7] is usually used. From all available traditional forecasting methods, the multiple-regression technique is the most popular and has been widely used to forecast the load that are affected by numerous factors from meteorological effects, electricity prices, economic growth and others. The modified traditional forecasting method is designed by modifying the traditional methods to enable the automatic parameter correction of forecasting model under changing environmental conditions. Some of the techniques used in modified traditional forecasting are adaptive load forecasting [8], [9], stochastic time series [10] and support vector machine [11], [12]. By comparing all methods in this category, the adaptive load forecasting method has the most advantages as the demand forecasting model parameters are automatically corrected to keep track of the changing load conditions, thus enable the prediction system to be used on-line.
Recently, soft computing techniques have been emerging as a flexible approach to forecast the electrical load in power system. This technique mimics the human reasoning system to employ the ability to produce mode of reasoning that is approximate rather than accurate. This method using algorithms such as fuzzy logic [13]− [15], artificial neural network (ANN) [16]− [19] and evolutionary algorithms such as genetic algorithm [20]− [22] and particle swarm optimizations [23]− [25]. In soft computing method, each factor affecting the forecast is considered as a cost and the method will exploit all possibilities to find the potential solution based on the computed costs.
Each algorithm in soft computing techniques has its own advantages and disadvantages. In fuzzy logic based method, the knowledge must be adapted accurately using fuzzy rules as the quality of the forecasting system will be mainly affected by the fuzzy rules. In ANN based methods, the selection of the parameters needed in training the models must be carefully chosen as each parameter will affect the performance of the forecasting system. Finally, solution generated by the evolutionary algorithm usually fall into local minimum thus creating low quality electrical load prediction for power systems. Therefore, the implementation of soft computing technique needs a careful design process to create an efficient load forecasting system.
As ANN comes with different configuration, this paper proposed analysis on the effect of ANN parameters towards short-term load forecasting system. In this paper, multiple layers feed forward network will be used which will predict the electrical usage a day ahead in 24-hours using historical data, day of week, week of month and month of year as the inputs. The analysis of different number of hidden layers and activation function types are conducted in this paper to find the most optimized parameters in short team load forecasting. Figure 1 shows the overall process for short term load forecasting. In this paper, a densely connected feedforward ANN with backpropagation learning algorithm will be used to implement the short-term load forecast. Based on Figure 1, the process of SLTF starts with the data initialization where the historical data were loaded. Then, the data was preprocessed where encoding process is executed in this stage. After the date has been loaded, the model will be trained where different activation function will be used to find the best activation function for the model. Finally, the hidden layer experiment is conducted where different number of hidden layers will be tested to analyze the effect of number of hidden layers towards the prediction quality. In this paper, 48 input neurons will be used and 50 hidden neurons were arbitrary chosen to forecast the electrical load. In the output layer, 24 neurons were selected and mean absolute error is used as the loss function which will be optimized using gradients descent algorithm using Keras SGD class. As a result, a total of 3674 trainable parameters for single layer model, 6224 trainable parameters for two layers model and 8774 trainable parameters for three layers model is required for this architecture.

Inputs selection
For input selection, the historical load data, day of week, week of month and month of year will be used for electrical load prediction [26]. Week in month is referred to the week sequence in a month which is whether it is the first week, second week and so on. Month in year is referred to the month sequence in a year whether it is first month, second month and so on. The historical data for previous day ( − 1) will be used to forecast at forecast day ( + 1) which is the next day of present time ( ) shown in Figure 2. A total of 24 input neurons will be allocated to load data of hour 1 to hour 24 of previous day. To implement the input data into the ANN, bit encoding method is used in this paper. 7 input neurons were used for the day of week data. For instance, if it is Sunday, the first neurons of day of week neurons group will give 1 as input and 5 neurons will be used for the week of month. The month of year will used 12 neurons input following the similar method. It should be noted that the day of week, week of month and month of year inputs should be 24 hours load input day date. For instance, if the forecast day is 5 September 2018 and the input day is 3 September 2018, the day of week is Monday and will be encoded as 0100000, it is second week of the month so it will be encoded as 01000 and ninth month of the year and encode as 000000001000. Then the 24-hours load input data will be taken from 3 September 2018 to predict the load on 5 September 2018.

Activation function and data preprocessing
To determine the output of the ANN, a non-linear activation function will be used in the hidden layer while a linear activation will be used in the output layer. This includes exponential, Tanh, sigmoid and softsign activation function. For output layer, a rectified linear unit (ReLU) will be used for all ANN models. For data preprocessing, all input data will be normalized into range of 0 to 1 before feeding the data into ANN except for the bit encoded data since its value is either 0 or 1. For normalization, min-max feature scaling is used. The description of min-max feature scaling is shown in (1).
As shown in (1), is the exact value of our data, ' is the normalized value, represents the maximum value of dataset and represents the minimum value of the dataset. The maximum load value used is 1048 MWh and the minimum load value used is 291 MWh. Both of these values are the highest and lowest load from year 2016 until 2017. In addition, any load value from year 2018 that exceed this minimum and maximum range will be removed. Finally, the chosen parameter is measure based on the value of mean absolute percentage error (MAPE). Daily best and worst absolute percentage error were recorded for performance analysis. The testing MAPE will be plotted against the training MAPE for overfitting analysis in this paper. The description of overfitting analysis is shown in Figure 3.
For overfitting analysis, a secondary training loop has been set up based to Figure 3. At every 50 training epochs, the training MAPE will be inspected either it has been falling under certain range given in 589 the conditions shown in Figure 3. If it falls into the range, the current model will be stored. The original attempt was conducted to train the models to a very low training MAPE, as low as 3%. However, overfitting occurred much sooner and more often than expected. Therefore, this secondary training loop is the attempt to capture a model just before it over-trained occurs to the model.

RESULTS AND DISCUSSION
This section will discuss the results and analysis that have been conducted to forecast the electrical load from 2016 to 2018. First, the best testing MAPE will be chosen based on a single hidden layer model. Then, the chosen best testing MAPE will be used for further analysis by adding several hidden layers to investigate the best ANN configuration for electrical load forecasting.

Best testing MAPE
In this paper, the Test MAPE is calculated using the mean of every hour to find the absolute percentage error. To calculate the MAPE, 334-day load data of every hour in each day is used. Then, the 334 data will be multiplied by 24 hour and finally result in 8,016 pieces of hourly data. The testing MAPE is the summation of each individual day hourly absolute percentage error divided by the total number of data which is 8,016. Table 1 shows the results of the Test MAPE and its respective Train MAPE.
From the table, ANN model with Tanh hidden layer activation function has produce the best test MAPE with 8.9% error while with sigmoid has the worst MAPE with 15.76% error. Therefore, Tanh function has been selected to investigate the effect of the number of hidden layers to the prediction performance for load forecasting which will be presented in the later part of this section.

Overfitting analysis
In ANN, overfitting occurs when a model tries to forecast data trend in the data that is too noisy to be predicted. This situation is happened due to the overly complex model that has many parameters. If the model is overfitted, the forecast output would become inaccurate due to the trend does not reflect the current data pattern. Figure 4 shows the comparison between the test MAPE and the train MAPE where Figure 4 (a) is for Tanh, Figure 4 (b) is for sigmoid, Figure 4 (c) is for softsign and Figure 4 (d) is for exponential.
Tanh activation function have the most consistent curve which indicate its test and train MAPE consistency. For Sigmoid activation function, the test and train MAPE relationship is very inconsistent as the graph have shown a zig-zag pattern. Softsign is slightly worse than Tanh when it comes to test and train MAPE curve but it placed at the second place in this situation. The relationship of exponential function starts off linear and as the train MAPE go further down after the point of overfitting, the test MAPE seems to be regaining its linear relationship with train MAPE.

Best and worst forecast sample
In choosing the best ANN parameter for electrical load forecast, two different prediction perspective has been chosen; best and worst to measure the effectiveness of the activation function throughout the forecast. The average day sample is taken from each individual day average absolute percentage. For example, to obtain these sample, calculation is performed on across all available 334 days hourly load to obtain its individual day average absolute percentage error, so there will be 334 pieces of daily average absolute percentage error. From there, the day which has the highest and lowest value of daily average absolute percentage error will be taken out and its hourly load forecast will be plot against the actual load. Table 2 shows the average absolute percentage error for the best and the worst day in electrical load prediction. By referring to the table, the lowest average absolute percentage Error for the best individual day is found to be from Tanh and then followed by softsign, exponential and sigmoid. However, the lowest average absolute percentage error for the worst individual day is found to be from softsign and then followed by Tanh, exponential and sigmoid. Although Tanh falls into second place, the percentage difference is low as it is only about 0.06%. Based on this analysis, Tanh activation function has been selected as the ANN model in investigating the effect of hidden layer number to the forecasting performance which will be explained in the next section.

Effect of ANN hidden layers to the forecasting performance
In this analysis, different number of hidden layers has been selected to choose the best ANN configuration. Figure 5 shows the comparison between the test MAPE and the train MAPE where Figure 5 (a) and Figure 5 (b) is for two and three hidden layers respectively. The two hidden layers shows a similar performance with single hidden layer shown in Figure 4 however, the three hidden layer model start overfitting around the value of 7.12% in train MAPE. Therefore, the three hidden layer model has shown a worst forecast stability performance compared to the two hidden layer model. Table 3 Figure 6 shows the best day forecast sample where Figure 6 (a) is for two hidden layer and Figure 6 (b) is for three hidden layers. Figure 7 shows the worst day forecast sample in a day where Figure 7 (a) is for two hidden layer and Figure 7 (b) is for three hidden layers. Table 4 shows the average absolute percentage error for the best and worst forecast day for two and three hidden layers. From the table, it can be shown that the two hidden layers model shows better performance compared to the three hidden layers. Therefore, it is highly recommended to use two hidden layers Tanh activation function in predicting the short-term electrical load.

CONCLUSION
The objective of this paper is to investigate the best ANN configuration for short term load forecasting. To find the most optimized ANN configuration, the performance of different ANN activation function is conducted where single hidden layer of Tanh, sigmoid, softsign and exponential function is chosen and the performance was measured and compared. Based on the comparison, Tanh activation function shows the best performance with 1.38% best individual day average absolute percentage error and 48% worst individual day average absolute percentage error. Due to its best performance in load forecasting using single hidden layer, Tanh activation function has been chosen to investigate the performance of different number of hidden layers in load forecasting. Based on the result, the ANN with two hidden layers has shown a better performance compared with ANN with three hidden layers. Therefore, the higher number of hidden layers does not indicate better forecasting performance. For future works, it is suggested that the number of inputs can be increased to include weather data to improve the load forecasting performance for different climate conditions.