Performance comparison of artificial intelligence techniques in short term current forecasting for photovoltaic system

Received Nov 27, 2018 Revised Jan 24, 2019 Accepted Mar 13, 2019 This paper presents artificial intelligence approach of artificial neural network (ANN) and random forest (RF) that used to perform short-term photovoltaic (PV) output current forecasting (STPCF) for the next 24-hours. The input data for ANN and RF is consists of multiple time lags of hourly solar irradiance, temperature, hour, power and current to determine the movement pattern of data that have been denoised by using wavelet decomposition. The LevenbergMarquardt optimization technique is used as a back-propagation algorithm for ANN and the bagging based bootstrapping technique is used in the RF to improve the results of forecasting. The information of PV output current is obtained from Green Energy Research (GERC) University Technology Mara Shah Alam, Malaysia and is used as the case study in estimation of PV output current for the next 24-hours. The results have shown that both proposed techniques are able to perform forecasting of future hourly PV output current with less error.


INTRODUCTION
The past few years have shown a remarkable growth in the use of solar energy for residential, commercial, and industrial sectors. The growing capacity for global solar PV sector already reached 178GW in 2014, and estimated to reach 540GW in 2019 [1,2]. In recent years, solar PV system has been developed drastically and the reason behind it is because of the nature usage of PV that is maintenance free, long lasting used, and environmentally friendly [3][4][5][6][7][8]. However, PV system is operating in a non-stationary random process caused by the variability of solar irradiance and other environmental factors that may affect the output current.
In general, artificial neural network (ANN), support vector machine (SVM), and fuzzy logic have been used as forecasting methods due to several advantages [9,10]. These methods have been used for solar irradiance forecasting due to its increasing demand in producing accurate output. Other than any basic AI technique, there is also a combination of several AI techniques that can produce accurate forecasting result in the future of solar irradiance. A time series with ANN, fuzzy logic with ANN and wavelet based ANN are examples of famous combination for ANN.
In particular, ANN is an alternative model that capable of handling uncertainty matters of solar irradiance [11,12]. The main advantages of using ANN can be seen in its stability to solve complex modelling especially a non-linear model [9]. Random forest (RF) is another advance AI used for forecasting. RF uses ensemble machine learning that consists of many decision tree models for classification and regression [13]. The construction of tree does not depend on the previous tree since the trees are created independently by using bootstrap aggregation technique as well as the bagging [14,15]. RF has the advantage in terms of non- overfitting the output results, the run time process is fast and efficient when handling a large dataset thus gives it superior predictive performance. This paper presents the ANN and RF methods that used to perform short-term PV output current forecasting for the next 24-hours. There is not much research that has been done regarding to the PV output current forecasting using RF. The input data used for this method is current, irradiance, hours and temperature will pass through the filtration process using the wavelet decomposition to eliminate the noises in each data. Then, the multiple time lags is used due to its capabilities to identify the pattern and behavior of filtered data while improving it for accurate estimation in the next 24 hours of PV output current forecasting [16][17][18][19][20]. The case study uses PV output current, temperature, irradiance and hours in 2015 with the total of 7460 hourly data obtained from the Green Energy Research Center (GERC), University Technology Mara Shah Alam, Malaysia. The robustness of both models in forecasting are compared by referring to the mean square error (MSE), mean absolute percentage error (MAPE) and regression between the forecasted and actual (targeted) values.

RESEARCH METHOD
This segment will explain the concept of feature extraction or data preparation for the ANN and RF models used in the PV output current forecasting [21][22][23][24][25]. The structure used for this process is shown in Figure  1 wherein the STPCF process begins from the original data selection. In this case, the hourly PV output current, temperature, irradiance and hours are selected as the input data. After data selection, the data preparation for the input and target data is performed by using the wavelet decomposition and multiple time lags technique. Subsequently, the forecasting models for ANN and RF are designed. Finally, the training and testing procedure is performed to obtain the forecasting outcome from ANN and RF.

Input data of chronological parameter
The information of data used in forecasting is acquired from the Green Energy Research Centre (GERC) of UiTM Shah Alam, Malaysia. The data is obtained in the form of MATLAB software. The data is obtained consisting with five parameters in 2015. In the GERC laboratory, this information is collected by data logger for every 5 minutes. This data will be analyzed and the hour, irradiance, temperature, power and current with maximum parameter value will be used in forecasting as shown in Table 1. The data preparation for ANN and RF models is shown in Figure 2 and its procedure is elaborated as follows. -Collect the raw data consisting with 7460 hourly information of hour, irradiance, temperature, power and current. -Perform the filtering process of input data by using the wavelet decomposition to reduce any noise inside it.
-Normalize all of the filtered data by dividing with its maximum value in order to reduce data redundancy within the range of 0 and 1. -Perform the multiple time lags to improve the data to determine the movement pattern of every input data required by the ANN and RF techniques. -Use the multiple time lags to estimate the future variable and the lagged (past period) variable that will evolved in the future [9,9,9,9,9]. The input data improved by the multiple time lagsable to determine the movement pattern of data in the neural network based on (1). The total number of time interval lagging is K=24 hours.
In (1), the total number of time interval lagging used is K=24 where the value of K is stated to be equivalent to the time interval in the forecasted variable. The value of K is fixed to 24 for forecasting the next 24 hours of PV output current. The input data is in the form of k-by-t matrix where each column will be used to forecast PV output current for the next 24-hours. The first column of training data, Lagk is used to forecast the target data of X48. The input data arrangement for training and target data is shown in Figure 3. Two sets of data which is training data and target data is created after all data have been converted into multiple time lags. The arrangement of training data for each line is a combination of hour, temperature, irradiance and current. The last part of current data will be used as target data. The training and target data formed will be used for forecasting using ANN and RF methods. The chronological arrangement of input data is shown in Table 2.

Artificial neural network (ANN)
The Artificial Neural Network (ANN) is an alternative method that has been efficiently carried out in this paper as it is also suited to tackle solar energy uncertainty issues. The ANN is using the Levenberg-Marquardt technique to forecast the PV output current for the next 24 hours [4].
In this case study, the ANN model for forecasting the PV output current is consisting of one input layer, two hidden layers, and one output layer. The output layer of ANN is consisting of one neuron which will provides the predicted PV output current for the next 24 hours. The Levenberg-Marquardt technique is used in the ANN model as back propagation algorithm for optimization of the data during the training process. This technique is commonly used in forecasting the training set of ANN due to its algorithm that compromise between the accuracy and stability of prediction to achieve the steepest method for measuring minimal errors. The ANN model for forecasting PV output current is shown in Figure 4 and the ANN procedure is explained below.
-Divide input data into three sets of training, testing and validating for the multiple time lags of K=24 hours. -In the training process, the synapses minimize the error between the actual output and the targeted output by regulating the learning rate and momentum. -Select the number of hidden layers is based on the fact of one hidden layer is sufficient to estimate any function. Therefore, two hidden layers is used in this ANN models that will provide more precise results with minimum RMS error in forecasting the next 24 hours of PV output current. -Repeat the error minimization process until the optimization process in forecasting is converged yielding to the smallest error in its output. Then, the training procedure is terminated once the minimum error becomes plateau for several iterations of optimization process involved in the ANN. -Identify the strength of ANN in producing the correct STPCF results that can be proven by conducting the testing then validation processes by using different set of input data.

Random forest (RF)
Random Forest is a model comprising with two significant components of tree bagging and random decision trees [6]. The TreeBagger defined as B is containing with the number of trees (NTrees) with the X and Y as the ensemble function that been used for creating a decision tree. The decision tree uses the input function X to predict the target response Y. The procedure of Random Forest is explained below. -Perform bootstrap samples, N randomly drawn from the training data of RF model, to create a regression trees for each sample. The bootstrap sample is having the same size as the original training data. -Perform the bagging technique that divides the bootstrap sample into two sets of data which is two-third is for the In-Bag while the remaining data is for the Out-Of-Bag (OOB). -Use the InBag to create a forest wherein the tree growth technique will produce the best leaves. . The OOB data is used to run the unbiased prediction error as trees are added into the forest during tree growth phase using the InBag data. The primary role of OOB data in tree growth technique is to compare its estimation with the predicted values obtained from the InBag to find the best leaves with minimal error rate from every tree. -Halt the growth of the tree once the final node of best leaf in every tree is obtained. Upon finishing the final nodes, the prediction value from the final node of best leaf is collected from every tree and the average prediction is calculated from the final node leaf of all trees. Figure 5 shows the structure of RF.

RESULTS AND ANALYSIS
This section discussed on the STPCF results determined by using the ANN and RF models. The data of hourly solar irradiance, temperature, hour, and current obtained from Green Energy Research (GERC) University Technology Mara Shah Alam, Malaysia is used for the case study of STPCF. The input data undergoes the wavelet decomposition to eliminate the noise inside the data and then the multiple time lags of K=24-hours is applied to the filtered data. The data size used in ANN and RF procedure is 17520 columns. The data is divided into three sets wherein the data size for training is 5785 columns, testing data is 720 columns and validation data is 720 columns.

Artificial neural network (ANN)
The input data of ANN is the combination of multiple time lags of hour, temperature, irradiance and current. Training and testing procedures of ANN are performed where the input data having the multiple time lags of K = 24 hours. In the ANN model, the number of neurons for the first hidden layer is 20 and second hidden layer is 10. While, learning rate and momentum is 0.3 respectively. The numbers for first hidden layer, second hidden layer, learning rate and momentum are selected by performing sensitivity analysis where the selected values of learning rate and is referring to the minimum RMSE value of output.

2153
Figures 6(a) and 6(b) represent the result of forecasted PV output current versus actual targeted values, and the regression of output results obtained during the testing procedure of ANN, respectively. In Figure 6(a), the actual targeted output is in blue colour and the forecasted PV output current is in red colour. The forecasted pattern of hourly PV output current is almost the same with the pattern of actual targeted values at certain hours. However, there is inconsistency with several large error in the variation between the forecasted and targeted PV output current for the next 24 hours.
(a) (b) Figure 6. STPCF for the next 24-hours using ANN for the (a) forecasted PV output current versus actual targeted values, (b) regression of forecasted versus actual PV output current

Random forest (RF)
The multiple time lags of hour, temperature, irradiance and current are used as the input data of RF. The training, testing and validating processes of RF are conducted using the input data with multiple time lags of K=24 hours. The RF is conduct at three different cases of 1, 5 and 10 number of trees (Ntrees) and every tree consisting of 5 leaves. The selection for the number of trees is based on the fact that single tree is sufficient to estimate any function. Therefore, two trees will provide more precision in determining the minimum error. The mean square error (MSE) is obtained from the training procedure. However, in the testing procedure, the MSE is automatically compared its output with the targeted data at each leaf in every tree. These comparisons are performed until the finest trees expansion is achieved giving the minimum average of RMS error for the final node leaf of all trees. The number of trees chosen for the sensitivity analysis to determine the best prediction of PV output current with minimum RMSE values RF is shown in Table 4.  Figure 7. STPCF for the next 24-hours using RF for the (a) forecasted PV output current versus actual targeted values, (b) regression of forecasted versus actual PV output current Tables 3 and 4 have shown the results of ANN and RF, respectively. The output result is obtained by considering to multiple time lags K = 24 hours during testing procedure for both techniques. The testing procedure for both techniques with multiple time lags is further investigated by comparing the MAPE results of PV output current.

MAPE comparison between the performance of ANN and RF
By referring to the MAPE results of ANN and RF in Table 5, it can be observed that the ANN model produces a higher MAPE value of 5.0836%, in contrast with the MAPE of 0.0579% determined by the RF in day two. It is perspicuous in Table 5 that the RF provides most accurate prediction with the minimum average MAPE results in forecasting the PV output current as compared to the ANN. It is obvious that bagging technique improves the training and testing processes of RF in obtaining the best results with minimum error in forecasting. This implies that the ANN is far more complicated than the RF in terms of interpreting and understanding the weight, easy to over-fit the model and unpredicted in its performance.

CONCLUSION
The application of artificial neural network (ANN) and random forest (RF) with wavelet denoising and multiple time lags K=24 in performing short term photovoltaic current forecasting (STPCF) has been discussed elaborately in this paper. The results shown proved that the models proposed for the case study have the benefit of providing accurate result of STPCF. However, the RF method shown the important of choosing the accurate number of tree and leaf to be used as it will affect the performance of RF. The result shown that the RF method able to forecast the PV output current for the next 24 hours and provide more accurate results of STPCF with minimum error compared to ANN.