Adaptive dynamic programing based optimal control for a robot manipulator

Received Sep 2, 2019 Revised Nov 9, 2019 Accepted Feb 4, 2020 In this paper, the optimal control problem of a nonlinear robot manipulator in absence of holonomic constraint force based on the point of view of adaptive dynamic programming (ADP) is presented. To begin with, the manipulator was intervened by exact linearization. Then the framework of ADP and Robust Integral of the Sign of the Error (RISE) was developed. The ADP algorithm employs Neural Network technique to tune simultaneously the actor-critic network to approximate the control policy and the cost function, respectively. The convergence of weight as well as position tracking control problem was considered by theoretical analysis. Finally, the numerical example is considered to illustrate the effectiveness of proposed control design.


INTRODUCTION
In recent years, the control methodology for robotic systems has been widely developed not only in practical applications [1,2], but also in theoretical analysis [3][4][5][6]. The main challenges of the control design have been considered, such as robust adaptive control problem, motion/force control, input saturation and full state constraints [7,8] and the path planning problem [9]. Several control techniques have been employed for manipulators to tackle the issue of input saturation by adding more terms into the designed control input considering the absence of input Constraint [4,5,[10][11][12][13]. In [4], authors proposed a new reference of control system due to the input saturation. The additional term world be computed based on the derivative of previous Lyapunov candidate function along the state trajectory under the control input saturation [4].
Furthermore, authors in [5] give a new approach to address the input constraints as well as combining with handling the disturbances. The proposed sliding surface was employed the Sat function of joint variables. In order to realize the disadvantage of state constraints in manipulator, the authors in [7,8] proposed the framework of Barrier Lyapunov function and Moore-Penrose inverse, Fuzzy-Neural Network technique. The equivalent sliding mode control algorithm was designed then the boundedness of control input was estimated. The advantage of this approach is that input boundedness absolutely adjusted by selecting several parameters.
The work in [10][11][12][13] presents a technique to implement the input constraint using a modified Lyapunov Candidate function. Because of the actuator saturation, the Lyapunov function would be added more the quadratic term from the difference between the control input from controller and the real signal applied to object. The control design was obtained after considering the Lyapunov function derivative along  [7,8,[10][11][12][13]. Optimization Technique using GA (genetic algorithm), PSO (particle swarm optimization) were adressed to solve the papth planning problem [9]. The MPC (model predictive control) solution, which is the special case of optimal control design, has been investigated for linear motor not only online min-max technique in [14,15] but also offline algorithm in [16]. In order to consider for robot manipulators. Optimal control algorithm obtains the control design that can tackle the input, state constraint based on considering the optimization problem in presence of constraint. An asymptotic optimal control design was presented in [3] by solving directly the Riccati equation in linear systems. However, it is difficult to find the explicit solution of Riccati equation as well as partial differential HJB (Hamilton-Jacobi-Bellman) equation in general case. The approximate/adaptive dynamic programming (ADP) has been paid much attention for optimal control problem in recent years because it is necessary to solve not only Riccati equation for linear systems but also HJB equation for nonlinear systems. Thanks to Kronecker product technique, authors in [17] proposed the online solution for linear systems without the knowledge of system matrix based on the leastsquares solution from acquisition of a sufficient number of data points. In [18], Zong-Ping Jiang et al. extend the above online solution to obtain the completely unknown dynamics by means that does not depend on either matrix A or matrix B of linear systems. The fact that Riccati equation was considered in more detail in the computation problem as well as data acquisition. Moreover, the exploration noise on the time interval was mentioned in proposed algorithm [18]. Instead of the approach of employing Kronecker product for the case of linear systems, the neural network approximation was mentioned for cost function to implement online adaptive algorithm on the Actor/Critic structure for continuous time nonlinear systems [19].
However, the proposed algorithm required the knowledge of input-to-state dynamics to update the control policy as well as persistent condition was not considered [19]. The weight parameters in neural network were tuned to minimize the objective in the least-squares sense [19]. The theoretical analysis about convergence of cost function and control input in adaptive/approximate dynamic programming (ADP) was the extension of the work in [20]. Thanks to the theoretical analysis about the neural network approximation, authors in [21] presented the novel online ADP algorithm which enables to tune simultaneously both actor and critic neural networks. The weights training problem of critic neural network (NN) was implemented by modified Levenberg-Marquardt algorithm to minimize the square residual error. Moreover, the tuning of weights in actor and critic NN depend on each other to obtain the weights convergence. It is worth noting that the persistence of excitation (PE) condition need to be satisfied and Lyapunov stability theory was employed to analysis the convergence problem [21]. Extension of the work in [21], based on the analysis of approximate Bellman error, the proposed algorithm in [22] enables to online simultaneously implement without the knowledge of drift term. In [23], the identifier along with adaptation law can be described using a Neural Network to approximate the dynamic uncertainties of nonlinear model. An extension using special cost function has been proposed in [24,25] to enable handling of input constraint. The framework of ADP technique and classical sliding mode control was presented to design the optimal control for an inverted pendulum [26]. However, the effectiveness of ADP has been still not considered for a robot manipulator in aforementioned researches. This work proposed the control algorithm combining exact linearization, Robust Integral of the Sign of the Error (RISE [3]) and ADP technique for manipulators in absence of holonomic constraint. This ADP technique was implemented using simultaneous tuning method to satisfy the weight convergence and stability.

DYNAMIC MODEL OF A ROBOT MANIPULATOR AND CONTROL OBJECTIVE
Consider the following robot manipulator without constraint: Several appropriate assumptions [3] will be considered to develop the control design in next chapters.
It should be noticed that this manipulator is considered in the absence of holonomic constraint force. The control objective is to find the control algorithm being the framework of exact linearization, RISE and ADP technique enabling the position tracking control in manipulators control system as shown in Figure 1. ADP algorithm will be employed to implement optimal control design as desribed in next chapter.

ADAPTIVE DYNAMIC PROGRAMMING APPROACH FOR A ROBOT MANIPULATOR 3.1. ADP algorithm
In [3], by using the control input (4) for manipulator (1) with nonlinear function (5)  Now, the control object is to design a control law u to guarantee not only stabilization (9) but also minimizing the quadratic cost function with infinite horizon as follows:

( )
Qx and R is positive definite function of x , symmetric definite positive matrix, respectively. This work presents a solution for approximate approach called adaptive dynamic programming (ADP) for optimal control design. In [21,22], consider the following affine system. The cost function is defined as (10). The next definition was given in [17,18] to show that the optimal control solution will be considered in the set of admissible control.
Defining Hamilton function and optimal cost function as follows: Step 2: Policy improvement Update new policy according to, Where max n is a number of limited iteration and v  is an arbitrary given small positive number.
This algorithm is considered in [21] that prove each policy control i  is admissible control.
The cost function i V was reduced at each step until converge to optimal policy and i  converge toward optimal policy as well. However, the nonlinear Lyapunov (17) is hard to solve directly. Therefore, in recent years, finding an indirectly way to solve this equation has been concerned by many researches [20][21][22][23][24][25]. In the next steps, two neural networks called Actor-Critic (AC) are trained simultaneously to solve approximately the HJB equation.
The cost function and its associated policy can be represented by using a neural network (NN) as follows, is corresponding function of NN that usually being selected as polynomial, Gausses, sigmoid function and so on.  is denoted x   . Approximated optimal cost function and optimal policy are presented: Note that, to approximate HJB solution, we need to find only term ˆc W . However, to stabilize closed-loop system, both ˆa W , ˆc W are employed, which leads to the flexibility that can help handling the stability of system in learning process. By replacing the optimal policy and the optimal cost function and by Actor-Critic networks in HJB (17), HJB error can be obtained.
The tuning law for ˆc W is described as follows, To make sure the convergence of ˆc W with update law (24), () x  must satisfy the Persistence Excitation (PE) condition [21].
( ) ( ) for several positive numbers 1  , 2  , T . Where On the other hands, (22) is nonlinear equation of ˆa W . Therefore, the tuning law for ˆa W is formulated based on GD algorithm to minimize the cost ( ) is a projection operator [22] that ensure the boundedness of updatation law.
Note that, these parameters of both two NN's update law c  , 1 a  , 2 a  must be selected to satisfy some conditions [22] to ensure stability of closed-loop system. One can also find the complete proof of convergence of parameters and stability of system in [22].

RISE feedback control design
In [3], the control term µ(t) is designed based on the RISE framework as follows: Remark 1: It is different from the work in [3], in our work the ADP algorithm is presented to find the intermediate optimal control input in the absence of dynamic uncertainty. Furthermore, ADP technique was considered in [20][21][22][23][24][25][26] was still not to apply for a robotic manipulator.
Remark 2: In compare with the work of Dixon [3] that design optimal control solving Riccati equation, this work requires partial knowledge of manipulator's dynamic including matrices , MC . However, using the ADP approach, the optimal control problem is addressed in general case for any given cost function as (10) without constraint.

OFFLINE SIMULATION RESULTS
Consider the offline simulation of a two-link manipulator control system using ADP technique and RISE algorithm.
The general dynamic of two-link manipulator is represented by (1) (23) and (26)  NN activation function is selected as, HJB as shown in [3]. Figures (1) and (2) (25), a probing signal is added in system input. Moreover, system's error evolution is shown in Figure (3) determining the stability of control system and state's evolution as shown in Figure 4.

CONCLUSION
This paper mentioned the problem of optimal control design for a manipulator in combination with RISE and exact linearization. With the ADP technique, the solution of HJB equation was found by iteration algorithm to obtain the controller satisfying not only the convergence of weight but also the position tracking. Offline simulations were implemented to validate the performance and effectiveness of the optimal control for manipulators.