Optimal tracking control for unknown-dynamics quadrotor with an Off-policy Reinforcement Learning(RL) algorithm
An Off-policy Reinforcement Learning Algorithm for Optimal Tracking Control Problem
Full report: link
In this project, the optimal tracking control problem (OTCP) for the quadrotor which is a highly coupling system with completely unknown dynamics is addressed based on data by introducing the reinforcement learning (RL) technique. The proposed Off-policy RL algorithm does not need any knowledge of the quadrotor model. By collecting data, which is the states and control inputs of the quadrotor system, then using an actor-critic network (NNs) to approximate the optimal controller, OTCP for the quadrotor is resolved. Finally, simulation results are provided to illustrate the effectiveness of the proposed method.
- Non-linear affine system:
$$\dot{X}(t) = F(X(t)) + G(X(t))u(t)$$ - The cost function:
$$V(X(t))=\int_{t}^{\infty}e^{-\lambda(\tau-t)}[X(\tau)^TQX(\tau)+u(\tau)^TRu(\tau)]d\tau$$ The Rl algorithm comprises 3 steps: - Step 1: Init
Start with a stable control signal
$u_0$ and add a noise component$u_e$ to ensure the PE condition. Collect data and determine threshold$\epsilon$ - Step 2: Policy Evaluation and Policy Improvement:
- Step 3: Checking convergence
Stop iterating if
$| u^{i+1}-u^i | < \epsilon$ , otherwise: update$u^i = u^{i+1}$ , return to step (2).
A quadrotor could be described with dynamic equations:
Where:
- The position of the center of mass is
$p = [p_x,p_y,p_z]^T \in \mathbb{R}^3$ - The Euler angles
$\Theta = [\phi, \theta, \psi]$ . -
$e_{i,j}$ is the vector which has$i$ numbers of zeros except for number 1 in the$j^{th}$ position. $J = diag(J_x, J_y, J_z)$
A typical control scheme for a quadrotor consists of a Position controller which generates desired trajectory for the inner control loop and an Attitude controller which tracks the desired attitude angles obtained from the outer loop control.
Rewrite the position equation in the form of an affine system:
where:
Rewrite the attitude equation in the form of an affine system:
where:
The Actor-Critic Neural Network structure is introduced to estimate
Consider a quadrotor with the desired trajectory as a spiral trajectory
In the first stage, we use 2 PID controllers for both outer and inner loops to collect data for the next
stage of training to obtain the optimal controllers. Note that noises are added to the system to guarantee
the PE condition. The position and attitude tracking error in this stage is illustrated in the figures below.
In the second stage, we use the data as the input to the RL algorithms proposed in the previous section. The convergence of the weights is shown in the figures below.
After we obtain the weights, estimated optimal controllers are applied to the object. The tracking performance is illustrated in the figures below.
The decay lambda has a major impact on the tracking error, which is illustrated below:
In this project, a novel control strategy that consists of the Off-policy RL algorithm was proposed. By collecting data to train two actor-critic networks (NNs) which aim to estimate the optimal controllers including a position controller and attitude controller, this structure has the advantage of no need of any prior information on the high coupling system. Finally, simulation results are provided to illustrate the tracking performance of a sophisticated trajectory of the system.