Formation control scheme with Reinforcement Learning for Multiple Surface vehicles (SV)

A combination of formation control and optimal control based on reinforcement learning for multiple SVs
Full report: link

1. Introduction

This article presents a comprehensive approach to integrating formation tracking control and optimal control for a fleet of multiple surface vehicles (SVs), accounting for both kinematic and dynamic models of each SV agent. The proposed control framework comprises two core components: a high-level displacement-based formation controller and a low-level reinforcement learning (RL)-based optimal control strategy for individual SV agents. The high-level formation control law, employing a modified gradient method, is introduced to guide the SVs in achieving desired formations. Meanwhile, the low-level control structure, featuring time-varying references, incorporates the RL algorithm by transforming the time-varying closed agent system into an equivalent autonomous system. The application of Lyapunov’s direct approach, along with the existence of the Bellman function, guarantees the stability and optimality of the proposed design. Through extensive numerical simulations, encompassing various comparisons and scenarios, this study demonstrates the efficacy of the novel formation control strategy for multiple SV agent systems, showcasing its potential for real-world applications.

2. The proposed control scheme

2.1. The high-level displacement-based controller

The high-level formation control law, employing a modified gradient method, translates the desired formation and trajectory into individual reference trajectories that are feasible. $$\dot{\bar{p_j}} = h_j h_j^T f_j,$$ $$\dot{h_j} = (I - h_j h_j^T) f_j, j \in S $$ The high-level displacement-based formation control protocol can be implemented for each SV: $$\dot{\bar{x_j}} = \bar{v_j} cos \bar{\psi_j}, $$ $$\dot{\bar{y_j}} = \bar{v_j} sin \bar{\psi_j}, $$ $$\bar{\omega_j} = [-sin \bar{\psi_j},cos \bar{\psi_j}](-(\mathcal{L} \otimes I)(\bar{p_j} - \bar{p_j}^*)), $$

$$\bar{v_j} = [cos \bar{\psi_j}, sin \bar{\psi_j}](-(\mathcal{L} \otimes I)(\bar{p_j} - \bar{p_j}^*))$$

After that, we can obtain the desired trajectory for the low-level controller by integrating these derivatives.

2.2. Low-level RL-based control design for each SV

We approximate the Bellman function and the Optimal controller using a critic NN and an actor NN:

$$\widehat V_i(X_i) = {\widehat {W}_{ci}}^T\Psi_i (X_i)$$

$$\widehat u_i(X_i) = - \frac{1}{2}{R^{ - 1}}{G_i^T}(X_i){(\frac{{\partial \Psi_i }}{{\partial x_i}})^T}{\widehat {W}_{ci}}$$

3. Multi-agent formation controller verification

3.1. Flower-shaped formation

The communication graph:

Tracking trajectories of four agents following a straight line:

3.2. Square formation

The communication graph:

Tracking trajectories of four agents following a circle line:

3.3. Diamond formation with more agents

The communication graph:

Tracking trajectories of eight agents following a circle line:

3.4. Advantages of the RL-based method compared to a non-RL policy

The metric is formulated as follows:

$$J_\Sigma = \int\limits_0^T\left( {\eta_i^T}Q\eta_i + {\tau_i^T}R\tau_i \right)dt$$

The cumulative cost with RL is consistently smaller than that without RL:

4. Conclusion

Project development direction:

The authors plan to conduct experimental validation and extend the low-level tracking controller with model-free RL algorithms that do not necessarily require complete system dynamics.
Direct implementation of RL algorithms to solve multi-agent control problems in nonlinear systems with uncertainty and disturbance is considered as a feasible approach for further research.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Multi_SVs_Summary.pdf		Multi_SVs_Summary.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Formation control scheme with Reinforcement Learning for Multiple Surface vehicles (SV)

1. Introduction

2. The proposed control scheme

2.1. The high-level displacement-based controller

2.2. Low-level RL-based control design for each SV

3. Multi-agent formation controller verification

3.1. Flower-shaped formation

3.2. Square formation

3.3. Diamond formation with more agents

3.4. Advantages of the RL-based method compared to a non-RL policy

4. Conclusion

About

Releases

Packages

duongdinhph/Formation_RL_for_multiagents

Folders and files

Latest commit

History

Repository files navigation

Formation control scheme with Reinforcement Learning for Multiple Surface vehicles (SV)

1. Introduction

2. The proposed control scheme

2.1. The high-level displacement-based controller

2.2. Low-level RL-based control design for each SV

3. Multi-agent formation controller verification

3.1. Flower-shaped formation

3.2. Square formation

3.3. Diamond formation with more agents

3.4. Advantages of the RL-based method compared to a non-RL policy

4. Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages