A combination of formation control and optimal control based on reinforcement learning for multiple SVs
Full report: link
This article presents a comprehensive approach to integrating formation tracking control and optimal control for a fleet of multiple surface vehicles (SVs), accounting for both kinematic and dynamic models of each SV agent. The proposed control framework comprises two core components: a high-level displacement-based formation controller and a low-level reinforcement learning (RL)-based optimal control strategy for individual SV agents. The high-level formation control law, employing a modified gradient method, is introduced to guide the SVs in achieving desired formations. Meanwhile, the low-level control structure, featuring time-varying references, incorporates the RL algorithm by transforming the time-varying closed agent system into an equivalent autonomous system. The application of Lyapunov’s direct approach, along with the existence of the Bellman function, guarantees the stability and optimality of the proposed design. Through extensive numerical simulations, encompassing various comparisons and scenarios, this study demonstrates the efficacy of the novel formation control strategy for multiple SV agent systems, showcasing its potential for real-world applications.
The high-level formation control law, employing a modified gradient method, translates the desired formation and trajectory into individual reference trajectories that are feasible.
After that, we can obtain the desired trajectory for the low-level controller by integrating these derivatives.
We approximate the Bellman function and the Optimal controller using a critic NN and an actor NN:
The communication graph:
Tracking trajectories of four agents following a straight line:
The communication graph:
Tracking trajectories of four agents following a circle line:
The communication graph:
Tracking trajectories of eight agents following a circle line:
The metric is formulated as follows:
The cumulative cost with RL is consistently smaller than that without RL:
Project development direction:
- The authors plan to conduct experimental validation and extend the low-level tracking controller with model-free RL algorithms that do not necessarily require complete system dynamics.
- Direct implementation of RL algorithms to solve multi-agent control problems in nonlinear systems with uncertainty and disturbance is considered as a feasible approach for further research.