Uloženo v:
Podrobná bibliografie
Hlavní autoři: Wijayatunga, Minduli, Wallace, Nathan, Sukkarieh, Salah, Armellin, Roberto
Médium: Preprint
Vydáno: 2026
Témata:
On-line přístup:https://arxiv.org/abs/2602.00366
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Obsah:
  • Spacecraft rendezvous and proximity operations (RPO) pose safety risks to high-value assets, so formal safety guarantees are critical. Yet conservative safety controllers can reduce mission efficiency. We propose a unified two-stage reinforcement learning (RL) framework that addresses two complementary limitations of Input-Constrained Control Barrier Functions (ICCBFs) for safety-critical, fuel-limited spacecraft control. Given a certified safe set S, ICCBFs guarantee forward invariance of an inner set C* under input bounds, but the resulting per-step quadratic programme (QP) is greedy and fuel-inefficient within C*, and recoverable states outside C* are conservatively discarded. Stage 1 learns state-dependent class-K-infinity parameters that adapt ICCBF/CLF decay rates, embedding long-horizon cost awareness while preserving invariance in C*. Stage 2 learns a residual barrier h_RL(x) that certifies recoverability for a subset of S minus C*. At run time, the controller selects the appropriate barrier formulation (Stage 1 or Stage 2) and solves a lightweight ZOH QP. Both stages are trained with PPO using rewards that penalise constraint violations, control effort, and task metrics. We evaluate three benchmarks: cruise control, spacecraft rendezvous with a rotating target, and inspection that maximises observability subject to keep-in and keep-out zone constraints. Across test cases, the method reduces median fuel relative to ICCBF baselines by 12 to 25 percent and increases the fraction of trajectories that remain in S by 7 to 8 percent, while retaining real-time QP complexity.