Citation:
Abstract:
Designing scalable and interpretable control strategies for decentralized multi-agent systems remains a challenge in reinforcement learning (RL). This challenge is particularly evident in pursuit–evasion tasks, which require coordination under partial observability, without explicit communication or centralized guidance. Although deep RL methods achieve strong performance, they typically operate as black boxes, limiting trust and deployment in safety-critical domains. We propose a Multi-Head DDPG architecture that decomposes control into three interpretable force components - pursuit, cohesion, and separation - weighted adaptively to generate context-aware actions. This design enables emergent role differentiation and interpretable self-organization in the model. In grid-based pursuit–evasion benchmarks, our method outperforms DQN, PPO, and standard DDPG in terms of success rate, convergence speed, and generalization, while also yielding transparent collective behaviors. Overall, the results show that weighted force-based behavioral decomposition provides a principled pathway toward achieving both high-performance and explainable multi-agent control.