新型態合作型深度強化學習方法用於多智能個體協作任務
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
多智能體強化學習 (MARL) 在處理合作任務時面臨巨大的挑戰,主要因為狀態空間的龐大。傳統方法如獨立近端策略優化 (IPPO) 缺乏對其他智能體的感知,而集中化方法如多智能體近端策略優化 (MAPPO) 則採用集中學習與分散策略。在本研究中,我們引入了一種新穎的以通信為中心的方法,其中智能體將其狀態信息與行動信息一起編碼,創建動態的信息交換通道。通過促進智能體之間的信息交換,我們的方法在個體決策和協作任務完成之間架起了橋樑。通過實證評估,我們展示了我們的方法在提高多種合作MARL場景中的收斂性和性能的有效性,從而推動了在集中框架內的分散策略學習的邊界.
Multi-Agent Reinforcement Learning (MARL) faces formidable challenges when tackling cooperative tasks due to the expansive state space. Traditional approaches, such as Independent Proximal Policy Optimization (IPPO), lack awareness of other agents, while centralized methods like Multi-Agent Proximal Policy Optimization (MAPPO) employ centralized learning with decentralized policies. In this study, This research introduces a novel communication-centric approach where agents encode their state information alongside action messages, creating dynamic channels of information exchange. By facilitating information exchange among agents, our approach bridges the gap between individual decision-making and collaborative task completion. Through empirical evaluations, we demonstrate the effectiveness of our method in improving convergence and performance across diverse cooperative MARL scenarios, thus pushing the boundaries of decentralized policy learning within a centralized framework.
Multi-Agent Reinforcement Learning (MARL) faces formidable challenges when tackling cooperative tasks due to the expansive state space. Traditional approaches, such as Independent Proximal Policy Optimization (IPPO), lack awareness of other agents, while centralized methods like Multi-Agent Proximal Policy Optimization (MAPPO) employ centralized learning with decentralized policies. In this study, This research introduces a novel communication-centric approach where agents encode their state information alongside action messages, creating dynamic channels of information exchange. By facilitating information exchange among agents, our approach bridges the gap between individual decision-making and collaborative task completion. Through empirical evaluations, we demonstrate the effectiveness of our method in improving convergence and performance across diverse cooperative MARL scenarios, thus pushing the boundaries of decentralized policy learning within a centralized framework.
Description
Keywords
多智能體強化學習, 深度強化學習, Multi-Agent Reinforcement Learning, Deep reinforcement learning