針對單一和多智能體人形機器人之創新雙演員近端策略優化算法
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
none
Single-agent and multi-agent systems are integral to the dynamic environmental processes of reinforcement learning in advanced humanoid robotic applications. This thesis introduces the Dual Proximal Policy Optimization (DA-PPO) algorithm and its extension, Independent Dual Actor Proximal Policy Optimization (IDA-PPO),designed for robotic navigation and cooperative tasks using the ROBOTIS-OP3 humanoid robot. The study validates the effectiveness of DA-PPO and IDA-PPO cross various scenarios, demonstrating significant improvements in both single-agent and multi-agent environments. DA-PPO excels in robotic navigation and movement tasks, outperforming established reinforcement learning methods in complex environments and basic walking tasks. This success is attributed to its innovative architecture, efficient utilization of hardware resources like the NVIDIA GeForce RTX 3050, and an effective reward function strategy. IDA-PPO, with its decentralized training and dual actor policy network, achieves higher mean rewards and faster learning compared to IPPO and MAPPO. IDA-PPO is 5.49 times faster than MAPPO and 8.22 times faster than IPPO, highlighting its superior efficiency and adaptability in multi-agent tasks. These findings underscore the importance of algorithmic innovation and hardware capabilities in advancing robotic performance, positioning DA-PPO and IDA-PPO as significant advancements in robotic learning
Single-agent and multi-agent systems are integral to the dynamic environmental processes of reinforcement learning in advanced humanoid robotic applications. This thesis introduces the Dual Proximal Policy Optimization (DA-PPO) algorithm and its extension, Independent Dual Actor Proximal Policy Optimization (IDA-PPO),designed for robotic navigation and cooperative tasks using the ROBOTIS-OP3 humanoid robot. The study validates the effectiveness of DA-PPO and IDA-PPO cross various scenarios, demonstrating significant improvements in both single-agent and multi-agent environments. DA-PPO excels in robotic navigation and movement tasks, outperforming established reinforcement learning methods in complex environments and basic walking tasks. This success is attributed to its innovative architecture, efficient utilization of hardware resources like the NVIDIA GeForce RTX 3050, and an effective reward function strategy. IDA-PPO, with its decentralized training and dual actor policy network, achieves higher mean rewards and faster learning compared to IPPO and MAPPO. IDA-PPO is 5.49 times faster than MAPPO and 8.22 times faster than IPPO, highlighting its superior efficiency and adaptability in multi-agent tasks. These findings underscore the importance of algorithmic innovation and hardware capabilities in advancing robotic performance, positioning DA-PPO and IDA-PPO as significant advancements in robotic learning
Description
Keywords
None, DA-PPO, IDA-PPO, Single Agent, Multi Agent, reinforcement learning, cooperative tasks, humanoid robots, robotic navigation