Georgia Institute of Technology
Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer precise and systematic control but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) provides robustness and handles high-dimensional spaces but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole- body loco-manipulation. We generate reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and train RL policies to track these trajectories. Our results demonstrate that Opt2Skill outperforms pure RL methods in both training efficiency and task performance, with optimal trajectories that account for torque limits enhancing trajectory tracking. We successfully transfer our approach to real-world applications.
Opt2Skill aims to develop loco-manipulation controllers that enable the Digit humanoid robot to track model-based optimal trajectories. We start by generating whole-body reference motions that align with the robot's dynamics and meet specific motion targets using DDP through a Crocoddyl solver. These reference trajectories are then used during the training and deployment of our RL policy. The RL policy augments the reference joint trajectories with a residual term, and the augmented target trajectory is fed into a low-level PD controller for the torque-controlled robot. Finally, we deploy the RL policies in both simulation and real-world scenarios.
Our approach leverages differential dynamic programming (DDP) to generate whole-body motions that obey the robot's dynamics and task requirements.
We study a rich set of loco-manipulation tasks, including walking, stair traversing, agile jumping, desk object reaching, and bulky-object handling. For box manipulation tasks, we design heuristic target goals for the robot's hand trajectory, accounting for the transition between contact phases with and without holding the box by adding or removing its mass and inertia from the hands.
We train RL-based motion imitation policies in MuJoCo simulator.
We compare Opt2Skill with a pure RL method that learns from scratch in three different tasks, i.e., Walking, Box Pickup, and Desk Object Reaching. The pure RL baseline excludes reference trajectories in the observation space and the reward design.
Opt2Skill shows more accurate tracking, particularly in velocity for Walking and end-effector positioning for Box Pickup and Desk Object Reaching. In contrast, pure RL exhibits larger deviations in velocity, fails to lift the box, and could not accurately reach the target object in the Desk Object Reaching task.
We demenstrate that both including the torque limit in the reference trajectory and incorporating the torque reference into reward design significantly contribute to motion tracking performance. The torque limit ensures a high-quality and dynamically feasible reference trajectory, while the torque reference guides the robot to track the motion more precisely.
@article{liu2024opt2skill,
title={Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation},
author={Liu, Fukang and Gu, Zhaoyuan and Cai, Yilin and Zhou, Ziyi and Zhao, Shijie and Jung, Hyunyoung and Ha, Sehoon and Chen, Yue and Xu, Danfei and Zhao, Ye},
journal={arXiv preprint arXiv:2409.20514}
year={2024},
}
The authors would like to thank all the developers of Crocoddyl and Pinocchio for their open source frameworks. We are especially grateful to Zhaoming Xie and Carlos Mastalli for their professional discussions and insightful feedback.