Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer flexibility to define precise motion but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) handles high-dimensional spaces with strong robustness but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. Opt2Skill generates dynamic feasible and contact-consistent reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and trains RL policies to track these optimal trajectories. Our results demonstrate that Opt2Skill outperforms baselines that rely on human demonstrations and inverse kinematics-based references, both in motion tracking and task success rates. Furthermore, we show that incorporating trajectories with torque information improves contact force tracking in contact-involved tasks, such as wiping a table.

Diverse walking modes

Opt2Skill enables the Digit humanoid robot to perform diverse walking modes, including forward, backward, sideways, and turning gaits.

In addition, it can handle rough terrain, such as stairs, ramps, and outdoor surfaces like grass.

Diverse loco-manipulation tasks

Opt2Skill also enables the Digit humanoid robot to perform multi-contact whole-body loco-manipulation tasks, such as pushing an object on a desk beyond its support polygon. To maintain balance, the robot stabilizes itself by leaning on the table with one elbow while reaching out with the opposite arm to push the box. The legs coordinate with upper-body motion to support this forward-reaching behavior, demonstrating the need for whole-body coordination. This example showcases Opt2Skill’s capability to handle high-dimensional loco-manipulation tasks.

In shelf manipulation scenarios, the Digit robot squats to pick up a plastic box from a lower shelf and places it on a higher one. Its hands can reach heights ranging from 0.4 m to 1.8 m. Opt2Skill demonstrates accurate tracking of hand positions throughout these complex motion sequences, achieving an average tracking error of less than 4 cm without online trajectory adaptation.

Additionally, Opt2Skill enables other real-world tasks, such as heavy box pickup, pickup+walk, door opening, and drawing+wiping.

In the heavy box pickup task, the robot squats down and successfully lifts a box weighing up to 4.9 kg. This task requires precise hand positioning and force control to lift the box without losing balance or dropping it.

In the pickup+walk task, the robot picks up a box from a standing pose and walks forward while carrying it. This requires the robot to maintain balance and apply sufficient contact force to hold the box, demanding accurate hand positioning and force control. The task demonstrates Opt2Skill’s ability to adapt to dynamic load changes and maintain stability during locomotion.

In the door opening task, the robot approaches two heavy fire doors and pushes them open using both arms while walking. This requires maintaining balance and applying sufficient force, demonstrating the robot's ability to handle heavy objects and adapt to dynamic environments.

In the drawing+wiping task, the robot uses one hand to draw lines on a whiteboard and the other hand to wipe them off. This task requires precise hand positioning and force control to ensure the lines are drawn clearly and erased effectively.

These results illustrate that our framework maintains compliant contact, adapts effectively to environmental variations, and enables the robot to perform everyday loco-manipulation behaviors. Together, these examples further highlight the versatility and generalization capability of the Opt2Skill framework across diverse real-world scenarios.

Opt2Skill

Opt2Skill aims to develop an RL-based whole-body controller that enables a humanoid robot to track model-based optimal trajectories. These trajectories contain valuable torque reference that enables a high success rate in multi-contact loco-manipulation tasks. We begin by generating dynamically feasible and task-specific motions using DDP. These trajectories serve as high-quality reference motions that encode contact-rich dynamics, torque limits, and task-solving strategies. We generate a diverse set of such trajectories offline and use them to guide the training of RL policies that directly predict joint-level target positions. By leveraging physically consistent references, the policy learns robust control strategies that track diverse reference motions sampled offline and transfer effectively to real hardware.

Whole-body Reference Motions Generation

To support robust policy learning across a wide range of loco-manipulation behaviors, we generate reference trajectories for diverse tasks, including walking, jumping, stair climbing, object manipulation, drawing+wiping, and door opening. For each task, we systematically vary key motion parameters such as gait phase, contact mode, walking speed, foot clearance, center of mass position, and target object location. Leveraging the flexibility of our trajectory optimization (TO) framework, we efficiently sample these parameters to produce large datasets of motions that satisfy full-body dynamics, contact constraints, and torque limits.
These reference trajectories include joint torques and contact forces, which are incorporated directly into the policy observations and reward design—facilitating the learning of robust and transferable contact-rich loco-manipulation behaviors.

Training in MuJoCo

We train RL-based motion imitation policies in MuJoCo simulator.

How does the quality and source of motion datasets affect the performance and generalization of Humanoid RL Policies?

We compare Opt2Skill with baselines that rely on human demonstrations and inverse kinematics-based references. These results show that physical feasibility and task adaptability of TO-based references contribute to both local tracking accuracy and long-term global stability.

Opt2Skill exhibits a higher success rate on varying terrains. This highlights the advantages of training with offline-generated, dynamically feasible trajectories compared to fixed human or purely kinematic references.

What role does torque information play in learning physically consistent and contact-rich humanoid behaviors?

We conduct additional simulation experiments to investigate the effect of torque information in contact-rich tasks. Specifically, we focus on a wiping task that requires controlled contact force between the end effector and a desk surface. We evaluate four ablation baselines:
Pos: tracks end-effector positions without any contact or torque information in the observation or reward.
Pos+F: adds reference contact force to the observation and includes a contact force tracking reward.
Pos+T: adds reference joint torques to the observation and includes a torque tracking reward.
Pos+F+T: adds both reference contact force and joint torques in the observation, along with corresponding force and torque tracking rewards.

We demonstrate that joint torque information—available only from trajectory optimization (TO)—enhances tracking performance in contact-rich scenarios. Torque information plays a crucial role by guiding when and how force should be applied during motion. Combining both torque and contact force information yields the best tracking performance in loco-manipulation tasks. TO provides reference torques, which help anchor the policy to physically grounded and consistent behaviors.

Conclusion

In this paper, we present a TO-guided RL pipeline for humanoid loco-manipulation. We show that the RL tracking performance is affected by the quality of the motion reference. The full-body-dynamics-based TO provides high-quality and dynamically-feasible trajectories. Based on such trajectories, motion imitation yields better tracking performance, especially through the use of torque information. We demonstrate our sim-to-real results on the humanoid robot Digit with versatile loco-manipulation skills, including dynamic stair traversing, multi-contact box manipulation, and door traversing.

BibTeX

@article{liu2024opt2skill,
  title={Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation},
  author={Liu, Fukang and Gu, Zhaoyuan and Cai, Yilin and Zhou, Ziyi and Zhao, Shijie and Jung, Hyunyoung and Ha, Sehoon and Chen, Yue and Xu, Danfei and Zhao, Ye},
  journal={arXiv preprint arXiv:2409.20514}
  year={2024},
}