Ours
Existing motion generation methods based on mocap data are often limited by data quality and coverage. In this work, we propose a framework that generates diverse, physically feasible full-body human reaching and grasping motions using only brief walking mocap data. Base on the observation that walking data captures valuable movement patterns transferable across tasks and, on the other hand, the advanced kinematic methods can generate diverse grasping poses, which can then be interpolated into motions to serve as task-specific guidance. Our approach incorporates an active data generation strategy to maximize the utility of the generated motions, along with a local feature alignment mechanism that transfers natural movement patterns from walking data to enhance both the success rate and naturalness of the synthesized motions. By combining the fidelity and stability of natural walking with the flexibility and generalizability of task-specific generated data, our method demonstrates strong performance and robust adaptability in diverse scenes and with unseen objects.
Grasping Object on High & Low Table.
Ours
Fullbody PPO
ASE
AMP
AMP*
(Adding generated data)
PMP(2-Part)
(Upper and Lower)
PMP(5-Part)
(Torso and Five Limbs)
PSE(2-Part)
(Upper and Lower)
PSE(5-Part)
(Torso and Five Limbs)
We can generate diverse reaching and grasping motions conditioned on different scenes with different table height(0.0-1.6m), table width(0.6-1.2m), initial positions with high success rate and natural movement. The generalization capability for different scenes, particularly with respect to table heights, largely stems from the diversity of our generated data. Our dataset covers almost all possible table heights, providing task-specific guidance to facilitate grasping in various scenarios.
Our policy successfully generalizes to various objects, including unseen categories, producing natural movements with a high success rate.
At low data ratios, task completion improves rapidly as the ratio increases. However, when the ratio exceeds 100%, the character struggles with natural turning, and beyond 200%, the character shifts focus to balancing between generated demos, hindering effective walking.
Data Ratio: 0%
Data Ratio: 5%
Data Ratio: 10%
Data Ratio: 20%
Data Ratio: 50%
Data Ratio: 100%
Data Ratio: 200%
We conduct various ablation studies to validate the effectiveness of feature alignment. The result show it can improve the motion naturalness and stability during recovery.
Local feature alignment enhances the refinement of the grasping pose. For better comparison, we include a pause in the video during the grasping phase. A more detailed explanation is presented in the figure below.
w/o features
Zero+First-layer features
More detailed comparison
w/o features
Torso-feature only
Limb-features only
Torso-feature + First-layer feature
Zero+First-layer features
Zero+First+Second-layer features
Feature alignment enhances overall stability: with feature alignment, the agent have its left hand raises swiftly, and left foot steps back quickly to maintain balance when grasping low objects. This coordinated movement is crucial for dynamic recovery.
w/o feature align
Adding Zero/First-Layer feature align
Please contact us at liyitang22@mails.tsinghua.edu.cn if you have any question.