Humanoid Whole-Body Manipulation via Active Spatial Brain and Generalizable Action Cerebellum

Humanoid Whole-Body Manipulation via Active Spatial Brain and
Generalizable Action Cerebellum

Zhizhao Liang^*, Yi-Lin Wei^*, Xuhang Chen^*, Mu Lin, Yi-Xiang He, Zhexi Luo, Jun-Hui Liu, Kun-Yu Lin, Wei-Shi Zheng^†

School of Computer Science and Engineering, Sun Yat-sen University

* Equal contribution. † Corresponding author.

Paper Video

Reachability Analysis

Different Positions and Heights

The four videos below demonstrate operations under different object positions and heights. The heatmap underneath compares reachability across different methods.

Long-Horizon Task

Long-Horizon

This long-horizon task demonstrates how the system can autonomously plan, invoke active visual perception, and execute manipulation actions to complete user-provided instructions that contain multiple subtasks.

Planning

Adaptive Planning

The planner leverages execution feedback to enable closed-loop control.

Adaptive Planning

Trajectory Generation

Action Primitives

Action primitives are incorporated to generate executable manipulation trajectories.

Action Primitives

Push

Pull

Place

Rotate

Spatial Understanding

Two Spatial Understanding Tasks

The two tasks below demonstrate obstacle avoidance and active exploration for spatial understanding.

Obstacle Avoidance

Active Exploration

Abstract

In this paper, we explore spatial-aware humanoid whole-body manipulation task. Compared with tabletop settings, this task poses two key challenges: 1) Spatial understanding is challenging in complex 3D environments with diverse spatial relations. 2) Action generation is difficult to generalize, as limited and costly real-robot data restricts data-driven models generalization. To address these challenges, we propose a generalizable humanoid loco-manipulation framework that leverages the spatial perception and action generation capabilities of multi-agent large models. Specifically, our framework includes two components: Active Spatial Brain for active spatial perception and decision-making, and Generalizable Action Cerebellum for executable robot action generation. The first component actively perceives the spatial scene and makes decisions on task planning and subtask decomposition. The second component generate executable robot actions based on the decisions made by the first module without needs of task-specific real robot data. To benchmark our framework, we design a set of spatial manipulation tasks from two perspectives: evaluating spatial perception and understanding, and assessing real-robot task performance. The results demonstrate strong performance on both aspects across diverse tasks and environments.

BibTeX

@misc{liang2026humanoid,
  title={Humanoid Whole-Body Manipulation via Active Spatial Brain and Generalizable Action Cerebellum},
  author={Zhizhao Liang and Yi-Lin Wei and Xuhang Chen and Mu Lin and Yi-Xiang He and Zhexi Luo and Jun-Hui Liu and Kun-Yu Lin and Wei-Shi Zheng},
  year={2026},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  eprint={2605.21133},
  url={https://arxiv.org/abs/2605.21133}
}

Humanoid Whole-Body Manipulation

Humanoid Whole-Body Manipulation via Active Spatial Brain andGeneralizable Action Cerebellum

Paper Video

Different Positions and Heights

Long-Horizon

Adaptive Planning

Adaptive Planning

Action Primitives

Action Primitives

Two Spatial Understanding Tasks

Obstacle Avoidance

Active Exploration

Abstract

Method Framework

BibTeX

Humanoid Whole-Body Manipulation via Active Spatial Brain and
Generalizable Action Cerebellum