Logo Xingxin on Bug

HKUST PhD Chronicle, Week 21, Extrinsic Dexterity

January 6, 2026
3 min read

Project Progress

This week, I focused on evaluating the sim2real gap. Thanks to my previous setup, which ensured the camera captures point clouds with the correct intrinsic configurations, the model performs very well on real-world data.

In 2D packing, the model shows excellent results.

predict_tetris_6d_pose_packing.webp

However, the 3D stacking scenario is still less promising. Despite this, the current pipeline is definitely viable. Since I have chosen the 📄Point Transformer V3: Simpler, Faster, Stronger as my backbone, my next step is to scale up and feed this data-hungry beast.

predict_tetris_6d_pose_stacking.webp

How to pick it up?

Beyond 6D pose estimation, I have been thinking about the next step: robot manipulation. Initially, I wasn’t sure how to formulate the specific problem I was encountering.

Let me describe the scenario. I want to use a Franka Research 3 gripper to pick up a 3D tetris, specifically, an “L” shape.


The "L" shape:

██
██
█████

If the goal is to assemble it into a “J” shape (since “L” is chiral), a standard gripper cannot simply flip it while holding it.


The "J" shape:

   ██
   ██
█████

To solve this, we can make some guess:

  1. Use a second robot arm to receive the piece and turn it into a “J”.
  2. Place the “L” on the table, change its orientation by “orbiting” or sliding it against the surface, and then regrasp it.

I was curious if there was a specific field researching this. After some research, I found that these topics are well-established. I am finally starting to identify exactly where my problem fits.


Key concepts I discovered.


Bimanual Manipulation

The bimanual manipulation is also known as dual-arm manipulation. Interestingly, one of the top papers appeared on Google index, 📄Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, is by Tony Z. Zhao. He co-founded Sunday Robotics | The helpful robotics company with Cheng Chi, the author of 📄Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.


Reorientation

Changing an object from an “L” to a “J” is called reorientation. The following video shows an example of in-hand reorientation, where the robot uses only its fingers to reorient an object without external surfaces.


Extrinsic Dexterity

The extrinsic dexterity is a fascinating concept initiated by Home. It describes how a robot manipulates objects by leveraging resources outside of its own hands or fingers, such as using gravity, friction, or a table surface to flip or turn an object.

Leonidas Guibas’s lab also mentioned a paper in this field, Learning to Regrasp by Learning to Place.


Solution?

Generally, solutions in this field fall into 2 categories.

  1. Classic Mechanics: using geometric or physics-based solvers.
  2. Reinforcement Learning: training a policy π\pi to handle the movement.

For example, the paper 📄Learning Extrinsic Dexterity with Parameterized Manipulation Primitives uses RL to teach a robot how to flip objects effectively.