HKUST PhD Chronicle, Week 21, Extrinsic Dexterity

Project Progress

This week, I focused on evaluating the sim2real gap. Thanks to my previous setup, which ensured the camera captures point clouds with the correct intrinsic configurations, the model performs very well on real-world data.

In 2D packing, the model shows excellent results.

However, the 3D stacking scenario is still less promising. Despite this, the current pipeline is definitely viable. Since I have chosen the 📄Point Transformer V3: Simpler, Faster, Stronger as my backbone, my next step is to scale up and feed this data-hungry beast.

How to pick it up?

Beyond 6D pose estimation, I have been thinking about the next step: robot manipulation. Initially, I wasn’t sure how to formulate the specific problem I was encountering.

Let me describe the scenario. I want to use a Franka Research 3 gripper to pick up a 3D tetris, specifically, an “L” shape.


The "L" shape:

██
██
█████

If the goal is to assemble it into a “J” shape (since “L” is chiral), a standard gripper cannot simply flip it while holding it.


The "J" shape:

   ██
   ██
█████

To solve this, we can make some guess:

Use a second robot arm to receive the piece and turn it into a “J”.
Place the “L” on the table, change its orientation by “orbiting” or sliding it against the surface, and then regrasp it.

I was curious if there was a specific field researching this. After some research, I found that these topics are well-established. I am finally starting to identify exactly where my problem fits.

Key concepts I discovered.

Bimanual Manipulation

The bimanual manipulation is also known as dual-arm manipulation. Interestingly, one of the top papers appeared on Google index, 📄Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, is by Tony Z. Zhao. He co-founded Sunday Robotics | The helpful robotics company with Cheng Chi, the author of 📄Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.

Reorientation

Changing an object from an “L” to a “J” is called reorientation. The following video shows an example of in-hand reorientation, where the robot uses only its fingers to reorient an object without external surfaces.

Extrinsic Dexterity

The extrinsic dexterity is a fascinating concept initiated by Home. It describes how a robot manipulates objects by leveraging resources outside of its own hands or fingers, such as using gravity, friction, or a table surface to flip or turn an object.

Leonidas Guibas’s lab also mentioned a paper in this field, Learning to Regrasp by Learning to Place.

Solution?

Generally, solutions in this field fall into 2 categories.

Classic Mechanics: using geometric or physics-based solvers.
Reinforcement Learning: training a policy $\pi$ to handle the movement.

For example, the paper 📄Learning Extrinsic Dexterity with Parameterized Manipulation Primitives uses RL to teach a robot how to flip objects effectively.