Project Progress
This week, I focused on evaluating the sim2real gap. Thanks to my previous setup, which ensured the camera captures point clouds with the correct intrinsic configurations, the model performs very well on real-world data.
In 2D packing, the model shows excellent results.

However, the 3D stacking scenario is still less promising. Despite this, the current pipeline is definitely viable. Since I have chosen the 📄Point Transformer V3: Simpler, Faster, Stronger as my backbone, my next step is to scale up and feed this data-hungry beast.

How to pick it up?
Beyond 6D pose estimation, I have been thinking about the next step: robot manipulation. Initially, I wasn’t sure how to formulate the specific problem I was encountering.
Let me describe the scenario. I want to use a Franka Research 3 gripper to pick up a 3D tetris, specifically, an “L” shape.
The "L" shape:
██
██
█████
If the goal is to assemble it into a “J” shape (since “L” is chiral), a standard gripper cannot simply flip it while holding it.
The "J" shape:
██
██
█████
To solve this, we can make some guess:
- Use a second robot arm to receive the piece and turn it into a “J”.
- Place the “L” on the table, change its orientation by “orbiting” or sliding it against the surface, and then regrasp it.
I was curious if there was a specific field researching this. After some research, I found that these topics are well-established. I am finally starting to identify exactly where my problem fits.
Key concepts I discovered.
Bimanual Manipulation
The bimanual manipulation is also known as dual-arm manipulation. Interestingly, one of the top papers appeared on Google index, 📄Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware, is by Tony Z. Zhao. He co-founded Sunday Robotics | The helpful robotics company with Cheng Chi, the author of 📄Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.
Reorientation
Changing an object from an “L” to a “J” is called reorientation. The following video shows an example of in-hand reorientation, where the robot uses only its fingers to reorient an object without external surfaces.
Extrinsic Dexterity
The extrinsic dexterity is a fascinating concept initiated by Home. It describes how a robot manipulates objects by leveraging resources outside of its own hands or fingers, such as using gravity, friction, or a table surface to flip or turn an object.
Leonidas Guibas’s lab also mentioned a paper in this field, Learning to Regrasp by Learning to Place.
Solution?
Generally, solutions in this field fall into 2 categories.
- Classic Mechanics: using geometric or physics-based solvers.
- Reinforcement Learning: training a policy to handle the movement.
For example, the paper 📄Learning Extrinsic Dexterity with Parameterized Manipulation Primitives uses RL to teach a robot how to flip objects effectively.