Finish All the Blender Tasks
This weeks marks a major milestone for me: I have successfully completed all the atomic tasks related to blender.
| Open Blender Lid | Place Item in Blender | Close Blender Lid | Turn On Blender |
|---|---|---|---|
Just like Home said, the soul of robotics is manipulation. The most difficult task among these 4 is the “turn on blender” which requires contact-rich manipulation.
Meanwhile, relying only on the eye-in-hand camera view makes it hard to identify whether the task is actually completed. This means the “reward” can be very difficult to define purely from visual images.
I think this points to a large issue in our field. Imagine in the future world of physical AI, how can a robot realize that the blender’s cable is unplugged after pressing the power button, especially without sound or tactile feedback?
Synthetic Data with Motion Planning
As Jie Tan emphasized, we have to believe in the value of synthetic data. This week, I tried to push the boundaries of synthetic data even further. My recent experiment sheds some light on the possibility of using it to solve real-world problems.
Here’s what I did: I used the pipeline defined in 📄GSWorld: Closed-Loop Photo-Realistic Simulation Suite for Robotic Manipulation to build up the simulation scene.
Next, I used motion planning to generate over 110 trajectories. Finally, I perform SFT on my policy using only this simulation data. Achieving what is known as “zero-shot” sim2real transfer.
It turns out it works! The real robot can successfully recognize the physical scene and gradually move its gripper to the top of the blender.
| Sim | Real |
|---|---|