Starting this week, I am working on co-training with the RoboCasa dataset. Moving to the real world adds an entirely new level of complexity.
Touchpads vs. Mechanical Buttons
I am currently tackling a task involving a blender, and it highlights a very interesting design challenge.
What does it mean? With the advancing technology, the interface of a modern blender is often no longer mechanical. It is capacitive.
Take the Ninja Professional Plus Blender as an example. To turn it on, the touch pad requires the electrical capacitance of human skin, not just physical force. It is no longer a mechanical button that you can just push with any solid object.

Ninja Professional Plus Blender ©️ Ninja
This highlights exactly why humanoid robots are becoming so important. Our world - our devices, architecture, and spaces - is designed specifically for human bodies.
Looking at the RoboCasa365 dataset, you can see the Franka Research 3 trying to actuate these interfaces using the default Franka Hand, which completely lacks that human-like touch.
©️ RoboCasa
Grasping Circular Objects
Another major obstacle is grasping. I am using the default Franka Hand, which means the contact type is essentially “point on plane” (parallel jaws) according to A Mathematical Introduction to Robotic Manipulation.
However, most blender lids are circular. A parallel jaw gripper cannot easily create “form closure” around a large circle, meaning the lid easily slips out of the gripper! You can see this with the lid on the Martha Stewart Countertop Blender:

Martha Stewart Countertop Blender©️ Martha Stewart
Interestingly, this type of grasp works perfectly fine in simulation. Simulation often lack the tiny physical imperfections, friction limits, and slip dynamics of the real world, which creates a false sense of success.
©️ RoboCasa
No wonder why the dexterous hand is considered one of the hardest problems in robotics.
Diving into VLA Models
Beyond the hardware challenges, it’s time for me to dive into the current state-of-art VLA models, which I haven’t thoroughly reviewed yet. I plan to read the papers and explore the codebases for these two: