HKUST PhD Chronicle, Week 22, From MuJoCo Stability to VLA

The following image was captured by an Intel® RealSense™ D435i while I was testing the data recording with LeRobot. It was just a random shot, but I found the light, composition, atmosphere surprisingly soothing.

Divergence

Recently, I have been trying to increase entropy to my dataset, which led me to use Overview - MuJoCo Documentation. This is my first time working with a physical simulation engine, and honestly, it is quite a challenge. I’ve encountered an obstacle related to divergence and numerical integration, the simulation sometimes “explode” or become unstable. I haven’t solved nor understood the root cause yet, so for now, I am simply filtering out the bad data.

LeRobot

I was asked to help my colleagues build a prototype using Franka Research 3 to record dataset for training a VLA model. The dataset follows the LeRobot schema. This is my first time working with VLAs, and it has been a lot of fun.

.
├── data
│   └── chunk-000
│       └── file-000.parquet
├── images
│   ├── observation.images.base
│   └── observation.images.wrist
├── meta
│   ├── episodes
│   │   └── chunk-000
│   │       └── file-000.parquet
│   ├── info.json
│   ├── stats.json
│   └── tasks.parquet
└── videos
    ├── observation.images.base
    │   └── chunk-000
    │       └── file-000.mp4
    └── observation.images.wrist
        └── chunk-000
            └── file-000.mp4

Wrist(640x480)	Base(640x480)

I really appreciate the simplicity of the LeRobot API. Using it along with libfranka: C++ library for Franka Robotics research robots, I was able to build this prototype in just 2 days. Although VLA is not my primary research focus, the idea of training and deploying one from scratch is exciting. There are so many open-sourced policies available now:

ACT
SmolVLA
π₀ (Pi0)
π₀-FAST (Pi0Fast)
π₀.₅ (Pi05)
NVIDIA GR00T N1.5
X-VLA
WALL-OSS

What a time to be alive!

ROS

To prepare for future features, like using robot perception to identify handles and open doors, I have decided to follow the DRY principle from The Pragmatic Programmer: From Journeyman to Master. I don’t want to reinvent the wheels.

Setting up ROS usually requires a significant time investment, but I know that managing to do so now will save me a ton of time later. I’ve chosen Isaac ROS because it comes with a great selection of prebuilt libraries for perception and acceleration. I hope to get the pipeline running quickly!