This week was a mix of wrestling real robots and getting a small taste of new theories.
Franka Gello
I finally set up the Franka GELLO. I received the single gello leader device this week. I am grateful to my supervisor for approving this purchase.
The main reason I wanted this device is that my current data collection uses an action space based on operational space control.
If I want to use a pre-trained checkpoint or co-train with other joint-space datasets, like DROID, my current setup makes it nearly impossible. Therefore, having a reliable way to do joint-space teleoperation is highly preferable.
Offline Goal-Conditioned Reinforcement Learning
Because of the project I am currently exploring, I have to learn the basics of a topic I have never touched before: Offline Goal-Conditioned Reinforcement Learning(GCRL).
This topic is fairly new in academia. I first heard about it from the reinforcement learning guru Sergey Levine’s group. His student, Seohong Park, seems to be an expert in this field and has published several papers at top AI conferences like ICLR and NeurIPS.
Until this week, my understanding of GCRL was quite limited. I only knew its basic problem formulation:
where is a stochastic policy outputting an action conditioned on the state and the goal . I aim to write a dedicated learning blog after I dive deeper into Seohong Park’s 📄OGBench: Benchmarking Offline Goal-Conditioned RL.
Value Function
In my previous RL study journey, like How to Derive the Policy Gradient with Monte Carlo Sampling?, I intentionally skipped the concept of value function. Well, it turns out that you cannot skip the hard stuff forever. With GCRL, I finally had to chew on this touch bone.
I found several books that are very useful for helping me understand value function:
- Section 3.7 of Reinforcement Learning: An Introduction
- Chapter 7 of Algorithms for Decision Making
Again, I plan to extract these learnings into another blog post soon once I fully digest them!