HKUST PhD Chronicle, Week 18, Debugging PointNet++

This week, I have been struggling with poor performance while using PPR-Net++: Accurate 6-D Pose Estimation in Stacked Scenarios to predict the 6D pose of the 3D tetromino.

To be honest, this is really frustrating. The result in the original paper were quite good, so there must be something wrong with my implementation. I came up with 3 hypotheses to investigate:

Backbone configuration: PPR-Net++ uses 📄PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space as its backbone. Am I using the correct parameters and settings?
Dataset quality: There might still be issues with my data. Specifically, is the entropy of my dataset high enough to train the model effectively?
Model compatibility: Perhaps PointNet++ is actually “overkill” for simple geometric shapes like these Tetris blocks, making it harder to converge.

These are my current theories. I plan to tackle each of these pain points next week.

Below are some brief notes on common terms I encountered while working with PointNet++.

What are $B,N,C$ ?

In code and research papers, you will often see dimensions noted as $B,N,C$ or $B,C,N$ . These refer to the Batch size, Number of points, and Number of channels(features), respectively. It is important to check which format a specific library expects, as some prefer $B,N,C$ while others might prefer $B,C,N$ .

What is farthest point sampling?

I have written a dedicated reflection on this topic. See: What is Farthest Point Sampling (FPS)?.

What is ball query and multi-scale grouping?

I have also documented this concept separately. See: What is Ball Query and Multi-Scale Grouping?.

HKUST PhD Chronicle, Week 18, Debugging PointNet++

What are B,N,CB,N,CB,N,C?

What is farthest point sampling?

What is ball query and multi-scale grouping?

What are $B,N,C$ ?