My First HPC Job
The effort has finally paid off!
I have spent considerable time familiarizing myself with the HPC cluster. This week, that investment proved its worth. I requested an AMD 128-core CPU node to generate a large synthetic dataset. On my workstation, which has $(nproc)==24 cores, the generation process was painfully slow, consuming hours of compute time and blocking other works.
With the HPC’s parallelism enabled, the same task finished in just 3 hours. More importantly, I could run the job and forget about it, without worrying that other tasks on my local machine would interfere or crash the model.
This experience also deepened my appreciation for the elegance of Python, specifically, the ease of use provided by modern package managers. Simple commands like:
uv sync
bun install
cargo installhandle dependencies seamlessly. Even setting up CUDA environment is straightforward:
uv add nvidia-cuda-runtime-cu12Once CUDA is configured, installing PyTorch becomes much simpler.
Here’s the slurm job script I used for dataset generation:
#!/bin/bash
#SBATCH --job-name=data_gen
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=1 # number of tasks per node (adjust when using MPI)
#SBATCH --cpus-per-task=128 # cpu-cores per task (>1 if multi-threaded tasks, adj
#SBATCH --time=01:00:00 # total run time limit (HH:MM:SS)
#SBATCH --partition=amd # The partition(queue: intel/amd/gpu-a30/gpu-l20) where
#SBATCH --account=???
#SBATCH --output=gen_cpu_%j.out
#SBATCH --error=gen_cpu_%j.err
#SBATCH --mail-user=???@connect.ust.hk
#SBATCH --mail-type=BEGIN,END,FAIL,REQUEUE
CPUS=$SLURM_CPUS_PER_TASK
if [ -z "$CPUS" ]; then
CPUS=1
fi
DATASET_NAME="teris"
echo "Generating dataset..."
uv run 1_pybullet_create_n_collect.py \
--start_cycle 1 \
--end_cycle 100 \
--mode direct \
--renderer tiny \
--workers $CPUS \
--dataset_name $DATASET_NAME \
--model_name teris \
--dropping packing \
--max_drop 80 \
--object_types I O J L S Z T \
> generation.log 2>&1
echo "Done!"Unit Test
I always remember my mentor David Veinberg’s advice, “Slow is smooth, and smooth is fast”. Last week, eager to generate data and train a model, I rushed ahead but only to end up with a model that performed poorly during inference. Upon investigation, I discover the root cause: the synthetic dataset generation code I had adopted produced low-quality results.
Instead of pushing forward, I made myself stop. I took a step back to carefully consider the expected behavior and desired outcomes. This pause allowed me to design and write 6 comprehensive test cases that fully cover the dataset generation logic. The process of designing these tests gave me much deeper understanding of the system’s intended behavior. Now, no matter how much I refactor or improve the code, I can make changes confidently, knowing the tests will catch any regressions.
As a small tip, I found a very useful plugin to speed up Python unit testing: pytest-xdist documentation. It automatically parallelizes tests across all available CPU cores:
pytest -n auto