Understanding PD Controllers in Robotics and Robot Learning

If you have ever used physics simulators like Isaac Sim, ManiSkill, or Overview - MuJoCo Documentation, you have likely encountered the term PD controller.

If you’ve worked with hardware libraries like libfranka: C++ library for Franka Robotics research robots, you might have also noticed parameters like, “stiffness”, “damping”.

In this post, I want to share a practical, intuitive understanding of this topic, bridging the gap between control theory and robot learning.

Remark

Unless otherwise stated, the mathematical notation in this post follows Russ Tedrake’s conventions in Robotic Manipulation.

The Math Expression

Let’s start by looking at the math. A standard PID controller is expressed as:

\tau = \underbrace{k_p (q^d - q)}_{P} + \underbrace{k_i \int (q^d - q) dt}_{I} + \underbrace{k_d (\dot{q}^d - \dot{q})}_{D}

In practical robotics, we almost always drop the integral (" $I$ ") term. Accumulating past errors can lead to dangerous, unpredictable movements in contact-rich tasks (a phenomenon known as integral windup). This simplifies our equation to a PD controller:

\tau = \underbrace{k_p (q^d - q)}_{P} + \underbrace{k_d (\dot{q}^d - \dot{q})}_{D}

where

$\tau$ : the command sent to the actuator (pulse-width modulation), simplified here as torque
$q$ : the current actual joint position
$q^d$ : the desired joint position (target)
$(q^d - q)$ : the position error
$\dot{q}$ : the current actual joint velocity
$\dot{q}^d$ : the desired joint velocity (target)
$(\dot{q}^d - \dot{q})$ : the velocity error
$k_p$ : the proportional gain (think of this as stiffness or a “spring”).
$k_d$ : the derivative gain (think of this as damping or “friction”).

Remark

In multi-joints robot, $k_p$ and $k_d$ are technically diagonal matrix, but treating them as scalar values works perfectly for building intuition!

Remark

In control theory, it is related to mass-spring-damper model.

The Franka Example

Let’s step away from the equations and look at an example I made using Franka Research 3.

When I use JointPositionControl in libfranka to move a robot, there are two levels of control happening simultaneously.

📌The High Level (trajectory generation)

First, I used trajectory generator like (like ruckig: Motion Generation for Robots and Machines) to create a smooth path that respects the limits of velocity, acceleration, and jerk. This trajectory matches the Franka’s 1KHz control frequency. Every millisecond ( $\Delta t=1\text{ms}$ ), we generate 7 new desired joint position values:

$t_1=1\text{ms}$ , $q(t)=\Set{\dots}$
$t_2=2\text{ms}$ , $q(t)=\Set{\dots}$
$t_3=3\text{ms}$ , $q(t)=\Set{\dots}$

// ⭐️control runs every 1ms
robot.control([&](...) -> franka::JointPositions {
  //...
 
  franka::JointPositions pos = {/* 7 target joint values */};
  return pos;
});

Remark

If you want to know more about the joint position notation, see Understanding Franka Robot Control Parameters.

Remark

If you want to know more about how I control the Franka, see frankz.

📌The Low-Level (Hardware Control)

At the hardware level, the PD controller takes over. The pos I return in the C++ code becomes the desired joint position $q^d$ .

The robot’s internal controller calculates the required torque:

\tau = k_p (q^d - q) + k_d(\dot{q}^d - \dot{q})

The libfranka hides this high-frequency loop from us. We feed it targets(desired joint position), and the internal PD controller tracks them.

Example: One Joint, One Step

Let’s walk through a simple one-joint scenario to see how $k_p$ and $k_d$ interact.

📌0. Setup

Current state: $q = 0.5\text{ rad}$ , $\dot{q} = 0.0\text{ rad/s}$
Target state: $q^d = 1.0\text{ rad}$ , $\dot{q}^d = 0.0\text{ rad/s}$
Gains: $k_p = 100$ , $k_d = 10$

📌1. Compute Initial Torque

\begin{align} \tau &= k_p (q^d - q) + k_d(\dot{q}^d - \dot{q}) \\ &= 100(1 - 0.5) + 10(0.0-0.0)\\ &=50 \end{align}

The “spring” ( $k_p$ ) pulls the joint forward.

📌2. The World Dynamics

The joint accelerates. As $q$ increases, the position error $(q^d - q)$ shrinks, reducing the torque. The joint eventually decelerates and settles at the target.

📌3. Braking

Imagine the robot is almost at the target ( $q = 0.95$ ), but it’s moving way too fast ( $\dot{q} = 8.0\text{ rad/s}$ ). What happens?

\begin{align} \tau &= k_p (q^d - q) + k_d(\dot{q}^d - \dot{q}) \\ &= 100(1 - 0.95) + 10(0.0-8.0)\\ &=-75 \end{align}

The torque is heavily negative! This means the robot is braking. The “D” term ( $k_d$ ) acts as a damper to slow the robot down and prevent it from overshooting the target.

Why does the $\dot{q}^d = 0$ ?

In pure joint position control, we often treat the desired velocity at the exact target as zero so the robot comes to a rest.
Analogy: choose either franka::JointPositions or franka::JointVelocities .

With this example, we build an intuition on the $k_p$ and $k_d$ .

$k_p$ (Spring): Pulls the robot toward the target.
$k_d$ (Damper): Slows the robot down to prevent overshooting.

Exercise 1

Assume we have

$k_p = 100$
$k_d = 10$
$q^d=1.0$
$\dot{q}^d=0.0$

Calculate the torque for the following scenarios:

	$q$	$\dot{q}$	$\tau$
(a)	1.0	0.0	?
(b)	1.2	2.0	?
(c)	0.0	-5.0	?

🗣Answer (a)

\begin{align} \tau &= k_p (q^d - q) + k_d(\dot{q}^d - \dot{q}) \\ &= 100(1 - 1) + 10(0.0-0.0)\\ &=0 \end{align}

The robot is at the target and not moving. No actuation needed.

🗣Answer (b)

\begin{align} \tau &= k_p (q^d - q) + k_d(\dot{q}^d - \dot{q}) \\ &= 100(1 - 1.2) + 10(0.0-2.0)\\ &=-40 \end{align}

The robot has overshot the target and is still moving away from it. The controller aggressively pulls it back and brakes.

🗣Answer (c)

\begin{align} \tau &= k_p (q^d - q) + k_d(\dot{q}^d - \dot{q}) \\ &= 100(1 - 0) + 10(0.0-(-5.0))\\ &=150 \end{align}

The robot is far from the target and currently moving in the wrong direction (negative velocity). Both the spring and the damper work together to aggressively push it forward.

PD Control in Robot Learning

When we transition to robot learning, things get interesting. A neural network policy ( $\pi_\theta$ ) rarely outputs raw torques. Instead, it outputs commands in a specific action space.

Take ManiSkill, for example. It offers several controllers:

pd_joint_delta_pos
pd_ee_delta_pos
pd_joint_vel
…

So how does the action spaces work?

$q^d = q+a$ (pd_joint_delta_pos): The policy outputs a small action $\Delta q$ . The target becomes $q^d = q + \Delta q$ . The neural network is directly adjusting individual joint angles.
$\displaystyle f^{-1}(T_{\text{base}}^{ee}+a)=q^d$ (pd_ee_delta_pose): The policy outputs a desired change in the [end-effector] pose (a SE(3)). An inverse kinematics solver( $f^{-1}$ ) converts this desired EE pose into target joint positions( $q^d$ ).
$\dot{q}^d$ (pd_joint_vel): The policy directly outputs the joint velocities.

Tip

The choice of controller defines the action space your policy sees.

Controller	Action dim	Policy outputs	Easier for Robot Learning?
`pd_joint_delta_pos`	7	Small joint deltas	ok
`pd_ee_delta_pose`	6	position + rotation delta	easy
`pd_joint_vel`	7	joint velocities	hard

EE-space controllers are typically much more sample-efficient. The policy only needs to learn “move gripper left 10cm,” rather than learning the complex kinematics required to coordinate 7 individual joints to achieve that same leftward motion.

Exercise 2

💬Question(1) You’re training an RL policy for a peg-in-hole task. The peg needs to safely comply with contact forces when it hits the edge of the hole. Which gains do you want?

(A) High $k_p$ Low $k_d$
(B) Low $k_p$ Low $k_d$
(C) Low $k_p$ High $k_d$

Remark

The “peg-in-hole” is the problem of inserting a (round) peg into a (round) hole.

©️Samuel Hunt Drake, “Using compliance in lieu of sensory feedback for automatic assembly.”, PhD thesis, Massachusetts Institute of Technology, 1978.

🗣Answer (1)

The answer is B. To elaborate, I want you to imagine 2 things.

high $k_p$ 📈: iron grip, stiff spring, robot fight back when hitting hole edge
low $k_p$ 📉: gently, soft spring, robot yields, slide in when hitting hole edge

Tip

The solution to this task is impedance control.

💬Question(2) Your simulated Franka robot arm is oscillating wildly around its target position. What should you change?

(A) Increase $k_p$
(B) Increase $k_d$
(C) Decrease $k_d$

🗣Answer (2)

Increase $k_d$ . To be honest, I personally encountered this issue in practice. The end-effector is oscillating around a position and it seems vibrating. To solve this issue, we actually need a damper.

Here’s the intuition.

Spring	Damper
overshoot, bounce back, over shoot = oscillation	spring but absorb, approach smoothly

Tip

Key takeaway: Oscillating wildly? > You need MORE damping > INCREASE $k_d$ .

💬Question(3) You switch from pd_ee_delta_pos (3-dim) to pd_joint_delta_pos (7-dim) for the same pick-and-place task. Training now requires 5x more samples. Why?

🗣Answer (3)

The intuition behind is that the pd_joint_delta_pos must learn much more.

	`pd_ee_delta_pos`	`pd_joint_delta_pos`
dimension	3	7
action	move gripper left	move gripper left by moving joints 1-7…
learning objective	what to do	what + how(IK solver)

It’s not just “more dimensions”. It’s that the policy must now learn the kinematics of the robot, which the IK solver was doing for free.

💬Question(4)

Imagine a 1D robot joint that should go to position $q^d=1.0$ , starting at $q=0.0$ . We have the following observations:

behavior A: overshoots to $q=1.5$ , back to $q=0.7$ , then $q=1.2$ , slowly settles to $q=1.0$
behavior B: … reaches $q=1.0$ and stays perfectly, no overshoot
behavior C: … slowly to $q=0.3$ , … $q=0.5$ , … $q=0.7$ , takes forever to reach $q=1.0$

We have these gain settings:

High $k_p$ Low $k_d$
Low $k_p$ Low $k_d$
Moderate $k_p$ High $k_d$

Which behavior goes with which gains?

🗣Answer(4)

It turns out that in these are the terms in control theory:

A: 1, High $k_p$ Low $k_d$ , underdamped, fast but risky, oscillation can damage your robot⚠️
B: 3, Moderate $k_p$ High $k_d$ , critically damped, sweet spot
C: 2, Low $k_p$ Low $k_d$ , overdamped, safe but slow, typically frustrates robot learning

💬Question(5) You’re training a policy with pd_joint_delta_pos. During evaluation, the robot arm moves way too slowly even though the policy outputs large actions. Which is more likely: $k_p$ is too high, or $k_p$ is too low?

🗣Answer(5): $k_p$ is too low. The spring is not strong enough to pull the joint to the target.

Summary

PID controller = proportional + integral + derivative feedback loop. Robotics usually uses PD(drop the $I$ ).
$\displaystyle \tau= k_p (q^d - q) + k_d(\dot{q}^d - \dot{q})$ : $k_p$ is spring(pulls to target) and $k_d$ is a damper(prevent overshoot)
tuning: high $k_p$ means stiff and fast but oscillation, high $k_d$ means smooth but slow. Sweet spot is moderate.
robot learning: the policy $\pi_{\theta}$ outputs action. The choice of using which controller defines the action space!
delta_pos vs target_delta_pos: the former resets from the current state (forgiving) while the latter accumulates (precise but risky)
why no $I$ term? It causes windup in contact-rich tasks.

Understanding PD Controllers in Robotics and Robot Learning

The Math Expression

The Franka Example

Example: One Joint, One Step

Exercise 1

PD Control in Robot Learning

Exercise 2

Summary

See also...