Dog_torque: Learning Sim2Real Walking for the Unitree Go1

by aihoque in Circuits > Robots

12 Views, 0 Favorites, 0 Comments

Dog_torque: Learning Sim2Real Walking for the Unitree Go1

profile_pic.png
Screenshot from 2025-05-11 23-01-16.png

We are creating a control policy for the Unitree Go1 in simulation to work on a real Unitree Go1. We train the robot in a Mujoco Environment, then we will test the policy and make sure it gives stable walking, then test the policy on a new robot. We control the robot by sending torque messages directly to its joints, and our policy takes in a state that is defined as follows:


  1. Base Linear Velocity
  2. Represents the linear velocity of the robot's base in 3D space:
  3. v=[vx,vy,vz]
  4. Base Angular Velocity
  5. The angular velocity of the base (roll, pitch, yaw rates):
  6. ω=[ωx,ωy,ωz]
  7. Projected Gravity Vector
  8. A 3D vector representing the direction of gravity as seen in the robot's local (body) frame, typically derived by rotating the world gravity vector through the inverse of the robot’s orientation quaternion.
  9. Desired Velocity Command
  10. The commanded linear velocity (e.g., from a user or planner) for the robot:
  11. v_d=[vx_d,vy_d,vz_d]
  12. Joint Positions
  13. Joint Velocities
  14. Last Action applied (dim=12)

Altogether this gives a state space dimension of 48, and an action space of 12.

Supplies

Unitree Go1 Robot

A linux machine for training Reinforcement Learning models (used ubuntu 22.04)

and my repository: https://github.com/aihoque2/dog_torque

Set Up the Code

Clone the repository and install Python dependencies:

git clone https://github.com/aihoque2/dog_torque
cd dog_torque
pip install -r requirements.txt


Train Model With Cost Penalty

Edit your reward function, `_calc_reward()` in train_agent.py to:

reward = rewards - 0.3 * costs

I attached a `go1_torque_env.py` file that has the settings set exactly


and modify healthy in reward_weights to 0.0 and terminal in cost_weights to 1000.0 as such:


# reward and cost weights
self.reward_weights = {
"linear_vel_tracking": 2.0,
"angular_vel_tracking": 1.0,
"healthy": 0.0 # SET ME TO ZERO
"feet_airtime": 1.0
}

self.cost_weights = {
"torque": 0.0002,
"vertical_vel": 2.0,
"xy_angular_vel": 0.05,
"action_rate": 0.01,
"joint_limit": 10.0,
"joint_velocity": 0.01,
"joint_acceleration": 2.5e-7,
"orientation": 1.0,
"collision": 1.0,
"default_joint_position": 0.1,
"terminal": 10000.0 # SET ME TO 1000.0
}

and your training will run like the gif below.

Train Model With Health Bonus

dawg_success.gif

Edit your reward function, `_calc_reward()` in train_agent.py to:

reward = max(0.0, rewards-costs)


and modify healthy in reward_weights to 2.0 and terminal in cost_weights to 0.0 as such:


# reward and cost weights
self.reward_weights = {
"linear_vel_tracking": 2.0,
"angular_vel_tracking": 1.0,
"healthy": 2.0 # SET ME TO 2.0
"feet_airtime": 1.0
}

self.cost_weights = {
"torque": 0.0002,
"vertical_vel": 2.0,
"xy_angular_vel": 0.05,
"action_rate": 0.01,
"joint_limit": 10.0,
"joint_velocity": 0.01,
"joint_acceleration": 2.5e-7,
"orientation": 1.0,
"collision": 1.0,
"default_joint_position": 0.1,
"terminal": 0.0 # SET ME TO 0.0
}

then, in train_agent.py, set

load_pretrained_policy = True

hopefully, your policy ends up with the result in the gif attached below. I created that gif by taking someone else's model and running test_agent.py. Use test_agent.py yourself to see the how well your policy ends up. If you have unstable results, set

load_pretrained_policy = False

and repeat step 2.

(TODO) Convert IMU to Projected Gravity Vector

We are now working on integrating our policy on a real robot. Create a program using data from the unitree_legged_sdk IMU. Compute the projected gravity vector using the IMU's quaternion orientation or acceleration data. I have not reached this far yet, but the pseudocode is below:

function getProjectedGravityVector(quaternion q):
// Step 1: Normalize the quaternion (if not already normalized)
q = normalize(q)

// Step 2: Convert quaternion to rotation matrix R of the body frame in space frame coords
R = quaternionToRotationMatrix(q)

// Step 3: Define world gravity vector in global/world frame
g_world = [0, 0, -1] // assuming Z-down convention

// Step 4: Project gravity into robot base frame
g_body = transpose(R) * g_world

// Step 5: Return the projected gravity vector in body frame
return g_body

Implement your function to interpret gravity direction relative to the robot's base frame. This code should be written with C++ to better interface with unitree_legged_sdk


We use projected_gravity instead of the quaternion as the input of the state space in order to reduce the dimension of the state space in order to make the network less complex.

(TODO) Write Low-Level Torque Controller

walk1.gif

Write a real-time control program for the Unitree Go1 using the unitree_legged_sdk. This program should:

  1. Read joint positions, velocities, and IMU data from LowState then convert it to projected gravity
  2. Use your trained policy to compute torque actions
  3. Send torque commands through LowCmd

after writing your code, your real robot should walk like the attached gif


I hope to come back to this project and get my policy working. I think I will have to repeat step 2 until I can get my average reward within the interval (-10.0, 0.0) and then go on to step 3. I just need more time to train the model. I could potentially use the already built policy I was able to take from someone else and try it on a GO1, but I need to figure out how to convert it into a libtorch format in order to integrate it to a real GO1.