Dog_torque: Learning Sim2Real Walking for the Unitree Go1

We are creating a control policy for the Unitree Go1 in simulation to work on a real Unitree Go1. We train the robot in a Mujoco Environment, then we will test the policy and make sure it gives stable walking, then test the policy on a new robot. We control the robot by sending torque messages directly to its joints, and our policy takes in a state that is defined as follows:

Base Linear Velocity
Represents the linear velocity of the robot's base in 3D space:
v=[vx,vy,vz]
Base Angular Velocity
The angular velocity of the base (roll, pitch, yaw rates):
ω=[ωx,ωy,ωz]
Projected Gravity Vector
A 3D vector representing the direction of gravity as seen in the robot's local (body) frame, typically derived by rotating the world gravity vector through the inverse of the robot’s orientation quaternion.
Desired Velocity Command
The commanded linear velocity (e.g., from a user or planner) for the robot:
v_d=[vx_d,vy_d,vz_d]
Joint Positions
Joint Velocities
Last Action applied (dim=12)

Altogether this gives a state space dimension of 48, and an action space of 12.

Supplies

Unitree Go1 Robot

A linux machine for training Reinforcement Learning models (used ubuntu 22.04)

and my repository: https://github.com/aihoque2/dog_torque

Set Up the Code

Clone the repository and install Python dependencies:

git clone https://github.com/aihoque2/dog_torque

cd dog_torque

pip install -r requirements.txt

Train Model With Cost Penalty

Edit your reward function, `_calc_reward()` in train_agent.py to:

reward = rewards - 0.3 * costs

I attached a `go1_torque_env.py` file that has the settings set exactly

and modify healthy in reward_weights to 0.0 and terminal in cost_weights to 1000.0 as such:

# reward and cost weights

self.reward_weights = {

"linear_vel_tracking": 2.0,

"angular_vel_tracking": 1.0,

"healthy": 0.0 # SET ME TO ZERO

"feet_airtime": 1.0

}

self.cost_weights = {

"torque": 0.0002,

"vertical_vel": 2.0,

"xy_angular_vel": 0.05,

"action_rate": 0.01,

"joint_limit": 10.0,

"joint_velocity": 0.01,

"joint_acceleration": 2.5e-7,

"orientation": 1.0,

"collision": 1.0,

"default_joint_position": 0.1,

"terminal": 10000.0 # SET ME TO 1000.0

}

and your training will run like the gif below.

Downloads

go1_torque_env_step2.py

Train Model With Health Bonus

Edit your reward function, `_calc_reward()` in train_agent.py to:

reward = max(0.0, rewards-costs)

and modify healthy in reward_weights to 2.0 and terminal in cost_weights to 0.0 as such:

# reward and cost weights

self.reward_weights = {

"linear_vel_tracking": 2.0,

"angular_vel_tracking": 1.0,

"healthy": 2.0 # SET ME TO 2.0

"feet_airtime": 1.0

}

self.cost_weights = {

"torque": 0.0002,

"vertical_vel": 2.0,

"xy_angular_vel": 0.05,

"action_rate": 0.01,

"joint_limit": 10.0,

"joint_velocity": 0.01,

"joint_acceleration": 2.5e-7,

"orientation": 1.0,

"collision": 1.0,

"default_joint_position": 0.1,

"terminal": 0.0 # SET ME TO 0.0

}

then, in train_agent.py, set

load_pretrained_policy = True

hopefully, your policy ends up with the result in the gif attached below. I created that gif by taking someone else's model and running test_agent.py. Use test_agent.py yourself to see the how well your policy ends up. If you have unstable results, set

load_pretrained_policy = False

and repeat step 2.

Downloads

go1_torque_env_step3.py

train_agent_step3.py

(TODO) Convert IMU to Projected Gravity Vector

We are now working on integrating our policy on a real robot. Create a program using data from the unitree_legged_sdk IMU. Compute the projected gravity vector using the IMU's quaternion orientation or acceleration data. I have not reached this far yet, but the pseudocode is below:

function getProjectedGravityVector(quaternion q):

// Step 1: Normalize the quaternion (if not already normalized)

q = normalize(q)

// Step 2: Convert quaternion to rotation matrix R of the body frame in space frame coords

R = quaternionToRotationMatrix(q)

// Step 3: Define world gravity vector in global/world frame

g_world = [0, 0, -1] // assuming Z-down convention

// Step 4: Project gravity into robot base frame

g_body = transpose(R) * g_world

// Step 5: Return the projected gravity vector in body frame

return g_body

Implement your function to interpret gravity direction relative to the robot's base frame. This code should be written with C++ to better interface with unitree_legged_sdk

We use projected_gravity instead of the quaternion as the input of the state space in order to reduce the dimension of the state space in order to make the network less complex.

(TODO) Write Low-Level Torque Controller

Write a real-time control program for the Unitree Go1 using the unitree_legged_sdk. This program should:

Read joint positions, velocities, and IMU data from LowState then convert it to projected gravity
Use your trained policy to compute torque actions
Send torque commands through LowCmd

after writing your code, your real robot should walk like the attached gif

I hope to come back to this project and get my policy working. I think I will have to repeat step 2 until I can get my average reward within the interval (-10.0, 0.0) and then go on to step 3. I just need more time to train the model. I could potentially use the already built policy I was able to take from someone else and try it on a GO1, but I need to figure out how to convert it into a libtorch format in order to integrate it to a real GO1.