Dog_torque: Learning Sim2Real Walking for the Unitree Go1
by aihoque in Circuits > Robots
12 Views, 0 Favorites, 0 Comments
Dog_torque: Learning Sim2Real Walking for the Unitree Go1


We are creating a control policy for the Unitree Go1 in simulation to work on a real Unitree Go1. We train the robot in a Mujoco Environment, then we will test the policy and make sure it gives stable walking, then test the policy on a new robot. We control the robot by sending torque messages directly to its joints, and our policy takes in a state that is defined as follows:
- Base Linear Velocity
- Represents the linear velocity of the robot's base in 3D space:
- v=[vx,vy,vz]
- Base Angular Velocity
- The angular velocity of the base (roll, pitch, yaw rates):
- ω=[ωx,ωy,ωz]
- Projected Gravity Vector
- A 3D vector representing the direction of gravity as seen in the robot's local (body) frame, typically derived by rotating the world gravity vector through the inverse of the robot’s orientation quaternion.
- Desired Velocity Command
- The commanded linear velocity (e.g., from a user or planner) for the robot:
- v_d=[vx_d,vy_d,vz_d]
- Joint Positions
- Joint Velocities
- Last Action applied (dim=12)
Altogether this gives a state space dimension of 48, and an action space of 12.
Supplies
Unitree Go1 Robot
A linux machine for training Reinforcement Learning models (used ubuntu 22.04)
and my repository: https://github.com/aihoque2/dog_torque
Set Up the Code
Clone the repository and install Python dependencies:
Train Model With Cost Penalty
Edit your reward function, `_calc_reward()` in train_agent.py to:
I attached a `go1_torque_env.py` file that has the settings set exactly
and modify healthy in reward_weights to 0.0 and terminal in cost_weights to 1000.0 as such:
and your training will run like the gif below.
Downloads
Train Model With Health Bonus

Edit your reward function, `_calc_reward()` in train_agent.py to:
and modify healthy in reward_weights to 2.0 and terminal in cost_weights to 0.0 as such:
then, in train_agent.py, set
hopefully, your policy ends up with the result in the gif attached below. I created that gif by taking someone else's model and running test_agent.py. Use test_agent.py yourself to see the how well your policy ends up. If you have unstable results, set
and repeat step 2.
(TODO) Convert IMU to Projected Gravity Vector
We are now working on integrating our policy on a real robot. Create a program using data from the unitree_legged_sdk IMU. Compute the projected gravity vector using the IMU's quaternion orientation or acceleration data. I have not reached this far yet, but the pseudocode is below:
Implement your function to interpret gravity direction relative to the robot's base frame. This code should be written with C++ to better interface with unitree_legged_sdk
We use projected_gravity instead of the quaternion as the input of the state space in order to reduce the dimension of the state space in order to make the network less complex.
(TODO) Write Low-Level Torque Controller

Write a real-time control program for the Unitree Go1 using the unitree_legged_sdk. This program should:
- Read joint positions, velocities, and IMU data from LowState then convert it to projected gravity
- Use your trained policy to compute torque actions
- Send torque commands through LowCmd
after writing your code, your real robot should walk like the attached gif
I hope to come back to this project and get my policy working. I think I will have to repeat step 2 until I can get my average reward within the interval (-10.0, 0.0) and then go on to step 3. I just need more time to train the model. I could potentially use the already built policy I was able to take from someone else and try it on a GO1, but I need to figure out how to convert it into a libtorch format in order to integrate it to a real GO1.