Robotics Gym & Experiments
This post describes my gym-so100-c project. As part of my journey into robotics, I’ve built a basic simulation environment that matches my chosen platform, the Standard Open Arm – SO101, and used it to explore a wide set of libraries and tools.
For this next stage, I wanted a simple but flexible sim-first setup where I could try different approaches, compare them side by side, and build on what I learned. Even though I had access to physical robots, I’m more of a software person — I enjoy the comfort of my home office (and the flexibility to keep working while traveling) more than spending long days in a lab.
This project brings together multiple libraries — Stable-Baselines3, imitation, lerobot — in the same environment, without being tied to just one workflow, and also helps demystify lerobot for simulation learning, which still lacks clear examples.
What’s inside?
As of August 2025, the project includes:
Simulation Environment & Tasks
- MuJoCo-based SO101 simulation environment
- Multiple Gym task implementations (inspired by single-arm ALOHA setup)
Training integrations & scripts
- Reinforcement learning with Stable-Baselines3 (Soft Actor-Critic as the most suitable algorithm for manipulation tasks)
- Imitation learning with the
imitationlibrary (Behavior Cloning and other approaches) - Training with lerobot (ACT, Diffusion, VLAs) — from simulation or real data
- Evaluation of policies trained with SAC,
imitation, and lerobot — from simulation or real policies (so far: simulation-only evaluation)
Control & Data Collection
- Teleoperation in simulation via keyboard or 8BitDo controller (that’s the one I had)
- Recording of simulation episodes (for imitation learning with
imitationorlerobot) - Custom dataset →
lerobotdataset conversions and upload
Simulation Environment & Tasks
I wanted to create a simulation env with a good physics engine and use it with a variety of methods. I decided to structure it as a gym and to use MuJoCo as a physics engine. Because of that adapting gym-aloha (gym environment of a similar type of robot using MuJoCo) was a great fit.
Why do you need a ‘gym’?
It’s not necessary for a simulation, but it is an industry standard for Reinforcement Learning.
OpenAI Gym (and its maintained fork Gymnasium) defines a standard interface for reinforcement learning environments.
The two core methods every Gym env implements are:
reset()→ returns the initial observation for a new episode.step(action)→ applies an action, advances the simulation by one step, and returns:
obs, reward, terminated, truncated, info
In addition, environments define:
- Observation space — what the agent sees (e.g., joint angles, object positions).
- Action space — what the agent can control (e.g., target joint positions, gripper commands).
- Reward function — the numeric feedback signal for learning.
This standardization means the same environment can be used with:
- RL libraries like Stable-Baselines3 (
SAC,PPO,HER…) - Imitation learning libraries like
imitation - Custom training loops or evaluation pipelines
My gym-so100-c environment follows this interface closely, which makes it easy to switch between algorithms, compare results, and plug into tools like lerobot.
Background: ALOHA & gym-aloha
ALOHA (A Low-cost Open Hardware Arm) is a dual-arm teleoperation platform used in the ACT paper and follow-up work.
The open-source gym-aloha package replicates this setup in MuJoCo and provides ready-to-use Gym environments for imitation and reinforcement learning.
gym-aloha structure:
- Environment core:
env.py— loads the MuJoCo scene, handlesreset/step, defines obs/action spaces, and integrates teleop/recording. - Task layers:
- Joint-space tasks:
tasks/sim.py— actions are target joint positions written todata.ctrl; MuJoCo’s position actuators move joints to match. - End-effector (mocap) tasks:
tasks/sim_end_effector.py— actions are desired gripper poses applied to a mocap body; MuJoCo’s constraint solver adjusts joints to match.
- Joint-space tasks:
| Aspect | Joint-space control | End-effector (mocap) control |
|---|---|---|
| Action | Target joint positions → data.ctrl |
Target gripper pose (position + orientation) via mocap |
| Control loop | Actuators drive joints toward commanded positions | Constraints drive joints to match mocap pose |
| Abstraction | Low-level, robot-specific | Higher-level, task-centric |
| Kinematics | Direct mapping; no IK | Implicit IK via MuJoCo constraints |
| Policy learning | Exposes full dynamics | Operates in Cartesian space |
| Real-world transfer | Straightforward if HW supports pos. control | Needs IK/operational-space control on robot |
Tasks in ALOHA
- Insertion — bimanual peg-in-hole task.
- Cube transfer — pass a cube from one arm to the other.
Each ALOHA task includes a scripted policy using inverse kinematics (IK) + noise to generate synthetic demonstrations.
This is great for imitation learning because it can quickly produce large, diverse datasets without human teleop.
My adaptation: gym-so100-c
I adapted gym-aloha for a single SO101 arm (5-DoF + gripper) to match my hardware target.
Changes:
- Replaced dual 6-DoF arms with one SO101.
- Removed dual-arm logic and simplified obs/action spaces.
- Ported only the joint-space control mode (clearer transfer to my target hardware).
- Added an experimental end-effector teleop scene (mocap-based), though not yet used for training.
Task implemented:
- Bin-a-cube:
- Cube starts at a random position on the table.
- Goal is to place it inside a fixed bin.
- Sparse reward for success, with optional shaping for approach, grasp, and alignment.
MuJoCo environment
The environment is defined in assets/so100_transfer_cube.xml:
- Robot: SO101 single arm (5-DoF + gripper).
- Actuators: position actuators, one per joint; actions = target joint positions.
- Workspace: table, free-moving cube (red), goal bin (gray).
- Sites: for gripper tip, cube, and bin goal — used for shaping and success checks.
- Cameras: optional top/front views for logging or vision policies.
Environment registration
I register my tasks in gym_so100/__init__.py so they can be created with gym.make().
Because the register() calls live in __init__.py, you need to import gym_so100 at least once before using them.
from gymnasium.envs.registration import register
register(
id="gym_so100/SO100TouchCube-v0",
entry_point="gym_so100.env:SO100Env",
max_episode_steps=300,
nondeterministic=True,
kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_touch_cube"},
)
register(
id="gym_so100/SO100TouchCubeSparse-v0",
entry_point="gym_so100.env:SO100Env",
max_episode_steps=300,
nondeterministic=True,
kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_touch_cube_sparse"},
)
register(
id="gym_so100/SO100CubeToBin-v0",
entry_point="gym_so100.env:SO100Env",
max_episode_steps=700,
nondeterministic=True,
kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_cube_to_bin"},
)
Once the package is installed:
import gym_so100 # triggers env registration
import gymnasium as gym
env = gym.make("gym_so100/SO100CubeToBin-v0")
obs, info = env.reset()
Learning & Training Workflows
This project uses a mix of offline imitation learning and online reinforcement learning, built on top of the imitation library, Stable-Baselines3, and Hugging Face’s LeRobot.
Scripts I provide
scripts/train_sac.py— my SAC training driver (parallel envs, callbacks, optional checkpoint resume, optional VecNormalize).
Example I actually ran:python scripts/train_sac.py \ --prefix bc_to_bin_2 \ --task=gym_so100/SO100CubeToBin-v0 \ --checkpoint outputs/checkpoints/bc_to_bin_simple_80000_steps.zip
* **`scripts/train_bc.py`** — Behavior Cloning from teleop demonstrations (with optional continuation using SAC).
Example I actually ran:
```bash
python scripts/train_bc.py \
--demonstrations expert_demonstrations.pkl \
--continue_with_sac \
--sac_timesteps 50000
What I tried & what I learned
- SAC (dense) vs SAC + HER (sparse): SAC with dense rewards performed better for me. HER made training much slower and didn’t beat dense SAC on my cube-to-bin task.
- Stability: SAC sometimes diverged or plateaued. Larger buffers/batches and careful entropy scheduling helped, but it wasn’t perfect.
- Observation resolution: I trained for a while on 64×48 inputs (fast!) but results were disappointing. I only noticed once I inspected the actual tensors going into the model (my rendered videos looked fine). Good reminder to verify inputs early.
- VecNormalize: Mixed feelings. It can help, but it also couples env stats with the model and adds setup complexity. I’m leaning toward doing normalization inside the model rather than via an env wrapper.
- Parallelism: Stable-Baselines’ vectorized envs were genuinely useful for throughput and a bit of stability. Performance caveat: On macOS, with MuJoCo running on CPU, vectorized environments were surprisingly fast. On Colab, the same setup was painfully slow, likely due to weaker single-thread CPU performance and different MuJoCo builds.
LeRobot (dataset-first)
For LeRobot, I:
- Recorded my own demonstrations in this sim.
- Converted them to LeRobot’s dataset format with my conversion script.
- Trained using LeRobot’s official training script/documentation.
Once the dataset was in the right format, training was straightforward. (I’m intentionally not adding any CLI here—see LeRobot’s docs for the exact commands you use.)
Control & Data Collection
While much of my training relied on scripted or learned policies, I also spent time building tools to manually control the SO100 in simulation — both to understand the robot’s kinematics and to produce demonstration data when needed.
This section covers:
- Joint-space teleop (
scripts/teleop.py) - End-effector teleop (
scripts/teleop_ee.py) - Gamepad input framework (
scripts/input_controller.py) - Recording teleop demonstrations (lerobot integration)
1. Joint-space teleop — scripts/teleop.py
Purpose:
A minimal MuJoCo viewer loop where the keyboard directly updates desired joint positions, which are then written to data.ctrl via the unnormalize_so100() helper.
Scene & setup:
- XML:
assets/so100_transfer_cube.xml - Uses
SO100_START_ARM_POSEfromgym_so100.constantsfor initialization.
Control mapping (key → joint delta):
← / →→ Base rotation↑ / ↓→ Shoulder+ / -→ ElbowV / B→ Wrist pitchG / H→ Wrist rotation5 / 6→ Gripper open/close
Flow:
pose = normalize_so100(SO100_START_ARM_POSE)
# modify pose[...] based on key
env_action = unnormalize_so100(pose)
np.copyto(data.ctrl, env_action)
This mode is direct and predictable, but only works in the robot’s joint space — no Cartesian constraints.
2. End-effector teleop — scripts/teleop_ee.py
Purpose: Control the gripper in Cartesian space by moving a mocap body; MuJoCo’s constraints drive the joints to follow. This is purely for exploration — no demonstration recording.
Scene & setup:
- XML:
assets/so100_transfer_cube_ee.xml - Mocap body index:
MOCAP_INDEX = 0
Translation controls:
↑ / ↓→mocap_pos[0, 2]± 0.01← / →→mocap_pos[0, 0]∓ 0.01+ / -→mocap_pos[0, 1]± 0.01
Rotation controls (using pyquaternion):
Q / A→ ±10° around X-axisW / S→ ±10° around Y-axisE / D→ ±10° around Z-axis
Gripper (still joint-space):
5 / 6→data.ctrl[5]± 0.05
This approach is closer to pose-space control and could be paired with IK for scripted demos. Keyboard is awkward for 6-DoF — VR or tracked-hand controllers would make this far more intuitive.
3. Gamepad input framework — scripts/input_controller.py
To make control smoother, I implemented an input abstraction:
-
InputController— Base class for controllers that output motion deltas (dx, dy, dz), gripper commands, and episode control flags. -
GamepadControllerHID— HIDAPI-based driver supporting Logitech, Xbox, PS4/PS5, and 8BitDo devices.- Reads joystick axes → motion deltas (with deadzone filtering)
- Button mapping for gripper, intervention, and episode end status
- Example:
get_deltas()returns(delta_x, delta_y, delta_z)in meters.
While not yet fully integrated into a recording pipeline, this makes it possible to swap keyboard for analog input without changing teleop logic.
4. Recording teleop demonstrations
For lerobot imitation learning, I recorded my own teleop demonstrations and then converted them into the lerobot dataset format.
Once in that format, I could use lerobot’s official training scripts directly.
This gave me a practical way to bootstrap policies without relying solely on scripted IK demos.
Summary: These tools let me explore the SO100 in both joint and end-effector spaces, experiment with input devices, and collect custom datasets for imitation learning. Even without recording, the teleop scripts were valuable for debugging the scene setup, reward shaping, and understanding the difference between action spaces in MuJoCo.
Integration & Evaluation
Got it — you want Integration & Evaluation to cover both scripts:
upload_lerobot_demos.py→ convert + upload dataset to LeRobot Hubevaluate_lerobot_policy.py→ run rollouts, measure total reward, save video
Here’s a combined draft:
Integration & Evaluation
This stage connects collected demonstrations to the training and evaluation pipeline in LeRobot.
1. Uploading Expert Demonstrations — scripts/upload_lerobot_demos.py
This script converts locally recorded expert demonstrations into the LeRobot dataset format and optionally pushes them to the Hub.
Key steps:
-
Load demonstrations from a
.pklfile (recorded via teleop or scripted policies). -
Create a LeRobot dataset with a consistent
featuresschema:observation.images.top→ RGB camera frames(3, 480, 640)observation.state→ robot joint positions(6,)action→ matching control commands(6,)next.reward,next.success,seed,timestamp
-
Iterate through episodes:
- Squeeze batch dimensions where needed (e.g.,
(1,3,H,W)→(3,H,W)). - Convert normalized floats to
uint8for pixel data if necessary. - Track and log total reward per episode.
- Squeeze batch dimensions where needed (e.g.,
-
Save locally and push to the Hugging Face Hub under
user_id/repo_id.
Example:
python scripts/upload_lerobot_demos.py \
--demonstrations expert_demonstrations.pkl \
--env-name gym_so100/SO100CubeToBin-v0 \
--user-id myhfusername \
--repo-id cube_to_bin_demos \
--root ./dataset
Why this matters: Once in LeRobot’s dataset format, your demonstrations can be used with any compatible policy without further preprocessing. This also makes sharing datasets trivial via the Hub.
2. Policy Evaluation and Video Recording — scripts/evaluate_lerobot_policy.py
Evaluates a trained LeRobot policy in a gym_so100 environment, recording both video rollouts and numeric reward metrics.
Supported policy types:
actdiffusionpi0fastsmolVLA
Flow:
-
Load policy from local path or Hub.
-
Verify I/O — prints expected features from policy config and env spaces.
-
Run episodes:
-
Formats observation dict for policy input.
-
Calls
policy.select_action(...). -
Steps through the env, collecting:
- Per-step reward
- Rendered frames
- Accumulated total reward
-
-
Report results:
best_rewardandaverage_rewardacross all episodes.- Success/failure per episode.
-
Save video to MP4.
Example:
python scripts/evaluate_lerobot_policy.py \
--policy_type act \
--policy_path myhfusername/cube_to_bin_policy \
--num_episodes 5 \
--normalize True \
--video_output_dir ./videos
Why this matters: LeRobot’s built-in examples focus on qualitative outputs (videos). This script adds quantitative performance metrics — letting you track improvement, compare policies, or detect regressions.
Workflow summary:
(1) Record demos → (2) upload_lerobot_demos.py → Hub dataset
↓
Train policy (LeRobot)
↓
(3) evaluate_lerobot_policy.py → video + reward metrics
Credits & Acknowledgements
Main influences
- lerobot — Hugging Face’s open-source robot learning platform, which inspired the dataset conversion and training pipeline integrations.
- Physical Intelligence Group — for their research on robotic manipulation and control, which shaped my choice of tasks and algorithms.
- CS285 (UC Berkeley Deep Reinforcement Learning) — a key learning resource that informed my reinforcement learning experiments.
Reference repositories
- Gym Aloha — introduced with the ACT paper; served as a design reference for my Gym task implementations.
- gym-so100 — port of SO100 to the Gym Aloha structure, used for understanding task setup.
- MuJoCo Menagerie: SO-ARM100 — official MuJoCo models of the SO101 arm, used as the base for my simulation environment.