Robotics Gym & Experiments

This post describes my gym-so100-c project. As part of my journey into robotics, I’ve built a basic simulation environment that matches my chosen platform, the Standard Open Arm – SO101, and used it to explore a wide set of libraries and tools.

For this next stage, I wanted a simple but flexible sim-first setup where I could try different approaches, compare them side by side, and build on what I learned. Even though I had access to physical robots, I’m more of a software person — I enjoy the comfort of my home office (and the flexibility to keep working while traveling) more than spending long days in a lab.

This project brings together multiple libraries — Stable-Baselines3, imitation, lerobot — in the same environment, without being tied to just one workflow, and also helps demystify lerobot for simulation learning, which still lacks clear examples.

What’s inside?

As of August 2025, the project includes:

Simulation Environment & Tasks

MuJoCo-based SO101 simulation environment
Multiple Gym task implementations (inspired by single-arm ALOHA setup)

Training integrations & scripts

Reinforcement learning with Stable-Baselines3 (Soft Actor-Critic as the most suitable algorithm for manipulation tasks)
Imitation learning with the imitation library (Behavior Cloning and other approaches)
Training with lerobot (ACT, Diffusion, VLAs) — from simulation or real data
Evaluation of policies trained with SAC, imitation, and lerobot — from simulation or real policies (so far: simulation-only evaluation)

Control & Data Collection

Teleoperation in simulation via keyboard or 8BitDo controller (that’s the one I had)
Recording of simulation episodes (for imitation learning with imitation or lerobot)
Custom dataset → lerobot dataset conversions and upload

Simulation Environment & Tasks

I wanted to create a simulation env with a good physics engine and use it with a variety of methods. I decided to structure it as a gym and to use MuJoCo as a physics engine. Because of that adapting gym-aloha (gym environment of a similar type of robot using MuJoCo) was a great fit.

Why do you need a ‘gym’?

It’s not necessary for a simulation, but it is an industry standard for Reinforcement Learning.

OpenAI Gym (and its maintained fork Gymnasium) defines a standard interface for reinforcement learning environments.
The two core methods every Gym env implements are:

reset() → returns the initial observation for a new episode.
step(action) → applies an action, advances the simulation by one step, and returns:

  obs, reward, terminated, truncated, info

In addition, environments define:

Observation space — what the agent sees (e.g., joint angles, object positions).
Action space — what the agent can control (e.g., target joint positions, gripper commands).
Reward function — the numeric feedback signal for learning.

This standardization means the same environment can be used with:

RL libraries like Stable-Baselines3 (SAC, PPO, HER…)
Imitation learning libraries like imitation
Custom training loops or evaluation pipelines

My gym-so100-c environment follows this interface closely, which makes it easy to switch between algorithms, compare results, and plug into tools like lerobot.

Background: ALOHA & gym-aloha

ALOHA (A Low-cost Open Hardware Arm) is a dual-arm teleoperation platform used in the ACT paper and follow-up work.
The open-source gym-aloha package replicates this setup in MuJoCo and provides ready-to-use Gym environments for imitation and reinforcement learning.

gym-aloha structure:

Environment core: env.py — loads the MuJoCo scene, handles reset/step, defines obs/action spaces, and integrates teleop/recording.
Task layers:
- Joint-space tasks: tasks/sim.py — actions are target joint positions written to data.ctrl; MuJoCo’s position actuators move joints to match.
- End-effector (mocap) tasks: tasks/sim_end_effector.py — actions are desired gripper poses applied to a mocap body; MuJoCo’s constraint solver adjusts joints to match.

Aspect	Joint-space control	End-effector (mocap) control
Action	Target joint positions → `data.ctrl`	Target gripper pose (position + orientation) via mocap
Control loop	Actuators drive joints toward commanded positions	Constraints drive joints to match mocap pose
Abstraction	Low-level, robot-specific	Higher-level, task-centric
Kinematics	Direct mapping; no IK	Implicit IK via MuJoCo constraints
Policy learning	Exposes full dynamics	Operates in Cartesian space
Real-world transfer	Straightforward if HW supports pos. control	Needs IK/operational-space control on robot

Tasks in ALOHA

Insertion — bimanual peg-in-hole task.
Cube transfer — pass a cube from one arm to the other.

Each ALOHA task includes a scripted policy using inverse kinematics (IK) + noise to generate synthetic demonstrations.
This is great for imitation learning because it can quickly produce large, diverse datasets without human teleop.

My adaptation: gym-so100-c

I adapted gym-aloha for a single SO101 arm (5-DoF + gripper) to match my hardware target.

Changes:

Replaced dual 6-DoF arms with one SO101.
Removed dual-arm logic and simplified obs/action spaces.
Ported only the joint-space control mode (clearer transfer to my target hardware).
Added an experimental end-effector teleop scene (mocap-based), though not yet used for training.

Task implemented:

Bin-a-cube:
- Cube starts at a random position on the table.
- Goal is to place it inside a fixed bin.
- Sparse reward for success, with optional shaping for approach, grasp, and alignment.

MuJoCo environment

The environment is defined in assets/so100_transfer_cube.xml:

Robot: SO101 single arm (5-DoF + gripper).
Actuators: position actuators, one per joint; actions = target joint positions.
Workspace: table, free-moving cube (red), goal bin (gray).
Sites: for gripper tip, cube, and bin goal — used for shaping and success checks.
Cameras: optional top/front views for logging or vision policies.

Environment registration

I register my tasks in gym_so100/__init__.py so they can be created with gym.make().
Because the register() calls live in __init__.py, you need to import gym_so100 at least once before using them.

from gymnasium.envs.registration import register

register(
    id="gym_so100/SO100TouchCube-v0",
    entry_point="gym_so100.env:SO100Env",
    max_episode_steps=300,
    nondeterministic=True,
    kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_touch_cube"},
)

register(
    id="gym_so100/SO100TouchCubeSparse-v0",
    entry_point="gym_so100.env:SO100Env",
    max_episode_steps=300,
    nondeterministic=True,
    kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_touch_cube_sparse"},
)

register(
    id="gym_so100/SO100CubeToBin-v0",
    entry_point="gym_so100.env:SO100Env",
    max_episode_steps=700,
    nondeterministic=True,
    kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_cube_to_bin"},
)

Once the package is installed:

import gym_so100               # triggers env registration
import gymnasium as gym

env = gym.make("gym_so100/SO100CubeToBin-v0")
obs, info = env.reset()

Learning & Training Workflows

This project uses a mix of offline imitation learning and online reinforcement learning, built on top of the imitation library, Stable-Baselines3, and Hugging Face’s LeRobot.

Scripts I provide

scripts/train_sac.py — my SAC training driver (parallel envs, callbacks, optional checkpoint resume, optional VecNormalize).
Example I actually ran:

python scripts/train_sac.py \
  --prefix bc_to_bin_2 \
  --task=gym_so100/SO100CubeToBin-v0 \
  --checkpoint outputs/checkpoints/bc_to_bin_simple_80000_steps.zip


* **`scripts/train_bc.py`** — Behavior Cloning from teleop demonstrations (with optional continuation using SAC).
  Example I actually ran:

  ```bash
  python scripts/train_bc.py \
    --demonstrations expert_demonstrations.pkl \
    --continue_with_sac \
    --sac_timesteps 50000

What I tried & what I learned

SAC (dense) vs SAC + HER (sparse): SAC with dense rewards performed better for me. HER made training much slower and didn’t beat dense SAC on my cube-to-bin task.
Stability: SAC sometimes diverged or plateaued. Larger buffers/batches and careful entropy scheduling helped, but it wasn’t perfect.
Observation resolution: I trained for a while on 64×48 inputs (fast!) but results were disappointing. I only noticed once I inspected the actual tensors going into the model (my rendered videos looked fine). Good reminder to verify inputs early.
VecNormalize: Mixed feelings. It can help, but it also couples env stats with the model and adds setup complexity. I’m leaning toward doing normalization inside the model rather than via an env wrapper.
Parallelism: Stable-Baselines’ vectorized envs were genuinely useful for throughput and a bit of stability. Performance caveat: On macOS, with MuJoCo running on CPU, vectorized environments were surprisingly fast. On Colab, the same setup was painfully slow, likely due to weaker single-thread CPU performance and different MuJoCo builds.

LeRobot (dataset-first)

For LeRobot, I:

Recorded my own demonstrations in this sim.
Converted them to LeRobot’s dataset format with my conversion script.
Trained using LeRobot’s official training script/documentation.

Once the dataset was in the right format, training was straightforward. (I’m intentionally not adding any CLI here—see LeRobot’s docs for the exact commands you use.)

Control & Data Collection

While much of my training relied on scripted or learned policies, I also spent time building tools to manually control the SO100 in simulation — both to understand the robot’s kinematics and to produce demonstration data when needed.

This section covers:

Joint-space teleop (scripts/teleop.py)
End-effector teleop (scripts/teleop_ee.py)
Gamepad input framework (scripts/input_controller.py)
Recording teleop demonstrations (lerobot integration)

1. Joint-space teleop — `scripts/teleop.py`

Purpose:
A minimal MuJoCo viewer loop where the keyboard directly updates desired joint positions, which are then written to data.ctrl via the unnormalize_so100() helper.

Scene & setup:

XML: assets/so100_transfer_cube.xml
Uses SO100_START_ARM_POSE from gym_so100.constants for initialization.

Control mapping (key → joint delta):

← / → → Base rotation
↑ / ↓ → Shoulder
+ / - → Elbow
V / B → Wrist pitch
G / H → Wrist rotation
5 / 6 → Gripper open/close

Flow:

pose = normalize_so100(SO100_START_ARM_POSE)
# modify pose[...] based on key
env_action = unnormalize_so100(pose)
np.copyto(data.ctrl, env_action)

This mode is direct and predictable, but only works in the robot’s joint space — no Cartesian constraints.

2. End-effector teleop — `scripts/teleop_ee.py`

Purpose: Control the gripper in Cartesian space by moving a mocap body; MuJoCo’s constraints drive the joints to follow. This is purely for exploration — no demonstration recording.

Scene & setup:

XML: assets/so100_transfer_cube_ee.xml
Mocap body index: MOCAP_INDEX = 0

Translation controls:

↑ / ↓ → mocap_pos[0, 2] ± 0.01
← / → → mocap_pos[0, 0] ∓ 0.01
+ / - → mocap_pos[0, 1] ± 0.01

Rotation controls (using pyquaternion):

Q / A → ±10° around X-axis
W / S → ±10° around Y-axis
E / D → ±10° around Z-axis

Gripper (still joint-space):

5 / 6 → data.ctrl[5] ± 0.05

This approach is closer to pose-space control and could be paired with IK for scripted demos. Keyboard is awkward for 6-DoF — VR or tracked-hand controllers would make this far more intuitive.

3. Gamepad input framework — `scripts/input_controller.py`

To make control smoother, I implemented an input abstraction:

InputController — Base class for controllers that output motion deltas (dx, dy, dz), gripper commands, and episode control flags.
GamepadControllerHID — HIDAPI-based driver supporting Logitech, Xbox, PS4/PS5, and 8BitDo devices.
- Reads joystick axes → motion deltas (with deadzone filtering)
- Button mapping for gripper, intervention, and episode end status
- Example: get_deltas() returns (delta_x, delta_y, delta_z) in meters.

While not yet fully integrated into a recording pipeline, this makes it possible to swap keyboard for analog input without changing teleop logic.

4. Recording teleop demonstrations

For lerobot imitation learning, I recorded my own teleop demonstrations and then converted them into the lerobot dataset format. Once in that format, I could use lerobot’s official training scripts directly. This gave me a practical way to bootstrap policies without relying solely on scripted IK demos.

Summary: These tools let me explore the SO100 in both joint and end-effector spaces, experiment with input devices, and collect custom datasets for imitation learning. Even without recording, the teleop scripts were valuable for debugging the scene setup, reward shaping, and understanding the difference between action spaces in MuJoCo.

Integration & Evaluation

Got it — you want Integration & Evaluation to cover both scripts:

upload_lerobot_demos.py → convert + upload dataset to LeRobot Hub
evaluate_lerobot_policy.py → run rollouts, measure total reward, save video

Here’s a combined draft:

Integration & Evaluation

This stage connects collected demonstrations to the training and evaluation pipeline in LeRobot.

1. Uploading Expert Demonstrations — `scripts/upload_lerobot_demos.py`

This script converts locally recorded expert demonstrations into the LeRobot dataset format and optionally pushes them to the Hub.

Key steps:

Load demonstrations from a .pkl file (recorded via teleop or scripted policies).
Create a LeRobot dataset with a consistent features schema:
- observation.images.top → RGB camera frames (3, 480, 640)
- observation.state → robot joint positions (6,)
- action → matching control commands (6,)
- next.reward, next.success, seed, timestamp
Iterate through episodes:
- Squeeze batch dimensions where needed (e.g., (1,3,H,W) → (3,H,W)).
- Convert normalized floats to uint8 for pixel data if necessary.
- Track and log total reward per episode.
Save locally and push to the Hugging Face Hub under user_id/repo_id.

Example:

python scripts/upload_lerobot_demos.py \
  --demonstrations expert_demonstrations.pkl \
  --env-name gym_so100/SO100CubeToBin-v0 \
  --user-id myhfusername \
  --repo-id cube_to_bin_demos \
  --root ./dataset

Why this matters: Once in LeRobot’s dataset format, your demonstrations can be used with any compatible policy without further preprocessing. This also makes sharing datasets trivial via the Hub.

2. Policy Evaluation and Video Recording — `scripts/evaluate_lerobot_policy.py`

Evaluates a trained LeRobot policy in a gym_so100 environment, recording both video rollouts and numeric reward metrics.

Supported policy types:

act
diffusion
pi0fast
smolVLA

Flow:

Load policy from local path or Hub.
Verify I/O — prints expected features from policy config and env spaces.
Run episodes:
- Formats observation dict for policy input.
- Calls policy.select_action(...).
- Steps through the env, collecting:
  - Per-step reward
  - Rendered frames
  - Accumulated total reward
Report results:
- best_reward and average_reward across all episodes.
- Success/failure per episode.
Save video to MP4.

Example:

python scripts/evaluate_lerobot_policy.py \
  --policy_type act \
  --policy_path myhfusername/cube_to_bin_policy \
  --num_episodes 5 \
  --normalize True \
  --video_output_dir ./videos

Why this matters: LeRobot’s built-in examples focus on qualitative outputs (videos). This script adds quantitative performance metrics — letting you track improvement, compare policies, or detect regressions.

Workflow summary:

(1) Record demos → (2) upload_lerobot_demos.py → Hub dataset
             ↓
       Train policy (LeRobot)
             ↓
(3) evaluate_lerobot_policy.py → video + reward metrics

Credits & Acknowledgements

Main influences

lerobot — Hugging Face’s open-source robot learning platform, which inspired the dataset conversion and training pipeline integrations.
Physical Intelligence Group — for their research on robotic manipulation and control, which shaped my choice of tasks and algorithms.
CS285 (UC Berkeley Deep Reinforcement Learning) — a key learning resource that informed my reinforcement learning experiments.

Reference repositories

Gym Aloha — introduced with the ACT paper; served as a design reference for my Gym task implementations.
gym-so100 — port of SO100 to the Gym Aloha structure, used for understanding task setup.
MuJoCo Menagerie: SO-ARM100 — official MuJoCo models of the SO101 arm, used as the base for my simulation environment.

Robotics Gym & Experiments

What’s inside?

Simulation Environment & Tasks

Why do you need a ‘gym’?

Background: ALOHA & gym-aloha

Tasks in ALOHA

My adaptation: gym-so100-c

MuJoCo environment

Environment registration

Learning & Training Workflows

Scripts I provide

What I tried & what I learned

LeRobot (dataset-first)

Control & Data Collection

1. Joint-space teleop — scripts/teleop.py

2. End-effector teleop — scripts/teleop_ee.py

3. Gamepad input framework — scripts/input_controller.py

4. Recording teleop demonstrations

Integration & Evaluation

Integration & Evaluation

1. Uploading Expert Demonstrations — scripts/upload_lerobot_demos.py

2. Policy Evaluation and Video Recording — scripts/evaluate_lerobot_policy.py

Credits & Acknowledgements

1. Joint-space teleop — `scripts/teleop.py`

2. End-effector teleop — `scripts/teleop_ee.py`

3. Gamepad input framework — `scripts/input_controller.py`

1. Uploading Expert Demonstrations — `scripts/upload_lerobot_demos.py`

2. Policy Evaluation and Video Recording — `scripts/evaluate_lerobot_policy.py`