Creating a Robotics Experimentation Environment: My Experience and Practical Lessons

Part 1 of a series on practical robotics experimentation

As part of my journey into robotics, I found myself facing a classic problem: I wanted to experiment with different learning approaches for robotic manipulation, but I needed a flexible playground where I could quickly test ideas without being locked into any single framework or workflow.

The result is gym-so100-c, a simulation environment built around the Standard Open Arm SO101 that bridges multiple machine learning libraries—Stable-Baselines3, the imitation library, and Hugging Face’s LeRobot—all in one cohesive environment.

The “Sim-First” Decision

Even though I had access to physical robots, I chose a sim-first approach for a simple reason: I’m more of a software person who enjoys the comfort of my home office and the flexibility to keep working while traveling, rather than spending long days in a lab.

This decision shaped everything about the project. I needed a simulation that was:

Physically realistic enough to eventually transfer to hardware
Fast enough for thousands of training episodes
Flexible enough to work with different learning paradigms
Simple enough that I could understand and modify every component

Why Build Another Gym Environment?

You might wonder: why not just use an existing simulation? I wanted something both flexible and reflecting my hardware platform well in simulation. The robotics world offers many compelling options.

The Simulation Landscape I Considered:

Isaac Sim was tempting—NVIDIA’s powerhouse with photorealistic rendering and advanced physics. But it requires a proper GPU setup, and I wanted to start experimenting immediately rather than waiting for hardware upgrades.
ManiSkill is an exciting newer option with great task diversity and modern ML integration. I’m actually quite excited to try this next—it seems to hit the sweet spot of realism and ease of use.
Gazebo/ROS represents the traditional robotics stack: mature, well-supported, with endless plugins. But the learning curve felt steep for someone coming from a pure ML background, and I wanted to focus on learning algorithms rather than robotics middleware.
PyBullet similar to MuJoCo. Also popular in RL enviornments.
Why I Chose MuJoCo + gym-aloha: The answer came down to immediate productivity. I could adapt gym-aloha and start experimenting within days, not weeks. It’s proven in the papers I was trying to replicate, has excellent contact modeling for manipulation, and enjoys a large ecosystem of compatible tools.
The Gym Interface Standard: OpenAI Gym (now Gymnasium) defines a standard interface that every reinforcement learning environment implements:

obs, info = env.reset()           # Start a new episode
obs, reward, terminated, truncated, info = env.step(action)  # Take an action

This simple interface is incredibly powerful because it means the same environment can work with:

RL libraries like Stable-Baselines3 (SAC, PPO, HER…)
Imitation learning libraries like imitation
Custom training loops or evaluation pipelines
Any future framework that follows the standard

The Evolution Plan: This environment is just the beginning. As I move toward more realistic scenarios, I’ll likely migrate to Isaac Sim or ManiSkill. But for rapid prototyping and algorithm comparison, this MuJoCo setup has been perfect.

Key insight: There are many good options. If your requirements are not very specific, look for something popular and that has something similar to what you need that you can quickly adapt.

This flows naturally from the question and sets up the technical details that follow.

Standing on the Shoulders of ALOHA

Rather than building from scratch, I adapted the gym-aloha project, which implements the dual-arm ALOHA platform used in several influential imitation learning papers.

The ALOHA Foundation: ALOHA (A Low-cost Open Hardware Arm) proved that effective manipulation learning was possible with relatively simple hardware. The gym-aloha implementation provided:

MuJoCo physics foundation
Well-designed observation and action spaces

My Adaptation: I modified gym-aloha for a single SO101 arm (5-DOF + gripper) to match my hardware target:

# Simple registration example
register(
    id="gym_so100/SO100CubeToBin-v0",
    entry_point="gym_so100.env:SO100Env",
    max_episode_steps=700,
    nondeterministic=True,
    kwargs={"obs_type": "so100_pixels_agent_pos", "task": "so100_cube_to_bin"},
)

Once registered, creating the environment is straightforward:

import gym_so100               # triggers env registration
import gymnasium as gym

env = gym.make("gym_so100/SO100CubeToBin-v0")
obs, info = env.reset()

The Task: Cube-to-Bin

I focused on one fundamental manipulation task: bin-a-cube. A red cube starts at a random position on the table, and the goal is to place it inside a fixed gray bin.

This task is deceptively simple but covers the core challenges of manipulation:

Perception: Locating the cube and understanding spatial relationships
Planning: Approaching the cube from a graspable angle
Control: Executing smooth, coordinated motion
Manipulation: Grasping, lifting, and precise placement

The MuJoCo scene includes:

Robot: SO101 single arm with position-controlled actuators
Workspace: Table, free-moving cube, and goal bin
Sensors: Joint positions, gripper state, and camera views
Sites: Tracking points for reward computation and success detection

Control Paradigms: Joint vs. End-Effector Space

Following gym-aloha’s design, I implemented joint-space control where actions directly specify target joint positions:

Aspect	Joint-space control
Action	Target joint positions → `data.ctrl`
Control loop	Actuators drive joints toward commanded positions
Learning	Policy learns in robot’s natural DOF
Transfer	Direct mapping to real hardware

I chose joint-space over end-effector control for cleaner transfer to my target hardware. While end-effector control (where you specify gripper poses and let MuJoCo’s constraints solve for joint angles) can be more intuitive, it adds complexity that I wanted to avoid initially.

What’s Inside the Environment

As of August 2025, gym-so100-c includes:

Simulation Environment & Tasks:

MuJoCo-based SO101 simulation environment
Cube-to-bin task with configurable reward shaping

Training Integration Scripts:

Reinforcement learning with Stable-Baselines3 (SAC)
Imitation learning with the imitation library
Training integration with LeRobot (ACT, Diffusion, VLAs)

Control & Data Collection:

Teleoperation via keyboard or gamepad
Episode recording for demonstration datasets
Dataset conversion to LeRobot format

Early Design Lessons

Physics Engine Choice:

MuJoCo was the obvious choice for its speed, stability, and excellent contact modeling. It works well on CPU, it was perfect on MacBook.

I am excited about IsaacSIM, but I don’t have a powerful GPU yet. I am open to trying other engines in the future.

Observation Space Design:

I experimented with different observation combinations:

pixels_agent_pos: Camera images + joint positions
agent_pos: Joint positions only (for faster training)
pixels: Camera images only (for vision-based policies)

The mixed approach (pixels_agent_pos) worked best, giving policies both rich visual information and precise proprioceptive feedback. This is also what ACT paper and most SOTA approaches are using.

Reward Engineering:

In Reinforce Learning an agent collects rewards interacting with the environment and tries to maximize them. The trick is to make such rewards that the agent does what you want from it and also that it can learn pretty well. Shaping the reward structure (designing the incentives) is called Reward Engineering and it’s suprisingly tricky to get right!

For example if you reward getting close to the block too much, the agent might chose to hover over it and never grab it. If you don’t give any rewards until the task is complete and if the task is too hard, there is no feedback to learn from.

I implemented both sparse rewards (success/failure only) and dense rewards with approach shaping. Dense rewards proved much more reliable for learning, though they required more careful tuning.

I also implememented HER (hindsight experience replay) that automates the reward shaping by trying to teach the robot behaviors from previous trajectories, but the results were mixed in my setup.

The Integration Challenge

The real value of this environment isn’t just the simulation — it’s the integration layer that lets me seamlessly move between different learning approaches. In the next post, I’ll dive into what I learned from training experiments with SAC, behavior cloning, and modern imitation learning methods.

But the foundation was crucial: having a single environment that could work with multiple learning paradigms, generate consistent datasets, and provide reliable evaluation metrics. Sometimes the unglamorous infrastructure work is what makes everything else possible.

Next up: Training experiments and what I learned about the practical differences between reinforcement learning and imitation learning approaches.

What’s your experience with simulation environments for robotics? Have you found certain design decisions that made experimentation much easier or harder? I’d love to hear about it.