# The Bipedal Skills Benchmark
The bipedal skills benchmark is a suite of reinforcement learning
environments implemented for the MuJoCo physics simulator. It aims to provide a
set of tasks that demand a variety of motor skills beyond locomotion, and is
intended for evaluating skill discovery and hierarchical learning methods. The
majority of tasks exhibit a sparse reward structure.
![Tasks Overview](https://raw.githubusercontent.com/facebookresearch/bipedal-skills/main/img/tasks.png)
This benchmark was introduced in [Hierarchial Skills for Efficient Exploration](https://facebookresearch.github.io/hsd3).
## Usage
In order to run the environments, a working MuJoCo setup (version 2.0 or higher) is required. You
can follow the respective [installation steps of
for that.
Afterwards, install the Python package with pip:
pip install bipedal-skills
To install the package from a working copy, do:
pip install .
All tasks are exposed and registered as Gym environments once the `bisk` module
is imported:
import gym
import bisk
env = gym.make('BiskHurdles-v1', robot='Walker')
# Alternatively
env = gym.make('BiskHurdlesWalker-v1')
A detailed description of the tasks can be found in the [corresponding
## Evaluation Protocol
For evaluating agents, we recommend estimating returns on 50 environment
instances with distinct seeds.
This can be acheived in sequence or by using one of Gym's vector wrappers:
# Sequential evaluation
env = gym.make('BiskHurdlesWalker-v1')
retrns = []
for i in range(50):
obs, _ = env.reset(seed=i)
retrn = 0
while True:
# Retrieve `action` from agent
obs, reward, terminated, truncated, info = env.step(action)
retrn += reward
if terminated or truncated:
# End of episode
print(f'Average return: {sum(retrns)/len(retrns)}')
# Batched evaluation
from gym.vector import SyncVectorEnv
import numpy as np
n = 50
env = SyncVectorEnv([lambda: gym.make('BiskHurdlesWalker-v1')] * n)
retrns = np.array([0.0] * n)
dones = np.array([False] * n)
obs, _ = env.reset(seed=0)
while not dones.all():
# Retrieve `action` from agent
obs, reward, terminated, truncated, info = env.step(action)
retrns += reward * np.logical_not(dones)
dones |= (terminated | truncated)
print(f'Average return: {retrns.mean()}')
## License
The bipedal skills benchmark is MIT licensed, as found in the LICENSE file.
Model definitions have been adapted from:
- [Gym](https://github.com/openai/gym) (HalfCheetah)
- [dm_control](https://github.com/deepmind/dm_control/) (Walker, Humanoid)