AV-ALOHA

This repository contains the code for the paper: "Active Vision Might Be All You Need: Exploring Active Vision in Bimanual Robotic Manipulation". You can visit the Project Page and check out the ArXiv Paper.

Overview

AV-ALOHA builds upon the ALOHA 2 system and introduces active vision for bimanual robotic manipulation. This repository includes:

Teleoperation and data collection
Training models with LeRobot
Evaluation on both simulated and real-world AV-ALOHA setups

For the VR teleoperation and stereo camera passthrough functionality, refer to the Unity App Repo.

Note: The code is under active development, and a more organized codebase will be available in future updates.

Hardware Setup

AV-ALOHA extends ALOHA 2 by adding another ViperX 300 S robot arm. To install the additional arm, we used two 840mm 2020 extrusions with 4 L brackets. The ZED Mini serves as the active vision camera, attached using custom 3D-printed parts available in assets/3D_printed_parts.

Software Installation

Install ROS Noetic and follow the ALOHA Setup Instructions for software and hardware setup, excluding their repo.
Bind the active vision robot arm to /dev/ttyDXL_puppet_middle.

Clone this repository:

cd ~/interbotix_ws/src
git clone https://github.com/Soltanilara/av-aloha
git submodule init
git submodule update

# build ROS packages
cd ~/interbotix_ws
catkin_make

Set up the Conda environment:

conda create -y -n lerobot python=3.10
conda activate lerobot
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Install the ZED Python API by following these instructions.

Install additional dependencies:

pip install -e gym_guided_vision
pip install -e lerobot
pip install -r requirements.txt

WebRTC Setup

Create a Firebase project and set up a Firestore database at Firebase Console.

In your Firestore database, set the rules as follows:

rules_version = '2';

service cloud.firestore {
  match /databases/{database}/documents {
    match /<your_password_for_webrtc>/{document=**} {
      allow read, write: if true;
    }
  }
}

In Project Settings -> Service Accounts, generate a new private key and name it serviceAccountKey.json. Place this file in the data_collection_scripts directory.

Create a file named signalingSettings.json in data_collection_scripts and paste the following:

{
    "robotID": "<robot id for your robot (e.g. robot_1)>",
    "password": "<your password same as in firestore rules>",
    "turn_server_url": "<turn url>",
    "turn_server_username": "<turn username>",
    "turn_server_password": "<turn password>"
}

Data Collection

Simulation:

# in data_collection_scripts/
python record_sim_episodes --task_name sim_insert_peg --episode_idx 0
python replay_sim_episode --task_name sim_insert_peg --num_arms <2 or 3>

Real Robot:

In one terminal, launch the robot:

# in data_collection_scripts/
source launch_robot.sh

In another terminal, activate the environment:

# in data_collection_scripts/
source activate.sh
python record_episodes --task_name occluded_insertion --episode_idx 0

Visualize an Episode:

# in data_collection_scripts/
python visualize_episodes.py --hdf5_path path/to/your/hdf5

Push Dataset to Hugging Face:

# in repo root
huggingface-cli login
python lerobot/lerobot/scripts/push_dataset_to_hub.py \
    --raw-dir path/to/your/dataset \
    --repo-id <hf_id>/<dataset_name> \
    --raw-format aloha_hdf5

Visualize Data from Hugging Face:

# in repo root
python lerobot/lerobot/scripts/visualize_dataset.py \
    --repo-id <hf_id>/<dataset_name>  \
    --episode-index 0

Training

Ensure the config names are set correctly by modifying lerobot/lerobot/configs. Start training with:

# in repo root
python lerobot/lerobot/scripts/train.py \
    hydra.run.dir=outputs/train/sim_sew_needle_3arms_zed_static_wrist_act \
    hydra.job.name=sim_sew_needle_3arms_zed_static_wrist_act \
    device=cuda \
    env=sim_sew_needle_3arms \
    policy=zed_static_wrist_act \
    wandb.enable=true

Evaluation

Simulation Evaluation (as done in the paper):

# in repo root
python lerobot/lerobot/scripts/eval.py \
    -p outputs/train/sim_hook_package_2arms_wrist_act/checkpoints \
    --out-dir outputs/eval/sim_hook_package_2arms_wrist_act \
    eval.n_episodes=50 \
    eval.batch_size=10 \
    --save-video

Single Checkpoint Evaluation:

Save your model to Hugging Face:

# in eval_scripts/
python save_policy.py \
    --repo_id iantc104/sim_slot_insertion_3arms_zed_wrist_act \
    --checkpoint_dir outputs/train/sim_slot_insertion_3arms_zed_wrist_act/checkpoints/014000/pretrained_model

Evaluate using the script in eval_scripts:

Simulated Policy:

# in eval_scripts/
python eval.py \
    --policy iantc104/sim_slot_insertion_3arms_zed_wrist_act \
    --episode_len 300 \
    --num_episodes 50 \
    --sim_env gym_guided_vision/SlotInsertion-3Arms-v0

Real Policy:

# in eval_scripts/
python eval.py \
    --policy iantc104/real_occluded_key_insertion_3arms_zed_act \
    --episode_len 700 \
    --num_episodes 50

Citation

@misc{chuang2024activevisionneedexploring,
    title={Active Vision Might Be All You Need: Exploring Active Vision in Bimanual Robotic Manipulation}, 
    author={Ian Chuang and Andrew Lee and Dechen Gao and Iman Soltani},
    year={2024},
    eprint={2409.17435},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2409.17435}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
data_collection_scripts		data_collection_scripts
eval_scripts		eval_scripts
guided_vision_ros		guided_vision_ros
gym_guided_vision		gym_guided_vision
lerobot @ 2785f54		lerobot @ 2785f54
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AV-ALOHA

Overview

Hardware Setup

Software Installation

WebRTC Setup

Data Collection

Simulation:

Real Robot:

Visualize an Episode:

Push Dataset to Hugging Face:

Visualize Data from Hugging Face:

Training

Evaluation

Simulation Evaluation (as done in the paper):

Single Checkpoint Evaluation:

Simulated Policy:

Real Policy:

Citation

About

Contributors 2

Languages

Soltanilara/av-aloha

Folders and files

Latest commit

History

Repository files navigation

AV-ALOHA

Overview

Hardware Setup

Software Installation

WebRTC Setup

Data Collection

Simulation:

Real Robot:

Visualize an Episode:

Push Dataset to Hugging Face:

Visualize Data from Hugging Face:

Training

Evaluation

Simulation Evaluation (as done in the paper):

Single Checkpoint Evaluation:

Simulated Policy:

Real Policy:

Citation

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages