This is the PyTorch implementation of our paper:
Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation
Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. (Oral)
Motivation
@inproceedings{ke2017tactile,
title={Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation},
author={Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa.},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2019}
}
- Our code was developed on Anaconda-installed python3.6, Pytorch 0.4.1 with one GPU TITAN Xp.
- Install the Matterport3DSimulator. Then, download the pre-computed image features:
cd matterport3D
mkdir -p img_features/
cd img_features/
wget https://storage.googleapis.com/bringmeaspoon/img_features/ResNet-152-imagenet.zip -O ResNet-152-imagenet.zip
unzip ResNet-152-imagenet.zip
cd ..
# After this step, `img_features/` should contain `ResNet-152-imagenet.tsv`.
-
Download this repo, extract the contents into
matterport3D/tasks/R2R
. -
Download the Room-to-Room dataset, the synthetic data and the speaker model proposed by Speaker-Follower Models for Vision-and-Language Navigation (NIPS2018).
./tasks/R2R/data/download.sh
./tasks/R2R/data/download_precomputed_augmentation.sh
./tasks/R2R/experiments/release/download_speaker_release.sh
- Install python requirements:
pip install -r tasks/R2R/requirements.txt
-
Train a seq2seq follower agent.
First, to train the SMNA agent described in Self-Monitoring Navigation Agent, store it in
./tasks/R2R/experiments/smna/
:python tasks/R2R/train.py --use_pretraining --pretrain_splits train literal_speaker_data_augmentation_paths --feedback_method sample2step --experiment_name smna
Then, with only the trained agent, launch the FAST framework and evaluate on validation seen and unseen.
python tasks/R2R/run_search.py --job search --load_follower tasks/R2R/experiments/smna/snapshots/[name of the latest model] --max_episode_len 40 --K 20 --logit --experiment_name FAST_short --early_stop
Note that the name of the trained model contains the performance, so you have to find the name for your agent. One example would be
smna/snapshots/follower_with_pretraining_cg_pm_sample_imagenet_mean_pooled_1heads_val_seen_iter_10_val_seen-success_rate=0.12
. -
Train a goal reranker.
First, cache the search queue. The following command will save the cached json file to the root of
matterport3D
folder:python tasks/R2R/run_search --job cache --load_follower tasks/R2R/experiments/smna/snapshots/[name of the latest model] --max_episode_len 40 --K 20 --logit --experiment_name cacheFAST
Second, move
Training Reranker.ipynb
to the root ofmatterport3D
folder. Sequentially run through it to produce a goal reranker undertasks/R2R/experiments/candidates_ranker_{}
where{}
is the performance.Finally, launch the framework with goal reranker.
python tasks/R2R/run_search.py --job search --load_follower tasks/R2R/experiments/smna/snapshots/[name of the latest model] --max_episode_len 40 --K 20 --logit --beam --experiment_name FAST-long --load_reranker tasks/R2R/experiments/[name of the reranker model]
If you want to skip all the steps above, here is my trained SMNA model (smna_model
) and intermediate files (cache_XXX.json
) that I used to produce the result in the paper: Google Drive,
The main entrance to the framework is run_search.py
. Follower agents live in follower.py
and the main framework lives in def _rollout_with_search()
within the Seq2SeqAgent
class. Speaker agent in speaker.py
. Various PyTorch modules are in attn.py
, model.py
.
attn.py
env.py
eval.py
follower.py
model.py
paths.py
refine_search.py
run_search.py
running_mean_std.py
speaker.py
train.py
utils.py
vocab.py
-
If you have problem installing Matterport3DSimulator, Chih-Yao Ma, the author of Self-Monitoring Navigation Agent for Vision-and-Language Navigation (ICLR2019), has a great installation guide.
-
The code is taken out from an actively developed repo. Some codes might be redundant but are kept here to allow the program to run.
-
--image_feature_type none
flag allow you to verify that your script is running, without loading the actual ImageFeatures. -
If you need more help, file an issue and drop me an email (
kayke@cs_dot_washington_dot_edu
), I'd get back to you as soon as I can!
This repository is built upon Speaker-Follower Models for Vision-and-Language Navigation (NIPS2018). The repository reproduced Self-Monitoring Navigation Agent for Vision-and-Language Navigation (ICLR2019).