📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset]
We provide means to install this repository with conda:
For the full set of dependencies with fixed versions (provided to ensure some level of long-term reproducibility):
conda env create --file=env.yaml
For a more minimal and less constrained set of dependencies (for future development/extensions):
conda env create --file=env_minimal.yaml
Training following the incremental setup described in our work can be replicated via the following hydra commands:
stage 1 training:
torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i1.yaml
stage 2 training:
torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i2.yaml init_from='path/to/stage1/results/ckpt.pt'
stage 3 training:
torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i3.yaml init_from='path/to/stage2/results/ckpt.pt'
Evaluating trained NAMMs on the full set of LongBench tasks can be replicated for both NAMMs with the following command:
torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval.yaml init_from='path/to/results/ckpt.pt'
Evaluating trained NAMMs on the full set of ChouBun tasks can be replicated with the following command:
torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval_choubun.yaml init_from='path/to/results/ckpt.pt'
Using wandb to log the results (through the hydra setting wandb_log=true) requires authenticating to the wandb server via the following command:
wandb login
and using your account's API key (which you should be able to find here)
Using gated models requires authenticating to the hugging face hub by running:
huggingface-cli login
and using your account's access tokens (which you should be able to find here)
To cite our work, you can use the following:
@article{sakana2024memory,
title={An Evolved Universal Transformer Memory},
author={Edoardo Cetin and Qi Sun and Tianyu Zhao and Yujin Tang},
year={2024},
eprint={2410.13166},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2410.13166},
}