Skip to content

Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.

Notifications You must be signed in to change notification settings

SakanaAI/evo-memory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


An Evolved Universal Transformer Memory

📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset]

Installation

We provide means to install this repository with conda:

For the full set of dependencies with fixed versions (provided to ensure some level of long-term reproducibility):

conda env create --file=env.yaml

For a more minimal and less constrained set of dependencies (for future development/extensions):

conda env create --file=env_minimal.yaml

Usage

Training

Training following the incremental setup described in our work can be replicated via the following hydra commands:

stage 1 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i1.yaml

stage 2 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i2.yaml init_from='path/to/stage1/results/ckpt.pt'

stage 3 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i3.yaml init_from='path/to/stage2/results/ckpt.pt'

Evaluation

Evaluating trained NAMMs on the full set of LongBench tasks can be replicated for both NAMMs with the following command:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval.yaml init_from='path/to/results/ckpt.pt'

Evaluating trained NAMMs on the full set of ChouBun tasks can be replicated with the following command:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval_choubun.yaml init_from='path/to/results/ckpt.pt'

Additional notes

Using wandb to log the results (through the hydra setting wandb_log=true) requires authenticating to the wandb server via the following command:

wandb login

and using your account's API key (which you should be able to find here)

Gated models (e.g., Llama)

Using gated models requires authenticating to the hugging face hub by running:

huggingface-cli login

and using your account's access tokens (which you should be able to find here)

Bibtex

To cite our work, you can use the following:

@article{sakana2024memory,
title={An Evolved Universal Transformer Memory}, 
       author={Edoardo Cetin and Qi Sun and Tianyu Zhao and Yujin Tang},
       year={2024},
       eprint={2410.13166},
       archivePrefix={arXiv},
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2410.13166},
}

About

Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages