An Evolved Universal Transformer Memory

📄 [Paper] | 🤗 [Hugging Face] 📁 [Dataset]

Installation

We provide means to install this repository with conda:

For the full set of dependencies with fixed versions (provided to ensure some level of long-term reproducibility):

conda env create --file=env.yaml

For a more minimal and less constrained set of dependencies (for future development/extensions):

conda env create --file=env_minimal.yaml

Usage

Training

Training following the incremental setup described in our work can be replicated via the following hydra commands:

stage 1 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i1.yaml

stage 2 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i2.yaml init_from='path/to/stage1/results/ckpt.pt'

stage 3 training:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_i3.yaml init_from='path/to/stage2/results/ckpt.pt'

Evaluation

Evaluating trained NAMMs on the full set of LongBench tasks can be replicated for both NAMMs with the following command:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval.yaml init_from='path/to/results/ckpt.pt'

Evaluating trained NAMMs on the full set of ChouBun tasks can be replicated with the following command:

torchrun --standalone --nproc_per_node=$NUM_OF_GPUs main.py run@_global_=namm_bam_eval_choubun.yaml init_from='path/to/results/ckpt.pt'

Additional notes

Using wandb to log the results (through the hydra setting wandb_log=true) requires authenticating to the wandb server via the following command:

wandb login

and using your account's API key (which you should be able to find here)

Gated models (e.g., Llama)

Using gated models requires authenticating to the hugging face hub by running:

huggingface-cli login

and using your account's access tokens (which you should be able to find here)

Bibtex

To cite our work, you can use the following:

@article{sakana2024memory,
title={An Evolved Universal Transformer Memory}, 
       author={Edoardo Cetin and Qi Sun and Tianyu Zhao and Yujin Tang},
       year={2024},
       eprint={2410.13166},
       archivePrefix={arXiv},
       primaryClass={cs.LG},
       url={https://arxiv.org/abs/2410.13166},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ChouBun/config		ChouBun/config
LongBench/config		LongBench/config
cfgs		cfgs
figures		figures
lb_reference_scores		lb_reference_scores
memory_evolution		memory_evolution
memory_llms		memory_llms
memory_policy		memory_policy
stateless_parallel_modules		stateless_parallel_modules
README.md		README.md
choubun_metrics.py		choubun_metrics.py
env.yaml		env.yaml
env_minimal.yaml		env_minimal.yaml
longbench_metrics.py		longbench_metrics.py
main.py		main.py
memory_evaluator.py		memory_evaluator.py
memory_trainer.py		memory_trainer.py
task_sampler.py		task_sampler.py
utils.py		utils.py
utils_hydra.py		utils_hydra.py
utils_log.py		utils_log.py
utils_longbench.py		utils_longbench.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Evolved Universal Transformer Memory

Installation

Usage

Training

Evaluation

Additional notes

Gated models (e.g., Llama)

Bibtex

About

Releases

Packages

Languages

SakanaAI/evo-memory

Folders and files

Latest commit

History

Repository files navigation

An Evolved Universal Transformer Memory

Installation

Usage

Training

Evaluation

Additional notes

Gated models (e.g., Llama)

Bibtex

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages