🤗 Hugging Face(is coming) • 🤖 ModelScope(is coming) • 📄 Paper
This repository provides an implementation of CoCA (Collinear Constrained Attention). This implementation is based on 2 transformer models in Hugging Face.
- GPT-NeoX which is an EleutherAI's library for training large-scale language models on GPUs.
- LLaMA from Meta AI team.
We just point out those modifications which made to implement CoCA here. For more information about model training and inference, we recommend transformers.
For practicality, we enhanced CoCA's computational and spatial efficiency with opt_einsum, view this repository for more information.
Atorch is an optimized torch version by Ant Group, it's not available for opensource community yet. It will be opensource in near future. Before that, you may use origin torch version instead.
You can use raw data or tokenized data for training.
When using raw data, please ensure the data format as:
{"content" : "It is a sentence for training."}
using .jsonl
for saving data.
You can also use tokenized data saving in .bin
via GPT-NeoX tokenizer.
python ./data/tokenization/generate_dataset.py
notice to modify input_dict
, conver_type_list
, output_name
, seq_length
for your own dataset.
You can train a model from scratch as follows:
bash ./train/run_coca.sh 32 1 8 2
- first parameter means
per gpu batch size
- second parameter means
tensor parallel
(larger than 1 is not supported yet) - third parameter means
data parallel
, equals to the number of GPUs - last parameter means
train epochs
If you want to load a pre-trained model, set --pretrained_model_path $PRETRAINED_MODEL_PATH \
.
CoCA can be loaded using the transformers
functionality:
from model.gpt_neox.modeling_gpt_neox import GPTNeoXForCausalLM, GPTNeoXConfig
from transformers import AutoTokenizer
from transformers import GenerationConfig
config = GPTNeoXConfig.from_pretrained(checkpoint)
config.is_decoder = True
# If you want to inference out of training length,
# CoCA is compatible with NTK-aware scaled RoPE and performs much more better than original attention structure
rope_scaling= {"type": "dynamic", "factor": 4.0}
config.rope_scaling = rope_scaling
model = GPTNeoXForCausalLM.from_pretrained(checkpoint,
config=config,
device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(checkpoint, padding_side="left")
tokenizer.add_special_tokens({'eos_token': "<|endoftext|>"})
tokenizer.add_special_tokens({'pad_token': "<|pad|>"})
If you have found the CoCA library helpful in your work, you can cite this repository as
@inproceedings{zhu2024coca,
title={CoCA: Fusing Position Embedding with Collinear Constrained Attention in Transformers for Long Context Window Extending},
author={Shiyi Zhu and Jing Ye and Wei Jiang and Siqiao Xue and Qi Zhang and Yifan Wu and Jianguo Li},
booktitle = {Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics},
month = august,
year = {2024},
publisher = {Association for Computational Linguistics},
}
This repository hosts code of CoCA project. Copyright (c) 2023, Ant Group. Licensed under the Apache License:
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
This repository is based off code written by EleutherAI that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by EleutherAI maintain a EleutherAI copyright header. When the EleutherAI code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.
This repository is based off code written by Meta AI that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by Meta AI maintain a Meta AI copyright header. When the Meta AI code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.
This repository is based off code written by NVIDIA that is licensed under the Apache License, Version 2.0. In accordance with the Apache License, all files that are modifications of code originally written by NVIDIA maintain a NVIDIA copyright header. All files that do not contain such a header are the exclusive copyright of EleutherAI. When the NVIDIA code has been modified from its original version, that fact is noted in the copyright header. All derivative works of this repository must preserve these headers under the terms of the Apache License.
This repository also contains code written by a number of other authors. Such contributions are marked and the relevant licensing is included where appropriate.
For full terms, see the LICENSE
file. If you have any questions, comments, or concerns about licensing please email me at zhushiyi.zsy@antgroup.com.