The implementation of this model has been ported to Microsoft.ML.GenAI.LLaMA
Inspired by pytorch-llama, this project implements LLaMA 2 from scratch with TorchSharp
- git lfs
- .NET 6.0 SDK
- Access to one of LLaMA 2 models
- Download the model weight. The model weigh is available from huggingface model hub.
- llama-2-7b: https://huggingface.co/meta-llama/Llama-2-7b
- llama-2-7b-chat: https://huggingface.co/meta-llama/Llama-2-7b-chat
Note
Please download the pth version (the one without -hf prefix)
- Change the path in
Program.cs
to the folder where you download the model weight. - Determine the right torchsharp runtime nuget package on your platform.
- use
TorchSharp-cuda-linux
if you are on linux and have a nvidia gpu - use
TorchSharp-cuda-windows
if you are on windows and have a nvidia gpu - use
TorchSharp-cpu
if you don't have a nvidia gpu
- use
- Run the project using
dotnet run
This project uses a BPE tokenizer from Microsoft.ML.Tokenizer
to tokenize the input text. You can find the vocab.json
and merges.txt
under torcharp-llama. To use a third-party tokenizer, you can simply replace the vocab.json
and merges.txt
with your own tokenizer files.
This project is only tested with LLaMA-2-7B model. I do hope I can have the chance to test it with other models, but unfortunately 7B model is already the largest model I can afford to run on my machine. If you have chance to test other models, please let me know if it works or not. Thanks!
Also, this project doesn't come with any warranty. Use it at your own risk.
- Add support to load from
.safetensor
and native ckpt file so that we don't need to convert the model to torchsharp format. The support for.safetensor
should be an easy one, but the support for native ckpt file is a bit tricky (otherwise why torchsharp format exists in the first place) - Add support for lora training