Releases: 1b5d/llm-api
Releases · 1b5d/llm-api
v0.1.2
v0.1.1
v0.1.0
- Introducing Huggingface generic model, which can be used to run many popular models on HF
- Upgrade llama.cpp in order to run newer Llama models like Llama 2
- Streamline docker images down to 2 images: a default lightweight one, and a gpu enabled image with nvidia / cuda support
- General fixes and stability improvements
v0.0.4-gptq-llama-triton
- Separate models in their own sub directories to prevent overriding configs when changing models
- Add triton support
v0.0.4
v0.0.3-gptq-llama-cuda
Bug fixes and stability for safetensor models in gptq for llama models
v0.0.2-gptq-llama-cuda
rebuilt the gptq cuda image based on a simple bullseye-slim base image
v0.0.1-gptq-llama-cuda
Support Llama based models inference on GPU using GPTQ-for-llama