Skip to content

Releases: 1b5d/llm-api

v0.1.2

13 Nov 14:37
c247d50
Compare
Choose a tag to compare
  • Added autoawq model family
  • Published multiple docker images supporting multiple BLAS backends (llama.cpp)
  • Remove config.yaml and provide config.yaml.example

v0.1.1

25 Oct 22:40
85e18f2
Compare
Choose a tag to compare
  • Upgrade llama-cpp-python to support GGUF format
  • Upgrades to GPTQ-for-LLama related libs
  • Update README.md

v0.1.0

23 Jul 21:21
928ec4a
Compare
Choose a tag to compare
  • Introducing Huggingface generic model, which can be used to run many popular models on HF
  • Upgrade llama.cpp in order to run newer Llama models like Llama 2
  • Streamline docker images down to 2 images: a default lightweight one, and a gpu enabled image with nvidia / cuda support
  • General fixes and stability improvements

v0.0.4-gptq-llama-triton

16 Jun 22:13
Compare
Choose a tag to compare
  • Separate models in their own sub directories to prevent overriding configs when changing models
  • Add triton support

v0.0.4

08 Jun 21:57
Compare
Choose a tag to compare
  • Separate models in their own sub directories to prevent overriding configs when changing models

v0.0.3-gptq-llama-cuda

05 May 21:41
Compare
Choose a tag to compare

Bug fixes and stability for safetensor models in gptq for llama models

v0.0.2-gptq-llama-cuda

25 Apr 17:13
Compare
Choose a tag to compare

rebuilt the gptq cuda image based on a simple bullseye-slim base image

v0.0.1-gptq-llama-cuda

23 Apr 21:59
Compare
Choose a tag to compare

Support Llama based models inference on GPU using GPTQ-for-llama

v0.0.1

13 Apr 21:18
Compare
Choose a tag to compare

Initial release