Releases · 1b5d/llm-api · GitHub

13 Nov 14:37

1b5d

v0.1.2 Latest

Latest

Added autoawq model family
Published multiple docker images supporting multiple BLAS backends (llama.cpp)
Remove config.yaml and provide config.yaml.example

Assets 2

25 Oct 22:40

1b5d

v0.1.1

Upgrade llama-cpp-python to support GGUF format
Upgrades to GPTQ-for-LLama related libs
Update README.md

Assets 2

23 Jul 21:21

1b5d

v0.1.0

Introducing Huggingface generic model, which can be used to run many popular models on HF
Upgrade llama.cpp in order to run newer Llama models like Llama 2
Streamline docker images down to 2 images: a default lightweight one, and a gpu enabled image with nvidia / cuda support
General fixes and stability improvements

Assets 2

16 Jun 22:13

1b5d

0.0.4-gptq-llama-triton

v0.0.4-gptq-llama-triton

Separate models in their own sub directories to prevent overriding configs when changing models
Add triton support

Assets 2

08 Jun 21:57

1b5d

v0.0.4

Separate models in their own sub directories to prevent overriding configs when changing models

Assets 2

05 May 21:41

1b5d

0.0.3-gptq-llama-cuda

v0.0.3-gptq-llama-cuda

Bug fixes and stability for safetensor models in gptq for llama models

Assets 2

25 Apr 17:13

1b5d

0.0.2-gptq-llama-cuda

v0.0.2-gptq-llama-cuda

rebuilt the gptq cuda image based on a simple bullseye-slim base image

Assets 2

23 Apr 21:59

1b5d

0.0.1-gptq-llama-cuda

v0.0.1-gptq-llama-cuda

Support Llama based models inference on GPU using GPTQ-for-llama

Assets 2

13 Apr 21:18

1b5d

v0.0.1

Initial release

Assets 2