Is a simple, fun project to run your own LLM chat using llama.cpp.
- Clone the repo: https://github.com/robjsliwa/llama-cpp-python for Python bindings for llama.cpp
- If you want to get latest version of llama.cpp, go to
vendor
folder and run git clone https://github.com/ggerganov/llama.cpp in there or update the hash whatever is easier for you. - Build docker image:
docker build -t llama-server .
- Run it with:
docker run --rm -it -p 8000:8000 -v /home/data/datasets/wizard-vicuna:/models -e MODEL=/models/Wizard-Vicuna-13B-Uncensored.ggml.q8_0.bin llama-server
- Install web ui with:
npm install
- Start web ui with:
npm start
Note: You can find great models on Hugging Face here: https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GGML/tree/main