[ English | 简体中文 ]
In AI application development, we often need to deploy multiple models to complete different tasks. For model dialogue services, we need LLM models, and for knowledge base retrieval services, we need Embedding and Reranker models. Therefore, Aris-AI-Model-Server
was born, focusing on integrating multiple model services into one, providing users with simple and convenient model access capabilities. The project name comes from the character Aris in Blue Archive, as shown in the figure below:
Aris: Character from Blue Archive
-
[2024-07-13] Aris-AI-Model-Server officially open-sourced.
-
[2024-06-23] We released the Aris-14B-Chat series models, which are based on Qwen1.5-14B-Chat and have undergone SFT and DPO on our private dataset of 140K entries. When using this model, please comply with the Qwen open-source license.
- Sentence Transformers
- Sentence Transformers
- VLLM
- MLX
- FastAPI
Route | Request Method | Authentication | OpenAI Compatible | Description |
---|---|---|---|---|
/ | GET | ❌ | ❌ | Root directory |
/v1/embeddings | GET | ✅ | ❌ | Get all Embedding models |
/v1/embeddings | POST | ✅ | ✅ | Call Embedding for text embedding |
/v1/rerankers | GET | ✅ | ❌ | Get all Reranker models |
/v1/rerankers | POST | ✅ | ❌ | Call Reranker for document reranking |
/v1/models | GET | ✅ | ✅ | Get all LLMs |
/v1/chat/completions | POST | ✅ | ✅ | Call LLM for dialogue generation |
.
├── assets
│ └── 110531412.jpg
├── config # Environment variables and model configuration
│ ├── .env.template
│ └── models.yaml.template
├── dockerfile
├── main.py
├── poetry.lock
├── pyproject.toml
├── scripts # awq, gptq quantization scripts
│ ├── autoawq.py
│ ├── autoawq.sh
│ ├── autogptq.py
│ └── autogptq.sh
└── src
├── api # OpenAI Compatible API
│ ├── auth
│ │ └── bearer.py
│ ├── model
│ │ ├── chat_cmpl.py
│ │ ├── embedding.py
│ │ ├── reranker.py
│ │ └── root.py
│ └── router
│ ├── __init__.py
│ ├── root.py
│ └── v1
│ ├── chat_cmpl.py
│ ├── embedding.py
│ ├── __init__.py
│ └── reranker.py
├── config
│ ├── arg.py # Command line arguments
│ ├── env.py # Environment variables
│ ├── gbl.py # Global variables
│ ├── __init__.py
│ └── model.py # Model configuration
├── controller
│ ├── controller.py # Engine controller
│ └── __init__.py
├── engine # Model invocation engine
│ ├── base.py
│ ├── embedding.py
│ ├── mlx.py
│ ├── reranker.py
│ └── vllm.py
├── logger # Logging library
│ └── __init__.py
├── middleware # Middleware
│ └── logger
│ └── __init__.py
└── utils
├── formatter.py # Prompt format (referenced from llama-factory implementation)
└── template.py # Format (referenced from llama-factory implementation)
git clone https://github.com/hcd233/Aris-AI-Model-Server.git
cd Aris-AI-Model-Server
This step is optional, but ensure your Python environment is 3.11
conda create -n aris python=3.11.0
conda activate aris
pip install poetry
Dependency | Description | Command |
---|---|---|
base | Install basic dependencies for API startup | poetry install |
reranker | Install dependencies for deploying reranker models | {{base}} + -E reranker |
embedding | Install dependencies for deploying embedding models | {{base}} + -E embedding |
vllm | Install dependencies for vllm backend | {{base}} + -E vllm |
mlx | Install dependencies for mlx backend | {{base}} + -E mlx |
awq | Install dependencies for awq quantization | {{base}} + -E awq |
gptq | Install dependencies for gptq quantization | {{base}} + -E gptq |
Example: If you want to deploy an embedding model, use awq quantization, and deploy models with vllm, execute the following command to install dependencies:
poetry install -E embedding -E awq -E vllm
Please refer to the template files for specific modifications
cp config/models.yaml.template models.yaml
cp config/.env.template .env
python main.py --config_path models.yaml
bash scripts/autoawq.sh
bash scripts/autogptq.sh
Not available yet
- Architecture division: Expand from single-machine version to kubernetes-based distributed version
- Enrich backends: Support more model backends, such as Triton, ONNX, etc.
Due to busy work, project progress may be slow, updates will be occasional. PRs and Issues are welcome.