Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
Updated
Nov 14, 2024 - Python
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
A high-throughput and memory-efficient inference and serving engine for LLMs
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
AICI: Prompts as (Wasm) Programs
RayLLM - LLMs on Ray
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
A throughput-oriented high-performance serving framework for LLMs
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
LLM (Large Language Model) FineTuning
Efficient AI Inference & Serving
Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
Finetune LLMs on K8s by using Runbooks
Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.
To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."