- The repo covers different use cases related to Prompt Engineering and Large Language Models (LLMs).
- Exploration & Exploitation: the repo contains notebooks for LLM experimentation with different Prompt Engineering techniques. It also showcases LLM deployment using Databricks Model Serving with GPU Support.
- This repo also ships with a demo frontend application developed using Gradio
As of 29/08/2023, you will find the following examples in the notebooks
folder:
🙋🏻♂️ customer_service
Artifact | Description |
---|---|
hf_mlflow_crash_course |
🤓 Provides a basic example using Hugging Face for training an intent classification model using distilbert-qa . Also showcases foundational concepts of MLflow, such as experiment tracking, artifact logging and model registration. |
primer |
🎬 Mostly conceptual notebook. Contains explanations around Prompt Engineering, and foundational concepts such as Top K sampling, Top p sampling and Temperature. |
basic_prompt_evaluation |
🧪 Demonstrates basic Prompt Engineeering with lightweight LLM models. In addition to this, showcases MLflow's newest LLM features, such as mlflow.evaluate() . |
few_shot_learning |
💉 Here we explore Few Shot Learning with an Instruction Based LLM (mpt-7b-instruct). |
active_prompting |
🏃🏻♂️ In this notebook, we explore active prompting techniques. Additionally, we demonstrate how to leverage VLLM in order to achieve 7X - 10X inference latency improvements. |
llama2_mlflow_logging_inference |
🚀 Here we show how to log, register and deploy a LLaMA V2 model into MLflow |
mpt_mlflow_logging_inference |
🚀 Here we show how to log, register and deploy an MPT-Instruct model into MLflow. Differently from the LLaMA V2 example, here we load model weights directly into the model serving endpoint when the endpoint is initialized, without uploading the artifacts into MLflow Model Registry. |
frontend |
🎨 End-to-end example of a frontend demo app which connects to one of the Model Serving Endpoints deployed in the previous notebook using Gradio |
To start using this repo on Databricks, there are a few pre-requirements:
- Create a GPU Cluster, minimally with Databricks Machine Learning Runtime 13.2 GPU and an NVIDIA T4 GPU (either A10 or A100 is required for the steps involving VLLM).
- (only if using Databricks MLR < 13.2) Install CUDA additional dependencies
- First, clone this repo to your workspace
- Configure an init script in your cluster by pointing to the following path in the Init Script configuration:
/Repos/your_name@email.com/databricks-llm-prompt-engineering/init/init.sh
- (only if using MPT models) Install the following Python packages in your cluster:
accelerate==0.21.0
einops==0.6.1
flash-attn==v1.0.5
ninja
tokenizers==0.13.3
transformers==4.30.2
xformers==0.0.20
- Once all dependencies finish installing and your cluster has successfully started, you should be good to go.
🎨 Frontend Web App Using Gradio
🚀 Model Deployment and Real Time Inference
🔎 Retrieval Augmented Generation (RAG)
🛣️ MLflow AI Gateway