Large Language Models (LLMs) & Prompt Engineering with Hugging Face, Databricks and MLflow

The repo covers different use cases related to Prompt Engineering and Large Language Models (LLMs).
Exploration & Exploitation: the repo contains notebooks for LLM experimentation with different Prompt Engineering techniques. It also showcases LLM deployment using Databricks Model Serving with GPU Support.
This repo also ships with a demo frontend application developed using Gradio

As of 29/08/2023, you will find the following examples in the notebooks folder:

🙋🏻‍♂️ customer_service

Artifact	Description
`hf_mlflow_crash_course`	🤓 Provides a basic example using Hugging Face for training an intent classification model using `distilbert-qa`. Also showcases foundational concepts of MLflow, such as experiment tracking, artifact logging and model registration.
`primer`	🎬 Mostly conceptual notebook. Contains explanations around Prompt Engineering, and foundational concepts such as Top K sampling, Top p sampling and Temperature.
`basic_prompt_evaluation`	🧪 Demonstrates basic Prompt Engineeering with lightweight LLM models. In addition to this, showcases MLflow's newest LLM features, such as `mlflow.evaluate()`.
`few_shot_learning`	💉 Here we explore Few Shot Learning with an Instruction Based LLM (mpt-7b-instruct).
`active_prompting`	🏃🏻‍♂️ In this notebook, we explore active prompting techniques. Additionally, we demonstrate how to leverage VLLM in order to achieve 7X - 10X inference latency improvements.
`llama2_mlflow_logging_inference`	🚀 Here we show how to log, register and deploy a LLaMA V2 model into MLflow
`mpt_mlflow_logging_inference`	🚀 Here we show how to log, register and deploy an MPT-Instruct model into MLflow. Differently from the LLaMA V2 example, here we load model weights directly into the model serving endpoint when the endpoint is initialized, without uploading the artifacts into MLflow Model Registry.
`frontend`	🎨 End-to-end example of a frontend demo app which connects to one of the Model Serving Endpoints deployed in the previous notebook using Gradio

Getting Started

To start using this repo on Databricks, there are a few pre-requirements:

Create a GPU Cluster, minimally with Databricks Machine Learning Runtime 13.2 GPU and an NVIDIA T4 GPU (either A10 or A100 is required for the steps involving VLLM).
(only if using Databricks MLR < 13.2) Install CUDA additional dependencies
- First, clone this repo to your workspace
- Configure an init script in your cluster by pointing to the following path in the Init Script configuration: /Repos/your_name@email.com/databricks-llm-prompt-engineering/init/init.sh
(only if using MPT models) Install the following Python packages in your cluster:

accelerate==0.21.0
einops==0.6.1
flash-attn==v1.0.5
ninja
tokenizers==0.13.3
transformers==4.30.2
xformers==0.0.20

Once all dependencies finish installing and your cluster has successfully started, you should be good to go.

Roadmap

~~🎨 Frontend Web App Using Gradio~~
~~🚀 Model Deployment and Real Time Inference~~
🔎 Retrieval Augmented Generation (RAG)
🛣️ MLflow AI Gateway

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
frontend		frontend
img		img
init		init
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Models (LLMs) & Prompt Engineering with Hugging Face, Databricks and MLflow

Contents

Getting Started

Roadmap

Credits & Reference

About

Releases

Packages

Languages

License

rafaelvp-db/databricks-llm-prompt-engineering

Folders and files

Latest commit

History

Repository files navigation

Large Language Models (LLMs) & Prompt Engineering with Hugging Face, Databricks and MLflow

Contents

Getting Started

Roadmap

Credits & Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages