RAG with Vision Application Framework

Features

This repository provides an application framework for a Python-based retrieval-augmented generation (RAG) pipeline that can utilize both textual and image content from MHTML documents to answer user queries, leveraging Azure AI Services, Azure AI Search, and Azure OpenAI Service. The project framework provides the following features:

Ingestion flow: Ingests MHTML files into Azure AI Search using a newly developed enrichment pipeline.
Enrichment flow: Enhances ingested documents by classifying images based on their content, using a multi-modal LLM (MLLM) to generate image descriptions, and caching enrichment results to speed up the process.
RAG with vision pipeline : Utilizes enrichment data to search for images and incorporates the enrichment pipeline during inference.
Evaluation starter code: Assesses the performance of a particular RAG pipeline configuration using various metrics, including ROUGE recall and LLM-as-a-judge techniques.

This repo is intended to be a starting point for RAG with vision, with the aim of enabling further experimentation to fine-tune the pipeline and best meet user needs for a given dataset.

Getting Started

Prerequisites and running the API

For more information on the prerequisites and how to run the RAG with Vision API locally, see here.

This repository also includes a devcontainer that can be used in VSCode with the ms-vscode-remote.remote-containers extension.

Understanding the architecture

The overall inference flow can be described via the following diagram:

For a full overview of the RAG with Vision architecture, including the document ingestion process and the image enrichment service, see this architecture document. An introduction to RAG pipeline evaluation and the starter evaluation flows provided in this repo, along with suggestions for collecting inner- and outer-loop feedback, can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.devcontainer		.devcontainer
.github		.github
.vscode		.vscode
docs		docs
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG with Vision Application Framework

Features

Getting Started

Prerequisites and running the API

Understanding the architecture

About

Contributors 6

Languages

License

Azure-Samples/rag-as-a-service-with-vision

Folders and files

Latest commit

History

Repository files navigation

RAG with Vision Application Framework

Features

Getting Started

Prerequisites and running the API

Understanding the architecture

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 6

Languages