Skip to content

This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

License

Notifications You must be signed in to change notification settings

Azure-Samples/rag-as-a-service-with-vision

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG with Vision Application Framework

Features

This repository provides an application framework for a Python-based retrieval-augmented generation (RAG) pipeline that can utilize both textual and image content from MHTML documents to answer user queries, leveraging Azure AI Services, Azure AI Search, and Azure OpenAI Service. The project framework provides the following features:

  • Ingestion flow: Ingests MHTML files into Azure AI Search using a newly developed enrichment pipeline.
  • Enrichment flow: Enhances ingested documents by classifying images based on their content, using a multi-modal LLM (MLLM) to generate image descriptions, and caching enrichment results to speed up the process.
  • RAG with vision pipeline : Utilizes enrichment data to search for images and incorporates the enrichment pipeline during inference.
  • Evaluation starter code: Assesses the performance of a particular RAG pipeline configuration using various metrics, including ROUGE recall and LLM-as-a-judge techniques.

This repo is intended to be a starting point for RAG with vision, with the aim of enabling further experimentation to fine-tune the pipeline and best meet user needs for a given dataset.

Getting Started

Prerequisites and running the API

For more information on the prerequisites and how to run the RAG with Vision API locally, see here.

This repository also includes a devcontainer that can be used in VSCode with the ms-vscode-remote.remote-containers extension.

Understanding the architecture

The overall inference flow can be described via the following diagram:

Inference flow

For a full overview of the RAG with Vision architecture, including the document ingestion process and the image enrichment service, see this architecture document. An introduction to RAG pipeline evaluation and the starter evaluation flows provided in this repo, along with suggestions for collecting inner- and outer-loop feedback, can be found here.

About

This repository offers a Python framework for a retrieval-augmented generation (RAG) pipeline using text and images from MHTML documents, leveraging Azure AI and OpenAI services. It includes ingestion and enrichment flows, a RAG with Vision pipeline, and evaluation tools.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages