custom-llm-document-assistant

This project implements a document processing and querying system using a custom Language Model (LLM) and semantic search. It allows users to load PDF documents, process them into chunks, create embeddings, and then query the processed information using a streamlit-based user interface.

Embedding models: You could try other models from the Hugging Face Hub, such as "distilbert-base-uncased" or "roberta-base".
Language models: Instead of Ollama's Llama, you could integrate other LLMs like OpenAI's GPT models, Google's PaLM, or open-source alternatives like GPT-J or BLOOM.

Features

PDF Document Processing: Extracts text from PDF files and splits it into chunks.
Text Embedding Creation: Generates embeddings using SentenceTransformers.
Efficient Similarity Search: Utilizes FAISS index for fast retrieval of relevant text chunks.
Interactive User Interface: Streamlit-based UI for easy querying.
Custom LLM Integration: Uses Ollama to run a local LLM for generating responses.

Project Structure

00_data/: Stores input PDFs, processed chunks, and embeddings.
01_load_pdf.py: Handles PDF loading and text chunking.
02_embedding_creation.py: Creates embeddings from processed text chunks.
01_user_query.py: Streamlit app for user interaction and query processing.

Installation

Clone the repository:
Install required dependencies, or user docker image
Install Ollama by following the instructions at Ollama's official website.

Usage

Place your PDF document in the 00_data/ directory.
Process the PDF and create chunks: python 01_load_pdf.py
Generate embeddings: python 02_embedding_creation.py
Launch the Streamlit app: streamlit run 01_user_query.py
In a separate terminal, run the Ollama model: ollama run llama3
Open your web browser and go to the URL provided by Streamlit to use the query interface.

Configuration

You can customize various aspects of the system by modifying parameters in the scripts:

01_load_pdf.py: Adjust max_chunk_length to change text chunk sizes.
02_embedding_creation.py: Modify the Config class to change embedding model, batch size, or index type.
01_user_query.py: Adjust the number of relevant chunks or modify the LLM prompt.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.devcontainer		.devcontainer
00_data		00_data
01_index_emb_data		01_index_emb_data
02_index_emb_user_query		02_index_emb_user_query
03_stand_alone_llm		03_stand_alone_llm
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

custom-llm-document-assistant

Features

Project Structure

Installation

Usage

Configuration

Contributing

About

Releases

Packages

Languages

neuromorphicsystems/custom-llm-document-assistant

Folders and files

Latest commit

History

Repository files navigation

custom-llm-document-assistant

Features

Project Structure

Installation

Usage

Configuration

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages