Upload your PDF and talk to it in natural language to gain insights and knowledge, just as you would talk to an LLM such as ChatGPT or Bard.
Note: The app will only respond to questions related to the loaded PDF.
- PDF is divided up into small chunks
- Chunks are embedded using an embedding model
- Embeddings are stored in a vector store
- User asks a question
- Question is embedded using the same embedding model
- Similarity search of the embedded question is performed with docs in the vector store
- Question + similar docs are sent to LLM
- LLM answers the question, which is shown to the user
This project is written in python 3.10.10
- Copy
.env.example
and rename to.env
- Add the
OPENAI_API_KEY
to.env
- To use hugging face models add the
HUGGINGFACEHUB_API_TOKEN
to.env
Install requirements:
pip install -r requirements.txt
In order to use hugging face models, uncomment the following in requirements.txt
- huggingface-hub
- InstructorEmbedding
- sentence-transformers
and run:
pip install -r requirements.txt
streamlit run app.py
To use hugging face models:
streamlit run app.py -- --hf
Using --hf
downloads the embedding model on your machine and the embeddings are performed locally. The LLM used is accessed via the Hugging Face Inference API
.