Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 873 Bytes

README.md

File metadata and controls

32 lines (23 loc) · 873 Bytes

😑😑😑 Boring RAG 😑😑😑

(WIP) A RAG with LLM that generates boring text. Implemented everything from scratch.

Ref:

TODO

  • Preprocessing + PDF Reader

  • Chunking, SentenceSplit

  • Embedding chunks

  • Save the embedding

  • Similarity Search

  • (TBD) Embedding pooling

  • BaseIndex Draft

  • Ingestion Pipeline: Pack into splitter and embedding into IngestionPipeline

  • Storage Context

  • BaseEmbedding.similarity vs SimpleVectorStore.query vs RetrieverQueryEngine._query

  • Retrieval

  • Generation

  • Re-org demo_rag.py

BUGs

  • PDF doc needs post processing
  • Chunking's metadata is not saved
  • Pydantic requires 1.10.14