Skip to content

Conversational AI Chatbot for AGH University of Science and Technology

Notifications You must be signed in to change notification settings

witek3100/chatAGH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  1. About
  2. Tech stack
  3. Project structure
  4. Project status and next steps
  5. Knowledge sources
  6. Run locally instruction
  7. Credits

About

CharAGH is conversational ai chatbot based on openAI's GPT4 with connection to knowledge about AGH University (statutes, websites etc.) using retrieval-augmented generation. Aim of the project is to create useful tool for academic society at AGH, allowing for quick retrieval of reliable information.

Check latest version here

Tech stack

Chatbot

  • The project's backbone is built upon the LangChain LLM framework, seamlessly integrating the OpenAI GPT4 model and utilizes Pinecone vector database for knowledge source embeddings storage, enabling RAG (Retriever-Augmented Generation). A migration to MongoDB Atlas is planned to consolidate data storage and enhance cost-effectiveness.

Web App

  • Serving as the user interface for chatbot, web application is created using Flask, JQuery and stores data in Mongo database. This combination ensures an intuitive experience for users interacting with the chatbot as weel as ease of development.

Deployment

  • Project is deployed within a Docker container on Google Cloud Run, offering scalability and reliability. Deployment procedures are automated using terraform and github actions.

Project structure

   ├── src
   |     ├── infra      ## Terraform files, deployment automation
   |     |      ├── main.tf
   |     |      └── variables.tf
   |     |
   |     ├── chatbot     ## Chatbot implementation using langchain
   |     |      ├── prompts
   |     |      ├── skills
   |     |      ├── chat.py
   |     |      └── message.py
   |     |
   |     ├── configs          ## Project configuration files
   |     |      ├── config.json
   |     |      └── requirements.txt
   |     |
   |     ├── sources             ## Knowledge sources generator (web scraper, langchain documents loader, and pinecone index initialization)
   |     |      ├── sources.json
   |     |      ├── sources_loader.py
   |     |      └── domain_scraper.py
   |     |
   |     ├── tests      ## tests - TODO
   |     |
   |     ├── web-app                 ## Web application - user interface to chatbot
   |     |      ├── static        
   |     |      |     ├── assets
   |     |      |     ├── css
   |     |      |     |    └── chat_tab.css
   |     |      |     └── js
   |     |      |          └── chat_tab.js
   |     |      ├── templates
   |     |      |     ├── chat_tab.html
   |     |      |     └── home_tab.html
   |     |      └── app.py
   |     |
   |     └── utils.py      ## Utilities (logger, mongo client etc.)
   |
   └── README.md                 

Project status and next steps

ChatAGH is currently on early stage of development, therefore bugs are expected and amount of functionalities is limited, now only to creating new, one time chats. Next steps involves:

  • Expanding web app (among others by adding accouts funcionality, to allow users store and return to their chats),
  • Imporoving chatting by broading knowledge sources and enhancing chatbot itself (prompts, model params etc.).
  • Fixing bugs and lot of smaller changes.

Knowledge sources

Current approach to generating knowledge sources is quite straightforward and requires improvement (expanding and filtering sources). In short, it involves scraping web pages from several domains related to agh and then loading them to pinecone using langchain. domain list:

List of domeins:

Run locally instruction

Credits

https://www.fpgmaas.com/blog/deploying-a-flask-api-to-cloudrun