- Repository Name
- Title of the Project
- Short Description of the Project
- Objectives of the Project
- Name of the Dataset
- Description of the Dataset
- Goal of the Project using this Dataset
- Size of the dataset
- Algorithms which are used as part of our investigation
- Project Requirements
- Usage of the Project
- Which chatbot architecture should the users use
- Authors
smartchat-conversational-chatbot
SmartChat: A Context-Aware Conversational Agent
Develop a chatbot that can effectively adapt to context and topic shifts in a conversation, leveraging the Stanford Question Answering Dataset to provide informed and relevant responses, and thereby increasing user satisfaction and engagement.
Create a user-friendly web or app interface that enables users to have natural and coherent conversations with the chatbot, with high satisfaction rating.
The dataset used in this project is Stanford Question Answering Dataset.
Data Source: Kaggle
Type of the Dataset: Text
The Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset consisting of questions posed by crowdworkers on a set of Wikipedia articles. The answer to every question is a segment of text, or span, from the corresponding reading passage. There are 100,000+ question-answer pairs on 500+ articles. More information can be found at: https://rajpurkar.github.io/SQuAD-explorer/
- The goal of the project is to develop a chatbot that can carry out multi-turn conversations, adapt to context, and handle a variety of topics.
- The dataset has 2 JSON files. One is for training and the other is for testing
- dev-v1.1.json – 4.9 MB
- train-v1.1.json – 30.3 MB
- 2 different architectures are used:
- GPT2-Medium architecture using LoRA and PEFT
- BERT (bert-base-uncased)
- python3
- datasets
- torch
- peft
- transformers
- evaluate
- safetensors
- numpy
- pandas
- matplotlib
- scikit-learn
- seaborn
- nltk
- rouge-score
- rouge
- gradio
- tqdm
- Goto SQuAD Dataset Preprocessing and ensure that you have
train-v1.1.json
anddev-v1.1.json
files. - Goto SQuAD Dataset Preprocessing file and run all the cells.
- To execute and view the results of BERT (bert-base-uncased) approach, please go through the instructions provided in the SQuAD_chatbot_using_bert-base-uncased_README.md file.
- To execute and view the results of GPT (gpt2-medium using LoRA and PEFT) approach, please go through the instructions provided in the SQuAD_chatbot_using_gpt2-medium_README.md file.
- Actually, both the chatbots are running well.
- SQuAD_using_gpt2-medium generates the answers but most of the times it has issues.
- For more information on the observations and technical details, refer training and validation files.
- SQuAD_using_bert-base-uncased is working very well as expected.
- Final conclusion is: Users can use any chatbot. But for perfect generation of answers, make use of SQuAD_using_bert-base-uncased.
Author | Contact Details |
---|---|
Ravi Teja Kothuru | rkothuru@sandiego.edu |
Soumi Ray | soumiray@sandiego.edu |
Anwesha Sarangi | asarangi@sandiego.edu |