Medical-Question-Answering-System

System Architecture

The system consists of 3 main modules: Knowledge Graph, Keyword Extractor, and BERT, BiLSTM, BiGRU semantic similarity model.

Knowledge Graph

Neo4j-based graph database is further modified to store the medical information. Cypher language and index adjacency are used in target access of data queries, which increases the query speed and eases the subsequent retrievals.

Answer Extraction

Input question is provided.
Key entities are from the input question.
Question intentions are derived.
Disease & Symptom Extractor is used to extract entities and the intentions.
Cypher Language is built to query from the knowledge graph by integrating the D&S entities and the user intention extracted from the previous stage.
Returned answers are cleaned and provided to user.

Knowledge Graph Question Classification

Question Type	Question Example
Disease_symptom	Symptoms of lung cancer?
Symptom_disease	What causes fever?
Disease_Cause	Lung cancer causes?
Disease_prevent	How to prevent cold?
Disease_cureaway	Medications for cold?
Disease_lasttime	Sars Lifetime

Keyword Extraction

The keyword extraction section is placed over the similarity model as a preprocessing layer to filters the dataset questions in order to extract the most relevant ones based on the user question, and an optimization layer that significantly speeds up the question answering process. The extraction layer consists of two major sections: a populated SQL database that contains the dataset questions id with the keywords and the synonyms of each question, and an extraction algorithm that retrieves the most relevant questions. As shown in the figure below, the extraction flow starts with receiving a question that is being processed using NLTK to extract the questions' keywords and their corresponding synonyms. A SQL selection query is then used to select all questions that match the extracted keywords. Accordingly, the questions that have the most keywords occurrences are extracted to be passed to the similarity model.

BERT, BiLSTM, BiGRU Semantic Similarity Model

Our Similarity Model is divided into different layers as mentioned earlier. Our similarity model takes two sentences as an input. Then for each sentence we use BERT as word embedding. Then each word embedding vector get send once to the BiLSTM layer and another time to the BiGRU layer. Then the output of each feature extraction layer gets send to a max pooling and average pooling layers. We concatenate the 2 feature extraction outputs of the max pooling and 2 feature extraction outputs of the average pooling then the output of the concatenation layer gets send to a dense layer as a final step.

Environment setup

Download Neo4j Graph Database using the following link: https://neo4j.com/download/
Create a new folder in Neo4j and import the following DB in Neo4j: https://drive.google.com/file/d/1aodIZ6Dl5qCPJZg7W5ki_UQug-gt5iET/view?usp=sharing
Download and install requirements.txt -> pip install -r requirements.txt
Download the .h5 file: https://drive.google.com/file/d/1gNGI4nmKXp9g38-rrpraIweXaJhkaQLc/view?usp=sharing
Download google/bert_uncased_L-12_H-768_A-12: https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
Clone the main branch and run GUI.py

Experiments and Results

1. Exploration of different WordEmbeddings

Word embeddings which are also known as word representations are needed to map words and phrases into a low-dimensional continuous space. The base model that we are comparing through the experiments uses BERT. The first embedding that was explored was word2vec and the second one was FastText. The results are shown in the table below.

Word Embedding	Evaluation Accuracy
word2vec	86.31%
fast-text	82.8%
BERT	86.91%

2. Exploration on different feature extraction:

Feature extraction is the process of reducing the data into more manageable groups for processing. The first method experimented was a basic CNN layer instead of BiLSTM; however, the results were not promising as the accuracy decreased. The second method was BiGRU and it yielded an accuracy very close to the BiLSTM. The last method experimented was the concatenation of both the BiLSTM and BiGRU layers and together they composed the feature extraction layer of the semantic similarity model. Below are the results:

Accuracy	Precision	Recall	F1
BiLSTM	86.3%	86.2%	88.7%
CNN	82.8%	80.9%	82.9%
BiGRU	86.9%	87.1%	87.2%
BBBSM	87.2%	87.2%	88%

The examples below illustrate the retrieved answers for sample questions that we used during evaluating our QA system.

Q1: What are the symptoms of lung cancer?

A1: The symptoms of lung cancer include:hemoptysis with chest pain stem in bloodshot couwith weight loss sticky or purulent sputum with, slightly, chest pain fever with cough planting spread lung nodules.

Q2: What is coronavirus?

A2: Coronavirus are a type of virus - there aremany kinds, and some of them can cause disease. Examples of coronaviruses include the commoncold, flu, to more severe diseases like MiddleEast Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV).

System Accuracy

Overall System	Accuracy
KG+BBBSM	86.48%

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.idea		.idea
MedicalKG_/MedicalKBQA		MedicalKG_/MedicalKBQA
SNLI_Corpus		SNLI_Corpus
__pycache__		__pycache__
dict		dict
.gitattributes		.gitattributes
AttentionLayer.py		AttentionLayer.py
Bert_BiLSTM.py		Bert_BiLSTM.py
Bert_Bigru.py		Bert_Bigru.py
Full Paraphrased Dataset.csv		Full Paraphrased Dataset.csv
GUI.py		GUI.py
LICENSE		LICENSE
QuoraDuplicate.py		QuoraDuplicate.py
README.md		README.md
Sim(Bert_BiGRU).py		Sim(Bert_BiGRU).py
Sim(Bert_BiLSTM).py		Sim(Bert_BiLSTM).py
WebMD.py		WebMD.py
answer_search.py		answer_search.py
embeddings.py		embeddings.py
eval.py		eval.py
fetch.py		fetch.py
fetch_bilstm.py		fetch_bilstm.py
freq.csv		freq.csv
icliniqQAs.csv		icliniqQAs.csv
quora_duplicate_questions.tsv		quora_duplicate_questions.tsv
requirements.txt		requirements.txt
sim(FastText).py		sim(FastText).py
sim(word2vec).py		sim(word2vec).py
tag_extraction.py		tag_extraction.py
train.py		train.py
webmdQAs.json		webmdQAs.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical-Question-Answering-System

System Architecture

Knowledge Graph

Keyword Extraction

BERT, BiLSTM, BiGRU Semantic Similarity Model

Environment setup

Experiments and Results

System Accuracy

About

Releases

Packages

Languages

License

AhmedAbouzaid1/Medical-Question-Answering-System

Folders and files

Latest commit

History

Repository files navigation

Medical-Question-Answering-System

System Architecture

Knowledge Graph

Keyword Extraction

BERT, BiLSTM, BiGRU Semantic Similarity Model

Environment setup

Experiments and Results

System Accuracy

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages