NeuraLearn Academy

NeuraLearn Academy is an innovative educational platform that utilizes generative AI and Large Language Models (LLMs) to elevate the online learning experience. Through AI-driven workflows, the platform offers features such as intelligent feedback, automated question generation, and content summarization, providing value to both instructors and students.

System Overview

NeuraLearn Academy was developed as my graduation project at the Faculty of Computers and Artificial Intelligence, Fayoum University. The system architecture consists of multiple interconnected servers, communicating via APIs:

Backend (Django): GitHub Repository
Frontend (React): GitHub Repository
Machine Learning (AI Services): GitHub Repository
Project Documentation: NeuraLearn_documentaion

Usage

Clone the repository:

git clone https://github.com/Sh-31/NeuraLearn-ML-Server-QG-RAG-Pipeline.git

Install the required dependencies:
```
pip3 install -r requirements.txt
```
Run the main script:
```
python3 server.py
```

Important Notes:

Question answer, Summarizer and Transcript Models:
These models utilize Hugging Face's serverless API (Hugging Face Inference API) to reduce local system resource usage, making them suitable for demo purposes.
Question Generation Model:
This model is loaded into the system's memory, requiring at least 6GB of GPU RAM to function efficiently. If you're limited on resources, I recommend using Lightning AI, which offers free GPU usage hours.

NeuraLearn ML API Documentation

This API provides machine learning services for the NeuraLearn Academy platform, including summarization, transcript generation, question answering, and comparison of answers.

Base URL

http://127.0.0.1:8080

Endpoints

1. Summarization

Generate a summary from the provided text input.

URL: /neuarlearn/ml/summarizer
Method: POST
Request Body:
```
{
  "text": "string",
  "max_length": 100,
  "min_length": 50
}
```
- text: The text to be summarized (required).
- max_length: The maximum length of the summary (optional).
- min_length: The minimum length of the summary (optional).
Response:
```
{
  "summary": "string"
}
```

2. Transcript Generation

Generate a transcript from an audio file.

URL: /neuarlearn/ml/transcript
Method: POST
Request Body: Raw audio bytes.
Response:
```
{
  "transcript": "string"
}
```

3. Question Answering

Answer questions based on the provided context and chat history.

URL: /neuarlearn/ml/chat
Method: POST
Request Body:
```
{
  "question": "string",
  "context": "string",
  "k": 5,
  "type": 1,
  "chat_history": ["string"],
  "do_splitting": true,
  "add_to_history": true,
  "chunk_size": 2000,
  "chunk_overlap": 1000
}
```
- question: The question to be answered (required).
- context: The context for the question (optional).
- k: The number of relevant passages to retrieve (optional).
- type: The question type (optional).
- chat_history: Previous chat history (optional).
- do_splitting: Flag to split long documents (optional).
- add_to_history: Flag to add the question to chat history (optional).
- chunk_size: Size of chunks for splitting (optional).
- chunk_overlap: Overlap between chunks (optional).

Response:

{
  "answer": "HTML (Hyper Text Markup Language) is used for structuring the content of a web page, such as headings, paragraphs, images, links, etc. CSS (Cascading Style Sheets) is used for styling the content, like colors, fonts, layout, and animations. HTML provides the structure, while CSS provides the look and feel of a web page. In the provided context, the user is modifying CSS properties like font size, line height, and color to change the appearance of the web page.",
  "chat_history": [
    "What is the difference between HTML and CSS?",
    "HTML (Hyper Text Markup Language) is used for structuring the content of a web page, such as headings, paragraphs, images, links, etc. CSS (Cascading Style Sheets) is used for styling the content, like colors, fonts, layout, and animations. HTML provides the structure, while CSS provides the look and feel of a web page. In the provided context, the user is modifying CSS properties like font size, line height, and color to change the appearance of the web page."
  ]
}

4. Answer Comparison

Compare a student's answer with the correct answer.

URL: /neuarlearn/ml/CompareAnswers
Method: POST
Request Body:
```
{
  "student_answer": "string",
  "correct_answer": "string",
  "question": "string"
}
```
- student_answer: The student's answer (required).
- correct_answer: The correct answer (required).
- question: The question being answered (required).
Response:
```
{
  "comparison_result": "string"
}
```

5. Question Generation

Generate questions based on the provided transcript.

URL: /neuarlearn/ml/QuestionGeneration
Method: POST
Request Body:
```
{
  "transcript": "string",
  "types": [1, 2],
  "chunk_size": 2000,
  "chunk_overlap": 1000
}
```
- transcript: The transcript text to generate questions from (required).
- types: List of question types (required), where:
  - 1: Multiple Choice Questions (MCQ)
  - 2: True/False Questions
  - 3: Open-ended Questions
- chunk_size: Size of chunks for processing (optional, default: 2000).
- chunk_overlap: Overlap between chunks (optional, default: 1000).
Response:

   {
    "0": {
      "Question": "What is the maximum width for the website?",
      "Answer": "300 pixels",
      "Choices": null
    },
    "1": {
      "Question": "What can you do to make the website responsive?",
      "Answer": "A",
      "Choices": [
        "A. Test the website in different screen sizes to ensure responsiveness.",
        "B. Use appropriate selectors and pseudo-classes for better specificity.",
        "C. Decide whether to use the 12-column or 16-column grid, and adjust your HTML structure accordingly.",
        "D. Apply CSS styles to the respective elements."
      ]
    }
  }

You can utilize the test script by navigating to Test/request_test.py. Simply uncomment any satisfied endpoint as shown below to execute the corresponding test:

Modeling Details

Question Generation Model

Technique: We utilized QLoRA (Quantized Low-Rank Adaptation) to fine-tune the LLaMA-3-8B model efficiently. This method allows for efficient adaptation of large language models with reduced computational and memory overhead.
Dataset: The fine-tuning was conducted using the QG_QuestionsData Dataset. which was created by combining various open-source datasets and applying a standardization schema.
Documentation: For detailed metadata and information about similar open-source datasets, refer to the DataSets MetaData.
Evaluation: Model performance was evaluated using the ROUGE evolution metric. Comparisons were made with other open-source LLMs. Evaluation metrics and results are available in the linked documentation.

Hyperparameters: Detailed hyperparameter settings are provided in the doc.

Summarization Model

Technique: We fine-tuned the LED-base-book-summary model, which employs the longformer architecture. Longformer is known for its efficient handling of long documents due to its sparse attention mechanism.
Dataset: Fine-tuning was performed on the PubMed Summarization dataset, which provides a range of medical abstracts and summaries useful for fine-tuning summarization models.

Learned Lessons

Challenges in Question Generation:
- Dataset Diversity: Many question generation datasets suffer from limited diversity. For example, a common issue is the prevalence of naive questions, such as "What is the aim of this paragraph?" This lack of variety can constrain the model's ability to generate nuanced questions.
- Filtering Unimportant Details: Another challenge is the model's tendency to overlook or misinterpret unimportant details within transcripts, which can affect the relevance of the generated questions.
Observations and Improvements:
- LLM Utilization: Utilizing the LLM itself for filtering unnecessary information from transcripts has shown to improve performance. This empirical approach helps in refining the data for more accurate question generation.
- Synthetic Data and Sophisticated Filtering: Building synthetic data to enhance diversity or developing advanced methods to filter out unnecessary data points can significantly improve the quality of question generation.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
BaseModels		BaseModels
Test		Test
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
document_prompts_utils.py		document_prompts_utils.py
main.py		main.py
memory_utils.py		memory_utils.py
model_utils.py		model_utils.py
parsers_utils.py		parsers_utils.py
question_generation_utils.py		question_generation_utils.py
requirements.txt		requirements.txt
semantic_similarity_utils.py		semantic_similarity_utils.py
server.py		server.py
summarizer_utils.py		summarizer_utils.py
transcript_utils.py		transcript_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeuraLearn Academy

System Overview

Usage

Important Notes:

NeuraLearn ML API Documentation

Base URL

Endpoints

1. Summarization

2. Transcript Generation

3. Question Answering

4. Answer Comparison

5. Question Generation

Modeling Details

Question Generation Model

Summarization Model

Learned Lessons

About

Releases

Packages

Languages

License

Sh-31/NeuraLearn-ML-Server-QG-RAG-Pipeline

Folders and files

Latest commit

History

Repository files navigation

NeuraLearn Academy

System Overview

Usage

Important Notes:

NeuraLearn ML API Documentation

Base URL

Endpoints

1. Summarization

2. Transcript Generation

3. Question Answering

4. Answer Comparison

5. Question Generation

Modeling Details

Question Generation Model

Summarization Model

Learned Lessons

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages