This repository will hold the supervised learning content that I teach at Europeia University on a course about Big Data and Data Analytics.
The program (in Portuguese) can be found here.
The notebooks on this repository are separated into theory and practice.
- Data Collection
- 1.1. Data Sources
- 1.2. Data Collection Considerations
- Data Exploration and Preparation
- 2.1. Data Exploration
- 2.2. Data Preparation/Cleaning
- Split Data into Training and Test Sets
- 3.1. Holdout Method
- 3.2. Cross Validation
- 3.3. Data Leakage
- 3.4. Best Practices
- Choose a Supervised Learning Algorithm
- 4.1. Consider algorithm categories
- 4.2. Evaluate algorithm characteristics
- 4.3. Try multiple algorithms
- Train the Model
- 5.1. Objective Function (Loss/Cost Function)
- 5.2. Optimization Algorithms
- 5.3. Overfitting and Underfitting
- Evaluate Model Performance
- 6.1. Performance Metrics for Regression Models
- 6.2. Performance Metrics for Classification Models
- Model Tuning and Selection
- 7.1. Hyperparameter Tuning
- 7.2. Ensemble Methods
Clicking on the links below will open the notebooks in Colab and allow you to run it on the cloud.
-
Classification examples:
Ensure that you have install conda.
- Create a new environment
conda create -n ml
- Activate the new environment
conda activate ml
- Install poetry with conda
conda install poetry
- Install all packages
poetry install
Much can be improved, so feel free to send a PR with suggestions.