Machine Learning from Scratch

Implemented a Multilayer perceptron Neural Network for Classification from scratch and using Tensorflow. Uses sigmoid activation along with a square loss.

Classifier with Square Loss from scratch

Loss per Epoch

Classifier with Square Loss using tensorflow

Loss per Epoch

Cross-Entropy Classifier Neural Network

Implemented a Multilayer perceptron Neural Network for Classification from scratch and using Tensorflow. Uses sigmoid activation with softmax at the output layer along with a cross entropy loss.

Classifier with Cross Entropy Loss from scratch

Loss per Epoch

Classifier with Cross Entropy Loss using Tensorflow

Loss per Epoch

Gaussian Discriminant Analysis

Implemented GDA which learns a distribution to form discriminant function for prediction.

GDA

Naive Bayesian Classifier

Implemented a Naive Bayesian Classifier which uses Baye's rule to learn a gaussian using given data probabilities.

Naive Bayes Classifier (Gaussian)

Naive Bayes Classifier with Bernoulli

Naive Bayes Classifier for broken down into 9 bins

Naive Bayes Classifier for broken down into 4 bins

Naive Bayes Classifier on Polluted Dataset

Naive Bayes on missing data

We implement Naive Bayes on missing data by ignoring it in Bernoulli distribution probability calculations.

Expectation Maximization

Given data which is a mixture of Gaussian. EM algorithm accurately predicts to which Gaussian the data point belongs

Expectation Maximization (mixture of Gaussian)

Expectation Maximization for Coin Flipping example

Given two biased coins and datapoints derived by picking one coin at random and flipping it d times. EM algorithm helps to predict which coin is used to create that datapoint

AdaBoost

Boosting is a methodology where by combining multiple weak learners we get a strong model for prediction. I have used simple 1-split Decision Tree as weak learner for this AdaBoosting implementation.

AdaBoost with Optimal Thresholding

Optimal thresholding signifies going through all the decision stumps => (feature,threshold) combinations to find the one that gives maximum improvement in predictions

Error at Each Round

Train/Test Error

AUC curve

ROC curve

AdaBoost with Random Thresholding

Random thresholding signifies picking a decision stump => (feature,threshold) combination at random

Error at Each Round

Train/Test Error

AUC curve

ROC curve

AdaBoost on Polluted Data

Implemented AdaBoost with Optimal decision stump on polluted dataset. Output stored in ./logs/out_polluted.txt

AdaBoost with Active Learning

Active learning is a technique in which we start with some percent of random data in the training set and then keep adding data points with least error to the training set. I have used Adaboost with Optimal Decision stumps to implement the Active Learning starting with 5, 10, 15, 20, 30, 50 percent of random data.

AdaBoost with missing data on UCI datasets

Implemented Adaboost on popular UCI datasets. Also, handled missing data in both datasets. Tested the performance by using some fixed percent of data selected at random.

Error Correcting Output Codes

Error Correcting Output Codes uses a coding matrix to use provide a way to use a binary classifier on a multi-label dataset. ECOC uses this coding matrix such that each column in the matrix represents a subset of labels and each of these has it's own model (for our case we use Adaboost as the individual learner).

Gradient Boosted Trees

Bagging

Bagging involves creating small bags of x% data picked randomly with replacement. A model is trained on each bag and predictions are made based on either the average/mode of predictions over all models depending on the type of labels.

Feature Selection using Margin analysis

Select top important features based on margins analysis. margins

PCA for feature reduction

PCA allows us to reduce the number of features in the dataset by creating features which are a linear combination of each other. We used sklearn's PCA implementation to reduce the number of features in our dataset to 100 and then ran Naive Bayes on these features to obtain good results.

Logistic Regression with regularization

Logistic Regression with Ridge regularization

Logistic Regression with LASSO regularization

HAAR Image feature extraction

We extracted features from MNIST dataset using HAAR methodology.

Create 100 rectangles randomly distributed inside image pixel sizes.
Extract 2 features each by imposing these 100 rectangles on each image.
We get 200 features per image.
Feed these features and labels into a multi-class classifier to obtain predictions(in our case we use AdaBoost with ECOC)

Support Vector Machine

SVM using sklearn SVM with HAAR features Use the extracted HAAR features from MNIST dataset in SVM SVM with SMO from scratch SVM on MNIST with HAAR features

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.idea		.idea
Active-Learning		Active-Learning
AdaBoost		AdaBoost
AdaBoost_UCI		AdaBoost_UCI
Adaboost_with_bad_features		Adaboost_with_bad_features
AutoEncoder		AutoEncoder
Bagging		Bagging
Decision Tree		Decision Tree
Dual-Perceptron		Dual-Perceptron
ECOC		ECOC
EM		EM
Gaussian Discriminant Analysis		Gaussian Discriminant Analysis
Gradient Descent		Gradient Descent
Gradient-Boosting		Gradient-Boosting
HAAR		HAAR
KNN		KNN
Linear Regression		Linear Regression
Logistic Regression		Logistic Regression
Logistic_Reg_Polluted		Logistic_Reg_Polluted
Logistic_with_regularization		Logistic_with_regularization
Naive Bayesian		Naive Bayesian
Naive_Bayes_missing_data		Naive_Bayes_missing_data
Naive_Polluted		Naive_Polluted
Naive_bayes_PCA		Naive_bayes_PCA
NeuralNetwork		NeuralNetwork
Newton's Method		Newton's Method
Perceptron		Perceptron
Regression Tree		Regression Tree
Regularization		Regularization
SMO		SMO
SVM		SVM
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

prasad-madhale/machine-learning

Folders and files

Latest commit

History

Repository files navigation

Machine Learning from Scratch

Index:

Linear Regression (Normal Equation)

Linear Regression (using Batch Gradient Descent)

Logistic Regression with Batch Gradient Descent

Decision Tree

Regression Tree

Logistic Regression using Newton's Method

Linear Regression with Ridge Regularization

Perceptron

Autoencoder Neural Network (scratch & tf)

Square Loss Classifier Neural Network (scratch & tf)

Cross-Entropy Classifier Neural Network

Gaussian Discriminant Analysis

Naive Bayesian Classifier

Expectation Maximization

AdaBoost

Logistic Regression with regularization

Support Vector Machine

K-Nearest Neighbors

1. KNN with Fixed neighbors

2. KNN with probabilities

3. KNN with a fixed range of radius

4. KNN with Relief algorithm

About

Topics

Resources

License

Stars

Watchers

Forks

Languages