Common Machine Learning algorithms implemented from Scratch
- Linear Regression using Normal Equation
- Linear Regression with Gradient Descent
- Logistic Regression with Gradient Descent
- Decision Trees
- Regression Tree
- Logistic Regression using Newton's method
- Linear Regression with Ridge Regularization
- Perceptron
- Autoencoder NN from scratch and using Tensorflow
- Classifier Neural Network (square loss) from scratch and using Tensorflow
- Classifier Neural Network (cross-entropy) from scratch and using Tensorflow
- Gaussian Discriminant Analysis
- Naive Bayesian Classifier
- Expectation Maximization
- AdaBoost
- AdaBoost with Active Learning
- AdaBoost with missing data (on UCI datasets)
- Error Correcting Output Codes
- Gradient Boosted Trees
- Feature Selection
- PCA for Feature Reduction
- Logistic Regression with regularization
- HAAR Image feature extraction
- Support Vector Machine
- Dual Perceptron
Linear Regression with Mean Squared Error cost function. The weight training is done with Normal Equation (closed-form solution).
Linear Regression for predicting Housing Price
Linear Regression for Email Spam detection
Linear Regression for House Price and Spam Email prediction using Batch Gradiet Descent.
Linear Regression with Gradient Descent-Spambase
Cost function
ROC curve
Linear Regression with Gradient Descent-Housing
Cost function
Logistic Regression - Spambase dataset
Log Likelihood
ROC curve
Decision Tree to classify data points in the Spambase dataset.
Regression Tree to predict continuous valued data for Housing price dataset.
Train logistic regression using Newton's method (solution in closed form)
Logistic Regression with Newton's method
Log likelihood
Train Linear Regression with Ridge regularization to control the weights
Linear Regression Ridge regularization - Housing
Linear Regression Ridge regularization - Spambase
Single layer pereptron to classify 01 labelled dataset
Mistakes per Iteration
Implemented Multilayer perceptron Neural Network for Autoencoder from scratch and using Tensorflow
Loss per Epoch
Loss per Epoch
Implemented a Multilayer perceptron Neural Network for Classification from scratch and using Tensorflow. Uses sigmoid activation along with a square loss.
Classifier with Square Loss from scratch
Loss per Epoch
Classifier with Square Loss using tensorflow
Loss per Epoch
Implemented a Multilayer perceptron Neural Network for Classification from scratch and using Tensorflow. Uses sigmoid activation with softmax at the output layer along with a cross entropy loss.
Classifier with Cross Entropy Loss from scratch
Loss per Epoch
Classifier with Cross Entropy Loss using Tensorflow
Loss per Epoch
Implemented GDA which learns a distribution to form discriminant function for prediction.
Implemented a Naive Bayesian Classifier which uses Baye's rule to learn a gaussian using given data probabilities.
Naive Bayes Classifier (Gaussian)
Naive Bayes Classifier with Bernoulli
Naive Bayes Classifier for broken down into 9 bins
Naive Bayes Classifier for broken down into 4 bins
Naive Bayes Classifier on Polluted Dataset
We implement Naive Bayes on missing data by ignoring it in Bernoulli distribution probability calculations.
Given data which is a mixture of Gaussian. EM algorithm accurately predicts to which Gaussian the data point belongs
Expectation Maximization (mixture of Gaussian)
Expectation Maximization for Coin Flipping example
Given two biased coins and datapoints derived by picking one coin at random and flipping it d times. EM algorithm helps to predict which coin is used to create that datapoint
Boosting is a methodology where by combining multiple weak learners we get a strong model for prediction. I have used simple 1-split Decision Tree as weak learner for this AdaBoosting implementation.
AdaBoost with Optimal Thresholding
Optimal thresholding signifies going through all the decision stumps => (feature,threshold) combinations to find the one that gives maximum improvement in predictions
Error at Each Round
Train/Test Error
AUC curve
ROC curve
AdaBoost with Random Thresholding
Random thresholding signifies picking a decision stump => (feature,threshold) combination at random
Error at Each Round
Train/Test Error
AUC curve
ROC curve
Implemented AdaBoost with Optimal decision stump on polluted dataset. Output stored in ./logs/out_polluted.txt
Active learning is a technique in which we start with some percent of random data in the training set and then keep adding data points with least error to the training set. I have used Adaboost with Optimal Decision stumps to implement the Active Learning starting with 5, 10, 15, 20, 30, 50 percent of random data.
Implemented Adaboost on popular UCI datasets. Also, handled missing data in both datasets. Tested the performance by using some fixed percent of data selected at random.
Error Correcting Output Codes uses a coding matrix to use provide a way to use a binary classifier on a multi-label dataset. ECOC uses this coding matrix such that each column in the matrix represents a subset of labels and each of these has it's own model (for our case we use Adaboost as the individual learner).
Bagging involves creating small bags of x% data picked randomly with replacement. A model is trained on each bag and predictions are made based on either the average/mode of predictions over all models depending on the type of labels.
Select top important features based on margins analysis. margins
PCA allows us to reduce the number of features in the dataset by creating features which are a linear combination of each other. We used sklearn's PCA implementation to reduce the number of features in our dataset to 100 and then ran Naive Bayes on these features to obtain good results.
Logistic Regression with Ridge regularization
Logistic Regression with LASSO regularization
We extracted features from MNIST dataset using HAAR methodology.
- Create 100 rectangles randomly distributed inside image pixel sizes.
- Extract 2 features each by imposing these 100 rectangles on each image.
- We get 200 features per image.
- Feed these features and labels into a multi-class classifier to obtain predictions(in our case we use AdaBoost with ECOC)
SVM using sklearn SVM with HAAR features Use the extracted HAAR features from MNIST dataset in SVM SVM with SMO from scratch SVM on MNIST with HAAR features
Implemented KNN with different number of nearest neighbors. Also, used different kernels like Gaussian, Cosine and Polynomial.
Implemented KNN with probability density estimator
Implemented KNN with fixed range of radius measured with Euclidean distance
Implemented KNN with feature selection by independently assessing the quality Weights of each feature by adjusting the weights with each instance
Implemented Dual Perceptron with linear and gaussian kernel.