In this study, the OJ dataset from the ISLR package has been analyzed. The dataset contains 1070 sales information for the Citrus Hill (CH) and Minute Maid (MM) brands of orange juice. After running a basic exploratroy data analysis, support vector based classification has been performed to predict which brand of the orange juice the customer purchased. Due to unbalanced categorical classes, stratified sampling using four-fold cross validation has been performed. Three classifiers, support vector classifier (SVC), support vector machine (SVM) with radial kernal, and SVM with second order polynomial, have been used for the analysis, and their performance has been compared using the test error rate, ROC curve, and area under the ROC curve.
GitHub_Proj11.pdf: Project report in PDF
GitHub_Proj11.R: R script
You can view the Project report in HTML by clicking here.