This was a data science contest hosted by Analytics Vidhya. Sales data for various products across different stores in different cities has been provided to us in the form of train & test data. I worked on this project during the Analytics Vidhya DataHack Hour. This project required all fundamentals like data cleaning, preprocessing and visualization, outlier & missing value analysis, feature engineering, applying different models and its hyperparameter tuning to predict the sales to achieve a high accuracy.
Steps involved in this project :
- Imported the required libraries and the datasets.
- Performed Exploratory Data Analysis (EDA) on both numerical as well as categorical variables.
- Analyzed the distribution of different variables using univariate analysis.
- Analyzed the correlation of different variables to the target variable through bivariate analysis.
- Did outlier and missing value analysis and even fixed it.
- Cleaned the data.
- Performed feature engineering.
- Converted all categorical variables into numeric variables.
- The last step performed was Model Building.
- Implemented various models like Linear regression, random forests, decision trees & ridge regression.
A humble request to the people who came here in search of code for this project, please don't copy the whole code and content. You have spent some time learning this subject. So, better try to write your code in your own way. It will further enhance your skills. If stucked somewhere, this repository is always here to help you.