The main goal of this project is to predict the attrition of an employee in a given company, i.e. the probability of an employee with certain characteristics stay or quit his/her current job in the firm.
The basis for the exploratory data analysis and predictive modelling is a small HR dataset containing the characteristics of several employees (demographics and not only) and the historic in the company if that employee is still kept or not in the company. So, this is a single classification problem where the objective is to predit attrition for the future given current employees status in the company.
An extensive analysis was made regarding all variables in the dataset and it can be divided in the following steps:
- Exploratory Data Analysis (either in the python/jupyter script but also in PowerBI)
- Outlier Analysis and Data Transformation
- Data Modelling (Clustering) - Unsupervised Learning - Exploring different types of segmentation
- Predictive Modelling using Feature Engineering and several models and parameters on each
- Building new models to predict Attrition with 1 year in advance (so, removing all employees who have been for the company for 1 year or less)
- Model evaluation, analysis and selection for each scenario.
Python notebook and outputs can be seen at "main.ipynb" as well as the presentation files and the PowerBI file.