A Chinese automobile company Geely Auto aspires to enter the US market by setting up its manufacturing unit there and producing cars locally to give competition to their US and European counterparts.
They have contracted an automobile consulting company to understand the factors on which the pricing of cars depends. Specifically, they want to understand the factors affecting the pricing of cars in the American market, since those may be very different from the Chinese market. The company wants to know:
Which variables are significant in predicting the price of a car? How well do those variables describe the price of a car? Based on various market surveys, the consulting firm has gathered a large dataset of different types of cars across the American market.
we are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with respect to the independent variables. They can accordingly manipulate the design of the cars, the business strategy, etc. to meet certain price levels. The model will act as a better medium for management to understand the pricing dynamics of a new market.
The following steps will be followed to reach our goal:
-
Importing libraries.
-
Reading the concerned dataset
-
Data Understanding
-
Data handling
-
Data visualization
-
Data Preparation
-
Splitting the Data and feature scaling
-
Building a linear regression model
-
Residual analysis of the train data
-
Making Predictions Using the Final Model
-
Model Evaluation
-
Conclusion
For this project, I used Kaggle’s Red Wine Quality dataset to build various classification models to predict whether a particular red wine is “good quality” or not. Each wine in this dataset is given a “quality” score between 0 and 10. For the purpose of this project, I converted the output to a binary output where each wine is either “good quality” (a score of 7 or higher) or not (a score below 7). The quality of a wine is determined by 11 input variables:
Fixed acidity Volatile acidity Citric acid Residual sugar Chlorides Free sulfur dioxide Total sulfur dioxide Density pH Sulfates Alcohol
The objectives of this project are as follows:
To experiment with different classification methods to see which yields the highest accuracy To determine which features are the most indicative of a good quality wine
Importing Lib Loading Data Understanding Data Missing Values Exploring Variables(Data Analysis) Feature Selection The proportion of Good vs Bad Wines Preparing Data for Modelling Applying different models Choosing right model Hurray you just completed the task!
The aim is to build a predictive model and find out the sales of each product at a particular store. Create a model by which Big Mart can analyze and predict outlet production sales.
A perfect project to learn Data Analytics and apply Machine Learning algorithms (Linear Regression, Random Forest Regressor, XG Boost) to predict outlet production sales.
BigMart has collected sales data from the year 2013, for 1559 products across 10 stores in different cities. Where the dataset consists of 12 attributes like Item Fat, Item Type, Item MRP, Outlet Type, Item Visibility, Item Weight, Outlet Identifier, Outlet Size, Outlet Establishment Year, Outlet Location Type, Item Identifier, and Item Outlet Sales. Out of these attributes response variable is the Item Outlet Sales attribute and the remaining attributes are used as the predictor variables.
The data set is also based on hypotheses of store level and product level. Where store level involves attributes like:- city, population density, store capacity, location, etc, and the product level hypotheses involve attributes like:- brand, advertisement, promotional offer, etc.
The data has 8523 rows of 12 variables.
Item_Identifier- Unique Product ID Item_Weight- Weight of the product Item_Fat_Content - Whether the product is low fat or not Item_Visibility - The % of the total display area of all products in a store allocated to the particular product Item_Type - The category to which the product belongs Item_MRP - Maximum Retail Price (list price) of the product Outlet_Identifier - Unique store ID Outlet_Establishment_Year- The year in which the store was established Outlet_Size - The size of the store in terms of ground area covered Outlet_Location_Type- The type of city in which the store is located Outlet_Type- Whether the outlet is just a grocery store or some sort of supermarket Item_Outlet_Sales - Sales of the product in the particular store. This is the outcome variable to be predicted.
Pip install pandas, pip install numpy, pip install matplotlib, pip install lib, pip install seaborn, pip install Sklearn, pip install joblib, pip install flask
We will handle this problem in a structured way. Loading Packages and Data Data Structure and Content Exploratory Data Analysis Missing Value Treatment Feature Engineering Encoding Categorical Variables Label Encoding Preprocessing Data Modeling Deployment