King County is a county located in the U.S. state of Washington. The population was 2,149,970 in a 2016 census estimate. King is the most populous county in Washington, and the 13th-most populous in the United States. The county seat is Seattle, which is the state’s largest city. King County is one of three Washington counties that are included in the Seattle-Tacoma-Bellevue metropolitan statistical area. About two-thirds of King County’s population lives in the city’s suburbs. As of 2011, King County was the 86th highest-income county in the United States. This document addresses the factors concerning the “house sale prices” in King County sold between May 2014 and May 2015.
For this project, I am using a dataset from Kaggle, ‘kc_house_data.csv’ (https://www.kaggle.com/harlfoxem/housesalesprediction ). This dataset has a good mix of categorical independent variables, and a continuous dependent variable (price). This dataset contains house sale prices for King County. It is a useful dataset for evaluating simple regression models. In this dataset, I will predict the sales price of houses in King County. It includes homes sold between May 2014 and May 2015.
I performed Data cleaning, data modelling, variable selection method (step: forward and backward), Checking for skewness, Correlation to check which variables have positive and negative impact on Price prediction, k-fold cross validation on Gradient boosting.
Different models are used overtime to check Accuracy:
1.Linear regression
2.Decision Tree
3.Random Forest
4.Gradient Boosting