The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.
The broad aims of this Machine Learning Project are to understand the dataset and to build a ML pripeline that can predict SalePrice of Houses using various techniques.
The dataset and documentation can be found in these links: 1. Kaggle 2. Feature documentation by Dean de Cock
Some insights that I've gleaned from studying Instructor's notes about the 'Ames, Iowa: Alternative to the Boston Housing Data as an End of Semester Regression Project' by Dean De Cock, Truman State University
-
The dataset contains 2930 observations and a set of 80 explanatory variables (23 nominal, 23 ordinal, 14 discrete, and 20 continuous) involved in assessing home values.
-
20 continuous variables relate to various area dimensions for each observation.
-
The 14 discrete variables typically quantify the number of items occurring within the house. Most are specifically focused on the number of kitchens, bedrooms, and bathrooms, etc.
-
There are a large number of categorical variables (23 nominal, 23 ordinal) associated with this data set. They range from 2 to 28 classes with the smallest being STREET (gravel or paved) and the largest being NEIGHBORHOOD (areas within the Ames city limits). The nominal variables typically identify various types of dwellings, garages, materials, and environmental conditions while the ordinal variables typically rate various items within the property.
=====================
Ames dataset: House Price prediction for Kaggle competition (advanced regression, supervised ML)
https://www.kaggle.com/competitions/house-prices-advanced-regression-techniques/overview