This project uses dbt (data build tool) to transform and analyze Zillow Research data, including home values across various regions in the United States. The project is set up to connect to AWS Athena through dbt Cloud.
The primary data source for this project is the Zillow Home Value Index (ZHVI):
Zillow Home Value Index (ZHVI): A measure of the typical home value and market changes across a given region and housing type. It reflects the typical value for homes in the 35th to 65th percentile range. Available as a smoothed, seasonally adjusted measure and as a raw measure.
For more information, refer to the ZHVI User Guide.
The project consists of the following main components:
- Source Data: Raw data from Zillow Research, including regions and home values.
- Intermediate Models: Cleaned and prepared data from the raw sources.
- Mart Models: Analytics-ready models for various analyses.
regions
: Core dimension table containing information about various geographical regions.regions_home_values
: Fact table containing home value data for each region over time.regions_date_spine
: Date dimension table for time-based analysis.regions_size_rank_analysis
: Analysis of home values and growth rates by size rank.regions_volatility_analysis
: Analysis of price volatility in different regions.regions_geographic_comparison
: Comparison of home values and growth rates across geographic levels.regions_time_series
: Time series analysis of home values and growth rates.
The project is configured to use dbt Cloud with AWS Athena as the data warehouse. Key configuration details:
- Project Name: zillow_research
- dbt Version: 2.0.0
- Profile: zillow_research
- Target: AWS Athena
Here is a screenshot of the dbt Cloud docs for this project
The following image shows the lineage graph of the models in this project, illustrating the dependencies between different models:
To get started with this project:
- Clone the repository to your local machine.
- Download Zillow Research csv files and place in the data/ directory
- run python3 scripts/compile_regions.py to create source tables for Athena
- Upload source tables to S3 to match tables in models/sources.yml
- run glue crawlers or manually create raw data tables in an athena database named zillow_research
- Set up your dbt Cloud environment with AWS Athena credentials
Models could be expanded for additional Zillow Research datasets, and models could be forked for Athena alternatives.
This project is licensed under the MIT License.