Udacity Data Engineering project: Data Warehouse
-
Updated
Feb 14, 2021 - Python
Udacity Data Engineering project: Data Warehouse
Udacity Data Engineering Nanodegree Project 3
AWS Pipeline examples from Udacity Date Engineering Nanodegree.
A data warehouse on Amazon Redshift using a star schema to facilitate the analysis of user behaviour on a music streaming app.
This project is a data warehousing solution for Sparkify, a music streaming service to extract data from JSON logs and stores it in a star schema data model in Amazon Redshift.
Create a Redshift DW on AWS by using snow schema and columnar operation `COPY` via SQL and Python
Udacity Course, Data Engineering Nanodegree, 3rd Project, Data Warehouse with Amazon Redshift
A repository concentrating on using High end parallel pipelines to perform ETL across various data sources
Warehousing with redshift
A music streaming startup, Sparkify, has grown their user base and song database and want to move their processes and data onto the cloud. The data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app. The objective of the project is to create an ETL pieline t…
The data is collected from IMDB and then transformed before loading to warehouse
This project builds a cloud-based ETL pipeline for Sparkify to move data to a cloud data warehouse. It extracts song and user activity data from AWS S3, stages it in Redshift, and transforms it into a star-schema data model with fact and dimension tables, enabling efficient querying to answer business questions.
Data Pipeline with Apache Airflow
Implement ETL data pipeline that reads data from S3 bucket and loads data into AWS redshift using Airflow
Query-able API analyzing pay gap between WNBA and NBA by scraping multiple data sources utilizing Python and Beautiful Soup
Database Schema & ETL pipeline for Song Play Analysis | Bosch AI Talent Accelerator Scholarship Program
Sparkify - Data Pipelines with Airflow - Udacity Data Engineering Expert Track.
Airflow orchestrated ETL (running in docker containers) that pulls batch data from an API to a local Postgres database, loads to AWS S3/Redshift provisioned by Terraform, and visualized in Quicksight.
☁️ Creating a Redshift Cluster using the AWS Python SDK
In this project, We will use Spark and data lakes to build an ETL pipeline for a data lake hosted on S3. To complete the project, you will need to load data from S3, process the data into analytics tables using Spark, and load them back into S3. We will deploy this Spark process on a cluster using AWS.
Add a description, image, and links to the redshift-cluster topic page so that developers can more easily learn about it.
To associate your repository with the redshift-cluster topic, visit your repo's landing page and select "manage topics."