This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
-
Updated
Dec 21, 2021 - Python
This repository has a collection of utilities for Glue Crawlers. These utilities come in the form of AWS CloudFormation templates or AWS CDK applications.
This repository contains source code for the AWS Database Blog Post Reduce data archiving costs for compliance by automating RDS snapshot exports to Amazon S3
ETL Data pipeline using aws services
In this project I have used the Trending YouTube Video Statistics data from Kaggle to analyze and prepare it for usage.
The Project aims to establish a robust data pipeline for tracking and analyzing sales performance using various AWS services. The process involves creating a DynamoDB database, implementing Change Data Capture (CDC), utilizing Kinesis streams, and finally, storing and querying the data in Amazon Athena.
A pipeline within AWS to capture schema changes in S3 files and to update them in a DB.
Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation
In this project, you will execute an End-To-End Data Engineering Project on Real-Time Stock Market Data using Kafka. We are going to use different technologies such as Python, Amazon Web Services (AWS), Apache Kafka, Glue, Athena, and SQL.
Este projeto tem como objetivo realizar a coleta, catalogo, governança, processamento e visualização de dados.
Unveiling job market trends with Scrapy and AWS
Collecting the list of songs,album and artists list details from the Spotify Music Application in specific intervals using spotipy API and performing ETL Operations using Amazon Cloud Services
Automation framework to catalog AWS data sources using Glue
Terraform configuration that creates several AWS services, uploads data in S3 and starts the Glue Crawler and Glue Job.
Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.
Implemented ETL pipeline on AWS for Playstore data using Lambda, Glue Crawlers, and Glue ETL Jobs. Orchestrated workflow with Step Functions and achieved seamless integration, optimal data merging, and enhanced data quality/accessibility.
Open data and cloud computing to answer the question: Are we losing our spring days?
An end-to-end data pipeline built with AWS S3, Glue, Crawler, Athena, Tableau visulization
Working with Glue Data Catalog and running the using S3 Event Notification and creating the entire stack using AWS CloudFormation
Analyzed a multicategory e-commerce store using big data techniques on a Kaggle dataset with the help of AWS EC2, AWS S3, PySpark, AWS Glue ETL, AWS Athena, AWS CloudFormation, AWS Lambda and Power BI!
AWS Athena, Glue Database, Glue Crawler deployment on existing S3 bucket through Serverless (sls) Framework.
Add a description, image, and links to the aws-glue-crawler topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue-crawler topic, visit your repo's landing page and select "manage topics."