Banks MC ETL Pipeline

The Banks MC ETL Pipeline is a Python project with the primary objective of automating the extraction, transformation, and loading (ETL) process for banks' market capital data. Leveraging web scraping capabilities through Requests and BeautifulSoup. Subsequently, Pandas, numpy, SQLite, and Apache Airflow are utilized to create the project. Docker is used to containerize Airflow, ensuring a simplified deployment.

ETL Pipeline

Extract: It is initiated by leveraging web scraping capabilities through Requests and BeautifulSoup to extract market capital data from an archived Wikipedia page. This phase involves retrieving specific information related to banks' market capital from the source.
Transform: Following data extraction, the pipeline utilizes Pandas and numpy to transform the raw data according to a predefined CSV file. This transformation involves calculating market capital in other currencies based on predefined conversion rates.
Load: Once the data has been successfully transformed, the final step involves loading it into both a CSV file and an SQLite database. This is facilitated by incorporating SQLite for database management.
Apache Airflow orchestrates these tasks, ensuring a systematic and automated execution of the entire ETL process.

Tools & Libraries

Python
Airflow
SQLite
Requests
BeautifulSoup
Pandas
Numpy

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
dags		dags
mypackage		mypackage
script		script
.gitignore		.gitignore
Banks_MC.db		Banks_MC.db
README.md		README.md
docker-compose.yml		docker-compose.yml
etl_project_log.txt		etl_project_log.txt
exchange_rates.csv		exchange_rates.csv
requirements.txt		requirements.txt
tasks_graph.png		tasks_graph.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Banks MC ETL Pipeline

ETL Pipeline

Tools & Libraries

About

Releases

Packages

Languages

SadafAsad/Banks-MC-ETL-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Banks MC ETL Pipeline

ETL Pipeline

Tools & Libraries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages