This project involves building a data pipeline to process and visualize financial transactional data using Azure services. The pipeline includes data ingestion, validation, preprocessing, KPI calculation, and visualization using Power BI.
- Data Ingestion: Collect data files and store them in Azure Data Lake Storage.
- Data Validation and Preprocessing: Validate and preprocess the data before ingestion into Azure SQL Database using custom Python scripts.
- Data Storage: Store the validated and cleaned data in Azure SQL Database.
- KPI Calculation: Create KPI tables based on the transactional data for various time periods.
- Data Visualization: Visualize the KPIs using Power BI.
- Azure Data Lake Storage: For storing raw and processed data files.
- Azure Data Factory: For orchestrating data ingestion and processing pipelines.
- Azure SQL Database: For storing cleaned and validated data.
- Python: For custom scripts to validate and preprocess data.
- Power BI: For visualizing the KPIs.
- Azure Synapse Analytics (optional): For advanced analytics and big data processing.
-
Data Ingestion:
- Store CSV files (transactions, employees, payments, products, customers) in Azure Data Lake Storage.
-
Data Validation and Preprocessing:
- Use Python scripts to perform basic validation checks (e.g., missing values, duplicates).
- Preprocess the data (e.g., data type conversions, handling missing values).
-
Data Storage:
- Upload the cleaned data to Azure SQL Database using custom scripts or Azure Data Factory.
-
KPI Calculation:
- Create stored procedures in Azure SQL Database to calculate KPIs such as the number of accounts opened, the number of transactions, and the total transaction amount for various time periods.
-
Data Visualization:
- Connect Power BI to Azure SQL Database.
- Import KPI tables and create dashboards to visualize the data.
-
Set Up Azure Services:
- Set up Azure Data Lake Storage, Azure SQL Database, and Azure Data Factory.
-
Ingest Data:
- Upload the provided CSV files to Azure Data Lake Storage.
-
Validate and Preprocess Data:
- Run the provided Python scripts to validate and preprocess the data.
-
Store Data in Azure SQL Database:
- Use custom scripts or Azure Data Factory to upload the cleaned data to Azure SQL Database.
-
Calculate KPIs:
- Execute the stored procedures in Azure SQL Database to calculate the KPIs.
-
Visualize Data in Power BI:
- Connect Power BI to Azure SQL Database and create dashboards to visualize the KPIs.