Data Engineering Course

Welcome to the Data Engineering course repository. This course is being undertaken in May 2023, and I will be documenting everything I learn here.

Introduction

In this course, you will delve into the concepts and practices of handling large volumes of data through extraction, transformation, and loading (ETL) processes. Additionally, you will learn to design and manage a Data Warehousing architecture using the latest technologies prevalent in today's market. You will implement data handling processes using the Python programming language, leveraging powerful libraries like Pandas. Moreover, you will work with SQL dialects and Amazon Redshift databases. By the end of this course, you will be equipped to administer, maintain, and optimize modern data infrastructures.

Professional Profile

Upon completion of the Data Engineering course, you will have the necessary skills and knowledge to:

Understand the concepts and challenges associated with the Big Data ecosystem.
Design and implement solutions for efficiently managing and analyzing large datasets.
Develop ETL processes for data extraction, transformation, and loading, enabling subsequent data analytics.
Successfully administer a data warehouse.

Prerequisites

To make the most of this course, it is recommended to have intermediate knowledge in the following areas:

Intermediate SQL skills and data analysis. Specifically, you should be familiar with the following concepts:
- Understanding primary keys and foreign keys.
- Knowledge of normalization concepts and preferably experience in normalizing a database.
- Proficiency in querying tables, aggregating results, utilizing aggregation functions, performing joins, creating new tables, and inserting, deleting, and updating records.
- Familiarity with SQL instructions such as JOINS, GROUP BY, HAVING, INSERT, UPDATE, DELETE, and CREATE.
Intermediate Python skills. Specifically, you should be familiar with the following concepts:
- Executing Python scripts, handling numeric variables, strings, lists/arrays, dictionaries, and writing and executing functions.
- Understanding APIs and experience in extracting data from websites using APIs, working with lists, dictionaries, and JSON.

Modules

Class 1: Foundation Content (Optional)

Introduction to the Unix terminal.
Basic concepts of computer architecture.
Python and SQL exercises.

Class 2: Introduction to Data Engineering

Exploring Big Data and its current challenges.
Understanding the collaboration between Data Engineers, Data Analysts, and Data Scientists.
Getting familiar with fundamental concepts in the field of Data Engineering.
Reviewing a basic data architecture.

Class 3: Data Warehouse

Introduction to OLAP databases.
Exploring MPP, Clustering, and Map Reduce techniques.
Leveraging Amazon Redshift for data warehousing.
Working with Apache Parquet file format.

Class 4: ETLs

Utilizing Pandas Dataframes for data manipulation.
Applying transformation techniques to dataframes (deduplication, merge, apply, etc).

Class 5: Security and Database Backup

Understanding the fundamentals of database security.
Implementing security measures in Amazon Redshift.
Performing manual backups to S3.

Class 6: Docker

Introduction to containerization and virtual machines.
Building Dockerfiles and creating Docker images.
Practical exercises using Docker.

Class 7: Apache Airflow

Overview of Apache Airflow.
Understanding the architecture of Airflow processes.
Working with DAGs, Tasks, and Operators.
Exploring advanced concepts: sensors, subdags, XCOMs.

Class 8: Stream Processing

Introduction to stream processing.
Working with PubSub messaging.
Theoretical introduction to Apache Kafka.
Practical exercises using AWS Kinesis.

Feel free to explore the contents of each module in the respective folders. I will be continuously updating this repository with my learning progress throughout the course.

If you have any questions, don't hesitate to create an issue in this repository. Happy learning!🧑‍💻👩‍💻

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
entregable-1		entregable-1
entregable-2		entregable-2
semana-1		semana-1
semana-2		semana-2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineering Course

Introduction

Professional Profile

Prerequisites

Modules

Class 1: Foundation Content (Optional)

Class 2: Introduction to Data Engineering

Class 3: Data Warehouse

Class 4: ETLs

Class 5: Security and Database Backup

Class 6: Docker

Class 7: Apache Airflow

Class 8: Stream Processing

About

Releases

Packages

Languages

Juanrii/data-engineering

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Course

Introduction

Professional Profile

Prerequisites

Modules

Class 1: Foundation Content (Optional)

Class 2: Introduction to Data Engineering

Class 3: Data Warehouse

Class 4: ETLs

Class 5: Security and Database Backup

Class 6: Docker

Class 7: Apache Airflow

Class 8: Stream Processing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages