Skip to content

Juanrii/data-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Course

Welcome to the Data Engineering course repository. This course is being undertaken in May 2023, and I will be documenting everything I learn here.

Introduction

In this course, you will delve into the concepts and practices of handling large volumes of data through extraction, transformation, and loading (ETL) processes. Additionally, you will learn to design and manage a Data Warehousing architecture using the latest technologies prevalent in today's market. You will implement data handling processes using the Python programming language, leveraging powerful libraries like Pandas. Moreover, you will work with SQL dialects and Amazon Redshift databases. By the end of this course, you will be equipped to administer, maintain, and optimize modern data infrastructures.

Professional Profile

Upon completion of the Data Engineering course, you will have the necessary skills and knowledge to:

  • Understand the concepts and challenges associated with the Big Data ecosystem.
  • Design and implement solutions for efficiently managing and analyzing large datasets.
  • Develop ETL processes for data extraction, transformation, and loading, enabling subsequent data analytics.
  • Successfully administer a data warehouse.

Prerequisites

To make the most of this course, it is recommended to have intermediate knowledge in the following areas:

  • Intermediate SQL skills and data analysis. Specifically, you should be familiar with the following concepts:

    • Understanding primary keys and foreign keys.
    • Knowledge of normalization concepts and preferably experience in normalizing a database.
    • Proficiency in querying tables, aggregating results, utilizing aggregation functions, performing joins, creating new tables, and inserting, deleting, and updating records.
    • Familiarity with SQL instructions such as JOINS, GROUP BY, HAVING, INSERT, UPDATE, DELETE, and CREATE.
  • Intermediate Python skills. Specifically, you should be familiar with the following concepts:

    • Executing Python scripts, handling numeric variables, strings, lists/arrays, dictionaries, and writing and executing functions.
    • Understanding APIs and experience in extracting data from websites using APIs, working with lists, dictionaries, and JSON.

Modules

Class 1: Foundation Content (Optional)

  • Introduction to the Unix terminal.
  • Basic concepts of computer architecture.
  • Python and SQL exercises.

Class 2: Introduction to Data Engineering

  • Exploring Big Data and its current challenges.
  • Understanding the collaboration between Data Engineers, Data Analysts, and Data Scientists.
  • Getting familiar with fundamental concepts in the field of Data Engineering.
  • Reviewing a basic data architecture.

Class 3: Data Warehouse

  • Introduction to OLAP databases.
  • Exploring MPP, Clustering, and Map Reduce techniques.
  • Leveraging Amazon Redshift for data warehousing.
  • Working with Apache Parquet file format.

Class 4: ETLs

  • Utilizing Pandas Dataframes for data manipulation.
  • Applying transformation techniques to dataframes (deduplication, merge, apply, etc).

Class 5: Security and Database Backup

  • Understanding the fundamentals of database security.
  • Implementing security measures in Amazon Redshift.
  • Performing manual backups to S3.

Class 6: Docker

  • Introduction to containerization and virtual machines.
  • Building Dockerfiles and creating Docker images.
  • Practical exercises using Docker.

Class 7: Apache Airflow

  • Overview of Apache Airflow.
  • Understanding the architecture of Airflow processes.
  • Working with DAGs, Tasks, and Operators.
  • Exploring advanced concepts: sensors, subdags, XCOMs.

Class 8: Stream Processing

  • Introduction to stream processing.
  • Working with PubSub messaging.
  • Theoretical introduction to Apache Kafka.
  • Practical exercises using AWS Kinesis.

Feel free to explore the contents of each module in the respective folders. I will be continuously updating this repository with my learning progress throughout the course.

If you have any questions, don't hesitate to create an issue in this repository. Happy learning!🧑‍💻👩‍💻

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages