Skip to content

rritec/Google-Cloud-Data-Engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Cloud(GCP)-Data-Engineering Course Content


This Repository has Cloud Data Engineering Training Materials developed by Myla Ram Reddy.

Please contact Renuka for Training and Certification @ 8374899166(whatsapp)


01. Python
Python Basic Level
  1. Install Anaconda
  2. understand markdown language
  3. How to write Python code in normal notepad
  4. How to write Python code in spyder
  5. How to write Python code in Visual Studio Code
  6. How to write Python code in in jupyter/ JupyterLab
  7. Different Python Objects
  8. int
  9. float
  10. complex
  11. str
  12. bool
  13. range
  14. Data Structures
  15. list
  16. Dict
  17. Tuple
  18. Set
  19. Mutable Vs Immutable
  20. Read items of str /list/Dict/Tuple/Set/range ..etc
  21. index
  22. slice
  23. fancy
  24. Operators
  25. Comparision(>,<,>=,<=,...)
  26. Logical/bool(and/or/not)
  27. Numpy logical (logical_and/logical_or/logical_not)
  28. Control Flows
  29. input
  30. if elif elif ... else
  31. while loop
  32. break
  33. continue
  34. for loop
Advanced Python
  1. System_Defined_Functions
  2. create functions
  3. function parameter
  4. manadatory parameters
  5. optional parameters
  6. flexiable parameters
  7. key value flexiable parameters
  8. LEGB_scope_of_objects_of_functions
  9. Methods
  10. Modules
  11. User_defined_packages
  12. system_defined_packages
  13. Iterables & Iterators
  14. Lambda_Functions
  15. Syntax Errors and Exceptions
  16. List comprehensions
  17. OOPs_Introduction_Classes_Objects_Attributes_Methods
  18. OOPs_Inheritance_and_MRO
  19. OOPs_Encapsulation
  20. OOPs_Polymorphism
02. BigData

BigData Introduction

  • What is BigData
  • BigData properties
  • When to choose bigdata

BigData VM Installation

  • Oracle Virtual box installation
  • Cloudera VM installation
  • winscp Installation
  • Putty Installation

Linux commands

  • Working with folders
  • create folder
  • remove folder with files
  • remove folder without files
  • understanding VI editor
  • working with Files
  • create a file
  • copy file
  • move file
  • remove file
  • cat command
  • understanding permissions
  • grep command
  • find command
  • ... etc

HDFS

  • mkdir command
  • put command
  • get command
  • CopyFromLocal command
  • CopyToLocal command
  • rm Command
  • merge command
  • ... etc

Hive

  • Hive Metastore
  • Hive Managed Tables
  • Hive External Tables
  • Hive Operations
  • Hadoop File Formats and its Types
  • Different ways to connecting hive
  • Partitioning
  • Bucketing

Sqoop

  • Sqoop Introduction
  • sqoop list-tables
  • Sqoop Eval
  • Sqoop Import
  • Sqoop Export
  • Import All Tables
  • Import table from mysql to hive

Pyspark

  • Spark Introduction
  • Spark Architecture
  • Spark Environment Setup (optional)
  • Spark RDD with Python
  • Spark RDD with Scala
  • Spark DF
  • Spark SQL
  • Spark Structured Streaming
03. GCP-Data-Engineering

03. GCP-Data-Engineering

Fundamentals of GCP-Data-Engineering

  1. What is Data Engineering
  2. Data Engineer Roles & Responsibilities
  3. Types of Data
  4. Steaming Vs Batch Data

Cloud Storage

  1. Introduction of Cloud Storage
  2. Standard Storage
  3. Nearline Storage
  4. Coldline Storage
  5. Archive Storage
  6. Create Bucket
  7. Upload content to Bucket
  8. Understanding renaming of files
  9. Download, Share and Manage Objects

Cloud SQL

  1. What is Cloud SQL
  2. Create Database of your intrest MySQL, SQL Server, PostgreSQL
  3. Write different Queries.

BigQuery

  1. Introduction about BigQuery Studio
  2. Create Dataset
  3. Create Table
  4. Load data from CSV file to BigQuery
  5. Load data from JSON file to BigQuery
  6. Analyse data with Queries
  7. Creating and using tables
  8. Introduction to partitioned tables
  9. Introduction to BigQuery ML
  10. Predefined roles and permissions
  11. Introduction to loading data
  12. Loading CSV data from Cloud Storage
  13. Exporting table data
  14. Create machine learning models in BigQuery ML
  15. Querying external data sources

DataFlow

  1. Create a Dataflow pipeline using Python
  2. Create a streaming pipeline using a Dataflow template
  3. Build and run a Flex Template
  4. Deploy Dataflow pipelines
  5. Develop with notebooks
  6. Troubleshooting and debugging

DataProc

  1. Overview of Dataproc Workflow Templates
  2. Dataproc on GKE Quickstart
  3. Configure Dataproc Hub
  4. Create a Dataproc Custom Image
  5. Write a MapReduce job with the BigQuery connector
  6. Use the Cloud Storage connector with Apache Spar

Cloud Data Fusion

  1. Create a data pipeline by using Cloud Data Fusion
  2. Creating a Cloud Data Fusion instance
  3. Creating a private instance
  4. Using JDBC drivers with Cloud Data Fusion
  5. Access control
  6. Enabling and disabling Cloud Data Fusion
  7. Granting service account user permission
  8. Viewing pipeline logs in Cloud Logging
  9. Using VPC Service Controls with Cloud Data Fusion

Composer(Airflow)

  1. Run an Apache Airflow DAG in Cloud Composer 1
  2. Features
  3. Creating environments
  4. Writing DAGs (workflows)
  5. Triggering DAGs (workflows)
  6. Monitoring environments
  7. Setting Environment Variables

BigTable

  1. Create an instance and write data with the cbt CLI
  2. Schema design best practices
  3. Create and manage tables
  4. Create and manage backups
  5. Integrations with

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published