This Repository has Cloud Data Engineering Training Materials developed by Myla Ram Reddy.
Please contact Renuka for Training and Certification @ 8374899166(whatsapp)
01. Python
Python Basic Level
- Install Anaconda
- understand markdown language
- How to write Python code in normal notepad
- How to write Python code in spyder
- How to write Python code in Visual Studio Code
- How to write Python code in in jupyter/ JupyterLab
- Different Python Objects
- int
- float
- complex
- str
- bool
- range
- Data Structures
- list
- Dict
- Tuple
- Set
- Mutable Vs Immutable
- Read items of str /list/Dict/Tuple/Set/range ..etc
- index
- slice
- fancy
- Operators
- Comparision(>,<,>=,<=,...)
- Logical/bool(and/or/not)
- Numpy logical (logical_and/logical_or/logical_not)
- Control Flows
- input
- if elif elif ... else
- while loop
- break
- continue
- for loop
Advanced Python
- System_Defined_Functions
- create functions
- function parameter
- manadatory parameters
- optional parameters
- flexiable parameters
- key value flexiable parameters
- LEGB_scope_of_objects_of_functions
- Methods
- Modules
- User_defined_packages
- system_defined_packages
- Iterables & Iterators
- Lambda_Functions
- Syntax Errors and Exceptions
- List comprehensions
- OOPs_Introduction_Classes_Objects_Attributes_Methods
- OOPs_Inheritance_and_MRO
- OOPs_Encapsulation
- OOPs_Polymorphism
02. BigData
- What is BigData
- BigData properties
- When to choose bigdata
- Oracle Virtual box installation
- Cloudera VM installation
- winscp Installation
- Putty Installation
- Working with folders
- create folder
- remove folder with files
- remove folder without files
- understanding VI editor
- working with Files
- create a file
- copy file
- move file
- remove file
- cat command
- understanding permissions
- grep command
- find command
- ... etc
- mkdir command
- put command
- get command
- CopyFromLocal command
- CopyToLocal command
- rm Command
- merge command
- ... etc
- Hive Metastore
- Hive Managed Tables
- Hive External Tables
- Hive Operations
- Hadoop File Formats and its Types
- Different ways to connecting hive
- Partitioning
- Bucketing
- Sqoop Introduction
- sqoop list-tables
- Sqoop Eval
- Sqoop Import
- Sqoop Export
- Import All Tables
- Import table from mysql to hive
- Spark Introduction
- Spark Architecture
- Spark Environment Setup (optional)
- Spark RDD with Python
- Spark RDD with Scala
- Spark DF
- Spark SQL
- Spark Structured Streaming
03. GCP-Data-Engineering
- What is Data Engineering
- Data Engineer Roles & Responsibilities
- Types of Data
- Steaming Vs Batch Data
- Introduction of Cloud Storage
- Standard Storage
- Nearline Storage
- Coldline Storage
- Archive Storage
- Create Bucket
- Upload content to Bucket
- Understanding renaming of files
- Download, Share and Manage Objects
- What is Cloud SQL
- Create Database of your intrest MySQL, SQL Server, PostgreSQL
- Write different Queries.
- Introduction about BigQuery Studio
- Create Dataset
- Create Table
- Load data from CSV file to BigQuery
- Load data from JSON file to BigQuery
- Analyse data with Queries
- Creating and using tables
- Introduction to partitioned tables
- Introduction to BigQuery ML
- Predefined roles and permissions
- Introduction to loading data
- Loading CSV data from Cloud Storage
- Exporting table data
- Create machine learning models in BigQuery ML
- Querying external data sources
- Create a Dataflow pipeline using Python
- Create a streaming pipeline using a Dataflow template
- Build and run a Flex Template
- Deploy Dataflow pipelines
- Develop with notebooks
- Troubleshooting and debugging
- Overview of Dataproc Workflow Templates
- Dataproc on GKE Quickstart
- Configure Dataproc Hub
- Create a Dataproc Custom Image
- Write a MapReduce job with the BigQuery connector
- Use the Cloud Storage connector with Apache Spar
- Create a data pipeline by using Cloud Data Fusion
- Creating a Cloud Data Fusion instance
- Creating a private instance
- Using JDBC drivers with Cloud Data Fusion
- Access control
- Enabling and disabling Cloud Data Fusion
- Granting service account user permission
- Viewing pipeline logs in Cloud Logging
- Using VPC Service Controls with Cloud Data Fusion
- Run an Apache Airflow DAG in Cloud Composer 1
- Features
- Creating environments
- Writing DAGs (workflows)
- Triggering DAGs (workflows)
- Monitoring environments
- Setting Environment Variables
- Create an instance and write data with the cbt CLI
- Schema design best practices
- Create and manage tables
- Create and manage backups
- Integrations with