logo

Cloud Data Engineering

The Cloud Data Engineering training program is carefully designed to make the students develop a solid understanding of the diverse Cloud Platforms’ processing components.

  • English
  • 25000
  • 50000
  • Course Includes
  • Live Classes
  • Continuous Assessment
  • Downloadable Course Materials
  • Real Time feedback


What you will learn

  • Deploy Managed Hadoop apps on the Google Cloud
  • Build deep learning models on the cloud using TensorFlow
  • Make informed decisions about Containers, VMs and AppEngine
  • Use big data technologies such as BigTable, Dataflow, Apache Beam and Pub/Sub

Requirements

  • Open for all

Description

The Cloud Data Engineering training program is carefully designed to make the students develop a solid understanding of the diverse Cloud Platforms’ processing components. The Program comprises of modules starting from the very basic aspects of Data Engineering improvising all the way up to the Data Engineering and Analytics modules and services on cloud platforms such as AWS and Azure. This program ensures that the student is well equipped with all the required skills to take up any project in the Cloud data engineering context.

Course Content

DATABASE CONCEPTS

Basics of Databases

DML Vs DDL Operations

SQL Vs PL SQL

RDBMS Vs NoSQL

Basic database objects

Data Normalization concepts ( 1st, 2nd, 3rd and BCNF)

Basics of Data Modeling

mb
SQL

Select Statements

Restricting and Sorting data

Single row functions

Aggregating Using Group functions

Manipulating Dat

Creating and Managing Tables

Joins

Including Constraints

Using SET Operators

Datetime Functions

Subqueries

mb
PLSQL

Declaring Variables

Writing executable statements

Writing control structures

Composite data types

Cursor

Creating Procedures

Creating Functions

Creating Triggers

mb
Introduction to NoSQL

What is NoSQL?

CAP Theorem

BASE Concept

What are the Types of NoSQL Databases?

Intro to MongoDB

RDBMS Vs MongoDB

Key Value Pairs

CRUD operations

mb
Data Warehousing

Data Warehousing basics

What is a Data Warehouse?

Data warehouse Vs OLTP System

Top Down approach

Bottom up approach

Enterprise Data Warehouse Vs Data Marts

Typical Data Warehouse Architecture

Logical Vs Physical Design

Star Schema

Snowflake Schema

Facts and Dimensions

Slowly changing dimensions

mb
ETL/ Data Integration

Data Sources and Extraction

Data Transformation

Data Loading and Refreshing

Data Load time and Throughput

Mapping and Process scheduling

Data Load Administration and Monitoring

Lookups and other important transformations

Time Series analysis & data loading process for Slowly Changing Dimension(SCD)

ETL Tool Walkthrough (Informatica or Talend)

mb
OLAP/ Data Visualization/ Business Intelligence

Decision support systems

Modeling the data

Business Intelligence Overview

Data Quality

How is Data Analysed?

What is OLAP?

What is Data Mining?

Vizualizing Data

Tabular Data, Charts and Dashboards

ROLAP and MOLAP

Report automation and scheduling

OLAP Tool walkthrough (Tableau or PowerBI)

mb
Big Data/Hadoop
What is Big Data, Emergence of the Big Data, Big Data Scenarios
Introduction to Hadoop Architecture, Name Node, Data Nodes
Introduction to HDFS, HIVE and HBASE
Creating internal/external tables, data types, limitations 
HDFS/HIVE/HBASE Commands and Hands-on
Sqoop - Introduction, Import & Export data from and to Sqoop
Introduction to Kafka
Introduction to PIG scripting 
mb
Big Data Testing

 

Introduction to PIG scripting 

Big Data Test Planning Approach
Test Data Creation approach
Test Execution approach
mb
TESTING in Data Engineering Context

Test Plan

Test cases & scenarios

Testing cycle

UTC

Integration Testing System testing    
UAT    
Tools    
HP ALM - High level -Theory    
Jira - High level demo
mb
Introduction to Agile

Agile overview

Agile types

Agile methodologies

Agile methodology in testing

mb
Introduction to Unix

Unix Basics

UNIX commands for various operations

UNIX file I/O operations and file permissions

mb
Introduction to Cloud and Azure Fundamentals
-Introduction to Cloud Computing and Cloud Platforms
-Cloud Concepts - Principles of Cloud Computing
-Create an Azure account
-Core Cloud Services - Introduction to Azure
-Core Cloud Services - Azure architecture and service guarantees
-Core Cloud Services - Manage Services with the Azure portal
-Security, responsibility, and trust in Azure
-Apply and monitor infrastructure standards with Azure Policy
-Control and organize Azure resources with Azure Resource Manager
-Predict costs and optimize spending for Azure
mb
Azure Storage
-What is Azure Storage
-Storage Types
-How does Azure storage works
-Blob data storage
-Azure Storage security
-Managing and Monitoring storage
mb
Azure Data Catalog
-What is Azure Data catalog
-Architecture of Azure Data Catalog
-Where do I use Azure Data Catalog?
-Introduction to Azure portal
-Creating a data catalog
-Registering Data sources
-Supported Data sources in Azure Data Catalog
-Publishing the data sources
mb
Azure Data factory
-What is Azure Data Factory
-Why Azure Data Factory (ADF)
-Key Concepts of ADF
-Linked Services, Activity, Pipe Lines
-Lab - Building First Pipeline
mb
Azure Data Lake
-What is Azure Data catalog
-Introduction to Azure data lake
-Basics of U-SQL
mb
"Azure Synapse Analytics (formerly SQL DW) & Polybase"
-Overview
-Data Access and Querying
-Data Loading and Export
-Processing large volume data load for Big Data analysis
mb
Tabular Model
-Learn and implement Tabular models which are Analysis Services databases that run in-memory or in DirectQuery mode
-Accessing data directly from backend relational data sources
mb
Power BI

 

-Power BI Overview
-Power Bi Desktop
-Power BI Queries - Connect to Data, Common query Tasks
-Power BI queries - Parameters and custom task
-PowerBI – Modelling
-DAX 
-PowerBI Visuals
-PowerBI Service
mb
Event Hub & Stream Analytics
-Event Hub : What is Azure Event Hub
-Event Hubs programming Guide
-Application of Event Hubs
-Managing Event Hubs.
-Stream Analytics: what is stream data?
-Stream Analytics Pattern
-Introducing stream Analytics
mb
Logic Apps

-Streaming data from Social media

mb
Azure DataBricks

 

-Overview of Azure Databricks
-Introducing stream Analytics
-Creating ETL pipeline using Azure Databricks
-Publish Azure Databricks pipeline
mb
Azure Cosmos DB (DocumentDB)

-Provides an insight to DocumentDB which is a NoSQL offering from Microsoft on the cloud.

mb
HD Insight
-Big Data Analytics with HD Insight
-Create Hadoop, HBase, Storm or Spark Clusters on Linux in HDInsight using the portal
mb
Introduction to AWS

Introduction to Cloud computing & AWS

mb
Ec2
Launch BI Server with termination protection enable
Monitor Ec2 Instance
Modify Security Group of BI Server to allow access
Resize your BI Server
Test Termination Protection
Termination Ec2 Instance
mb
AWS S3
Create S3 bucket
Add an object to S3 bucket
Create bucket policy for BI server
Configure S3 bucket versioning
Load data into S3, processing pipelines
mb
AWS IAM Lab
Creating users and groups
Configuring IAM policy
Configuring IAM roles for BI server
mb
AWS Lambda
Create Lambda function
Configure s3 bucket as a Lambda event source
Trigger Lambda function by uploading data to S3
mb
AWS Redshift
Intro to Column-oriented database
Massively Parallel Processing (MVP) Concepts
Data types used in Amazon RedShift
Getting started with AWS Redshift
Creating an IAM role
Launching a Sample Amazon Redshift Cluster
Authorizing access to the cluster
Connecting to the cluster and running queries
Loading sample data from Amazon S3
Cleaning up
mb
Basics of EMR

Amazon Elastic MapReduce (EMR) is an Amazon Web Services (AWS) tool for big data processing and analysis.

mb
Basics of Glue

Fully managed extract, transform, and load (ETL) service

mb
Intro to Amazon Kinesis Data Streams

Massively scalable, highly durable data ingestion and processing service optimized for streaming data.

mb
Basics of DynamoDB

Fully managed proprietary NoSQL database service that supports key-value and document data structures

mb
Basics of Athena

Serverless Interactive Query Service

mb
Route 53 (DNS)
Route53 - Register A Domain Name Lab
Route53 Routing Policies Available On AWS
Simple Routing Policy Lab
Weighted Routing Policy Lab
Latency Routing Policy
Failover Routing Policy
Geolocation Routing Policy
Geoproximity Routing Policy (Traffic Flow Only)
mb
VPCs
Introduction To VPCs
Benefits of VPCs
Benefits of VPCs
Build A Custom VPC 
Network Address Translation (NAT)
Access Control Lists (ACL)
Custom VPCs and ELBs
VPC Flow Logs
Bastions
mb
High Availability
Load Balancers Theory
Direct Connect
Load Balancers And Health Checks Lab
Advanced Load Balancer Theory [SAA-CO2]
Autoscaling Theory [SAA-C02]
Autoscaling Groups Lab
HA Architecture
HA Word Press Site
Setting Up EC2
Adding Resilience And Autoscaling
mb
Project involving creation of database objects, Dimensional Modeling, ETL transformations, Mappings, and OLAP reports using cloud services
mb

About the Instructor

instructor
About the Instructor