Hadoop Developer â?? Spark â?? 50 Hours

Â

Course Introduction

Â

Why Apache Hadoop?

Problem in Data Driven Businesses
How Hadoop Solves it and why Big Data Solutions

Hadoop Fundamental

What comprises of Hadoop, Subprojects and Ecosystem

Core Hadoop Components
Apache Subprojects
Hadoop Ecosystem

Â

HDFS

HDFS Feature
HDFS Architecture â?? Non HA
HDFS Architecture â?? HA
Writing and Reading Files in HDFS
NameNode Memory and Load Handling
Basic HDFS Security
HDFS CLI
HDFS UIs
Other storage Technologies
Hands-on in writing, reading files with HDFS,Â Â Permissions, Viewing Blocks and other basic HDFS Operations

Â

Â Â Â Â Introduction to Python

Introduction to Functional Programming
Features of Functional Paradigm
Variables, Control structures, Functions and Objects
Mutable and Immutable Data
First Class Functions
Strings, Tuples and Named Tuples
Lists, Dicts and sets
Lambda Functions
Hands-on on all the above

Â

Mapreduce and YARNÂ Â - Basics

Why Computational Framework
YARN Architecture
MapReduce Architecture and Hands-on
Spark Architecture
How YARN executes MR and Spark jobs
How to see YARN Applications in WEB UIs and Shell
YARN Application Logs
Hands-on on all the above

Â

Â Â Importing RDBMS Data to Hadoop

Introduction to Apache Sqoop
Sqoop2 Architecture
Using Sqoop to import RDBMS Table to HDFS
Change the Delimiter and File Format of imported Tables
Control which columns to be imported
Sqoop Performance improvement
Sqoop Hands-on

Â

Capturing Data with Flume

Introduction to Apache Flume
Flume Architecture
Flume Source, Sink, Channel
Flume Configurations
Ingesting weblogs using Flume
Hands-on â?? Flume Data Ingestion scenarios

Â

Data Model and DWH using Hive and Impala/Tez

Introduction to Hive
Introduction to Impala/Tez
How to query Hive and Impala/Tez
How Hive and Impala/Tez differs RDBMS
Usage of Hive Metasore by Hive and Impala
HiveQL and Impala SQL for query operations
Managed and External Tables
Introduction to Hue
Create Tables using Hue
Load Data using Hive, impala and sqoop import to Hive tables
Hive, Impala/Tez Hands

Â

Hadoop Data Formats

Introduction to Data Formats
Various Data Formats
Introduction to AVRO
Parquet
Evolution of Avro Schema â?? Compatabilities
Extracting Metadata and data from AVRO data file
Using AVRO with hive, sqoop
Using Parquet with hive, sqoop
Compressions
Hands on AVRO

Â

Data Partitioning Concepts

Overview of Partitions
Partitions in Hive and Impala
Dealing with Hive Partition Tables
Hands-on â?? Hive partition tables

Â

Spark â?? High Level Entry to Development

Directed Acyclic Graph
Types of Spark CLI â?? Spark-shell and pyspark
Functional Programming in spark
Introduction to Spark RDD
Hands-on â?? Running Spark applications using spark cli

Â

Spark RDDs

How RDDs are created from files or data in memory
Handling File Formats
Additional Operations on RDD
Hands-on Process Data Files using spark RDD

Â

Pair RDDs and Aggregations

Key Value Pair RDD
Other Pair RDD Concepts
Pair RDD to join Datasets
Hands-on â?? Using Pair RDD to join Dataset in spark cli

Â

Writing and Deploying Spark Applications

Â How to write a spark application â?? Scala and Pyspark
Run Spark Appliations in YARN
Access Spark Application Web UI and controlling the applications
Configuring application properties and Loggings
Hands-On â?? Writing a spark applications â?? pyspark and Scala
Hands-on Configuring a spark applications

Â

Parallel Processing in Spark

RDD partitions
Partitions of File-Based RDDs
HDFS and Data Locality
Executing parallel operations
Stages and Tasks
Hands-on â?? Viewing stages and jobs in spark applicationUI

Â

RDD Persistence

RDD Lineage
RDD Persistence
Distributed persistence
Hands-on â?? How to Persist an RDD

Â

Spark Data Processing patterns

Iterative Algorithms in Spark
Graph Processing and Analysis
Hands â?? on â?? Implementation of iterative Algorithm with Spark

Â

Spark Dataframes

Introduction to Spark Data Frames
DataFrame API
Load Data to DataFrames
Converting DataFrames to pair RDD
Hands-on â?? Working with DataFrames

Â

Spark SQLContext

Spark SQL Basics
Creating SparkSQL

Â

Querying SparkSQL
Hands-on â?? Working with SparkSQL

Datasets

Typed API
Untyped API
Hands-on â?? Working with Datasets

Â

Spark ML Libraries

Introduction to Machine Learning
Machine Learning with Spark
K-Means
Hands-on â?? Implementation of K-means and one another ML Use case

Spark Streaming

Spark Streaming overview
DStreams
Developing stream Applications
Multi Batch Operations
Time slicing and state operations
Sliding window Operations
Hands-Ons

Application Hands-on Related to Hadoop and Spark
Application Hands-on Related to Kafka and Spark Streaming
Conclusion

Gallery (1)

About the Trainer

5 Avg Rating

5 Reviews

6 Students

2 Courses

Navaneetha Babu Chellathurai

Official Cloudera's Trainer

9 Years of Experience

Digital Big Data Transformation Expert | Technical Speaker

Contact Me for Big Data Consulting and Training Assignments - Not Full Time.

Cloudera Certified Apache Hadoop Instructor with 7 years of Overall and 6 Years of Big Data Experience with demonstrating history of working in the financial services industry.
Expertise in Big Data Transformation involved in Creating Big Data Lake using Hadoop for Cold Data Strorage and Cassandra Data Lake for Customer360 Hot Storage.
Expertise in Real-Time Data Ingestion and Data Processing using Kafka-Streaming, Spark Streaming and Lightbend
Evangelist in Big Data Engineering involving all the stacks starting from Installing to High performance tuning.
Skilled in Hadoop, Hive, Spark, Cassandra, NiFI, MongoDB, Spark, Talend BigData, R, YAML, Python, DataStage, Statistical Tools, IoT, Strong Data Warehousing, Spark based Analytics model development and Data Analytics background.
- Cloudera Certified Apache Hadoop Instructor
- Cloudera Trained and Certified Apache Hadoop Administrator
- Cloudera Trained and Certified Apache Hadoop Developer
- Cloudera Certified Apache Hadoop Data Analyst
- Mongo University Certified MongoDB DBA

Conducted 50 plus Corporate Cloudera University Batches(Admin,Data Analyst) and Other Cloudera Based Developer Batches
Conducted 15 Hortonworks Data Platform based Hadoop Developer and Admin Batches
5 Batches of Hortonworks Data Flow based Apache Kafka and Nifi based Real-Time Data Ingestion Batches
Demonstrated Cassandra Administrator and Developer Batch
Expert in Talend Big Data and its Integration with Big Data Component
Working on few University Undergraduate and Post Diploma Big Data Analytics Course structuring and Book preparation. Eventually into Universities based Big Data Tutoring and Research for Student Community

Contact me for Hadoop - Cloudera and Hortonworks Training and Consulting Requirements - Lets Connect and Happy Hadooping

Students also enrolled in these courses

BIg Data Analytics

LIVE

View this Course

Course offered by Arjun

0 review

Big data/Hadoop Training in Chennai

LIVE

4 reviews

View this Course

Course offered by Perpetro Technologies

24 reviews

Apache Spark

LIVE

2 Hours

View this Course

Course offered by Chitra

3 reviews

Big Data Development with Scala and Apache Spark

LIVE

30 reviews

2 Hours

View this Course

Course offered by Aakash

24 reviews

Reviews (5)

5 out of 5 5 reviews

Navaneetha Babu Chellathurai https://p.urbanpro.com/tv-prod/member/photo/6730010-small.png Adyar

5.0055

Navaneetha Babu Chellathurai

D

Divakar P

Reviewed on 26 Jul, 2019

Big Data

"Very interactive and presentations were interesting, good slides and videos that kept us all engaged.A real eye-opener for us all. I feel better equipped to manage after completing the course. He is enthusiastic and really ware of what he explaining. "

Navaneetha Babu Chellathurai

V

Vignesh M

Reviewed on 26 Jul, 2019

Big Data

"Navaneetha Babu is a bigdata and Hadoop legend. He is a real-time expert trainer."

Navaneetha Babu Chellathurai

V

Vel

Reviewed on 26 Jul, 2019

Big Data

"Excellent teaching, Learning materials provided was simply amazing. Hands on task given was typical. Worth every hours. "

Navaneetha Babu Chellathurai

A

Amarnath S

Reviewed on 26 Jul, 2019

Big Data

"I was looking for career change in Big Data Technologies, one of my friend referred Navaneeth's in-class course on Big Data. The course was more practical and interactive, after successful completion of course I gained more knowledge on Big Data and it helped me to crack couple of interviews. "

View All

Have you attended any class with RhombusLabs?