Hadoop Developer â?? Spark â?? 50 Hours
Â
- Course Introduction
Â
- Why Apache Hadoop?
- Problem in Data Driven Businesses
- How Hadoop Solves it and why Big Data Solutions
- Hadoop Fundamental
- What comprises of Hadoop, Subprojects and Ecosystem
- Core Hadoop Components
- Apache Subprojects
- Hadoop Ecosystem
Â
- HDFS
- HDFS Feature
- HDFS Architecture â?? Non HA
- HDFS Architecture â?? HA
- Writing and Reading Files in HDFS
- NameNode Memory and Load Handling
- Basic HDFS Security
- HDFS CLI
- HDFS UIs
- Other storage Technologies
- Hands-on in writing, reading files with HDFS,  Permissions, Viewing Blocks and other basic HDFS Operations
Â
- Â Â Â Â Introduction to Python
- Introduction to Functional Programming
- Features of Functional Paradigm
- Variables, Control structures, Functions and Objects
- Mutable and Immutable Data
- First Class Functions
- Strings, Tuples and Named Tuples
- Lists, Dicts and sets
- Lambda Functions
- Hands-on on all the above
Â
Â
Â
Â
- Mapreduce and YARNÂ Â - Basics
- Why Computational Framework
- YARN Architecture
- MapReduce Architecture and Hands-on
- Spark Architecture
- How YARN executes MR and Spark jobs
- How to see YARN Applications in WEB UIs and Shell
- YARN Application Logs
- Hands-on on all the above
Â
- Â Â Importing RDBMS Data to Hadoop
- Introduction to Apache Sqoop
- Sqoop2 Architecture
- Using Sqoop to import RDBMS Table to HDFS
- Change the Delimiter and File Format of imported Tables
- Control which columns to be imported
- Sqoop Performance improvement
- Sqoop Hands-on
Â
- Capturing Data with Flume
- Introduction to Apache Flume
- Flume Architecture
- Flume Source, Sink, Channel
- Flume Configurations
- Ingesting weblogs using Flume
- Hands-on â?? Flume Data Ingestion scenarios
Â
- Data Model and DWH using Hive and Impala/Tez
- Introduction to Hive
- Introduction to Impala/Tez
- How to query Hive and Impala/Tez
- How Hive and Impala/Tez differs RDBMS
- Usage of Hive Metasore by Hive and Impala
- HiveQL and Impala SQL for query operations
- Managed and External Tables
- Introduction to Hue
- Create Tables using Hue
- Load Data using Hive, impala and sqoop import to Hive tables
- Hive, Impala/Tez Hands
Â
- Hadoop Data Formats
- Introduction to Data Formats
- Various Data Formats
- Introduction to AVRO
- Parquet
- Evolution of Avro Schema â?? Compatabilities
- Extracting Metadata and data from AVRO data file
- Using AVRO with hive, sqoop
- Using Parquet with hive, sqoop
- Compressions
- Hands on AVRO
Â
- Data Partitioning Concepts
- Overview of Partitions
- Partitions in Hive and Impala
- Dealing with Hive Partition Tables
- Hands-on â?? Hive partition tables
Â
- Spark â?? High Level Entry to Development
- Directed Acyclic Graph
- Types of Spark CLI â?? Spark-shell and pyspark
- Functional Programming in spark
- Introduction to Spark RDD
- Hands-on â?? Running Spark applications using spark cli
Â
- Spark RDDs
- How RDDs are created from files or data in memory
- Handling File Formats
- Additional Operations on RDD
- Hands-on Process Data Files using spark RDD
Â
- Pair RDDs and Aggregations
- Key Value Pair RDD
- Other Pair RDD Concepts
- Pair RDD to join Datasets
- Hands-on â?? Using Pair RDD to join Dataset in spark cli
Â
Â
Â
- Writing and Deploying Spark Applications
- Â How to write a spark application â?? Scala and Pyspark
- Run Spark Appliations in YARN
- Access Spark Application Web UI and controlling the applications
- Configuring application properties and Loggings
- Hands-On â?? Writing a spark applications â?? pyspark and Scala
- Hands-on Configuring a spark applications
Â
Â
- Parallel Processing in Spark
- RDD partitions
- Partitions of File-Based RDDs
- HDFS and Data Locality
- Executing parallel operations
- Stages and Tasks
- Hands-on â?? Viewing stages and jobs in spark applicationUI
Â
- RDD Persistence
- RDD Lineage
- RDD Persistence
- Distributed persistence
- Hands-on â?? How to Persist an RDD
Â
- Spark Data Processing patterns
- Iterative Algorithms in Spark
- Graph Processing and Analysis
- Hands â?? on â?? Implementation of iterative Algorithm with Spark
Â
Â
Â
- Spark Dataframes
- Introduction to Spark Data Frames
- DataFrame API
- Load Data to DataFrames
- Converting DataFrames to pair RDD
- Hands-on â?? Working with DataFrames
Â
- Spark SQLContext
- Spark SQL Basics
- Creating SparkSQL
Â
Â
Â
- Querying SparkSQL
- Hands-on â?? Working with SparkSQL
- Datasets
- Typed API
- Untyped API
- Hands-on â?? Working with Datasets
Â
- Spark ML Libraries
- Introduction to Machine Learning
- Machine Learning with Spark
- K-Means
- Hands-on â?? Implementation of K-means and one another ML Use case
- Spark Streaming
- Spark Streaming overview
- DStreams
- Developing stream Applications
- Multi Batch Operations
- Time slicing and state operations
- Sliding window Operations
- Hands-Ons
- Application Hands-on Related to Hadoop and Spark
- Application Hands-on Related to Kafka and Spark Streaming
- Conclusion