true

Learn Big Data from the Best Tutors

Affordable fees
1-1 or Group class
Flexible Timings
Verified Tutors

Search in

BigDATA HADOOP Infrastructure & Services: Basic Concept

Sayan Goswami

14/04/2017 1 0

Hadoop Cluster & Processes

What is Hadoop Cluster?

Hadoop cluster is the collections of one or more than one Linux Boxes. In a Hadoop cluster there should be a single Master(Linux machine/box) machine and all the rest of machines are called Slaves(Linux machines/boxes).

Hadoop Modes to start up :

Single Mode
Pseudo distributed Mode
Fully distributed Mode (it should minimum 3 Linux box)

Single Mode actually acts as a single Master box and it generally used for testing purpose(mainly black box testing).

Fully distributed Mode requires minimum 3 machines/boxes. It is actually the Production mode of Hadoop cluster.

But for R&D(Research & Development) purpose Apache Software Org. has introduced another useful mode i.e Pseudo distributed Mode. In this mode we can achieve all the core functionalities of Fully distributed cluster. That means in this mode one single machine acts as Master & as well as Slave recursively. We will configure our own Hadoop cluster in this mode.

Hadoop Processes & Layers :

Hadoop has 5 different processes for different functionalities.

NameNode.
Secondary NameNode.
DataNode.
JobTracker.
TaskTracker.

Also Hadoop handle all the storage & analytics part in between 2 layers. These are-

HDFS Layer

NameNode

Secondary NameNode

DataNode

Application / MapReduce Layer

JobTracker

TaskTracker

Note: NameNode is called the single point of failure in Hadoop cluster.Why??? Because NameNode is the highest priority than JobTracker. Simply without data what’s the value of Application.

Hadoop Process orientation in a Cluster:

Brief functionalities of the Hadoop processes-

NameNode(NN)- NN holds the total HDFS(Hadoop Distributed File System) . That means all the HDFS metadata (that send as report by each DataNode) is stored in NameNode(NN).
Secondary NameNode(SNN)- This is process is actually kind of a housekeeper of NameNode(NN). All the activities which generated by NameNode, SNN captured those as snapshot.
DataNode(DN)- DN is responsible for data writing in HDFS. Also DN periodically sends block report to NameNode(NN)
JobTracker(JT)- Each MapReduce job distributed by this process i.e JobTracker. JT distributes the job by parallel to all of the existing TaskTracker’s(TT) with the help of NN.
TaskTracker(TT)- This process is actually executing the Job’s which are distributed by JT.

We can’t edit/modify the data in HDFS. The rule is “Write Once Read Many Times”. Yes we can append the new data but can’t edit.

1 Like 0 Dislike

Follow 0

Other Lessons for You

Microsoft Excel

Software developed and manufactured by Microsoft Corporation that allows users to organize, format, and calculate data with formulas using a spreadsheet system broken up by rows and columns. Microsoft...

ITech Analytic Solutions

0 0

Up, Up And Up of Hadoop's Future

The onset of Digital Architectures in enterprise businesses implies the ability to drive continuous online interactions with global consumers/customers/clients or patients. The goal is not just to provide...

Gopal Raj

0 0

Big data Training Catalogue

Course: 1 Understanding Fundamentals of Big Data Duration: 1 Day Level: Basic Fundamentals of Big Data Understanding Big Data Big Data Drivers Big Data Use cases Understanding Big Data Dimensions Characteristics...

Xcelframeworks

0 0

Understanding Big Data

Introduction to Big Data This blog is about Big Data, its meaning, and applications prevalent currently in the industry.It’s an accepted fact that Big Data has taken the world by storm and has become...

MyMirror

0 0

A Helpful Q&A Session on Big Data Hadoop Revealing If Not Now then Never!

Here is a Q & A session with our Director Amit Kataria, who gave some valuable suggestion regarding big data. What is big data? Big Data is the latest buzz as far as management is concerned....

Amit Kataria

1 0

Find Big Data Training near you

Online Big Data Training

Looking for Big Data Training?

Learn from Best Tutors on UrbanPro.

Are you a Tutor or Training Institute?

Join UrbanPro Today to find students near you

Big Data Questions

How much time will I take to learn Big Data and after learning how much time will it take to attain a job?

12 Answers

I have done my PGDITM(POST GRADUATION DIPLOMA IN Information Technology MANAGEMENT) with FINANCIAL SYSTEMS...

6 Answers

What are the top three institutes in Kolkata that provide Big Data Training? What are the areas I should...

8 Answers

Hello, I have completed B.com , MBA fin & M and 5 yr working experience in SAP PLM 1 - Engineering documentation...

9 Answers

Which is better to learn, Apache Spark or Apache Flink?

8 Answers

Looking for Big Data Classes?

The best tutors for Big Data Classes are on UrbanPro

Select the best Tutor
Book & Attend a Free Demo
Pay and start Learning

Learn Big Data with the Best Tutors

The best Tutors for Big Data Classes are on UrbanPro

I am a Student I am a Tutor
Name*	Please enter your full name. Please enter institute name.
Email*	Please enter your email address.
Phone*	Please enter a valid phone number.
Location*	Please enter a pincode or area name.
City*	Please enter city name.
Category*	Please enter category.
Gender*	Male Female Please select your gender.
Email ID/ Mobile No.*	Please enter either mobile no. or email.
Enter Password*	Please enter OTP Please enter Password Sorry, this phone number is not verified, Please login with your email Id.