Hello guys, I am going to share difference between Hadoop and Hadoop 2. Following are the difference
based on architecture and other things in details.
1:- Architecture base
difference
1:- There is difference to Changes in the Configuration
Files
Hadoop
1
Task Configuration Files
Core HADOOP_INSTALL/conf/core-site.xml
HDFS HADOOP_INSTALL/conf/hdfs-site.xml
MapReduce HADOOP_INSTALL/conf/mapred-site.xml
YARN NA
Hadoop
2
Task Configuration Files
Core HADOOP_INSTALL/etc/hadoop/core-site.xml
HDFS HADOOP_INSTALL/etc/hadoop/hdfs-site.xml
MapReduce HADOOP_INSTALL/etc/hadoop/mapred-site.xml
YARN HADOOP_INSTALL/etc/hadoop/yarn-site.xml
2:- Difference in Web
Interface Port Number
Hadoop
1
Daemon Port
Number
HDFS Namemode (same) 50070
Mapreduce -1 Job Tracker 50030
YARN Resource Manager --NA--
YARN Mapreduce JobHistory Server --NA--
Hadoop
2
Daemon Port
Number
HDFS Namemode (same) 50070
Mapreduce -1 Job Tracker ---NA--
YARN Resource Manager 8088
YARN Mapreduce JobHistory Server 19888
4:- Difference
in Start /Stop Script
Hadoop 1
task Script
to start HDFS $ HADOOP_INSTALL/bin/start-dfs.sh
$ HADOOP_INSTALL/bin/hadoop-daemon.sh start namenode
$ HADOOP_INSTALL/bin/hadoop-daemon.sh start namenode
to start Map Reduce $ HADOOP_INSTALL/bin/start-mapred.sh
to start everything $ HADOOP_INSTALL/bin/start-all.sh
Hadoop
2
task Script
to start HDFS $ HADOOP_INSTALL/sbin/start-dfs.sh
$ HADOOP_INSTALL/sbin/hadoop-daemon.sh start namenode
$ HADOOP_INSTALL/sbin/hadoop-daemon.sh start namenode
to start Map Reduce $ HADOOP_INSTALL/sbin/start-yarn.sh
to start everything $HADOOP_INSTALL/sbin/start-all.sh
5:-Hadoop Command
Split difference
In v1 there are bin/Hadoop executables used for file system
operations, administration and map reduce operations. In v2 these are handled
by separate binaries.
Hadoop 1
task Script
File system operation $ HADOOP_INSTALL/bin/hadoop dfs -ls
NameNode operations $ HADOOP_INSTALL/bin/hadoop namenode -format
File system administration commands $ HADOOP_INSTALL/bin/hadoop dfsadmin -refreshNodes
MapReduce commands $ HADOOP_INSTALL/bin/hadoop job ....
NameNode operations $ HADOOP_INSTALL/bin/hadoop namenode -format
File system administration commands $ HADOOP_INSTALL/bin/hadoop dfsadmin -refreshNodes
MapReduce commands $ HADOOP_INSTALL/bin/hadoop job ....
Hadoop
2
task Script
File system operation $ HADOOP_INSTALL/bin/hdfs dfs -ls
NameNode operations $ HADOOP_INSTALL/bin/hdfs namenode –format
File system administration commands $ HADOOP_INSTALL/bin/hdfs dfsadmin -refreshNodes
MapReduce commands $ HADOOP_INSTALL/bin/mapred job ...
NameNode operations $ HADOOP_INSTALL/bin/hdfs namenode –format
File system administration commands $ HADOOP_INSTALL/bin/hdfs dfsadmin -refreshNodes
MapReduce commands $ HADOOP_INSTALL/bin/mapred job ...
6:- Other Important Difference
Hadoop1
HDFS
·
Namenode
(master)
Secondary Namenod
Secondary Namenod
·
DataNode (worker)
[many per cluster, one per node]
DataNode (worker)
[many per cluster, one per node]
Processing
MapReduce
v1
o Job Tracker (master)
[one per cluster]
[one per cluster]
o Task Tracker (worker)
[many per cluster, one per node]
[many per cluster, one per node]
Limited upto 4000 nodes per cluster
Only on e namenode/namespace per cluster
Map and reduce slots are static
Only Job run is Mapreduce
Mapreduce1
- Mapreduce1 runs on the top of job tracker and tasktrake• JobTracker schedules tasks, matches task with TaskTrackers• JobTracker manages MapReduce Jobs, monitors progress
- JobTracker recovers from errors, restarts failed and slow tasks
- MapReduce1 has
inflexible slot-based memory management model
• Each TaskTracker is configured at start-up to have N slots
• A task is executed in a single slot• Slots are configured with maximum memory on cluster start-up - The model is likely to cause over and under utilization issues
Hadoop2
HDFS
·
Namenode
(master)
[Muliple per cluster]
[Muliple per cluster]
·
Checkpoint
Node (formerly Secondary NameNode)
·
DataNode
(worker)
[many per cluster, one per node]
[many per cluster, one per node]
Processing
YARN
(MRv2)
- · Resource Manager [one per cluster]
- · Node Manager [many per cluster, one per node]
- · Application Master [many per cluster]
Potential up to 10,000+ nodes per cluster
Support multiple namespace for managing cluster
No slots
Any apps can integrate with Hadoop 2
Mapreduce1
YARN addresses
shortcomings of MapReduce1
•
jobTracker is split into 2 daemons
•
ResourceManager
- administers resources on the cluster
ApplicationMaster -
manages applications such as MapReduce
•
Fine-Grained
memory management model
ApplicationMaster
requests resources by asking for “containers” with a certain memory limit (ex
2G)
YARN administers
these containers and enforces memory usage
Each Application/Job
has control of how much memory to request
No comments:
Post a Comment
Note: only a member of this blog may post a comment.