Advertisement

Latest Post

Wednesday, 4 January 2017

What is Differences between Hadoop 1 and Hadoop 2

Hello guys, I am going to share difference between Hadoop  and Hadoop 2. Following are the difference based on architecture and other things in details.

1:- Architecture base difference


1:- There is difference to Changes in the Configuration Files
                                                                Hadoop 1

Task                                                                       Configuration Files
Core                                                                      HADOOP_INSTALL/conf/core-site.xml
HDFS                                                                     HADOOP_INSTALL/conf/hdfs-site.xml
MapReduce                                                       HADOOP_INSTALL/conf/mapred-site.xml
YARN                                                                                    NA

                                                                Hadoop 2
Task                                                                       Configuration Files
Core                                                                      HADOOP_INSTALL/etc/hadoop/core-site.xml
HDFS                                                                     HADOOP_INSTALL/etc/hadoop/hdfs-site.xml
MapReduce                                                       HADOOP_INSTALL/etc/hadoop/mapred-site.xml
YARN                                                                    HADOOP_INSTALL/etc/hadoop/yarn-site.xml

2:- Difference in Web Interface Port Number
                                                Hadoop 1
Daemon                                                                   Port Number
HDFS Namemode (same)                                             50070
Mapreduce -1 Job Tracker                                           50030
YARN Resource Manager                                             --NA--
YARN Mapreduce JobHistory Server                       --NA--

                                                                Hadoop 2
Daemon                                                                      Port Number
HDFS Namemode (same)                                             50070
Mapreduce -1 Job Tracker                                           ---NA--
YARN Resource Manager                                             8088
YARN Mapreduce JobHistory Server                       19888

4:- Difference in  Start /Stop Script
                                Hadoop 1
task                                                                        Script
to start HDFS                                      $ HADOOP_INSTALL/bin/start-dfs.sh
                                                                $ HADOOP_INSTALL/bin/hadoop-daemon.sh start namenode

to start Map Reduce                       $ HADOOP_INSTALL/bin/start-mapred.sh

to start everything                          $ HADOOP_INSTALL/bin/start-all.sh

                                                                Hadoop 2
task                                                                        Script
to start HDFS                                      $ HADOOP_INSTALL/sbin/start-dfs.sh
                                                                $ HADOOP_INSTALL/sbin/hadoop-daemon.sh start namenode

to start Map Reduce                       $ HADOOP_INSTALL/sbin/start-yarn.sh

to start everything                          $HADOOP_INSTALL/sbin/start-all.sh

5:-Hadoop Command Split difference
In v1 there are bin/Hadoop executables used for file system operations, administration and map reduce operations. In v2 these are handled by separate binaries.

                                                Hadoop 1
task                                                                        Script
File system operation                                    $ HADOOP_INSTALL/bin/hadoop dfs -ls
NameNode operations                                 $ HADOOP_INSTALL/bin/hadoop namenode -format
File system administration commands    $ HADOOP_INSTALL/bin/hadoop dfsadmin -refreshNodes
MapReduce commands                                $ HADOOP_INSTALL/bin/hadoop job ....

                               
                                                Hadoop 2
task                                                                        Script
File system operation                                    $ HADOOP_INSTALL/bin/hdfs dfs -ls
NameNode operations                                 $ HADOOP_INSTALL/bin/hdfs namenode –format
File system administration commands    $ HADOOP_INSTALL/bin/hdfs dfsadmin -refreshNodes
MapReduce commands                                $ HADOOP_INSTALL/bin/mapred  job ...


6:- Other Important Difference
                                                Hadoop1
HDFS
·         Namenode (master)
Secondary Namenod
·        
DataNode (worker)
[many per cluster, one per node]
Processing
                MapReduce v1 
o   Job Tracker (master)
                [one per cluster]
o   Task Tracker (worker)
                 [many per cluster, one per node]
Limited upto 4000 nodes per cluster
Only on e namenode/namespace per cluster
Map and reduce slots are static
Only Job run is Mapreduce
Mapreduce1
  • Mapreduce1  runs on the top of job tracker and tasktrake
           •       JobTracker schedules tasks, matches task with TaskTrackers
           •       JobTracker manages MapReduce Jobs, monitors progress
  • JobTracker recovers from errors, restarts failed and slow tasks
  • MapReduce1 has inflexible slot-based memory management model
      •       Each TaskTracker is configured at start-up to have N slots
      •       A task is executed in a single slot
            •       Slots are configured with maximum memory on cluster start-up
  • The model is likely to cause over and under utilization issues


Hadoop2
HDFS
·         Namenode (master)
[Muliple per cluster]
·         Checkpoint Node (formerly Secondary NameNode)
·         DataNode (worker)
[many per cluster, one per node]
Processing
                YARN (MRv2)
  • ·         Resource Manager [one per cluster]
  • ·         Node Manager [many per cluster, one per node]
  • ·         Application Master [many per cluster]

Potential up to 10,000+ nodes per cluster
Support multiple namespace  for managing cluster
No slots
Any apps can integrate with Hadoop 2
Mapreduce1
YARN addresses shortcomings of  MapReduce1
        jobTracker is split into 2 daemons
       ResourceManager - administers resources on the cluster
ApplicationMaster - manages applications such as MapReduce
       Fine-Grained memory management model
ApplicationMaster requests resources by asking for “containers” with a certain memory limit (ex 2G)
YARN administers these containers and enforces memory usage
Each Application/Job has control of how much memory to request



Government Jobs