Tuesday, April 29, 2014

Starting HDP Services

Start all the Hadoop services in the following order:
  • HDFS
  • MapReduce
  • ZooKeeper
  • HBase
  • Hive Metastore
  • HiveServer2
  • WebHCat
  • Oozie
  • Ganglia
  • Nagios
Instructions
  1. Start HDFS
    1. Execute these commands on the NameNode host machine:su -l hdfs -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start namenode" 
    2. Execute these commands on the Secondary NameNode host machine:su -l hdfs -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start secondarynamenode” 
    3. Execute these commands on all DataNodes:su -l hdfs -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start datanode"
  2. Start MapReduce
    1. Execute these commands on the JobTracker host machine:su -l mapred -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start jobtracker; sleep 25"
    2. Execute these commands on the JobTracker host machine:su -l mapred -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start historyserver" 
    3. Execute these commands on all TaskTrackers:su -l mapred -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf start tasktracker"
  3. Start ZooKeeper. On the ZooKeeper host machine, execute the following command:
    su - zookeeper -c "export  ZOOCFGDIR=/etc/zookeeper/conf ; export ZOOCFG=zoo.cfg ; source /etc/zookeeper/conf/zookeeper-env.sh ; /usr/lib/zookeeper/bin/zkServer.sh start"
  4. Start HBase
    1. Execute these commands on the HBase Master host machine:su -l hbase -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/hbase/conf start master"
    2. Execute these commands on all RegionServers:su -l hbase -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/hbase/conf start regionserver" 
  5. Start Hive Metastore. On the Hive Metastore host machine, execute the following command:su -l hive -c "nohup hive --service metastore > $HIVE_LOG_DIR/hive.out 2> $HIVE_LOG_DIR/hive.log &"  
    where $HIVE_LOG_DIR is the directory where Hive server logs are stored (example: /var/log/hive).
  6. Start HiveServer2. On the Hive Server2 host machine, execute the following command:sudo su hive -c "nohup /usr/lib/hive/bin/hiveserver2 -hiveconf hive.metastore.uris=\" \" > $HIVE_LOG_DIR /hiveServer2.out 2>$HIVE_LOG_DIR/hiveServer2.log &" 
    where $HIVE_LOG_DIR is the directory where Hive server logs are stored (example: /var/log/hive).
  7. Start WebHCat. On the WebHCat host machine, execute the following command:su -l hcat -c "/usr/lib/hcatalog/sbin/webhcat_server.sh start"
  8. Start Oozie. On the Oozie server host machine, execute the following command:sudo su -l oozie -c "cd $OOZIE_LOG_DIR/log; /usr/lib/oozie/bin/oozie-start.sh" 
    where $OOZIE_LOG_DIR is the directory where Oozie log files are stored (for example: /var/log/oozie).
  9. Start Ganglia.
    1. Execute this command on the Ganglia server host machine:/etc/init.d/hdp-gmetad start
    2. Execute this command on all the nodes in your Hadoop cluster:/etc/init.d/hdp-gmond start
  10. Start Nagios.
    service nagios start

Hadoop Ecosystem Default Port

 1. HDFS Ports

The following table lists the default ports used by the various HDFS services.
Table 2.1. HDFS Ports
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
NameNode WebUI
Master Nodes (NameNode and any back-up NameNodes)50070httpWeb UI to look at current status of HDFS, explore file systemYes (Typically admins, Dev/Support teams)dfs.http.address
50470httpsSecure http servicedfs.https.address
NameNode metadata service
Master Nodes (NameNode and any back-up NameNodes)8020/9000IPC
File system metadata operations
Yes (All clients who directly need to interact with the HDFS)Embedded in URI specified by fs.default.name
DataNode
All Slave Nodes
50075
http
DataNode WebUI to access the status, logs etc.
Yes (Typically admins, Dev/Support teams)dfs.datanode.http.address
50475
https
Secure http service
dfs.datanode.https.address
50010
Data transfer
dfs.datanode.address
50020
IPC
Metadata operations
Nodfs.datanode.ipc.address
Secondary NameNodeSecondary NameNode and any backup Secondanry NameNode
50090
http
Checkpoint for NameNode metadata
Nodfs.secondary.http.address

2. MapReduce Ports

The following table lists the default ports used by the various MapReduce services.
Table 2.2. MapReduce Ports
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
JobTracker  WebUI
Master Nodes (JobTracker Node and any back-up Job­Tracker node )50030httpWeb UI for JobTrackerYesmapred.job.tracker.http.address
JobTracker
Master Nodes (JobTracker Node)8021IPC
For job submissions
Yes (All clients who need to submit the MapReduce jobs  including Hive, Hive server, Pig)Embedded in URI specified bymapred.job.tracker
Task­Tracker Web UI and Shuffle
All Slave Nodes
50060
httpDataNode Web UI to access status, logs, etc.Yes (Typically admins, Dev/Support teams)mapred.task.tracker.http.address
History Server WebUI51111httpWeb UI for Job HistoryYesmapreduce.history.server.http.address

 3. Hive Ports

The following table lists the default ports used by the various Hive services.
[Note]Note
Neither of these services are used in a standard HDP installation.
Table 2.3. Hive Ports
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
Hive Server2
Hive Server machine (Usually a utility machine)10000thriftService for programatically (Thrift/JDBC) connecting to HiveYes (Clients who need to connect to Hive either programatically or through UI SQL tools that use JDBC)ENV Variable HIVE_PORT
Hive Metastore
9083thriftYes (Clients that run Hive, Pig and potentially M/R jobs that use HCatalog)hive.metastore.uris

4. HBase Ports

The following table lists the default ports used by the various HBase services.
Table 2.4. HBase Ports
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
HMaster
Master Nodes (HBase Master Node and any back-up HBase Master node)60000Yeshbase.master.port
HMaster Info Web UI
Master Nodes (HBase master Node and back up HBase Master node if any)60010httpThe port for the HBase­Master web UI. Set to -1 if you do not want the info server to run.Yeshbase.master.info.port
Region Server
All Slave Nodes60020Yes (Typically admins, dev/support teams)hbase.regionserver.port
Region Server
All Slave Nodes60030httpYes (Typically admins, dev/support teams)hbase.regionserver.info.port
All ZooKeeper Nodes2888Port used by ZooKeeper peers to talk to each other.Seehere for more information.Nohbase.zookeeper.peerport
All ZooKeeper Nodes3888Port used by ZooKeeper peers to talk to each other.Seehere for more information.hbase.zookeeper.leaderport
2181Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect.hbase.zookeeper.property.clientPort

5. WebHCat Port

The following table lists the default ports used by the WebHCat service.
Table 2.5. WebHCat Port
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
WebHCat Server
Any utility machine50111httpWeb API on top of HCatalog and other Hadoop servicesYestempleton.port

 6. Ganglia Ports

The following table lists the default ports used by the various Ganglia services.

Table 2.6. Ganglia Ports
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
Ganglia server8660/61/62/63For gmond collectors
All Slave Nodes8660For gmond agents
Ganglia server8651For ganglia gmetad

 7. MySQL Ports

The following table lists the default ports used by the various MySQL services.
Table 2.7. MySQL Ports
ServiceServersDefault Ports UsedProtocolDescriptionNeed End User Access?Configuration Parameters
MySQLMySQL database server3306