hadoop学习(五)

HDFS Details for Multimachine Clusters(2nd)

Checking the NameNodes
    ${JAVA_HOME}/bin/jps 结果第一行为java进程的pid
Checking the DataNodes
    bin/slaves.sh jps | grep Datanode | sort
    在查看过程中,如果有slave失败,则需要去那台机器上查看他们的日志文件。这样会不会造成管理员压力太大的问题?
    In fact, I had half of a new cluster fail to start, and it took some time to realize that the newly installed machines had a default firewall that         blocked the HDFS port.
    bin/hadoop dfsadmin -report  可以查看当前在线的datanode的部分信息
Tuning Factors
    most important factors are network bandwidth and disk throughput. Memory use and CPU overhead for thread handling may also be issues.
    The large input-split size reduces the ratio of task setup time to task run time.
    Set the maximum number of requests in progress. the more requests in progress, the more contention there is for storage operations and network bandwidth, with a corresponding increase in memory requirements and CPU overhead for handling all of the outstanding requests.
    Different factors per cluster.
File Descriptors (http://en.wikipedia.org/wiki/File_descriptor )
    Any user that runs processes that access HDFS should have a large limit on file descriptor access, and all applications that open files need careful         checking to make sure that the files are explicitly closed.

你可能感兴趣的:(hadoop学习(五))