原文
Hadoop conf/hadoop-env.sh has following environment variables
HADOOP_NAMENODE_OPTS
HADOOP_SECONDARYNAMENODE_OPTS,
HADOOP_DATANODE_OPTS
HADOOP_BALANCER_OPTS
HADOOP_JOBTRACKER_OPTS
HADOOP_TASKTRACKER_OPTS
You can use them to start the remote debugger so that you can connection and debug any of the above servers. Unfortunately, Hadoop tasks are started through a separate JVM by the task tracker, and you cannot use this method to debug your map or reduce function as they run in separate JVMs.
To debug task tracker, do following steps.
1. Edit conf/hadoop-env.sh to have following
export HADOOP_TASKTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"
2. Start Hadoop (bin/start-dfs.sh and bin/start-mapred.sh)
3. It will block waiting for debug connection
4. Connect to the server using Eclipse "Remote Java Application" in the Debug configurations and add the break points
5. Run a map reduce Job
译文
在 hadoop conf/hadoop-env.sh 文件中含有以下几个环境变量
HADOOP_NAMENODE_OPTS
HADOOP_SECONDARYNAMENODE_OPTS,
HADOOP_DATANODE_OPTS
HADOOP_BALANCER_OPTS
HADOOP_JOBTRACKER_OPTS
HADOOP_TASKTRACKER_OPTS
你可以使用这些环境变量来启动远程调试器,这样你就可以连接并调试以上的这个server了。 不幸的是,Hadoop的task是由 TaskTracker 启动的一个独立的JVM, 所以你不能使用这个方法来调试 map task 或者 reduce task。(事实上,调试map task 或 reduce task 要比这个方法方便简单得多,参考本专栏另一篇置顶博文《Map/Reduce Task 远程调试详解》)
以 远程调试 TaskTracker 为例,做一下几步
1. 编辑 hadoop的 conf/hadoop-env.sh 文件,修改 HADOOP_TASKTRACKER_OPTS 属性
export HADOOP_TASKTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"
2. 重新启动 hadoop
3. hadoop 会等待远程调试器的连接
4 使用 Eclipse的远程调试器连接 hadoop 并设置断点
5. 运行一个 map/reduce job