读hadoop源代码时,经常需要调试源代码。hadoop几乎所有的进程的启动最终全部依靠$HADOOP_HOME/bin/hadoop 脚本实现,开始就简单的在需要的地方添加JDWP的debug选项,做的多了,感觉不够自动化,简单修改了一下hadoop这个脚本,让自动化远程调试飞!
修改的脚本如下:
在$HADOOP_HOME/bin/hadoop中后添加:
. "$bin"/hadoop-config.sh choose_debug_port() { debug_port_base=11000 while [ -z "$DEBUG_PORT" ] do if [ "$(netstat -tln |grep $debug_port_base |wc -l)" -eq 0 ] then DEBUG_PORT=$debug_port_base else debug_port_base=$(($debug_port_base+1)) fi done } debug_file="$bin/hadoop.debug" choose_debug_port is_debug_enabled() { if [ -f "$debug_file" ] then echo $(cat $debug_file | grep $1) fi }
在$HADOOP_HOME/bin/hadoop中246行后添加( # cygwin path translation前):
if [ $(is_debug_enabled $COMMAND) ] then echo "debug for $COMMAND is enabled, port:$DEBUG_PORT" export HADOOP_OPTS="$HADOOP_OPTS -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,address=$DEBUG_PORT,suspend=y,server=y " fi
这样,hadoop脚本会根据COMMAND和$HADOOP_HOME/bin/hadoop.debug 内容决定是否开启调试。
比如,想调试datanode,执行echo datanode > $HADOOP_HOME/bin/hadoop.debug
启动hadoop时,仅仅datanode会开启remote debug
console显示:
debug for datanode is enabled, port:11000 Listening for transport dt_socket at address: 11000
想开启namenode等调试,仅仅需要echo 你想调试的command > $HADOOP_HOME/bin/hadoop.debug
hadoop命令支持的所有COMMAND都可以轻松开启调试:
附hadoop所有命令:
hadoop@haitaoyao-laptop:~/hadoop/bin$ hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client mradmin run a Map-Reduce admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility jobtracker run the MapReduce job Tracker node pipes run a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs queue get information regarding JobQueues version print the version jar <jar> run a jar file distcp <srcurl> <desturl> copy file or directories recursively archive -archiveName NAME <src>* <dest> create a hadoop archive daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME
--EOF--