hadoop debug

1. hadoop 集群环境中对namenode,datanode,jobtracker,tasktracker的监控甚至是debug. 
  要修改hadoop-env.sh文件: 

export HADOOP_TASKTRACKER_OPTS=" -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false  -Dcom.sun.management.jmxremote.port=9051 -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=9052 " 

export HADOOP_TASKTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_TASKTRACKER_OPTS" 

这样可以通过 jconsole远程连接 tasktracker_server_ip:9051 查看jvm的运行状态,包括线程,堆栈,运行参数等. 
可以通过Eclipse Remote debug 连接tasktracker_server_ip:9052,打断点来跟踪TaskTracker的程序运行步骤. 

2. 当设置 mapred.job.reuse.jvm.num.tasks 设置为-1时,只有属于同一个job的多个map(分配给一台同样的服务器)才会在同一个jvm中运行,对于属于同一个job的map,reduce还未知. 

3. 对于map及reduce任务,都是由TaskController(默认为DefaultTaskController)生成一个taskjvm.sh 文件,然后由java的Process类来执行,通过process.waitFor()来返回exitCode,如果exitCode==0说明进程正常结束,如果返回非0,说明进程非正常退出,在taskjvm.sh可以看到所有的运行参数和变量(可以通过grep tasktracker.log日志找到),通过查看task日志来分析原因,每个task的日志在 ${hadoop.log.dir}/userlogs/${job_id}/${task_id}/ 目录下,有stderr 错误输出,stdout 标准输出,syslog 运行日志,log.index 前面三个文件的数据大小. 

4. 如果jvm设置为重用,对于一个job的debug可以通过设置属性来对job进行jmx和debug 
<property> 
<name>mapred.map.child.java.opts</name> 
<value>-Xmx512m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false  -Dcom.sun.management.jmxremote.port=21000 -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=22000 </value> 
</property> 

<property> 
<name>mapred.reduce.child.java.opts</name> 
<value>-Xmx512m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false  -Dcom.sun.management.jmxremote.ssl=false  -Dcom.sun.management.jmxremote.port=10011 -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=10012 </value> 
</property> 

2012.01.19 
  1. hadoop -D参数的设置位置: 
hadoop  jar hbase-json-test.jar -Dmapred.output.compress=true -Dmapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec /maxthon/mac_export/20120115/ duming/mac_hfile/20120121/ 

2012.05.11 
  1. hadoop distcp 
hadoop distcp -Dmapred.map.child.java.opts=" -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=22000 " -skipcrccheck -update -m 1 hftp://namenode-bj.hadoop.maxthon.cn:50070/user/yangfeng/tttt.txt /user/hadoop/duming/ 
  不版本间的数据迁移要注意: 
  a. source 用hftp协议. 
  b. 迁移到的hadoop集群版本比较高, 最好设置-skipcrccheck选项也-update选项, skipcrccheck忽略FileChecksum校验, 因为版本的升级可能带来Checksum值不一样, cdh4与cdh3就是这样. 

2012.06.01 
1. load数据到现有的partition 下, 报错误, 但最终还是能将数据转移过去, 最终原因是没有使用use views; 

2012.06.09 
org.apache.hadoop.hive.ql.parse.SemanticException: org.apache.hadoop.hive.ql.metadata.HiveException: Invalid partition for table chukwa 
at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:122)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) 
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) 
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) 
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) 
at org.apache.hadoop.hive.ql.optimizer.pcr.PartitionConditionRemover.transform(PartitionConditionRemover.java:78)
at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:87) 
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7306) 
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337) 
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889) 
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) 
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) 
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) 
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
at java.lang.reflect.Method.invoke(Method.java:597) 
at org.apache.hadoop.util.RunJar.main(RunJar.java:200) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Invalid partition for table chukwa 
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByNames(Hive.java:1696) 
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.pruneBySequentialScan(PartitionPruner.java:376) 
at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:233) 
at org.apache.hadoop.hive.ql.optimizer.pcr.PcrOpProcFactory$FilterPCR.process(PcrOpProcFactory.java:112)
... 21 more 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Invalid partition for table chukwa 
at org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:207) 
at org.apache.hadoop.hive.ql.metadata.Partition.<init>(Partition.java:106) 
at org.apache.hadoop.hive.ql.metadata.Hive.getPartitionsByNames(Hive.java:1691) 
... 24 more 
Caused by: MetaException(message:Invalid partition key & values) 
at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:407) 
at org.apache.hadoop.hive.metastore.Warehouse.makePartName(Warehouse.java:392) 
at org.apache.hadoop.hive.ql.metadata.Partition.initialize(Partition.java:190) 
... 26 more 
是因为数据库 PARTITION_KEY_VALS中对应某个PART_ID的分区数据少一个logType记录, 
hive存储的meta信息位于数据库中, 类为ObjectStore, 
2012.07.03 
  hadoop  jar  hbase-promote-0.0.1-SNAPSHOT.jar com.maxthon.hbase.promote.etl.EtlPromoteDriver -Dmapred.map.child.java.opts="-Xdebug -Xrunjdwp:transport=dt_socket,address=9090,server=y,suspend=y,address=22000 " /hive/warehouse/maxthon.db/chukwa/logtype=com_ios.g.maxthon.com/logday=20120630/loghour=00 /user/yangfeng/promote_ios/01 

2012.07.04 
mapred.min.split.size MxCombineFileInputFormatShim.getSplits 
mapred.min.split.size.per.node MxCombineFileInputFormatShim.getSplits 
mapred.max.split.size  256000000 
mapreduce.input.fileinputformat.split.minsize.per.node  1 
mapreduce.input.fileinputformat.split.maxsize  256000000  每个block块的大小 MxCombineFileInputFormatNew.getSplits, MxCombineFileInputFormatNew.OneFileInfo判断 

2012.07.10 
  1. 通过-D更改参数, main方法调用, 参考DistCp 
    JobConf job = new JobConf(DistCp.class); 
    DistCp distcp = new DistCp(job); 
    int res = ToolRunner.run(distcp, args); 

你可能感兴趣的:(server,监控,Export,Address,transport)