hadoop 程序遇到的问题

1
java.lang.Exception: java.lang.RuntimeException: 
java.lang.NoSuchMethodException: Hadoo$MRMapper.()
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)

原因:hadoop的Mapper和Reduce作为内部类必须是静态的

解决:添加static即可


2.

 job_local737230221_0001
java.lang.Exception: java.lang.NullPointerException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.NullPointerException
    at mapredutest.Mapreduce$MRmapper.map(Mapreduce.java:40)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at mapredutest.Mapreduce$MRmapper.run(Mapreduce.java:28)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.

问题: 多半应该是数据类新不匹配的问题,mapper里的key value和map里的要匹配,reduce同理,同样job里的定义也要匹配。


 3.想获取目标文件的 value为 double型,结果出错

java.io.IOException: wrong value class: class org.apache.hadoop.io.DoubleWritable is not class org.apache.hadoop.io.IntWritable

原因: 因为程序里map出的value是IntWritable型的,但是Reduce里出的是DoubleWritable型,在job里,定义了job.setCombinerClass(...Reduce.class),显然类型不匹配,所以报错解决:设置匹配的类型,或获取采取默认combiner,不指定combiner:


4

Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/output, expected: file:///

原因:Configuration 的实例conf未能找到 fs.default.name的值,所以conf里不知 hdfs://localhost:9000.
解决:
 // pickup config files off classpath
 Configuration conf = new Configuration()
 // explicitely add other config files
 // PASS A PATH NOT A STRING!
 conf.addResource(new Path("/opt/hadoop/hadoop-2.2.0/etc/core-site.xml"));
 FileSystem fs = FileSystem.get(conf);
 // load files and stuff below!

注意是conf.addResource(new Path(String str);不是conf.addResource(String str);


5. GibbsSampling 运行的问题:里面设置了iteration次数,开始记录sampl的次数begintorecord,及recordstep间隔数,通过提取job里static变量 的值来用,在eclipse里调试可行,但是换命令行执行 得到的记录为空,但换成对应的数字运行,就会得到期望的结果,显然是变量值提取过程出错了!但是为什么在eclipse里调试无错,且变量值提取正确,而换命令行运行jar不行?

原因 暂未知

解决, 将变量已常量的形式,赋值在job configuration里,已configuration的形式赋值并调用,结果正确!


6.  没弄懂eclipse调试和命令行调试机理差别在哪?

 在命令行运行时,路径参数直接写hadoop的相对路径,即可识别,elipse里调试,run on hadoop 却必须写全路径,即包含如 http://localhost:9000,否则无法识别。

暂未找出原因。

7.  问题类似于6, 在startjob里,需要运行两个mapreduce程序,后者对前者的运行结果做概率计算,所以基本写成如下

int res =
        ToolRunner.run(new Configuration(), new GibbsSamplingJob(), gibbsArgs);
    // System.exit(res);
    int resCP =
        ToolRunner
.run(new Configuration(), new ComputeProbability(),
            computeProbabilityArges);
    System.exit(resCP);

命令行运行:无错误提示信息,先执行了res,会看到正常的map执行过程,执行完毕,接着执行了resCP,最后得到两个正确的结果,在浏览器里会看到两个allication,

eclipse调试: 会得到期望的结果,但console只显示了后者map过程,前者未显示,且中间提示错误如下:

Connecting to datanode 127.0.0.1:50010
Error making BlockReader. Closing stale NioInetPeer(Socket[addr=/127.0.0.1,port=50010,localport=39052])
java.io.EOFException: Premature EOF: no length prefix available
	at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
	at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392)
	at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:131)
	at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1088)
	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:533)
	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:749)
	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:793)
	at java.io.DataInputStream.read(Unknown Source)
	at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:211)
	at org.apache.hadoop.util.LineReader.readLine(LineReader.java:174)
	at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:164)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

Starting flush of map output

暂未解决




  参考:http://www.tuicool.com/articles/YNFzem     
  http://answers.mapr.com/questions/6873/exception-wrong-value-class-class-orgapachehadoopiotext-is-not-class-orgapachehadoopiolongwritable
http://www.opensourceconnections.com/blog/2013/03/24/hdfs-debugging-wrong-fs-expected-file-exception/

 


 
  
 
 

你可能感兴趣的:(hadoop)