mapreduce运行过程中依赖第三方jar的解决办法

楼主是一个hadoop初学者,本着helloword的精神来写mapreduce,将一个表的数据写到另一个表。其中踩过的坑啊......为了不踩同样的坑,楼主将其中的问题及解决办法记录下来。
以下是楼主的代码:
package handler;

import mapper.DBInputMapper;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import bean.User;

@SuppressWarnings("all")
public class Main implements Tool{
	
	private Configuration conf;

	public static void main(String[] args) throws Exception {
		int run = ToolRunner.run(new Main(), args);
		System.exit(run);
	}

	@Override
	public Configuration getConf() {
		return this.conf;
	}

	@Override
	public void setConf(Configuration conf) {
		this.conf = conf;
	}

	@Override
	public int run(String[] args) throws Exception {
		Configuration conf = this.getConf();
		DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver","jdbc:mysql://***:3306/cbh", "***", "***");
		Job job = Job.getInstance(conf);
		job.setJarByClass(Main.class);
		job.setMapperClass(DBInputMapper.class);
		job.setMapOutputKeyClass(User.class);
		job.setMapOutputValueClass(NullWritable.class);
		job.setOutputKeyClass(User.class);
		job.setOutputValueClass(NullWritable.class);
		job.setInputFormatClass(DBInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);
		// 列名
		String[] fields = { "id", "phone" };
		// 六个参数分别为:
		// 1.Job;2.Class
		// 3.表名;4.where条件
		// 5.order by语句;6.列名
		DBInputFormat.setInput(job, User.class, "t_cbh_user", null, "id",
				fields);
		DBOutputFormat.setOutput(job, "t_user_hadoop", fields);
		// FileOutputFormat.setOutputPath(job, new
		// Path("/test/mysql2Mysql"));
		return job.waitForCompletion(true) ? 0 : 1;
	}

}

1、代码在本地正常,提交到集群提示找不到mysql的驱动类。但是此时任务还没执行,还处于提交资源阶段。楼主怀疑是hadoop库里面没有mysql的驱动包。于是楼主在hadoop-env.sh添加以下代码:
`
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/jars/*
`

/jars下是楼主用到的第三方jar包,有mysql的驱动包。再次运行,尼玛,依然是同样的错误,but坑换位置了。再次分析下,发现此次报错时,任务已经执行,楼主顿时茅塞顿开,肯定是任务在yarn框架上运行抛的错。于是楼主加了下面一行代码:

job.addArchiveToClassPath(newPath("hdfs://es:9000/jars/mysql/driver/mysql-connector-java-5.1.41.jar"))

再次运行,成功了!

本着条条道路通罗马的精神,楼主尝试了将
job.addArchiveToClassPath(newPath("hdfs://es:9000/jars/mysql/driver/mysql-connector-java-5.1.41.jar"))
替换为
String[] args = new GenericOptionsParser(conf, allArgs).getRemainingArgs();
采用 -libjars 的方式:

hadoop -jar xxxx.jar handler.Main -libjars /jars/mysql-connector-java-5.1.41.jar

执行,程序依然正常。

你可能感兴趣的:(java,hadoop)