软件版本:
Sqoop:1.99.4;JDK:1.7;Hadoop2.2;
环境:
1. Hadoop集群:
node12:NameNode、ResourceManager、JobHistoryServer、DataNode、NodeManager; 1.5G内存+10G硬盘+1核
node13:SecondaryNameNode、DataNode、NodeManager;1.5G内存+10G硬盘+1核
(1)找到sqoop-1.99.4-bin-hadoop200\server\webapps\sqoop.war,接着,解压sqoop.war文件,找到:
sqoop-1.99.4-bin-hadoop200\server\webapps\sqoop\WEB-INF\lib目录,拷贝下面的lib包到工程build path(非运行必须);
(2)找到目录sqoop-1.99.4-bin-hadoop200\shell\lib ,拷贝下面的lib包到工程的build path(运行必须)
Sqoop简单概念:
1. Sqoop connector:
Sqoop连接器,sqoop server启动后,使用客户端输入命令:show connector,可以看到sqoop目前的连接器,如下所示:
2. Sqoop link:
一个link需要一个connector的支持,所以link的建立需要指定一个connector(指定其id即可,从下面的代码可以看到);link是对connector的一层封装可以设置参数,比如针对mysql,可以设置mysql driver的名字、链接字符串、用户名、密码等等参数;针对hdfs,可以设置uri;具体可以设置的参数如下所示:
hdfs:
Link configuration: linkConfig.uri,HDFS URI : null From Job configuration: fromJobConfig.inputDirectory,Input directory : null ToJob configuration: toJobConfig.outputFormat,Output format : null toJobConfig.compression,Compression format : null toJobConfig.customCompression,Custom compression format : null toJobConfig.outputDirectory,Output directory : null
Link configuration: linkConfig.jdbcDriver,JDBC Driver Class : null linkConfig.connectionString,JDBC Connection String : null linkConfig.username,Username : null linkConfig.password,Password : null linkConfig.jdbcProperties,JDBC Connection Properties : null From database configuration: fromJobConfig.schemaName,Schema name : null fromJobConfig.tableName,Table name : null fromJobConfig.sql,Table SQL statement : null fromJobConfig.columns,Table column names : null fromJobConfig.partitionColumn,Partition column name : null fromJobConfig.allowNullValueInPartitionColumn,Null value allowed for the partition column : null fromJobConfig.boundaryQuery,Boundary query : null To database configuration: toJobConfig.schemaName,Schema name : null toJobConfig.tableName,Table name : null toJobConfig.sql,Table SQL statement : null toJobConfig.columns,Table column names : null toJobConfig.stageTableName,Stage table name : null toJobConfig.shouldClearStageTable,Should clear stage table : null
3. Sqoop job:
Sqoop任务的建立需要指定两个东西,其一:指定from和to,from和to只需要提供其对应的link的id即可;其二,则是指定from和to的具体参数(这里的参数是指变动比较大的参数,比如hdfs中的输入目录,jdbc中的表名或列字符串等);
Sqoop Java API:
0. 建立connector(这个不需要建立,在配置好sqoop后自然会有)
1. 建立link
1)在建立link前,先在sqoop client中查看现有的link:
show link ,可以看到是没有link的;
2)建立hdfs的link:
// long connectorId = 1; // MLink link = createHDFSLink(client,connectorId);3)建立jdbc的link:(以mysql为例)
long connectorId=2; // MLink link = createMysqlLink(client,connectorId);4)查看,show link:
2. 建立任务:
0)查看现有的任务:show job ,可以看到是没有的(默认没有)
1)建立job:
// 建立任务 // long fromLinkId=1; // long toLinkId=2; // String input="/user/fansy/name.data"; // String tableName="test_name_age"; // String columns="id,name,age"; // int mappers = 2; // MJob job = createJob(client,fromLinkId,toLinkId,input,tableName,columns,mappers); // System.out.println(job.getEnabled());2)查看任务:
3. 启动任务:
// 启动任务 // long jobId =1; // startJob(client,jobId);
所有代码如下所示:
package sqoop; import org.apache.sqoop.client.SqoopClient; import org.apache.sqoop.model.*; import org.apache.sqoop.submission.counter.Counter; import org.apache.sqoop.submission.counter.CounterGroup; import org.apache.sqoop.submission.counter.Counters; import org.apache.sqoop.validation.Status; import java.util.List; import java.util.ResourceBundle; /** * Created by fansy on 2016/3/7. */ public class Sqoop1_99_4_Test { public static void main(String[] args){ String url = "http://node13:12000/sqoop/"; SqoopClient client = new SqoopClient(url); // 建立link 连接 // long connectorId = 1; // MLink link = createHDFSLink(client,connectorId); long connectorId=2; // MLink link = createMysqlLink(client,connectorId); listInputAndOutputParameters(client,connectorId); // 建立任务 // long fromLinkId=1; // long toLinkId=2; // String input="/user/fansy/name.data"; // String tableName="test_name_age"; // String columns="name,age"; // int mappers = 2; // MJob job = createJob(client,fromLinkId,toLinkId,input,tableName,columns,mappers); // System.out.println(job.getEnabled()); // 启动任务 // long jobId =1; // startJob(client,jobId); } static void describe(List<MConfig> configs, ResourceBundle resource) { for (MConfig config : configs) { System.out.println(resource.getString(config.getLabelKey()) + ":"); List<MInput<?>> inputs = config.getInputs(); for (MInput input : inputs) { System.out.println(input.getName()+","+resource.getString(input.getLabelKey()) + " : " + input.getValue()); } System.out.println(); } } /** * 打印各个connector的参数 * @param client * @param connectorId */ static void listInputAndOutputParameters(SqoopClient client,long connectorId){ // String url = "http://node13:12000/sqoop/"; // SqoopClient client = new SqoopClient(url); // long connectorId = link.getConnectorId(); // link config for connector describe(client.getConnector(connectorId).getLinkConfig().getConfigs(), client.getConnectorConfigBundle(connectorId)); // from job config for connector describe(client.getConnector(connectorId).getFromConfig().getConfigs(), client.getConnectorConfigBundle(connectorId)); // to job config for the connector describe(client.getConnector(connectorId).getToConfig().getConfigs(), client.getConnectorConfigBundle(connectorId)); // create a placeholder for link } /** * 建立mysql link * @param client * @param connectorId * @return */ static MLink createMysqlLink(SqoopClient client,long connectorId){ MLink link = client.createLink(connectorId); link.setName("mysql"); link.setCreationUser("fansy"); MLinkConfig linkConfig = link.getConnectorLinkConfig(); // fill in the link config values linkConfig.getStringInput("linkConfig.connectionString").setValue("jdbc:mysql://192.168.0.1/test_fz"); linkConfig.getStringInput("linkConfig.jdbcDriver").setValue("com.mysql.jdbc.Driver"); linkConfig.getStringInput("linkConfig.username").setValue("fansy"); linkConfig.getStringInput("linkConfig.password").setValue("fansy"); // save the link object that was filled Status status = client.saveLink(link); if(status.canProceed()) { System.out.println("Created Link with Link Id : " + link.getPersistenceId()); } else { System.out.println("Something went wrong creating the link"); } return link; } /** * 建立HDFS link * @param client * @param connectorId * @return */ static MLink createHDFSLink(SqoopClient client,long connectorId){ MLink link = client.createLink(connectorId); link.setName("hdfs"); link.setCreationUser("fansy"); MLinkConfig linkConfig = link.getConnectorLinkConfig(); // fill in the link config values linkConfig.getStringInput("linkConfig.uri").setValue("hdfs://node12:8020"); // save the link object that was filled Status status = client.saveLink(link); if(status.canProceed()) { System.out.println("Created Link with Link Id : " + link.getPersistenceId()); } else { System.out.println("Something went wrong creating the link"); } return link; } /** * 建立hdfs to mysql 任务 * @param client * @param fromLinkId * @param toLinkId * @param input * @param tableName * @param columns * @param mappers * @return */ static MJob createJob(SqoopClient client,long fromLinkId,long toLinkId, String input,String tableName,String columns,int mappers){ MJob job = client.createJob(fromLinkId, toLinkId); job.setName("hdfs to mysql"); job.setCreationUser("fansy"); // set the "FROM" link job config values MFromConfig fromJobConfig = job.getFromJobConfig(); fromJobConfig.getStringInput("fromJobConfig.inputDirectory").setValue(input); // set the "TO" link job config values MToConfig toJobConfig = job.getToJobConfig(); toJobConfig.getStringInput("toJobConfig.tableName").setValue(tableName); toJobConfig.getStringInput("toJobConfig.columns").setValue(columns); // set the driver config values MDriverConfig driverConfig = job.getDriverConfig(); // driverConfig.getStringInput("throttlingConfig.numExtractors").setValue(String.valueOf(mappers)); driverConfig.getIntegerInput("throttlingConfig.numExtractors").setValue(mappers); Status status = client.saveJob(job); if(status.canProceed()) { System.out.println("Created Job with Job Id: "+ job.getPersistenceId()); } else { System.out.println("Something went wrong creating the job"); } return job; } /** * 启动任务 * @param client * @param jobId */ static void startJob(SqoopClient client,long jobId){ MSubmission submission = client.startJob(jobId); System.out.println("Job Submission Status : " + submission.getStatus()); if(submission.getStatus().isRunning() && submission.getProgress() != -1) { System.out.println("Progress : " + String.format("%.2f %%", submission.getProgress() * 100)); } System.out.println("Hadoop job id :" + submission.getExternalId()); System.out.println("Job link : " + submission.getExternalLink()); Counters counters = submission.getCounters(); if(counters != null) { System.out.println("Counters:"); for(CounterGroup group : counters) { System.out.print("\t"); System.out.println(group.getName()); for(Counter counter : group) { System.out.print("\t\t"); System.out.print(counter.getName()); System.out.print(": "); System.out.println(counter.getValue()); } } } if(submission.getExceptionInfo() != null) { System.out.println("Exception info : " +submission.getExceptionInfo()); } } /** * 查看任务状态 * @param client * @param jobId */ static void checkJobStatus(SqoopClient client,long jobId){ //Check job status for a running job MSubmission submission = client.getJobStatus(jobId); if(submission.getStatus().isRunning() && submission.getProgress() != -1) { System.out.println("Progress : " + String.format("%.2f %%", submission.getProgress() * 100)); } System.out.println("Job status:"+submission.getStatus().name()); } }
1. HDFS数据:
11kate,3 fansy,22 12kate,3 tom,32 1kate,3 2kate,32. Mysql:
CREATE TABLE `test_name_age` ( `name` varchar(255) DEFAULT NULL, `age` int(11) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
总结:
1. 数据缺失:
可以看到数据库中的数据name列数据前后都少了一个字符,但是在8088端口看到的日志中看到读取的数据确实是完整的:
2. 不能指定分隔符,Sqoop1.99.4版本只有一个默认的分隔符实现
CSVIntermediateDataFormat
如果要用其他的分隔符,那么就要自己去实现;在sqoop1.99.6版本(目前最新版)中支持了json和avro的实现,如下:
最终一句话,sqoop1.99的版本还是bug多多,实际中还是用回1.4的版本吧!