Import data from SQLServer with Sqoop

author: luciuz

date: 2013/11/15

-----------------------------------------------------------------------------------------------------------------

假定: 用户database:InsideTSQL2008, tablename:Employees

注意:如果需要导入的table “No primary key could be found”, 则需要“specify one with --split-by or perform a quential import with '-m 1'.”

准备:从Microsoft官网下载jdbc驱动, 将sqljdbc_3.0/enu/下的sqljdbc4.jar和sqljdbc.jar包导入到$SQOOP_HOME/lib下。

    下载地址:http://www.microsoft.com/zh-cn/download/confirmation.aspx?id=2505

 

!!  以下代码中加"\"是便于查看,实际运行时注意断行的位置,最好将"\"去掉,因为有可能将'\'符号包含在引号中,比如: “ jdbc:sqlserver://172.16.102.30  \ ", 这样便会出错。


1. import all-tables

${Installation}/bin/sqoop import-all-tables\

--connect 'jdbc:sqlserver://172.16.102.30; username=sa;\ password=1qaz!QAZ;  database=InsideTSQL2008' -m 1 \

--target-dir /user/grid/mydb/

2. import some tables

${Installation}/bin/sqoop import --verbose \

--driver com.microsoft.sqlserver.jdbc.SQLServerDriver\

 --connect 'jdbc:sqlserver://172.16.102.30; username=sa;\

password=1qaz!QAZ;  database=InsideTSQL2008' \

--table Employees --target-dir /user/grid/Employees_db -m 1

3. list-tables

    $SQOOP_HOME/bin/sqoop list-tables \

--connect 'jdbc:sqlserver://172.16.102.30; username=sa; \

password=1qaz!QAZ;  database=InsideTSQL2008'

4. create-hive-table

sqoop  import --warehouse-dir "/user/hive/MyWareHouse" \

--hive-overwrite --connect 'jdbc:sqlserver://172.16.102.30;\ 

username=sa;password=1qaz!QAZ;  database=InsideTSQL2008' \

--table Employees --hive-table tblFoo --hive-import -m 1

注意:导入hive以后因为Hive官方默认的配置问题,是看不到已导入的tables的,需要做如下修改:

$HIVE_HOME/conf/hive-site.xml中修改配置如下:

  • 修改一

将:

1 <property>

2   <name>javax.jdo.option.ConnectionURL</name>

3   <value>jdbc:derby:;databaseName=metastore_db;create=true</value>

4   <description>JDBC connect string for a JDBC metastore</description>

5 </property>

修改为:

1 <property>

2   <name>javax.jdo.option.ConnectionURL</name>

3   <value>jdbc:derby:;databaseName=/home/grid/hive/metastore_db;create=true</value>

4   <description>JDBC connect string for a JDBC metastore</description>

5 </property>

 

注意:databaseName的地址为metastore_db存放的地方,默认在$HIVE_HOME/下面。

  • 修改二

    官方配置文档有错误,做如下修改:

将:

1  <property>

2    <name>hive.server2.thrift.sasl.qop</name>

3    <value>auth</auth>

 

跟正为:

1   <property>

2   <name>hive.server2.thrift.sasl.qop</name>

3   <value>auth</value>

修改以上两项后重启hive 即可。

5. list-databases

sqoop list-databases --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'

6. list-columns-of-a-table

sqoop import -connect 'jdbc:sqlserver://172.16.102.30; \

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'\

--table Employees --columns "empid, lastname, firstname, title"\

-m 1

7. Storing data in SequenceFiles, and setting the generated class name to com.foo.Employee

sqoop import -connect 'jdbc:sqlserver://172.16.102.30; \

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'\

--table Employees --class-name com.foo.Employee \

--as-sequencefile -m 1

 

8.Specifying the delimiters to use in a text-mode import

sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008' \

--table Employees --fields-terminated-by ',' \

--lines-terminated-by '\n' --optionally-enclosed-by '\"' -m

9. import data like "select ... where..." and with user-defined-delimiters

sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008' \

--table Employees --where "mgrid = 5" --fields-terminated-by ','\

--lines-terminated-by '\n'  --optionally-enclosed-by '\"' -m 1

10. Changing the splitting column from the default(default is the primary key)

sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'\

--table Employees --where "mgrid = 5" --split-by empi\

--fields-terminated-by ',' --lines-terminated-by '\n' \

--optionally-enclosed-by '\"'

11. append data to an existed table

qoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008' \

--table Employees --where "mgrid <> 5" --fields-terminated-by ','\

--lines-terminated-by '\n'  --optionally-enclosed-by '\"' \

--split-by empid --append

12 uses validation to validate the import using the table row count and number of rows copied into HDFS

sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'\

--table Employees --validate -m 1

13. Query Imports

sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'\

--query "SELECT a.empid, a.lastname FROM Employees AS a WHERE\

a.empid > 2 AND \$CONDITIONS" -split-by a.empid \

--target-dir /user/grid/query_results

注意,--target-dir,-split-by(或m 1)不能少,$CONDITIONS必须有,且为保险起见加反斜杠;

程序执行时会将以上命令展开为:

SELECT a.empid, a.lastname FROM Employees AS a WHERE a.empid > 2 AND  (1 = 0)

14. delete-target-dir

sqoop import --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008'\

--table Employees -m 1 --delete-target-dir

Delete the import target directory if it exists.

注意,不需要指定要删除哪个directory, “delete-target-dir”删除的就是需要import数据的那个directory

15. codegen

sqoop codegen --connect 'jdbc:sqlserver://172.16.102.30;\

username=sa; password=1qaz!QAZ;  database=InsideTSQL2008' \

--table Employees

注意 codegen的argument, 不含delete-target-dir, 请参考codegen的common argument, codegen arguments, Output line formatting arguments, Input parsing arguments和Hive arguments

 

你可能感兴趣的:(sqlserver)