大数据基础(二)hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出

hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装

2016.05.15

本文测试环境:

hadoop2.6.2 ubuntu 14.04.04 amd64 jdk1.8

安装版本:

maven 3.3.9 hbase 1.15 hive 1.2.1 sqoop2(1.99.6)和sqoop1(1.4.6)

另外,本文参考了一些文章,基本上都有原文链接。


前提:hadoop安装:
参考:http://blog.csdn.net/xanxus46/article/details/45133977

本文的安装教程可以辅助基本的hadoop日志分析,详细教程,参考:

http://www.cnblogs.com/edisonchou/p/4449082.html


一、maven
1、安装jdk
2、下载:
http://maven.apache.org/download.cgi
wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
3、解压:
tar -xzf apache-maven-3.3.9-bin.tar.gz
4、配置环境变量
vi ~/.bashrc
export MAVEN_HOME=/home/Hadoop/apache-maven-3.3.9
export PATH=$MAVEN_HOME/bin:$PATH
生效:
source ~/.bashrc
5、验证
$mvn --version
结果:
root@spark:/usr/local/maven/apache-maven-3.3.9# mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /usr/local/maven/apache-maven-3.3.9
Java version: 1.8.0_65, vendor: Oracle Corporation
Java home: /usr/lib/java/jdk1.8.0_65/jre
Default locale: en_HK, platform encoding: UTF-8
OS name: "linux", version: "3.19.0-58-generic", arch: "amd64", family: "unix"
root@spark:/usr/local/maven/apache-maven-3.3.9# 
http://www.linuxidc.com/Linux/2015-03/114619.htm


二、hbase
1、下载:
http://mirrors.hust.edu.cn/apache/hbase/stable/
http://mirrors.hust.edu.cn/apache/hbase/stable/hbase-1.1.5-bin.tar.gz
2、解压:
HBase的安装也有三种模式:单机模式、伪分布模式和完全分布式模式,在这里只介绍完全分布模式。前提是Hadoop集群和Zookeeper已经安装完毕,并能正确运行。 
第一步:下载安装包,解压到合适位置,并将权限分配给hadoop用户(运行hadoop的账户,比如root)
这里下载的是hbase-1.1.5,Hadoop集群使用的是2.6,将其解压到/usr/local下
tar -zxvf hbase-1.1.5-bin.tar.gz
mkdir /usr/local/hbase
mv hbase-1.1.5 /usr/local/hbase
cd /usr/local
chmod -R 775 hbase
chmod -R root: hbase
3、环境变量
$vi ~/.bashrc
export HBASE_HOME=/usr/local/hbase/hbase-1.1.5
PATH=$HBASE_HOME/bin:$PATH
source ~/.bashrc
4、配置文件
4.1 jdk[有默认的jdk,可以不改]
sudo vim /opt/hbase/conf/hbase-env.sh 
修改$JAVA_HOME为jdk安装目录,这里是/opt/jdk1.8.0 
4.2 hbase-site.xml
/usr/local/hbase/hbase-1.1.5/conf/hbase-site.xml

       
                hbase.rootdir
                hdfs://spark:9000/hbase
       

       
                hbase.cluster.distributed
                true
       


5、验证
先启动hadoop
sbin/start-dfs.sh
sbin/start-yarn.sh
$hbase shell
结果:
root@spark:/usr/local/hbase/hbase-1.1.5/bin# hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hbase/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 1.1.5, r239b80456118175b340b2e562a5568b5c744252e, Sun May  8 20:29:26 PDT 2016
hbase(main):001:0> 
http://blog.csdn.net/xanxus46/article/details/45133977

集群安装:http://blog.sina.com.cn/s/blog_6145ed810102vtws.html



三、hive
1、下载:http://apache.fayea.com/hive/stable/
http://apache.fayea.com/hive/stable/apache-hive-1.2.1-bin.tar.gz
2、解压:
tar xvzf apache-hive-1.2.1-bin.tar.gz 
3、环境变量
root@spark:/home/alex/xdowns# vi ~/.bashrc
export HIVE_HOME=/usr/local/hive/apache-hive-1.2.1-bin
export PATH=$PATH:$HIVE_HOME/bin
root@spark:/home/alex/xdowns# source ~/.bashrc
4、修改配置文件
首先将hive-env.sh.template和hive-default.xml.template进行复制并改名为hive-env.sh和hive-site.xml。

/home/hadoop/apache-hive-1.0.0-bin/conf/hive-env.sh修改,如下所示:

export HADOOP_HEAPSIZE=1024

# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop/hadoop-2.5.2

# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/home/hadoop/apache-hive-1.0.0-bin/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/hadoop/apache-hive-1.0.0-bin/lib
/home/hadoop/apache-hive-1.0.0-bin/conf/hive-site.xml修改,如下所示:

 
  hive.metastore.warehouse.dir 
  hdfs://Master:9000/hbase
 
 
  hive.querylog.location 
  /usr/hadoop/hive/log
   
    存放hive相关日志的目录 
 
 

5、连接MySQL【可选】
5.1 禁用mysql 绑定本机
由于 mysql的默认安装只允许本地登录,所以需要修改配置文件将地址绑定注释掉:
vi /etc/mysql/my.cnf
#bind-address           = 127.0.0.1
5.2 重启mysql:  service mysql restart
5.3 登录msql,mysql -uroot -proot
创建database: hive
create database hive;
show databases;
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
5.4 修改hive配置文件hive-site.xml
修改以下属性:



   

 javax.jdo.option.ConnectionURL
    jdbc:mysql://192.168.10.180:3306/hive?characterEncoding=UTF-8


   



   


        javax.jdo.option.ConnectionDriverName


        com.mysql.jdbc.Driver


   



   


        javax.jdo.option.ConnectionUserName


        root


   



   


        javax.jdo.option.ConnectionPassword


        alextong


   




5.5 把mySQL的JDBC驱动包复制到Hive的lib目录下
这里下载的版本是:mysql-connector-Java-5.0.8-bin.jar
http://dev.mysql.com/downloads/connector/j/5.0.html
tar xvzf mysql-connector-java-5.0.8.tar.gz 
mv mysql-connector-java-5.0.8-bin.jar  apache-hive-1.2.1-bin/lib

6、验证:
6.1 先启动hadoop
先启动hadoop
start-dfs.sh
start-yarn.sh
6.2 hive
$hive
xxxxx
>hive
6.3 在hive上建立数据表
6.3.1
hive> show databases;
OK
default
Time taken: 1.078 seconds, Fetched: 1 row(s)
hive> 
6.3.2 建数据库 hive
hive>create table test(id int,name string);
6.4 在mysql验证
2)登录mySQL查看meta信息
use hive;
show tables;
mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Database changed
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| DATABASE_PARAMS           |
| DBS                       |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| PARTITIONS                |
| PARTITION_KEYS            |
| PART_COL_STATS            |
| ROLES                     |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| VERSION                   |
+---------------------------+


select* from TBLS;
成功
6.5 详细验证
6.5.1 文档
root@spark:~# vi add.txt
5
2
:wq
6.5.2 上传hdfs
root@spark:~# hadoop fs -put /home/alex/xdowns/add.txt /user
root@spark:~# hadoop fs -ls /user
-rw-r--r--   1 root supergroup        148 2016-05-15 16:03 /user/add.txt
6.5.3 hive建表
hive> create table tester(id int);
OK
Time taken: 0.301 seconds


6.5.4 hive load
a.在hdfs上的文件
hive> load data inpath 'hdfs://spark:9000/user/add.txt' into table tester;
Loading data to table default.tester
Table default.tester stats: [numFiles=1, totalSize=3]
OK
load完成后 hdfs上的文件自动删除
6.5.5 hive select查询结果
hive> select * from tester;
OK
5
2
Time taken: 0.313 seconds, Fetched: 2 row(s)
hive> 
6.5.6 mysql 查询结果
mysql> SELECT * FROM TBLS;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
|      1 |  1463298658 |     1 |                0 | root  |         0 |     1 | test     | MANAGED_TABLE | NULL               | NULL               |
|      2 |  1463299661 |     1 |                0 | root  |         0 |     2 | testadd  | MANAGED_TABLE | NULL               | NULL               |
|      6 |  1463300857 |     2 |                0 | root  |         0 |     6 | testadd  | MANAGED_TABLE | NULL               | NULL               |
|     11 |  1463301301 |     1 |                0 | root  |         0 |    11 | test_add | MANAGED_TABLE | NULL               | NULL               |
|     12 |  1463301398 |     1 |                0 | root  |         0 |    12 | tester   | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
5 rows in set (0.01 sec)

b.如果是本地文件
hive> load data local inpath 'add.txt' into table testadd;
退出quit;
7、报错
7.1 java.io
Exception in thread "main"Java.lang.RuntimeException: java.lang.IllegalArgumentException:java.net.URISyntaxException: Relative path in absolute URI:${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
办法:
http://blog.csdn.net/zwx19921215/article/details/42776589
把hive-site.xml里所有含system:java.io.tmpdir的像换成绝对路径,比如/usr/local/hive/log
    hive.exec.local.scratchdir
    /usr/local/hive/log
    Local scratch space for Hive jobs
 
 
    hive.downloaded.resources.dir
    /user/local/hive/log
        hive.querylog.location
    /usr/local/hive/log
    Location of Hive run time structured log file
 

7.2 jline
[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
办法:
https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
http://stackoverflow.com/questions/28997441/hive-startup-error-terminal-initialization-failed-falling-back-to-unsupporte
vi ~/.bashrc
export HADOOP_USER_CLASSPATH_FIRST=true
source ~/.bashrc
7.3 字符集的问题
2 For direct MetaStore DB connections, we don’t support retries at the client level.


当在Hive中创建表的时候报错:


create table years (year string, event string) row format delimited fields terminated by '\t';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)


这是由于字符集的问题,需要配置MySQL的字符集:


mysql> alter database hive character set latin1;




四、sqoop2安装,sqoop1见下一步五
(最好在安装sqoop之前,把Hbase和Hive安装上)
1.4.6也支持hadoop 2.6.2
1、下载:
http://mirror.bit.edu.cn/apache/sqoop/1.99.6/
http://mirror.bit.edu.cn/apache/sqoop/1.99.6/sqoop-1.99.6-bin-hadoop200.tar.gz
2、解压:
tar xvzf sqoop-1.99.6-bin-hadoop200.tar.gz
3、环境变量
vi ~/.bashrc
export SQOOP_HOME=/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_HOME=$SQOOP_HOME/server  
export LOGDIR=$SQOOP_HOME/logs  
source ~/.bashrc
4、配置文件
4.1 配置${SQOOP_HOME}/server/conf/catalina.properties 文件【含hive的jar文件替换】
找到common.loader行,删除hadoop和hive所有jar路径,加入本机hadoop2的jar路径 【一行里边,不要换行】
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/yarn/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/yarn/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/hdfs/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/hdfs/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/mapreduce/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/tools/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/tools/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/lib/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/common/*.jar,/usr/local/hadoop/hadoop-2.6.2/share/hadoop/httpfs/tomcat/lib/*.jar,/usr/local/hive/apache-hive-1.2.1-bin/lib/*.jar
【如果还需要导入hive或hbase,对应的jar包也需要加入 
由于添加的jar包中包含了log4j.jar,为了防止jar包冲突,删除sqoop中的log4j.jar


[grid@hadoop6 sqoop-1.99.3]$ mv ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar ./server/webapps/sqoop/WEB-INF/lib/log4j-1.2.16.jar.bak】
4.2 配置${SQOOP_HOME}/server/conf/sqoop.properties 文件
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/hadoop-2.6.2/etc/hadoop/
5、替换@LOGDIR@ 和@BASEDIR@ :【可选】
/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/base
/usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/logs
6、jdbc驱动
然后找到你的数据库jdbc驱动复制到sqoop/lib目录下,如果不存在则创建.
下载mysql驱动包 mysql-connector-java-5.1.16-bin.jar 并放到 /usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/server/lib 目录下
7、启动
7.1先启动hadoop
./start-dfs
./start-yarn
7.2 启动sqoop
7.2.1启动 [root@db12c sqoop]# ./bin/sqoop.sh server start
Sqoop home directory: /home/likehua/sqoop/sqoop
Setting SQOOP_HTTP_PORT:     12000
Setting SQOOP_ADMIN_PORT:     12001
Using   CATALINA_OPTS:
Adding to CATALINA_OPTS:    -Dsqoop.http.port=12000 -Dsqoop.admin.port=12001
Using CATALINA_BASE:   /home/likehua/sqoop/sqoop/server
Using CATALINA_HOME:   /home/likehua/sqoop/sqoop/server
Using CATALINA_TMPDIR: /home/likehua/sqoop/sqoop/server/temp
Using JRE_HOME:        /usr/local/jdk1.7.0
Using CLASSPATH:       /home/likehua/sqoop/sqoop/server/bin/bootstrap.jar
(sqoop服务端是一个跑在tomcat上的服务程序)
[关闭 sqoop server :./bin/sqoop.sh server stop]
7.2.2启动sqoop客户端:
注意:使用sqoop2-shell如果有hadoop jar包warning,说明jar包在4.1时没有配置完全或者有错误,重新按教程配置,注意common.loader不要多行。
此外,sqoop2(1.99.x)没有一些旧命令,比如输入sqoop是不会进入shell的。
[root@db12c sqoop]# bin/sqoop.sh client
Sqoop home directory: /home/likehua/sqoop/sqoop
Sqoop Shell: Type 'help' or '\h' for help.


sqoop:000> show version --all#显示版本:show version --all显示连接器:show connector --all创建连接:create connection --cid 1


client version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
server version:
  Sqoop 1.99.3 revision 2404393160301df16a94716a3034e31b03e27b0b
  Compiled by mengweid on Fri Oct 18 14:15:53 EDT 2013
Protocol version:
  [1]
sqoop:000>
主要参考:http://www.th7.cn/db/nosql/201510/134172.shtml
http://www.cnblogs.com/likehua/p/3825489.html
exit退出
8、sqoop从hive导出数据到mysql
启动
cd /usr/local/sqoop/sqoop-1.99.6-bin-hadoop200/bin
./sqoop2-shell 
为客户端配置服务器
sqoop:000> set server --host spark --port 12000 --webapp sqoop
Server is set successfully



五、Sqoop1安装

Hadoop 2.6.2下Sqoop 1.4.6安装以及Hive,HDFS,MySQL导入导出


环境:
Ubuntu 14.04.04 amd64 jdk1.8 Hadoop 2.6.2 ,Hive,Hbase


注:Sqoop2(1.99.6)使用起来有点不顺手,等Sqoop2完善了再用,而用Sqoop1(1.4.6)操作简单,两个都可以选择。
Sqoop2的安装可以参考本人的相关文章。


参考:http://www.tuicool.com/articles/FZRJbuz
1、下载:
http://www.apache.org/dyn/closer.lua/sqoop/1.4.6
http://apache.fayea.com/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz


2、解压:
tar -zxvf  sqoop-1.4.4-cdh5.1.2.tar.gz


3、配置
cd /usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/conf
cp sqoop-env-template.sh sqoop-env.sh
vi sqoop-env.sh
添加如下:【如果有hive,hbase,zookeeper要改,最好都有】
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/home/hadoop/hadoop


#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop


#set the path to where bin/hbase is available
export HBASE_HOME=/home/hadoop/hbase


#Set the path to where bin/hive is available
export HIVE_HOME=/home/hadoop/hive


#Set the path for where zookeper config dir is
export ZOOCFGDIR=/home/hadoop/zookeeper


4、添加MySQL connector jar包
cp  ~/hive/lib/mysql-connector-java-5.1.30.jar   ~/sqoop/lib/
或者自己下一个放到相应的路径下


5、添加环境变量
vi ~/.bashrc
export SQOOP_HOME=/home/hadoop/sqoop
export PATH=$PATH:$SBT_HOME/bin:$SQOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$SQOOP_HOME/lib
source ~/.bashrc


6、测试MySQL数据库的连接
sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
提示错误:
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
添加zookeeper home
vi ~/.bashrc
export ZOOKEEPER_HOME=/opt/zookeeper/zookeeper
export path=${ZOOKEEPER_HOME}/bin:$PATH
重新测试:
sqoop list-databases --connect jdbc:mysql://127.0.0.1:3306/ --username root -P
仍有提示ACCUMULO_HOME之类的,不管,输入密码MySQL密码
Enter password: 
2016-05-18 19:16:15,336 INFO  [main] manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hive
mysql
performance_schema
test_hdfs


7、MySQL数据库的表导入HDFS
注意:先启动hadoop,否则有connection refused错误
root@spark:~# /usr/local/hadoop/hadoop-2.6.2/sbin/start-dfs.sh
root@spark:~# /usr/local/hadoop/hadoop-2.6.2/sbin/start-yarn.sh
之后:
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --target-dir /user/test111 
注释:
-m 1是map的数量,
--target-dir必须是空目录,否则报错文件夹已存在,如果想自动删除,改为--delete-target-dir
如果有xxx streaming xxx .close()就添加 --driver com.mysql.jdbc.Driver
参考:http://stackoverflow.com/questions/26375269/sqoop-error-manager-sqlmanager-error-reading-from-database-java-sql-sqlexcept
http://www.cognoschina.net/home/space.php?uid=173321&do=blog&id=121081


8、MySQL数据库的表导入Hive
就是在第7步后边加--hive-import,注意导入的路径是你在hive-site.xml里边指定的路径
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop import -m 1  --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --target-dir /user/test222 --hive-import


9、HDFS导入MySQL
sqoop export,--export-dir就是hdfs所在路径,其他和第7步一样,注意--table必须是空表,先用mysql创建好
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --export-dir /user/test111


10、Hive导入MySQL
与HDFS导入MySQL相同,注意--table必须是空表,先用mysql创建好
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://127.0.0.1:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable --export-dir /user/test222
第二个例子 
root@spark:/usr/local/sqoop/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/lib# sqoop export --connect jdbc:mysql://192.168.10.180:3306/traincorpus --driver com.mysql.jdbc.Driver --username root -P --table testtable3 --export-dir /hbase/tester
2016-05-18 20:57:29,927 INFO  [main] mapreduce.ExportJobBase: Transferred 125 bytes in 50.8713 seconds (2.4572 bytes/sec)
2016-05-18 20:57:29,968 INFO  [main] mapreduce.ExportJobBase: Exported 2 records.




11、Sqoop job
sqoop job --create myjob -- import --connect jdbc:mysql://192.168.10.180:3306/test --username root --password 123456 --table mytabs --fields-terminated-by '\t'
其中myjob表示作业名称。在job中保存密码,默认在调用时还会要求输入密码,需要将密码直接保存在job中下次可以免密码直接执行,可以将/conf/sqoop-site.xml中的sqoop.metastore.client.record.password注视去掉。
其他相关作业命令:① job sqoop job --list,查看作业列表;② job sqoop job --delete myjob,删除作业。


可以参考:
http://www.th7.cn/db/mysql/201405/54683.shtml



#########################################

问题解决参考:

hive问题及解决

1.hiveserver2启动后,beeline不能连接的涉及的问题:
原因:权限问题
解决:
/user/hive/warehouse
/tmp
/history (如果配置了jobserver 那么/history也需要调整)
这三个目录,hive在运行时要读取写入目录里的内容,所以把权限放开,设置权限:
hadoop fs -chmod -R 777 /tmp
hadoop fs -chmod -R 777 /user/hive/warehouse
2.beeline 链接拒绝报错信息
原因:官方的一个bug
解决:
hive.server2.long.polling.timeout


hive.server2.thrift.bind.host 注意把host改成自己的host
3.字符集问题、乱码的、显示字符长度问题的
原因:字符集的问题,乱码问题
解决:hive-site.xml中配置的mysql数据库中去 alter database hive character set latin1;
类似附件中的图片显示错误。
4.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don’t support retries at the client level.)
这个是由于我的mysql不再本地(默认使用本地数据库),这里需要配置远端元数据服务器
hive.metastore.uris


thrift://lza01:9083
Thrift URI for the remote metastore. Used by metastore client to connect to rem
ote metastore. 然后在hive服务端启动元数据存储服务 hive –service metastore




5.FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es) : Specified key was too long; max key length is 767 bytes
修改mysql的字符集
alter database hive character set latin1;
转载请注明:云帆大数据学院(http://www.yfteach.com) » hive安装问题及解决方法


版权声明:本文为博主原创文章,未经博主允许不得转载。


目录(?)[+]
1 Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT…


当启动Hive的时候报错:


Caused by: javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
NestedThrowables:
java.sql.SQLException: Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.
1
2
3
1
2
3
这个问题是由于hive的元数据存储MySQL配置不当引起的,可以这样解决:


mysql> set global binlog_format='MIXED';
1
1
2 For direct MetaStore DB connections, we don’t support retries at the client level.


当在Hive中创建表的时候报错:


create table years (year string, event string) row format delimited fields terminated by '\t';
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)
1
2
1
2
这是由于字符集的问题,需要配置MySQL的字符集:


mysql> alter database hive character set latin1;
1
1
3 HiveConf of name hive.metastore.local does not exist


当执行Hive客户端时候出现如下错误:


WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
1
1
这是由于在0.10 0.11或者之后的HIVE版本 hive.metastore.local 属性不再使用。将该参数从hive-site.xml删除即可。


4 Permission denied: user=anonymous, access=EXECUTE, inode=”/tmp”


在启动Hive报如下错误:


(Permission denied: user=anonymous, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------
1
1
这是由于Hive没有hdfs:/tmp目录的权限,赋权限即可:


hadoop dfs -chmod -R 777 /tmp
1
1
5 未完待续
http://blog.csdn.net/cjfeii/article/details/49363653

你可能感兴趣的:(大数据开发基础)