Hive/Hbase/Sqoop的安装教程

Hive/Hbase/Sqoop的安装教程

 

HIVE INSTALL

1.下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
2.上传到Linux指定目录,解压:

mkdir hive 
mv apache-hive-2.3.3-bin.tar.gz hive
tar -zxvf apache-hive-2.3.3-bin.tar.gz
mv apache-hive-2.3.3-bin apache-hive-2.3.3

### 安装目录为:/app/hive/apache-hive-2.3.3 


3.配置环境变量
sudo vi /etc/profile
添加环境变量:

export HIVE_HOME=/app/hive/apache-hive-2.3.3
export PATH=$PATH:$HIVE_HOME/bin

:wq #保存退出


4.修改HIVE配置文件:
配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):

cd /app/hive/apache-hive-2.3.3/conf
cp hive-env.sh.template hive-env.sh
###在文件中添加如下内容-- 去掉#,并把目录改为自己设定的目录
export HADOOP_HEAPSIZE=1024
export HADOOP_HOME=/app/hadoop/hadoop-2.7.7 #hadoop的安装目录
export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
export HIVE_HOME=/app/hive/apache-hive-2.3.3
export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
export JAVA_HOME=/app/lib/jdk

  

创建hdfs文件目录:

cd /app/hive/apache-hive-2.3.3
mkdir hive_site_dir
cd hive_site_dir
hdfs dfs -mkdir -p warehouse #使用这条命令的前提是hadoop已经安装好了
hdfs dfs -mkdir -p tmp
hdfs dfs -mkdir -p log
hdfs dfs -chmod -R 777 warehouse
hdfs dfs -chmod -R 777 tmp
hdfs dfs -chmod -R 777 log
创建临时文件夹:
cd /app/hive/apache-hive-2.3.3
mkdir tmp

  

配置文件hive-site.xml (在原有的基础上修改):
cp hive-default.xml.template  hive-site.xml
vi hive-site.xml
>>配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName



  javax.jdo.option.ConnectionDriverName
  com.mysql.jdbc.Driver



  javax.jdo.option.ConnectionURL
  jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8



  javax.jdo.option.ConnectionUserName
  szprd


  javax.jdo.option.ConnectionPassword
  szprd

  

>>配置hdfs文件目录


hive.exec.scratchdir

/app/hive/apache-hive-2.3.3/hive_site_dir/tmp
HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/ is created, with ${hive.scratch.dir.permission}.



hive.metastore.warehouse.dir
/app/hive/apache-hive-2.3.3/hive_site_dir/warehouse



hive.exec.local.scratchdir

/app/hive/apache-hive-2.3.3/tmp/${system:user.name}
Local scratch space for Hive jobs



hive.downloaded.resources.dir

/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources
Temporary local directory for added resources in the remote file system.



hive.querylog.location

/app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}Location of Hive run time structured log file




hive.metastore.schema.verification
false

Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.

修改完配置文件后,:wq 保存退出

5.下载合适版本的mysql驱动包,复制到HIVE安装目录的 lib目录下
https://dev.mysql.com/downloads/connector/j/

6.初始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )

cd /app/hive/apache-hive-2.3.3/bin
./schematool -initSchema -dbType mysql

  

7.启动hive
hive     #这里配置了环境变量后,可以在任意目录下执行 (/etc/profile)


8.实时查看日志启动hive命令(在hive安装目录的bin目录下执行):

./hive -hiveconf hive.root.logger=DEBUG,console

 

 

 


HBASE INSTALL


1.下载hbase安装包:  http://hbase.apache.org/downloads.html


2.解压: tar -zxvf  hbase-1.2.6.1-bin.tar.gz


3.配置环境变量: (加在最后面)
vi /etc/profile

#HBase Setting
export HBASE_HOME=/app/hbase/hbase-1.2.6.1
export PATH=$PATH:$HBASE_HOME/bin

  

4.编辑配置文件: hbase-env.sh

export HBASE_MANAGES_ZK=false
export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids #如果该目录不存在,则先创建
export JAVA_HOME=/app/lib/jdk #指定JDK的安装目录

 

编辑配置文件: hbase-site.xml
在configuration节点添加如下配置:


hbase.rootdir
hdfs://192.168.1.202:9000/hbase




hbase.zookeeper.property.dataDir
/home/vc/dev/MQ/ZK/zookeeper-3.4.12




zookeeper.znode.parent
/hbase




hbase.cluster.distributed
true



hbase.unsafe.stream.capability.enforce
false

Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.


  

5.启动zookeeper
进入zookeeper的安装目录下的bin目录,执行 ./zkServer.sh
然后启动客户端: ./zkCli.sh
启动成功后,输入: create /hbase hbase

6.启动hbase
进入Hbase的bin目录: ./start-hbase.sh
./hbase shell  #这里启动成功后就可以开始执行hbase相关命令了
list  #没有报错表示成功

7.web访问HBASE: http://10.28.85.149:16010/master-status   #ip为当前服务器的ip,端口为16010

 

 


#Sqoop install
1.下载安装包: https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/


2.解压: tar -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz

更改文件名: mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0


3. 配置环境变量:/etc/profile

#Sqoop Setting
export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
export PATH=$PATH:$SQOOP_HOME/bin

  

4.将mysql的驱动包复制到 Sqoop安装目录的lib目录下

https://dev.mysql.com/downloads/connector/j/

 

5.编辑配置文件: sqoop的安装目录下的 conf下
vi sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7

#set the path to where bin/hbase is available
export HBASE_HOME=/app/hbase/hbase-1.2.6.1

#Set the path to where bin/hive is available
export HIVE_HOME=/app/hive/apache-hive-2.3.3

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12

  

6,输入命令:

sqoop help  #查看相关的sqoop命令

sqoop version #查看sqoop的版本

 

 

 ps:

关于停止hbase的命令: stop-hbase.sh   ,出现关于pid的错误提示时,请参考这篇博文:https://blog.csdn.net/xiao_jun_0820/article/details/35222699

hadoop的安装教程:http://note.youdao.com/noteshare?id=0cae2da671de0f7175376abb8e705406

zookeeper的安装教程:http://note.youdao.com/noteshare?id=33e37b0967da40660920f755ba2c03f0

 

 

 

 

# hadoop 伪分布式模式安装
# 前提 JDK 安装成功

# 下载hadoop2.7.7
```
cd /home/vc/dev/hadoop

wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz 
```
# 解压缩

```
 tar -zxvf hadoop-2.7.7.tar.gz 
```

## 配置hadoop的环境变量,在/etc/profile下追加 hadoop配置

```
# hadoop home setting 

export HADOOP_HOME=/app/hadoop/hadoop-2.7.7
export HADOOP_INSTALL=${HADOOP_HOME}
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"

```

## 修改 hadoop安装目录/etc/hadoop/hadoop-env.sh 文件

```
# The java implementation to use.
export JAVA_HOME=/home/vc/dev/jdk/jdk1.8.0_161
```
### hadoop安装目录/etc/hadoop/core-site.xml


```


    
    
        hadoop.tmp.dir
        /home/vc/dev/hadoop/hadoop-2.7.7/tmp
        Abase for other temporary directories.
    
    
    
        fs.defaultFS
        hdfs://192.168.1.202:9000
    

                    
```
 
### 配置HDFS ,etc/hadoop/hdfs-site.xml

```

        
        
                dfs.namenode.name.dir
                file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/name
        
        
        
                dfs.replication
                1
        
        
        
                dfs.datanode.data.dir
                file:///home/vc/dev/hadoop/hadoop-2.7.7/hdfs/data
        


~                   
```
### 设置hadoop 伪分布式下免密登入,Hadoop集群节点之间的免密登入务必配置成功,不然有各种问题

如果是单节点情况下免密登入测试`ssh localhost`,如果不能登入成功,执行下面命令:

```
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

```
### 伪分布式下不需要配置/etc/hosts文件,真分布式下需要配置各主机和IP的映射关系。


# hadoop伪分布式下启动
## 配置 hdfs
```
# 第一次启动hdfs需要格式化:出现询问输入Y or N,全部输Y即可
bin/hdfs namenode -format
# 启动 Start NameNode daemon and DataNode daemon: 启动HDFS,这个命令启动hadoop单节点集群
sbin/start-dfs.sh
```
通过上面启动后即可在web页面浏览 NameNode 节点信息:
![](http://one17356s.bkt.clouddn.com/18-8-24/97813052.jpg)

```
# 通过hadoop 命令在hdfs上创建目录
hadoop fs -mkdir /test
# 或者通过这个命令
 hdfs dfs -mkdir /user
 
# 上传文件

```
![](http://one17356s.bkt.clouddn.com/18-8-24/33727958.jpg)

## 关闭 HDFS

```
./sbin/stop-dfs.sh

```
## 配置 yarn
### etc/hadoop/mapred-site.xml

```



 
    
        mapreduce.framework.name
        yarn
    


```
### etc/hadoop/yarn-site.xml
```




  
    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
  
        yarn.nodemanager.aux-services.mapreduce.shuffle.class
        org.apache.hadoop.mapred.ShuffleHandler
    

```




![](http://one17356s.bkt.clouddn.com/18-8-24/53993777.jpg)
![](http://one17356s.bkt.clouddn.com/18-8-24/28989509.jpg)


## yarn 启动和停止

```
./sbin/start-yarn.sh
./sbin/stop-yarn.sh

```

## 查看集群状态

```
./bin/hadoop dfsadmin  -report   
```
# 伪分布式下测试

```
//服务器上新建目录
 mkdir ~/input
 //进入服务器目录并将hadoop配置文件当做数据文件复制到input目录
 cd ~/input
 cp /app/hadoop/hadoop-2.7.7/etc/hadoop/*.xml ./
 //将 input下的文件上传到hdfs分布式文件系统中/one目录下
 hdfs dfs -put ./* /one
 //检查上传到hdfs中的文件
 hdfs dfs -ls /one
 //执行jar文件,务必保证计算结果目录 /output 在hdfs上不存在。不然报错
 hadoop jar /app/hadoop/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar grep /one /output 'dfs[a-z.]+'
 //将计算结果目录导出到服务器下~/input目录中
hdfs dfs -get /output
// 查看内容
cat output/*

```
--- 

# ZK 安装
# 下载zk解压并安装:(zookeeper-3.4.9.tar.gz)
# 设置环境变量
![](http://one17356s.bkt.clouddn.com/17-11-2/30838835.jpg)
# 改配置文件(配置文件存放在$ZOOKEEPER_HOME/conf/目录下,将zoo_sample.cfg文件名称改为zoo.cfg)
配置说明:
- tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
- dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
- clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。

![](http://one17356s.bkt.clouddn.com/18-7-8/79348236.jpg)
4.1单机模式
- 下载zookeeper的安装包之后, 解压到合适目录. 进入zookeeper目录下的conf子目录, 创建`cp zoo_sample.cfg zoo.cfg`根据模板创建配置文件,并配置如下参数。
- tickTime=2000 
- dataDir=/home/vc/dev/MQ/ZK/data
- dataLogDir=/home/vc/dev/MQ/ZK/log
- clientPort=2181 
## 每个参数的含义说明

- tickTime: zookeeper中使用的tick基本时间单位, 毫秒值.
- dataDir: 数据目录. 可以是任意目录.
- dataLogDir: log目录, 同样可以是任意目录. 如果没有设置该参数, 将使用和dataDir相同的设置.
- clientPort: 监听client连接的端口号


# 启动zk
`/dev/Zk/zookeeper-3.4.9/bin$ ./zkServer.sh start`
![](http://one17356s.bkt.clouddn.com/17-11-2/76638495.jpg)

# 查看是否起来
使用命令:`netstat -antp | grep 2181`
![](http://one17356s.bkt.clouddn.com/17-11-2/15616237.jpg)

# 通过zCl.sh链接到zk服务

```
 ./zkCli.sh -server localhost:2181 链接到本机zk服务
 history 执行命令
 quit 客户端断开zkserver链接
 
```
![](http://one17356s.bkt.clouddn.com/18-8-27/4122129.jpg)

# 关闭Zk服务
`./zkServer.sh stop`

---



# [HIVE SQOOP HBASE安装博客链接:](https://www.cnblogs.com/DFX339/p/9550213.html)

# HIVE-INSTALL
- 下载安装包:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.3.3/
- 上传到Linux指定目录,解压:  
	
```
mkdir hive    
mv apache-hive-2.3.3-bin.tar.gz  hive
tar -zxvf apache-hive-2.3.3-bin.tar.gz
mv apache-hive-2.3.3-bin apache-hive-2.3.3
### 安装目录为:/app/hive/apache-hive-2.3.3
```

- 配置环境变量:

```
sudo  vi /etc/profile
添加:export HIVE_HOME=/app/hive/apache-hive-2.3.3
	  export PATH=$PATH:$HIVE_HOME/bin
:wq    #保存退出
```

- 修改HIVE配置文件:
	- 配置文件hive-env.sh (在原有的基础上修改,没有的项就添加):	
	
    ```
    cd /app/hive/apache-hive-2.3.3/conf
    cp hive-env.sh.template   hive-env.sh
    在文件中添加如下内容(去掉#,并把目录改为自己设定的目录)
    export HADOOP_HEAPSIZE=1024
    export HADOOP_HOME=/app/hadoop/hadoop-2.7.7   #hadoop的安装目录
    export HIVE_CONF_DIR=/app/hive/apache-hive-2.3.3/conf
    export HIVE_HOME=/app/hive/apache-hive-2.3.3
    export HIVE_AUX_JARS_PATH=/app/hive/apache-hive-2.3.3/lib
    export JAVA_HOME=/app/lib/jdk
    ```

	
- 创建hdfs文件目录:

    ```
    cd /app/hive/apache-hive-2.3.3
    mkdir hive_site_dir
    cd hive_site_dir
    hdfs dfs -mkdir -p warehouse   #使用这条命令的前提是hadoop已经安装好了
    hdfs dfs -mkdir -p tmp
    hdfs dfs -mkdir -p log
    hdfs dfs -chmod -R 777 warehouse
    hdfs dfs -chmod -R 777 tmp
    hdfs dfs -chmod -R 777 log
    创建临时文件夹:
    cd  /app/hive/apache-hive-2.3.3
    mkdir  tmp
    ```

	
- 配置文件hive-site.xml (在原有的基础上修改):	

    ```
    cp hive-default.xml.template   hive-site.xml 
    vi hive-site.xml
    ```

	- 配置一些数据库的信息 ConnectionURL/ConnectionUserName/ConnectionPassword/ConnectionDriverName
	
    ```
    
    
    	javax.jdo.option.ConnectionDriverName
    	com.mysql.jdbc.Driver
    
    
    
    	javax.jdo.option.ConnectionURL
    	jdbc:mysql://10.28.85.149:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8
    
    
    
    	javax.jdo.option.ConnectionUserName
    	szprd
    
    
        javax.jdo.option.ConnectionPassword
        szprd
    
    ```


- 配置hdfs文件目录

    ```
    
    		hive.exec.scratchdir
    		
    		/app/hive/apache-hive-2.3.3/hive_site_dir/tmp
    		HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.
           
    	
           
                 hive.metastore.warehouse.dir
                 /app/hive/apache-hive-2.3.3/hive_site_dir/warehouse
           
    	
    	
    		hive.exec.local.scratchdir
    		
    		/app/hive/apache-hive-2.3.3/tmp/${system:user.name}
    		Local scratch space for Hive jobs
    	
      
         
            hive.downloaded.resources.dir
            
    	/app/hive/apache-hive-2.3.3/tmp/${hive.session.id}_resources
            Temporary local directory for added resources in the remote file system.
         
      
         
             hive.querylog.location
             
    	 /app/hive/apache-hive-2.3.3/hive_site_dir/log/${system:user.name}Location of Hive run time structured log file
         
      
      
      
        hive.metastore.schema.verification
        false
        
          Enforce metastore schema version consistency.
          True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
                schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
                proper metastore schema migration. (Default)
          False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
        
      
    ```

  
**修改完hive-site.xml 配置文件后,wq 保存退出**
	
- 下载合适版本的mysql驱动包,放到HIVE安装目录的 lib目录下
	https://dev.mysql.com/downloads/connector/j/
	
- 始化数据库(在启动hive前一定要先执行这个命令哦,如果失败了,请查看数据库配置信息是否准确~ )
	
    ```
    cd  /app/hive/apache-hive-2.3.3/bin
    ./schematool -initSchema -dbType mysql
    ```

	 
- 启动hive

	`hive   #这里配置了环境变量后(/etc/profile),可以在任意目录下执行 `
	
- 实时查看日志启动hive命令(在hive安装目录的bin目录下执行): `./hive -hiveconf hive.root.logger=DEBUG,console`

--- 	 
	 
	 
# HBASE INSTALL
- [下载hbase安装包:](http://hbase.apache.org/downloads.html)


- 解压: `tar -zxvf  hbase-1.2.6.1-bin.tar.gz`

- 配置环境变量:	(加在最后面)

    ```
    vi  /etc/profile
    #HBase Setting
    export HBASE_HOME=/app/hbase/hbase-1.2.6.1
    export PATH=$PATH:$HBASE_HOME/bin
    ```

- 编辑配置文件:  `hbase-env.sh`
    
    ```
    # 默认为ture,表示使用内建的zk,false使用外部zk系统
    export HBASE_MANAGES_ZK=false
    export HBASE_PID_DIR=/app/hadoop/hadoop-2.7.7/pids   #如果该目录不存在,则先创建
    export JAVA_HOME=/app/lib/jdk   #指定JDK的安装目录
    ```

 
- 编辑配置文件:   `hbase-site.xml`
 在configuration节点添加如下配置:

```


dfs.replication

    1





   hbase.rootdir
   hdfs://10.28.85.149:9000/hbase




    hbase.zookeeper.property.clientPort
    2181
  


        hbase.zookeeper.property.dataDir
        /app/zookeeper/data




        zookeeper.znode.parent
        /hbase



  
    hbase.cluster.distributed
    true
  

 
         hbase.unsafe.stream.capability.enforce
         true
        
                Controls whether HBase will check for stream capabilities (hflush/hsync). Disable this if you intend to run on LocalFileSystem, denoted by arootdir with the 'file://' scheme, but be mindful of the NOTE below.
                 WARNING: Setting this to false blinds you to potential data loss and inconsistent system state in the event of process and/or node failures.If HBase is complaining of an inability to use hsync or hflush it's most likely not a false positive.
         


```


- 启动zookeeper
进入zookeeper的安装目录下的bin目录,执行  `./zkServer.sh`

然后启动客户端: `  ./zkCli.sh`

启动成功后,输入:   ` create /hbase hbase`

- 启动hbase

进入Hbase的bin目录:   `./start-hbase.sh`
		
```
./hbase shell   #这里启动成功后就可以开始执行hbase相关命令了
list  #查看当前hbase库中的所有表,没有报错表示成功
```

					
- web访问HBASE:   http://10.28.85.149:16010/master-status   #ip为当前服务器的ip,端口为16010
		
--- 



# SQOOP INSTALL

- [下载安装包](https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/)


- 解压 ` tar -zxvf  sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz`

    更改文件名:    `mv sqoop-1.4.7.bin__hadoop-2.6.0 sqoop-1.4.7_hadoop-2.6.0`
		
- 配置环境变量:

    ```
    #Sqoop Setting
    export SQOOP_HOME=/app/sqoop/sqoop-1.4.7_hadoop-2.6.0
    export PATH=$PATH:$SQOOP_HOME/bin
    ```


- 将mysql的驱动包复制到 Sqoop安装目录的lib目录下
    下载地址:https://dev.mysql.com/downloads/connector/j/

- 编辑配置文件:  sqoop的安装目录下的 conf下

```
vi sqoop-env.sh

#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/app/hadoop/hadoop-2.7.7

#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/app/hadoop/hadoop-2.7.7

#set the path to where bin/hbase is available
export HBASE_HOME=/app/hbase/hbase-1.2.6.1

#Set the path to where bin/hive is available
export HIVE_HOME=/app/hive/apache-hive-2.3.3

#Set the path for where zookeper config dir is
export ZOOCFGDIR=/app/zookeeper/zookeeper-3.4.12
```


- 测试sqoop的安装
	- sqoop help  #可以查看到sqoop的相关命令
	
	- 测试sqoop的连接: 查看此连接信息下的所有数据库
	
    ```
    sqoop list-databases \
    --connect jdbc:mysql://10.28.85.148:3306/data_mysql2hive \
    --username root \
    --password Abcd1234
    ```



--- 

# oozie 安装 
# 安装基于oozie-4.0.0-cdh5.3.6.tar.gz oozie 版本
安装之前准备条件:
- 可用的mysql数据库
- 已经安装好的hadoop集群
- oozie 最终编译好的安装包中 `oozie-server` 就是一个tomcat环境,不用另外安装tomcat 环境。
## 安装
- 下载编译后的压缩包:`wget http://archive.cloudera.com/cdh5/cdh/5/oozie-4.0.0-cdh5.3.6.tar.gz`
- 解压缩到所指定的目录 :`tar -zxvf oozie-4.0.0-cdh5.3.6.tar.gz`,这里使用的目录是: `/app/oozie`
- 设置全局环境变量:`sudo vim /etc/profile`
```

#oozie setting
export OOZIE_HOME=/app/oozie/oozie-4.0.0-cdh5.3.6
export PATH=$PATH:$OOZIE_HOME/bin
```

- 设置 ` Oozie安装目录/conf/oozie-env.sh   ` 设置环境变量
同时oozie的web console 的端口也在这里进行设置:
`OOZIE_HTTP_PORT ` 设置 oozie web 服务的监听端口,默认是11000
```

export OOZIE_CONF=${OOZIE_HOME}/conf
export OOZIE_DATA=${OOZIE_HOME}/data
export OOZIE_LOG=${OOZIE_HOME}/logs
export CATALINA_BASE=${OOZIE_HOME}/oozie-server
export CATALINA_HOME=${OOZIE_HOME}/oozie-server
```

- 在Oozie根目录下创建libext文件夹,并将Oozie依赖的其他第三方jar移动到该目录下面。`mkdir libext`
    
    - 将下载的ext2.2添加到 libext 目录 :` cp ext-2.2.zip oozie-5.0.0/libext/`
    - 添加hadoop lib下的包到libext目录,进入libext目录`cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/*.jar ./`和 ` cp /app/hadoop/hadoop-2.7.7/share/hadoop/*/lib/*.jar ./`
    - 添加对于存储元数据的mysql数据库的驱动(`mysql-connector-java-5.1.41.jar`)

- hadoop 设置oozie 代理用户设置:
只需要替换xxx 为你oozie提交任务的用户名即可。
    - hadoop.proxyuser.**xxx**.hosts
    
    - hadoop.proxyuser.**xxx**.groups
```


    hadoop.proxyuser.imodule.hosts
    *
  
  
    hadoop.proxyuser.imodule.groups
    *
  
```

- 在hdfs上设置Oozie的公用jar文件夹,

hadoop的默认端口号是8020,我改成了9000,所以这里注意一下:

遇到一个问题是:NameNode 处于 safe mode,需要关闭安全模式:`hdfs dfsadmin -safemode leave`

```
 oozie-setup.sh sharelib create -fs hdfs://10.28.85.149:9000 -locallib oozie-sharelib-4.0.0-cdh5.3.6-yarn.tar.gz
```
- 创建Oozie的war文件

先将hadoop相关包,mysql相关包,ext相关压缩包放到libext文件夹中,然后运行:`oozie-setup.sh prepare-war` 命令创建war包。


- oozie 安装目录conf/oozie-site.xml


oozie.service.HadoopAccessorService.hadoop.configurations属性的值为本地hadoop目录的配置文件路径:
```
 

        oozie.services
        
        org.apache.oozie.service.JobsConcurrencyService,
            org.apache.oozie.service.SchedulerService,
            org.apache.oozie.service.InstrumentationService,
            org.apache.oozie.service.MemoryLocksService,
            org.apache.oozie.service.CallableQueueService,
            org.apache.oozie.service.UUIDService,
            org.apache.oozie.service.ELService,
            org.apache.oozie.service.AuthorizationService,
            org.apache.oozie.service.UserGroupInformationService,
            org.apache.oozie.service.HadoopAccessorService,
            org.apache.oozie.service.URIHandlerService,
            org.apache.oozie.service.DagXLogInfoService,
            org.apache.oozie.service.SchemaService,
            org.apache.oozie.service.LiteWorkflowAppService,
            org.apache.oozie.service.JPAService,
            org.apache.oozie.service.StoreService,
            org.apache.oozie.service.CoordinatorStoreService,
            org.apache.oozie.service.SLAStoreService,
            org.apache.oozie.service.DBLiteWorkflowStoreService,
            org.apache.oozie.service.CallbackService,
            org.apache.oozie.service.ActionService,
            org.apache.oozie.service.ShareLibService,
            org.apache.oozie.service.ActionCheckerService,
            org.apache.oozie.service.RecoveryService,
            org.apache.oozie.service.PurgeService,
            org.apache.oozie.service.CoordinatorEngineService,
            org.apache.oozie.service.BundleEngineService,
            org.apache.oozie.service.DagEngineService,
            org.apache.oozie.service.CoordMaterializeTriggerService,
            org.apache.oozie.service.StatusTransitService,
            org.apache.oozie.service.PauseTransitService,
            org.apache.oozie.service.GroupsService,
            org.apache.oozie.service.ProxyUserService,
            org.apache.oozie.service.XLogStreamingService,
            org.apache.oozie.service.JvmPauseMonitorService
        
    
    
    
        oozie.service.HadoopAccessorService.hadoop.configurations
        *=/app/hadoop/hadoop-2.7.7/etc/hadoop
    
    
        oozie.service.JPAService.create.db.schema
        true
    

    
        oozie.service.JPAService.jdbc.driver
        com.mysql.jdbc.Driver
    
    
        oozie.service.JPAService.jdbc.url
        jdbc:mysql://10.28.85.148:3306/ooize?createDatabaseIfNotExist=true
    

    
        oozie.service.JPAService.jdbc.username
        root
    
    
    
        oozie.service.JPAService.jdbc.password
        Abcd1234
    


```

- 运行Oozie服务并检查是否安装完成
`oozied.sh run 或者oozied.sh start` (前者在前端运行,后者在后台运行)
- 关闭oozie 服务: `oozied.sh stop`
- 命令行检查oozie web 状态(`oozie admin -oozie http://10.28.85.149:11000/oozie -status `)  返回:`System mode: NORMAL`
- 然后使用shareliblist命令查看相关内容 `oozie admin -shareliblist -oozie http://localhost:11000/oozie`
- 页面访问:`http://10.28.85.149:11000/oozie/`

**遇到 了一个问题**

```
Sep 03, 2018 4:36:47 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NullPointerException
        at org.apache.jsp.index_jsp._jspInit(index_jsp.java:25)
        at org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)
        at org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)
        at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
        at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
        at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:723)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.AuthFilter$2.doFilter(AuthFilter.java:154)
        at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:594)
        at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:553)
        at org.apache.oozie.servlet.AuthFilter.doFilter(AuthFilter.java:159)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.oozie.servlet.HostnameFilter.doFilter(HostnameFilter.java:84)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
        at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
        at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
        at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:620)
        at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
        at java.lang.Thread.run(Thread.java:745)
```
问题原因是工程目录下`WEB-INF/lib`目录和tomcat下lib目录都有servlet-api.jar ,jsp-api.jar 文件造成的。
`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `下 和`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/lib`两个目录下有具有相同的jar包造成了冲突。`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server`这个目录下就是oozie-server的tomcat 环境。目录下的lib目录就是tomcat运行时jar包。

解决办法:将`/app/oozie/oozie-4.0.0-cdh5.3.6/oozie-server/webapps/oozie/WEB-INF/lib `目录下的:servlet-api-2.5-6.1.14.jar, servlet-api-2.5.jar, jsp-api-2.1.jar 三个文件删除即可。

然后就可以顺利启动了
![](http://one17356s.bkt.clouddn.com/18-9-3/48205608.jpg)

---		

Pig的安装
# 前提 ### hadoop 2.7.7 已安装 ### jdk1.7+ # 安装 ``` tar -xzvf pig-0.17.0.tar.gz # Pig setting export PIG_HOME=/app/pig/pig-0.17.0 export PATH=$PATH:$PIG_HOME/bin ``` # 测试 ``` -- 本地模式 pig -x local -- mapreduce模式 pig -x mapreduce ``` ![](http://one17356s.bkt.clouddn.com/18-8-28/13040171.jpg) ---

 

转载于:https://www.cnblogs.com/DFX339/p/9550213.html

你可能感兴趣的:(大数据,java,数据库)