ubuntu16.04搭建Hadoop2.7.2+spark1.6.1+mysql+hive2.0.0伪分布学习环境


按步骤走就行:


(1)raini@biyuzhe:~$ gedit .bashrc

#java
export JAVA_HOME=/home/raini/app/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$JRE_HOME/bin:$PATH

#scala
export SCALA_HOME=/home/raini/app/scala-2.10.6
export PATH=${SCALA_HOME}/bin:$PATH

#spark
export SPARK_HOME=/home/raini/spark1
export PATH=$PATH:$SPARK_HOME/bin:$PATH

# hadoop2.6
export HADOOP_PREFIX=/home/raini/hadoop2
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"
export HADOOP_PREFIX PATH CLASSPATH


(2)raini@biyuzhe:~$ sudo apt-get install rsync

(3)raini@biyuzhe:~$ sudo apt-get install openssh-server

cd ~/.ssh/   # 若没有该目录,请先执行一次ssh localhost
ssh-keygen -t rsa   # 会有提示,都按回车就可以
cat id_rsa.pub >> authorized_keys  # 加入授权
使用ssh localhost试试能否直接登录

(4)raini@biyuzhe:~$ sudo gedit /etc/hosts

127.0.0.1    localhost
127.0.1.1    biyuzhe
#10.155.243.206  biyuzhe
#有的说这里必须修改,否则后面会遇到连接拒绝等问题

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

(5)修改配置文件:etc/hadoop/hadoop-env.sh

    export JAVA_HOME=/home/raini/app/jdk
    export HADOOP_COMMON_HOME=/home/raini/hadoop

(6)raini@biyuzhe:~$ gedit .bashrc

    添加export PATH="/home/raini/hadoop/bin:/home/raini/hadoop/sbin:
    如  export PATH="/home/raini/hadoop/bin:/home/raini/hadoop/sbin:   $JAVA_HOME/:$HADOOP_PREFIX/bin:$PATH"

(7)修改文件etc/hadoop/core-site.xml


   
        hadoop.tmp.dir
        file:/home/raini/hadoop/tmp
        Abase for other temporary directories.
   

   
        fs.defaultFS
        hdfs://localhost:9000
   

 
      io.file.buffer.size
      131072
   

  
       hadoop.proxyuser.master.hosts
        *
  

  
       hadoop.proxyuser.master.groups
       *
  




(8)修改etc/hadoop/hdfs-site.xml:


 
 
    
            dfs.namenode.name.dir
            file:/home/raini/hadoop/tmp/dfs/namenode
    

    
            dfs.datanode.data.dir
            file:/home/raini/hadoop/tmp/dfs/datanode
    


 
    
            dfs.replication
            1
    

 
    
            dfs.webhdfs.enabled
            true
    




(9)修改配置文件mapred-site.xml



   
          mapreduce.framework.name
          yarn
   


   
          mapreduce.jobhistory.address
          localhost:10020
   


    
          mapreduce.jobhistory.webapp.address
          localhost:19888
    





(10)修改配置文件yarn-site.xml

 



   
      yarn.nodemanager.aux-services
      mapreduce_shuffle
   


   
      yarn.nodemanager.aux-services.mapreduce.shuffle.class
      org.apache.hadoop.mapred.ShuffleHandler
   


   
       yarn.resourcemanager.address
       localhost:8032
   


   
         yarn.resourcemanager.scheduler.address
         localhost:8030
   


   
         yarn.resourcemanager.resource-tracker.address
         localhost:8031
   


    
        yarn.resourcemanager.admin.address
        localhost:8033
   


   
         yarn.resourcemanager.webapp.address
         localhost:8088
   





(11)    raini@biyuzhe:~$ source .bashrc
    
    raini@biyuzhe:~/hadoop$ sbin/start-dfs.sh

Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-namenode-biyuzhe.out
biyuzhe: starting datanode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-datanode-biyuzhe.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:7Th7Qu6av5WOqmmVLemv3YN+52LAcHw4BuFBNwBt5DU.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-secondarynamenode-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
14242 Jps
14106 SecondaryNameNode
13922 DataNode------------------(无namenode)


(12) raini@biyuzhe:~/hadoop$ hdfs namenode -format

raini@biyuzhe:~/hadoop$ sbin/stop-dfs.sh
Stopping namenodes on [localhost]
localhost: no namenode to stop
biyuzhe: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode

raini@biyuzhe:~/hadoop$ sbin/start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-namenode-biyuzhe.out
biyuzhe: starting datanode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-datanode-biyuzhe.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/raini/app/hadoop-2.7.2/logs/hadoop-raini-secondarynamenode-biyuzhe.out

raini@biyuzhe:~/hadoop$ jps
14919 NameNode-----------------------(namenode)
15407 Jps
15271 SecondaryNameNode
15073 DataNode


(13)raini@biyuzhe:~/hadoop$ sbin/start-yarn.sh  

starting yarn daemons
starting resourcemanager, logging to /home/raini/hadoop/logs/yarn-raini-resourcemanager-biyuzhe.out
biyuzhe: starting nodemanager, logging to /home/raini/app/hadoop-2.7.2/logs/yarn-raini-nodemanager-biyuzhe.out
raini@biyuzhe:~/hadoop$ jps
15625 NodeManager
14919 NameNode
15271 SecondaryNameNode
15073 DataNode
15937 Jps
15501 ResourceManager


(14)验证: yarn:http://localhost:8088/

    hadoop:  http://localhost:50070
        

Started: Sat Apr 23 14:04:17 CST 2016
Version: 2.7.2, rb165c4fe8a74265c792ce23f546c64604acf0e41
Compiled: 2016-01-26T00:08Z by jenkins from (detached from b165c4f)
Cluster ID: CID-b0ad8d51-6ea3-4bfc-a1d8-ee0cbc9a8ff6
Block Pool ID: BP-890697487-127.0.1.1-1461391390144


--------------------------------spark安装

(2)配置Spark环境变量

export SPARK_HOME=/home/raini/spark
export PATH=${SPARK_HOME}/bin:$PATH

(3)配置spark-env.sh
export JAVA_HOME=/home/raini/app/jdk
export SCALA_HOME=/home/raini/app/scala
export SPARK_WORKER_MEMORY=4g
export SPARK_MASTER=biyuzhe
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI=8099
export SPARK_WORKER_CORES=2
export HADOOP_CONF_DIR=/home/raini/hadoop/etc/hadoop

(4)cp slaves.template slaves

#localhost
biyuzhe


spark/sbin/start-all.sh 启动




- -----------------------mysql----Hive2.0.0安装

1)mysql安装

$sudo apt-get install mysql-server

登录mysql:$mysql -u root -p

建立数据库hive:mysql>create database hive;

                mysql>show databases;//查看创建;

这里一定要把hive数据库的字符集修改为latin1,而且一定要在hive初次启动的时候就修改字符集 (否则就等着删除操作的时候死掉吧)
                 mysql>alter database hive character set latin1;

创建hive用户,并授权:mysql>grant all on hive.* to hive@'%'  identified by 'hive';

(法二: DROP USER 'hive'@'%';

         mysql> create user 'hive'@'%' identified by 'hive';
       赋予权限  grant all privileges on *.* to 'hive'@'%' with grant option;

     )

更新:mysql>flush privileges;

 查询mysql的版本:mysql>select version();//这里是5.7.11-0ubuntu6

下载mysql的JDBC驱动包: http://dev.mysql.com/downloads/connector/j/

下载mysql-connector-java-5.1.38.tar.gz ,复制msyql的JDBC驱动包到Hive的lib目录下。

2)Hive安装

官网http://hive.apache.org/下载apache-hive-2.0.0-bin.tar.gz并解压在home/hp路径下。

环境配置

添加如下:
#Hive
export HIVE_HOME=/home/raini/app/hive-2.0.0
export PATH=$PATH:${HIVE_HOME}/bin
export CLASSPATH=$CLASSPATH.:{HIVE_HOME}/lib

配置hive-env.sh文件

复制hive-env.sh.template,修改hive-env.sh文件

指定HADOOP_HOME及HIVE_CONF_DIR的路径如下:

HADOOP_HOME=/home/。。/hadoop

export HIVE_CONF_DIR=/home/。。/hive/conf

# export HADOOP_HEAPSIZE=512

# 含有额外的图书馆为蜂巢编译/执行必需的文件夹
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/home/raini/app/hive-2.0.0/lib


4)配置hive-site.xml文件

      Hive uses Hadoop, so:
 
    you must have Hadoop in your path OR
    export HADOOP_HOME=
 
In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.warehouse.dir) and set them chmod g+w in HDFS before you can create a table in Hive.
 
Commands to perform this setup(需要给755权限):

raini@biyuzhe:~$ hadoop fs -mkdir -p  /user/hive/tmp
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/log
raini@biyuzhe:~$ hadoop fs -mkdir -p /user/hive/warehouse
raini@biyuzhe:~$ hadoop fs -chmod g+w   /user/hive/tmp

raini@biyuzhe:~$ hadoop fs -chmod g+w   /user/hive/log
raini@biyuzhe:~$ hadoop fs -chmod g+w   /user/hive/warehouse    /usr/hive/tmp

You may find it useful, though it's not necessary, to set HIVE_HOME:
 
  $ export HIVE_HOME=
    export HIVE_HOME=/home/raini/app/hive

$ sudo /etc/init.d/mysql status

hive/bin下要有->mysql-connector-java-5.1.38-bin.jar

5)hive配置-设置hive数据将元数据存储在MySQL中 , hive需要将元数据存储在RDBMS中,默认情况下,配置为Derby数据库






   hive.metastore.local
   true
  使用本机mysql服务器存储元数据。这种存储方式需要在本地运行一个mysql服务器



   javax.jdo.option.ConnectionURL
   jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
   biyuzhe使用的数据库?charcherEncoding=UTF-8



   javax.jdo.option.ConnectionDriverName
   com.mysql.jdbc.Driver
   使用的链接方式



   javax.jdo.option.ConnectionUserName
   hive
   mysql用户名



   javax.jdo.option.ConnectionPassword
   hive



  hive.metastore.warehouse.dir
  /user/hive/warehouse
  元数据存放的地放,需要在本地(不是hdfs中)新建这个目录



    hive.exec.scratdir
    /user/hive/tmp
    hive的数据临时文件目录,需要在本地新建这个目录HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.
 



    hive.querylog.location
    /user/hive/log
    这个是用于存放hive相关日志的目录



    hive.cli.print.current.db
    true



-------------------------------------------finish hive-site.xml

cp hive-log4j.properties.template  hive-log4j.proprties

vi hive-log4j.properties

hive.log.dir=

这个是当hive运行时,相应的日志文档存储到什么地方

(mine:hive.log.dir=/usr/hive/log/${user.name})

hive.log.file=hive.log

这个是hive日志文件的名字是什么

默认的就可以,只要您能认出是日志就好,

只有一个比较重要的需要修改一下,否则会报错。

log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

如果没有修改的话会出现:

WARNING: org.apache.hadoop.metrics.EventCounter is deprecated.

please use org.apache.hadoop.log.metrics.EventCounter  in all the  log4j.properties files.

(只要按照警告提示修改即可)。

-------------------------------------------------------finish all


hive metastore 服务端启动命令:
hive --service metastore -p

raini@biyuzhe:~/app/hive/tmp$ hive --service metastore > /tmp/hive_metastore.log 2>&1 &
[1] 26856


这里Hive中metastore(元数据存储)采用Local方式,非remote方式。

报错:
    Exception in thread "main" java.lang.RuntimeException: Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql)


第一次需执行初始化命令$raini@biyuzhe:~$ schematool -dbTypemysql –initSchema
raini@biyuzhe:~$ schematool -initSchema -dbType mysql -userName=hive -passWord=hive



查看初始化后信息$ schematool -dbType mysql –info

启动Hadoop服务:$sbin/start-dfs.sh和$sbin/start-yarn.sh


启动raini@biyuzhe:~/app$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/raini/app/hive2.0.0/lib/hive-jdbc-2.0.0-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/hive2.0.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/spark-1.6.1-bin-hadoop2.6/lib/spark-assembly-1.6.1-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/raini/app/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/home/raini/app/hive2.0.0/lib/hive-common-2.0.0.jar!/hive-log4j2.properties
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:41 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Sun Apr 24 11:25:43 CST 2016 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
hive (default)> show databases;
OK
default
Time taken: 1.017 seconds, Fetched: 1 row(s)
hive (default)>

hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';

报错FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:For direct MetaStore DB connections, we don't support retries at the client level.)


开启metastore

raini@biyuzhe:~/app$ hive --service metastore
Starting Hive Metastore Server


hive (default)> create table test(id int, name string) row format delimited FIELDS TERMINATED BY ',';
OK
Time taken: 1.613 seconds

可以看到mysql中的元数据信息:

raini@biyuzhe:~$ mysql -u hive -p

mysql> select* from TBLS
    -> ;
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
|     41 |  1461469991 |     1 |                0 | raini |         0 |    41 | test     | MANAGED_TABLE | NULL               | NULL               |
+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+
1 row in set (0.00 sec)


Hadoop中查看生成的文件:

raini@biyuzhe:~$ hdfs dfs -ls /user/hive/warehouse/
Found 1 items
drwxrwxrwx   - raini supergroup          0 2016-04-24 11:53 /user/hive/warehouse/test












你可能感兴趣的:(spark,hadoop,hive,mysql)