Hadoop+Kylin集群部署过程和问题

一.Hadoop安装初始化工作

版本:

hadoop:2.7.1

hbase:1.3.1

hive:2.3.0  

kylin:2.0.0  

jdk:1.8.0

1.新建用户和组
> groupadd hadoop
> useradd -d /home/hadoop -g hadoop -m -s /bin/bash hadoop
> passwd hadoop

2.为该用户设置ssh免密登陆
> rm ~/.ssh/*
> cd ~/.ssh
> ssh-keygen -t rsa -P  ""
> cat id_rsa.pub >> authorized_keys
> chmod  644  authorized_keys
> chmod  700  ~/.ssh/
> ssh localhost
#本机免密完成 下面是从本机访问其他机器
scp ~/.ssh/id_rsa.pub hadoop @hadoop - 03 :~/
#在hadoop- 03 机器上执行
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys

 

3.安装jdk,配置环境变量
> vi /etc/profile
#set java environment
JAVA_HOME=/opt/jdk1. 8 .0_144
JRE_HOME=/opt/jdk1. 8 .0_144/jre
CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_HOME JRE_HOME CLASS_PATH PATH
> source /etc/profile

 

二.Hadoop安装

> su hadoop
1.解压hadoop
> tar zxvf hadoop- 2.7 . 1 .tar.gz -C /opt/hadoop
2.设置环境变量
> vi ~/.bash_profile
# set hadoop path
export HADOOP_HOME=/opt/hadoop/hadoop- 2.7 . 1
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/ native
export HADOOP_OPTS= "-Djava.library.path=$HADOOP_HOME/lib"
> source ~/.bash_profile
3.创建目录
> mkdir /opt/hadoop/hadoop- 2.7 . 1 /tmp
> mkdir /opt/hadoop/hadoop- 2.7 . 1 /hdfs
> mkdir /opt/hadoop/hadoop- 2.7 . 1 /hdfs/data
> mkdir /opt/hadoop/hadoop- 2.7 . 1 /hdfs/name

4.修改配置文件

a.core-site.xml

     fs. default .name
     hdfs: //hadoop-02:9000
     HDFS的URI,文件系统: //namenode标识:端口号
     hadoop.tmp.dir
     /opt/hadoop/hadoop- 2.7 . 1 /tmp
     namenode上本地的hadoop临时文件夹
     hadoop.proxyuser.hadoop.groups
     hadoop
     该处配置为提供jdbc远程连接hive使用,其中属性中的hadoop和值的hadoop都是有权限操作hadoop的linux帐号名
     hadoop.proxyuser.hadoop.hosts
     *
     控制允许访问hadoop的域名


b.hadoop-env.sh

export JAVA_HOME=/opt/jdk1. 8 .0_144
export HADOOP_PID_DIR=/opt/hadoop/pid
export HADOOP_MAPRED_PID_DIR=/opt/hadoop/pid
注释下面这段代码
# Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
# for  f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar;  do
#   if  "$HADOOP_CLASSPATH"  ]; then
#    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
#   else
#    export HADOOP_CLASSPATH=$f
#  fi
#done


c.hdfs-site.xml

     dfs.name.dir
     /opt/hadoop/hadoop- 2.7 . 1 /hdfs/name
     namenode上存储hdfs名字空间元数据
     dfs.data.dir
     /opt/hadoop/hadoop- 2.7 . 1 /hdfs/data
     datanode上数据块的物理存储位置
     dfs.replication
     1
     副本个数,配置默认是 3 ,应小于datanode机器数量

d.mapred-env.sh

export JAVA_HOME=/opt/jdk1. 8 .0_144
export HADOOP_MAPRED_PID_DIR=/opt/hadoop/pid


e.mapred-site.xml

     mapreduce.framework.name
     yarn
     mapreduce.jobhistory.address
     hadoop- 02 : 10020
     mapreduce.jobhistory.webapp.address
     hadoop- 02 : 19888
     mapreduce.jobhistory.joblist.cache.size
     15000


f.yarn-env.sh

export JAVA_HOME=/opt/jdk1. 8 .0_144
export YARN_PID_DIR=/opt/hadoop/pid


g.yarn-site.xml

     yarn.nodemanager.aux-services
     mapreduce_shuffle
     yarn.resourcemanager.webapp.address
     192.168 . 2.242 : 8099
 
     yarn.nodemanager.resource.memory-mb 
     8192  
 
 
    yarn.scheduler.minimum-allocation-mb 
    2048  
 
 
     yarn.nodemanager.vmem-pmem-ratio 
     2.1  

5.格式化namenode
> hdfs namenode -format

6.启动
> start-dfs.sh
> start-yarn.sh
# kylin需要连接jobhistory
> mr-jobhistory-daemon.sh start historyserver

7.jps检查应该有以下进程


21716 NodeManager
21270 DataNode
21144 NameNode
21437 SecondaryNameNode
21615 ResourceManager

三.hive 安装


1.解压hive
> tar zxvf apache-hive- 2.3 . 0 -bin.tar.gz -C /opt/hadoop

2.设置环境变量
> vi ~/.bash_profile
export HIVE_HOME=/opt/hadoop/apache-hive- 2.3 . 0 -bin
export PATH=$PATH:$HIVE_HOME/bin
> source ~/.bash_profile

3.安装mysql(过程省略)
4.拷贝jdbc驱动器到hive目录
5.修改hive配置文件
> cp hive-env.sh.template hive-env.sh
> cp hive- default .xml.template hive-site.xml
> cp hive-log4j2.properties.template hive-log4j2.properties
> cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties

6.修改hive-env.sh
## Java路径
export JAVA_HOME=/opt/jdk1. 8 .0_144
## Hadoop安装路径
export HADOOP_HOME=/opt/hadoop/hadoop- 2.7 . 1
## Hive安装路径
export HIVE_HOME=/opt/hadoop/apache-hive- 2.3 . 0 -bin
## hive配置文件路径
export HIVE_CONF_DIR=/opt/hadoop/apache-hive- 2.3 . 0 -bin/conf

7.在hdfs 中创建下面的目录,并且授权
> hdfs dfs -mkdir -p /user/hive/warehouse
> hdfs dfs -mkdir -p /user/hive/tmp
> hdfs dfs -mkdir -p /user/hive/log
> hdfs dfs -chmod -R  777  /user/hive/warehouse
> hdfs dfs -chmod -R  777  /user/hive/tmp
> hdfs dfs -chmod -R  777  /user/hive/log

8.修改hive-site.xml
     hive.exec.scratchdir
     /user/hive/tmp
     hive.metastore.warehouse.dir
     /user/hive/warehouse
     hive.querylog.location
     /user/hive/log
     javax.jdo.option.ConnectionURL
     jdbc:mysql: //localhost:3306/metastore?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
     javax.jdo.option.ConnectionDriverName
     com.mysql.jdbc.Driver
     javax.jdo.option.ConnectionUserName
     hive
     javax.jdo.option.ConnectionPassword
     hive

9.创建temp文件
> mkdir /opt/hadoop/apache-hive- 2.3 . 0 -bin/tmp


> mkdir /opt/hadoop/apache-hive-2.3.0-bin/tmp
修改hive-site.xml
把{system:java.io.tmpdir} 改成 /opt/hadoop/apache-hive-2.3.0-bin/temp
把 {system:user.name} 改成 {user.name}
10.初始化hive

> schematool -dbType mysql -initSchema


该步骤可能出错:Duplicate key name 'PCS_STATS_IDX'
去mysql删除metastore数据库重新执行初始化即可

四.hbase 安装

1.解压hbase
> tar zxvf hbase- 1.3 . 1 -bin.tar.gz -C /opt/hadoop

2.修改环境变量
> vi ~/.bash_profile
export HBASE_HOME=/opt/hadoop/hbase- 1.3 . 1
export PATH=$HBASE_HOME/bin:$PATH
> source ~/.bash_profile

3.修改hbase-env.sh
export JAVA_HOME=/opt/jdk1. 8 .0_144/
export HBASE_MANAGES_ZK= false
export HBASE_PID_DIR=/opt/hadoop/pid

4.修改hbase-site.xml
     hbase.rootdir
     hdfs: //hadoop-02:9000/hbase
           hbase.zookeeper.property.clientPort
           2181
           hbase.zookeeper.quorum
           zookeeper- 01
           dfs.replication
           1
     hbase.cluster.distributed
     true

5.启动
> start-hbase.sh

6.jps检查


16498 DataNode
16950 NodeManager
17286 JobHistoryServer
17882 HQuorumPeer
17947 HMaster
18075 HRegionServer
16845 ResourceManager
16670 SecondaryNameNode
16367 NameNode

五.kylin 安装


1.解压kylin
> tar zxvf apache-kylin- 2.0 . 0 -bin-hbase1x.tar.gz -C /opt/


2.配置环境变量
export KYLIN_HOME=/opt/apache-kylin- 2.0 . 0 -bin
export CLASSPATH=$CLASSPATH:$KYLIN_HOME/lib
export PATH=$KYLIN_HOME/bin:$PATH
3.生成keystore文件
cd /opt/apache-kylin- 2.0 . 0 -bin/tomcat/conf
keytool -genkey -alias tomcat -keyalg RSA -keystore .keystore -dname  "CN=192.168.2.242, OU=192.168.2.242, O=192.168.2.242, L=ZH, ST=CN"  -keypass changeit -storepass changeit

4.启动kylin
> kylin.sh start
注意:如果安装的kylin是2.1以上版本 启动的时候会出现 找不到joda的方法错误,这时候需要修改
 find-hive-dependency.sh
-hive_lib=`find -L ${hive_lib_dir} -name  '*.jar'  ! -name  '*calcite*'  -printf  '%p:'  | sed  's/:$//' `
+hive_lib=`find -L ${hive_lib_dir} -name  '*.jar'  ! -name  '*calcite*'  ! -name  '*jackson-datatype-joda*'  -printf  '%p:'  | sed  's/:$//' `

4.访问http://hostname:7070/kylin 初始帐号密码:ADMIN/KYLIN

hadoop: 集群启动顺序
> start-dfs.sh
> start-yarn.sh
> mr-jobhistory-daemon.sh start historyserver
> start-hbase.sh
> kylin.sh start
> /usr/bin/nohup hive --service metastore & #端口  9083
> /usr/bin/nohup hive --service hiveserver2 & #端口  10000  10002

hadoop:集群关闭顺序
> kylin.sh stop
> stop-hbase.sh
> mr-jobhistory-daemon.sh stop historyserver
> stop-yarn.sh
> stop-dfs.sh

你可能感兴趣的:(java)