本文适用于大数据架构基础环境测试,至于大数据理论请自行百度学习了解,本文为环境实操,配置内容用于测通环境,实际生产应用需要添加更详细更具体的配置项,建议小伙伴们动手操作下
1,机器与系统准备
centos7系统 最小化安装 设置好网络和防火墙 网络需要能访问外网
三台机器,一台master,两台slave,设置主机名为master,slave01,slave02
添加hosts
cat /etc/hosts
192.168.1.10 master
192.168.1.11 slave01
192.168.1.12 slave02
下面关闭防火墙
setenforce 0
systemctl stop firewalld
systemctl disable firewalld
sed -i 's/enforcing/disabled/g' /etc/selinux/config
设置yum源
yum install wget -y
cd /etc/yum.repos.d/
wget http://mirrors.aliyun.com/repo/Centos-7.repo
wget http://mirrors.aliyun.com/repo/epel-7.repo
yum -y install epel-release
yum install net-tools -y
yum install tree -y
配置三台机器免密登陆
打开ssh的rsa认证
vi /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
然后重启sshd
systemctl restart sshd
创建用户hadoop
groupadd hadoop
useradd -m -g hadoop hadoop
echo "hadoop" |passwd --stdin hadoop 或者直接passwd hadoop 输入密码hadoop
切换到普通用户
su hadoop
cd /home/hadoop/
ssh-keygen -t rsa #为你生成rsa密钥,可以直接一路回车,执行默认操作
生成密钥后,会出现
.ssh
├── id_rsa
└── id_rsa.pub #公钥 服务端需要里边内容验证连接着身份
cd .ssh/
touch authorized_keys
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
chmod 700 id_rsa*
复制slave01,slave02的id_rsa.pub公钥添加到master的authorized_keys
将有三个机器的公钥authorized_keys文件复制到slave机器上
scp authorized_keys hadoop@slave01:/home/hadoop/.ssh/
scp authorized_keys hadoop@slave02:/home/hadoop/.ssh/
然后都重启sshd systemctl restart sshd
之后就可以免密访问了
上面的基本环境三台机器协同配置好,务必保证准确
##############################################,
2,下面安装JDK和hadoop
本次用了最新版本的jdk-8u151-linux-x64.tar.gz(官网下载)
hadoop用户下操作
cd /usr/
mkdir java
cd java/
tar zxf jdk-8u151-linux-x64.tar.gz
下载hadoop
cd /home/hadoop/
mkdir bigdata
cd bigdata/
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
tar -zxf hadoop-2.7.5.tar.gz
mv hadoop-2.7.5 hadoop
设置用户环境变量
vi /home/hadoop/.bashrc
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_USER_NAME=hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
source /home/hadoop/.bashrc #加载设置的变量
下面修改Hadoop的配置文件
创建数据目录
##配置jdk和环境变量、创建目录也都要在slaves上操作,等hadoop修改完配置,用复制的方法拷贝到slaves机器上。
cd /home/hadoop/bigdata/
mkdir -p data/hadoop/tmp
mkdir -p data/hadoop/hdfs/datanode
mkdir -p data/hadoop/hdfs/namenode
vi /home/hadoop/bigdata/hadoop/etc/hadoop/core-site.xml
fs.defaultFS hdfs://master:9000/ hadoop.tmp.dir /home/hadoop/bigdata/data/hadoop/tmp
core-site.xml文件添加了每个节点的临时文件目录tmp,最好自己先行创建
vi /home/hadoop/bigdata/hadoop/etc/hadoop/hdfs-site.xml
dfs.namenode.secondary.http-address master:9001 dfs.datanode.data.dir file:/home/hadoop/bigdata/data/hadoop/hdfs/datanode dfs.datanode.name.dir file:/home/hadoop/bigdata/data/hadoop/hdfs/namenode dfs.replication 3
hdfs-site.xml文件添加了节点数据目录datanode和namenode,最好提前创建
vi /home/hadoop/bigdata/hadoop/etc/hadoop/mapred-site.xml #如果没有这个文件就用模板复制一个
mapreduce.framework.name yarn
vi /home/hadoop/bigdata/hadoop/etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname master yarn.nodemanager.aux-services mapreduce_shuffle
vi /home/hadoop/bigdata/hadoop/etc/hadoop/slaves
slave01 slave02
以上配置仅为基本参数,关于资源分配和性能参数可以根据业务相应的加配置!
在master上配置完成后,把hosts .bashrc /home/hadoop/bigdata/hadoop data 复制到slave01 slave02 的对应目录
scp /etc/hosts hadoop@slave01:/etc/hosts
scp /etc/hosts hadoop@slave02:/etc/hosts
#scp -r /usr/java/jdk1.8.0_151 hadoop@slave01:/usr/java/
#scp -r /usr/java/jdk1.8.0_151 hadoop@slave02:/usr/java/
scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/
scp /home/hadoop/.bashrc hadoop@slave02:/home/hadoop/
scp -r /home/hadoop/bigdata/hadoop hadoop@slave01:/home/hadoop/bigdata/
scp -r /home/hadoop/bigdata/hadoop hadoop@slave02:/home/hadoop/bigdata/
最后在slave机器上执行下 source /home/hadoop/.bashrc #加载设置的变量
启动hadoop集群在master上执行
cd /home/hadoop/bigdata/hadoop/sbin
sh start-all.sh
[hadoop@master sbin]$ jps 2713 ResourceManager 2362 NameNode 5053 Jps 2558 SecondaryNameNode [hadoop@slave01 sbin]$ jps 2769 NodeManager 3897 Jps 2565 DataNode
到此hadoop的集群启动成功。
##########################################
hive部署
wget https://mirrors.aliyun.com/apache/hive/stable/apache-hive-1.2.2-bin.tar.gz
cd /home/hadoop/bigdata/
tar zxf apache-hive-1.2.2-bin.tar.gz
mv apache-hive-1.2.2 hive
修改配置
cd /home/hadoop/bigdata/hive/conf
cp hive-default.xml.template hive-site.xml
cp hive-env.sh.template hive-env.sh
cp hive-log4j.properties.template hive-log4j.properties
vi hive-env.sh
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HIVE_CONF_DIR=/home/hadoop/bigdata/hive/conf
vi hive-log4j.properties
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/home/hadoop/bigdata/hive/log
hive.log.file=hive.log
vi hive-site.xml
hive.metastore.warehouse.dir hdfs://master:9000/user/hive/warehouse hive.exec.scratchdir hdfs://master:9000/user/hive/scratchdir hive.exec.local.scratchdir /home/hadoop/bigdata/hive/tmp Local scratch space for Hive jobs hive.downloaded.resources.dir /home/hadoop/bigdata/hive/tmp Temporary local directory for added resources in the remote file system. hive.server2.logging.operation.log.location /home/hadoop/bigdata/hive/tmp Top level directory where operation logs are stored if logging functionality is enabled hive.querylog.location /home/hadoop/bigdata/hive/logs Location of Hive run time structured log file javax.jdo.option.ConnectionURL jdbc:mysql://master:3306/hivemeta?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName hive Username to use against metastore database javax.jdo.option.ConnectionPassword 123456 password to use against metastore database hive.metastore.uris thrift://master:9083 Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.
hive配置中是经常出错的可根据启动使用中的错误提示修改。
hadoop fs -mkdir -p /user/hive/scratchdir
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /user/hive/scratchdir
hadoop fs -chmod g+w /user/hive/warehouse
启动metastore 和hiveserver2 服务
nohup hive --service metastore&
nohup hive --service hiveserver2&
[hadoop@master bin]$ hive Logging initialized using configuration in file:/home/hadoop/bigdata/hive/conf/hive-log4j.properties hive> show databases; OK default fucktime Time taken: 1.14 seconds, Fetched: 2 row(s) hive>
##########################################
3,spark部署 zookeeper部署 hbase部署
先在master上操作 修改好后scp到slave上,
首先下载软件,用阿里云镜像下载
cd /home/hadoop/bigdata/
wget https://mirrors.aliyun.com/apache/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
wget https://mirrors.aliyun.com/apache/zookeeper/stable/zookeeper-3.4.10.tar.gz
wget https://mirrors.aliyun.com/apache/hbase/stable/hbase-1.2.6-bin.tar.gz
wget https://www.scala-lang.org/files/archive/scala-2.10.4.tgz
解压到bigdata目录
cd /home/hadoop/bigdata/
tar zxf spark-2.2.1-bin-hadoop2.7.tgz
tar zxf scala-2.10.4.tgz
tar zxf zookeeper-3.4.10.tar.gz
tar zxf hbase-1.2.6-bin.tar.gz
mv spark-2.2.1-bin-hadoop2.7 spark
mv scala-2.10.4 scala
mv zookeeper-3.4.10 zk
mv hbase-1.2.6 hbase
在用hadoop的.bashrc中添加对应系统的环境变量
export HIVE_HOME=/home/hadoop/bigdata/hive
export PATH=$PATH:$HIVE_HOME/bin
export SCALA_HOME=/home/hadoop/bigdata/scala
export PATH=$PATH:$SCALA_HOME/bin
export SPARK_HOME=/home/hadoop/bigdata/spark
export PATH=$PATH:$SPARK_HOME/bin
export ZK_HOME=/home/hadoop/bigdata/zk
export PATH=$PATH:$ZK_HOME/bin
export HBASE_HOME=/home/hadoop/bigdata/hbase
export PATH=$PATH:$HBASE_HOME/bin
source /home/hadoop/.bashrc
复制到slave机器上
scp /home/hadoop/.bashrc hadoop@slave01:/home/hadoop/
scp /home/hadoop/.bashrc hadoop@slave02:/home/hadoop/
source /home/hadoop/.bashrc
******************************************************
修改spark配置
cd /home/hadoop/bigdata/spark
cp spark-env.sh.template spark-env.sh
vi spark-env.sh
export SCALA_HOME=/home/hadoop/bigdata/scala
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_CONF_DIR=/home/hadoop/bigdata/hadoop/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/home/hadoop/bigdata/spark
SPARK_DRIVER_MEMORY=512M
cp slaves.template slaves
vi slaves
slave01
slave02
将spark拷贝到slave机器
cd /home/hadoop/bigdata/
scp -r spark hadoop@slave01:/home/hadoop/bigdata/
scp -r spark hadoop@slave02:/home/hadoop/bigdata/
cd /home/hadoop/bigdata/spark/sbin
sh start-all.sh
[hadoop@master sbin]$ jps 2713 ResourceManager 2362 NameNode 1268 Master 5053 Jps 2558 SecondaryNameNode [hadoop@slave01 sbin]$ jps 2769 NodeManager 3897 Jps 25623 Worker 2565 DataNode
*********************************************
修改zookeeper配置
cd /home/hadoop/bigdata/zk/conf/
cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/home/hadoop/bigdata/zk/zkdata
dataLogDir=/home/hadoop/bigdata/zk/zkdatalog
server.1=master:2888:3888
server.2=slave01:2888:3888
server.3=slave02:2888:3888
echo "1" > /home/hadoop/bigdata/zkdata/myid
复制zk到slave机器上
cd /home/hadoop/bigdata/
scp -r zk hadoop@slave01:/home/hadoop/bigdata/
scp -r zk hadoop@slave02:/home/hadoop/bigdata/
在slave机器上分别修改myid
echo "2" > /home/hadoop/bigdata/zkdata/myid
echo "3" > /home/hadoop/bigdata/zkdata/myid
在各节点启动zkServer
cd /home/hadoop/bigdata/zk/bin/
./zkServer.sh start
查看状态
sh zkServer.sh status
******************************************************
修改hbase配置
cd /home/hadoop/bigdata/hbase/conf
vi habse-site.xml
hbase.rootdir hdfs://master:9000/hbase hbase.zookeeper.quorum master,slave01,slave02 hbase.cluster.distributed true hbase.zookeeper.property.dataDir /home/hadoop/bigdata/zk/zkdata
vi regionservers
slave01
slave02
将habse复制到slave机器上
cd /home/hadoop/bigdata/
scp -r hbase hadoop@slave01:/home/hadoop/bigdata/
scp -r hbase hadoop@slave02:/home/hadoop/bigdata/
在master上启动hbase
cd /home/hadoop/bigdata/hbase/bin
sh start-hbase.sh
查看状态
hbase shell
status
总结:本实验操作过程中遇到的问题很少,如果出现问题,基本都是xml配置修改的问题,一般调整好配置就能测通了,另外此文可结合https://blog.51cto.com/superleedo/1894519 一同用于搭建环境。
祝福小伙伴们实测顺利