Hadoop/Spark平台搭建

add user

useradd ITS-Hadoop
passwd ITS-Hadoop

ssh 无密码访问

ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
chmod 700 ~/.ssh

切换到root

su
/root/script/scpfloder.sh /home/ITS-Hadoop/.ssh /home/ITS-Hadoop/
/root/script/runcommand.sh "chown -R ITS-Hadoop:ITS-Hadoop /home/ITS-Hadoop/.ssh"

配置主节点基础软件

hadoop

chown -R ITS-Hadoop:ITS-Hadoop /usr/local/hadoop-2.6.0

开始配置文件

cd /usr/local/hadoop-2.6.0/etc/hadoop/
ls

capacity-scheduler.xml      hadoop-policy.xml           mapred-env.cmd
configuration.xsl           hdfs-site.xml               mapred-env.sh
container-executor.cfg      httpfs-env.sh               mapred-queues.xml.template
core-site.xml               httpfs-log4j.properties     mapred-site.xml
.core-site.xml.swn          httpfs-signature.secret     mapred-site.xml.template
.core-site.xml.swo          httpfs-site.xml             slaves
.core-site.xml.swp          kms-acls.xml                ssl-client.xml.example
hadoop-env.cmd              kms-env.sh                  ssl-server.xml.example
hadoop-env.sh               kms-log4j.properties        yarn-env.cmd
hadoop-metrics2.properties  kms-site.xml                yarn-env.sh
hadoop-metrics.properties   log4j.properties            yarn-site.xml

需要配置的文件是

# core-site.xml hadoop-env.sh hdfs-site.xml yarn-env.sh yarn-site.xml slaves

zookeeper

chown -R ITS-Hadoop:ITS-Hadoop /usr/local/zookeeper-3.4.6

开始配置文件

cd /usr/local/zookeeper-3.4.6/
vi conf/zoo.cfg

    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial 
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between 
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just 
    # example sakes.
    dataDir=/usr/local/zookeeper-3.4.6/var/data
    dataLogDir=/usr/local/zookeeper-3.4.6/var/datalog
    # the port at which the clients will connect
    clientPort=2181
    server.1=hadoop5:2888:3888
    server.2=hadoop6:2888:3888
    server.3=hadoop7:2888:3888

vi var/data/myid

    3

hbase

chown -R ITS-Hadoop:ITS-Hadoop /usr/local/hbase-1.1.4

开始配置文件

cd /usr/local/hbase-1.1.4/conf/
ls

hadoop-metrics2-hbase.properties  hbase-env.sh      hbase-site.xml    regionservers
hbase-env.cmd                     hbase-policy.xml  log4j.properties

需要配置的文件是

# hbase-env.sh hbase-site.xml regionservers

spark

chown -R ITS-Hadoop:ITS-Hadoop /usr/local/scala-2.10.4
chown -R ITS-Hadoop:ITS-Hadoop /usr/local/spark-1.4.1-bin-hadoop2.6

开始配置文件

cd /usr/local/spark-1.4.1-bin-hadoop2.6/conf/
ls

derby.log          log4j.properties             slaves
docker.properties  metrics.properties           spark-defaults.conf
fairscheduler.xml  metrics.properties.template  spark-env.sh

需要配置的文件是

# spark-defaults.conf spark-env.sh slaves

修改环境变量配置文件

vi /etc/profile

export JAVA_HOME=/usr/local/java/jdk1.7
export JRE_HOME=/usr/local/java/jdk1.7/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

export HADOOP_HOME=/usr/local/hadoop-2.6.0
export HADOOP_DEV_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

export ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.6
export PATH=$ZOOKEEPER_HOME/bin:$PATH

export HBASE_HOME=/usr/local/hbase-1.1.4
export PATH=$HBASE_HOME/bin:$PATH

export SPARK_HOME=/usr/local/spark-1.4.1-bin-hadoop2.6
export PATH=$SPARK_HOME/bin:$PATH
export SPARK_EXAMPLES_JAR=$SPARK_HOME/lib/spark-examples-1.4.1-hadoop2.6.0.jar

export SCALA_HOME=/usr/local/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH

export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$SPARK_HOME/lib:$HIVE_HOME/lib:SHBASE_HOME/lib:$SCALA_HOME/lib

清理原始安装文件

略…..

拷贝文件(ITS-Hadoop 用户模式下)

~/script/scpfloder.sh 
~/script/scpfloder.sh /usr/local/hadoop-2.6.0 /usr/local
~/script/runcommand.sh "chown -R ITS-Hadoop:ITS-Hadoop /usr/local/hadoop-2.6.0"
~/script/scpfloder.sh /usr/local/zookeeper-3.4.6 /usr/local
~/script/runcommand.sh "chown -R ITS-Hadoop:ITS-Hadoop /usr/local/zookeeper-3.4.6"
~/script/scpfloder.sh /usr/local/hbase-1.1.4 /usr/local
~/script/runcommand.sh "chown -R ITS-Hadoop:ITS-Hadoop /usr/local/hbase-1.1.4"
~/script/scpfloder.sh /usr/local/scala-2.10.4 /usr/local
~/script/runcommand.sh "chown -R ITS-Hadoop:ITS-Hadoop /usr/local/scala-2.10.4"
~/script/scpfloder.sh /usr/local/spark-1.4.1-bin-hadoop2.6 /usr/local
~/script/runcommand.sh "chown -R ITS-Hadoop:ITS-Hadoop /usr/local/spark-1.4.1-bin-hadoop2.6"

修改相关节点的zookeeper中的my.id文件

略…..

创建所需文件夹

mkdir /home/ITS-Hadoop/hbase /home/ITS-Hadoop/hbase/logs /home/ITS-Hadoop/hbase/tmp
~/script/runcommand.sh "mkdir /home/ITS-Hadoop/hbase /home/ITS-Hadoop/hbase/logs /home/ITS-Hadoop/hbase/tmp"
mkdir /home/ITS-Hadoop/dfs /home/ITS-Hadoop/dfs/name /home/ITS-Hadoop/dfs/log /home/ITS-Hadoop/data /home/ITS-Hadoop/tmp
~/script/runcommand.sh "mkdir /home/ITS-Hadoop/dfs /home/ITS-Hadoop/dfs/name /home/ITS-Hadoop/dfs/log /home/ITS-Hadoop/dfs/data /home/ITS-Hadoop/dfs/tmp"

HDFS 格式化

/usr/local/hadoop-2.6.0/bin/hdfs namenode -format 

需要先杀死所有的jps进程(删除/tmp下的文件)

hadoop dfs -mkdir /hbase
hadoop dfs -mkdir /sparkLog

安装thrift

Thrift的编译器使用C++编写的,在安装编译器之前,首先应该保证操作系统基本环境支持C++的编译,安装相关依赖的软件包,如下所示

yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel

下载Thrift的软件包,并解压缩

wget http://archive.apache.org/dist/thrift/0.9.1/thrift-0.9.1.tar.gz
tar xf thrift-0.9.1.tar.gz
cd thrift-0.9.1
./configure
make
make install
thrift --help

生成hbase的thrift接口

thrift --gen py /usr/local/hbase-1.1.4-src/hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

这样在当前目录就生成了gen-py目录
Hbase.py 中定义了一些HbaseClient可以使用的方法
ttypes.py中定义了HbaseClient传输的数据类型
将生成的hbase目录copy到python的包下 (路径需要根据pythonpath而定)
有的是/usr/lib/python2.7/site-packages/ , 有的是 /usr/local/python27/lib/python2.7/site-packages

cp -r gen-py/hbase /usr/local/python27/lib/python2.7/site-packages  

启动hbase和thrift服务

./bin/start-hbase.sh  
./bin/hbase-daemon.sh start thrift  

安装redis

// yum install redis -y
wget http://download.redis.io/releases/redis-3.2.0.tar.gz
tar xf redis-3.2.0.tar.gz
cd redis-3.2.0
make

启动命令 redis-server 在/usr/local/bin目录下,如果没有需要把路径加入到 PATH 中

安装 redis 的 python 包

安装easy_install

( https://pypi.python.org/pypi/setuptools#downloads )
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py
python ez_setup.py --insecure
Installing easy_install script to /usr/local/python27/bin

建立软连接

ln -s /usr/local/python27/bin/easy_install /usr/local/bin/

安装 redis 包

easy_install redis

安装pip

安装ganglia

安装ganglia服务

# 在所有节点安装
yum install ganglia-gmond -y
# 在主节点安装:和gmond安装中介绍的相同,如果本地软件库不提供gmetad,那么需要安装EPEL。
yum install ganglia-gmetad -y  

安装gweb(主节点安装)

##首先安装Apache和PHP
yum install httpd php -y
vim /etc/php.d/json.ini

    extension=json.ini

安装gweb(主节点安装)

wget http://downloads.sourceforge.net/project/ganglia/ganglia-web/3.7.1/ganglia-web-3.7.1.tar.gz
tar -xf ganglia-web-3.7.1.tar.gz
cd ganglia-web-3.7.1

修改 Makefile

vim Makefile
    #修改默认配置:
    GDESTDIR = /var/www/html/ganglia2
    APACHE_USER = apache
    #注意:GDESTDIR 和 APACHE_USER 要与APACHE的配置文件(/etc/httpd/conf/httpd.conf)中的  DocumentRoot 、 apache保持一致

make install

Django

tar xf Django-1.6.11.tar
cd Django-1.6.11
python setup.py install

拷贝工程

scp -r project/gugou 10.2.15.107:~/project
scp -r project/BYSJ 10.2.15.107:~/project

启动集群

#start hadoop
/usr/local/hadoop-2.6.0/sbin/start-dfs.sh
/usr/local/hadoop-2.6.0/sbin/start-yarn.sh

# start hbase
## start zookeeper
/usr/local/zookeeper-3.4.6/bin/zkServer.sh start
ssh hadoop5 "/usr/local/zookeeper-3.4.6/bin/zkServer.sh start"
ssh hadoop6 "/usr/local/zookeeper-3.4.6/bin/zkServer.sh start"
## start hbase
/usr/local/hbase-1.1.4/bin/start-hbase.sh
## 若regionservers未启动, 则执行start regionservers
~/script/runcommand.sh "source /etc/profile;/usr/local/hbase-1.1.4/bin/hbase-daemon.sh start regionserver"
## start thrift
/usr/local/hbase-1.1.4/bin/hbase-daemon.sh start thrift

#start spark
/usr/local/spark-1.4.1-bin-hadoop2.6/sbin/start-all.sh

启动服务

# 需要启动所有ganglia
service gmond start
# 在主节点执行
service gmetad start
# /usr/bin/nc localhost 8651 &

service mysqld start & 

redis-server &

安装python的thrift 包

easy_install thrift

安装 MySQLdb module

wget --no-check-certificate https://sourceforge.net/projects/mysql-python/files/mysql-python/1.2.3/MySQL-python-1.2.3.tar.gz/download
tar xf MySQL-python-1.2.3.tar.gz
cd MySQL-python-1.2.3
python setup.py install

安装 requests module

easy_install requests

安装 happybase module

easy_install happybase

安装JPype

wget --no-check-certificate https://pypi.python.org/packages/3c/94/b620c0e0143c864141ea572a7ad831d8233d84d5702cef692bc039f1c9c1/JPype1-0.6.1.tar.gz
tar xf JPype1-0.6.1.tar.gz 
cd JPype1-0.6.1
python setup.py install
# 
pip install JayDeBeApi

安装 rrdtool

yum install cairo-devel libxml2-devel pango-devel pango libpng-devel freetype freetype-devel libart_lgpl-devel
wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.3.1.tar.gz
tar xf rrdtool-1.3.1.tar.gz
cd rrdtool-1.3.1
./configure --prefix=/usr/local/rrdtool && make && make install
ln -s /usr/local/rrdtool/bin/* /usr/bin/

安装 python-rrdtool

wget --no-check-certificate https://pypi.python.org/packages/99/af/bf46df3104d78591f942278467a1016d056a887c808ed1127207a4e1ebaf/python-rrdtool-1.4.7.tar.gz
tar xf python-rrdtool-1.4.7.tar.gz
cd python-rrdtool-1.4.7
python setup.py install

数据库导入数据

略…

开启服务

cd /home/ITS-Hadoop/project/BYSJ/wrapper
python ETLWrapperServer.py &

service iptables stop
python manage.py runserver 0.0.0.0:9999


#stop spark
    /usr/local/spark-1.4.1-bin-hadoop2.6/sbin/stop-all.sh

# stop hbase
    ## stop thrift
    /usr/local/hbase-1.1.4/bin/hbase-daemon.sh stop thrift
    ## stop regionservers
    ~/script/runcommand.sh "source /etc/profile;/usr/local/hbase-1.1.4/bin/hbase-daemon.sh stop regionserver"
    ## stop hbase
    /usr/local/hbase-1.1.4/bin/stop-hbase.sh
    ## stop zookeeper
    /usr/local/zookeeper-3.4.6/bin/zkServer.sh stop
    ssh slave2 "/usr/local/zookeeper-3.4.6/bin/zkServer.sh stop"
    ssh slave3 "/usr/local/zookeeper-3.4.6/bin/zkServer.sh stop"

#stop hadoop
    /usr/local/hadoop-2.6.0/sbin/stop-dfs.sh
    /usr/local/hadoop-2.6.0/sbin/stop-yarn.sh

问题

RECEIVED SIGNAL 15: SIGTERM

2016-05-13 22:16:56,908 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM
2016-05-13 22:16:56,912 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop7/127.0.0.1
************************************************************/
2016-05-16 09:46:47,963 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop7/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 2.6.0

发现 “host = hadoop7/127.0.0.1” 这里应该是局域网地址,而不是127.0.0.1,是/etc/hosts文件的问题
将 ::1 对应的主机名更换一下,不叫hadoop7

python 升级

python 升级后 yum还要用2.6版本,需要修改一下
python 升级后 python环境变量有点问题,有一些module路径需要修改:增加软链接

regionserver 节点开启后又停止

可能是节点的时间不一致,使用时间同步即可
vi /etc/crontab

0-59/10 * * * * /usr/sbin/ntpdate ITS-Hadoop10

/home/ITS-Hadoop/script/./scpfile.sh /etc/crontab /etc

关闭hadoop任务

/usr/local/hadoop-2.6.0/bin/yarn application -kill application_1464678840184_0026

ganglia 权限问题

There was an error collecting ganglia data (127.0.0.1:8652): fsockopen error: Permission denied
参考: http://www.songyawei.cn/content/2064
setenforce 0

ganglia 无法获取到硬件信息

原因:新安装的gmetad大小写不敏感,所以生成的RRD文件所在文件夹是小写的,而我们获取时使用的是大写的主机名,所以找不到,详情请看如下配置文件片段

vi /etc/ganglia/gmetad.conf

141 # In earlier versions of gmetad, hostnames were handled in a case
142 # sensitive manner
143 # If your hostname directories have been renamed to lower case,
144 # set this option to 0 to disable backward compatibility.
145 # From version 3.2, backwards compatibility will be disabled by default.
146 # default: 1   (for gmetad < 3.2)
147 # default: 0   (for gmetad >= 3.2)
148 case_sensitive_hostnames 1

Zlib 下载安装

官网: http://www.zlib.net/
http://prdownloads.sourceforge.net/libpng/zlib-1.2.8.tar.gz?download

wget http://prdownloads.sourceforge.net/libpng/zlib-1.2.8.tar.gz?download
tar -xvzf zlib-1.2.8.tar.gz
cd zlib-1.2.8
./configure
make
sudo make install

你可能感兴趣的:(Spark,设置)