1 192.168.101.120 cup-slave-4 192.168.101.150 user1/hadoop123
2 192.168.101.121 cup-slave-1 192.168.101.151 user1/hadoop123
3 192.168.101.122 cup-master-1 192.168.101.152 user1/hadoop123
4 192.168.101.123 cup-master-2 192.168.101.153 user1/hadoop123
5 192.168.101.124 cup-slave-3 192.168.101.154 user1/hadoop123
6 192.168.101.125 cup-slave-2 192.168.101.155 user1/hadoop123
临时文件目录:
C:\ProgramFilesDev\CDH4\on cup-master-1\
C:\ProgramFilesDev\CDH4\install files\
注意: 配置文件的编辑最好使用UltraEdit等工具编辑,不要使用写字板等工具,否则在linux下有可能会导致错误!!!!!!!!!
/etc/sysconfig/network: (永久修改主机名)
NETWORKING=yes
HOSTNAME=cup-master-1
GATEWAY=192.168.101.1
依次执行,GATEWAY一定要准确,可以执行$ifconfig查看Bcast属性
$source /etc/sysconfig/network
依次执行
修改hostname: ##这个步骤一定要执行,否则NN格式化的时候有可能会报UnknownHostEception:cup-master-1的错误
$hostname cup-master-1
$hostname cup-master-2
$hostname cup-slave-1
$hostname cup-slave-2
$hostname cup-slave-3
$hostname cup-slave-4
/etc/hosts中已经配置了的主机:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.122 cup-master-1
192.168.101.123 cup-master-2
192.168.101.121 cup-slave-1
192.168.101.125 cup-slave-2
192.168.101.124 cup-slave-3
192.168.101.120 cup-slave-4
$source /etc/hosts
依次执行
DNS:
/etc/resolv.conf 增加
search localdomain
nameserver 192.168.101.110 ##dns ip
nameserver 8.8.8.8
依次执行
语言配置:
/etc/sysconfig/i18n
LANG=en_US
$source /etc/sysconfig/i18n
依次执行
$echo $LANG
进行查看
关闭防火墙 $sudo service iptables stop
查看防火墙 $sudo service iptables status
依次执行
永久关闭: $chkconfig iptables off
$iptables -F
$service iptables save
卸载openjdk:
1. rpm -qa|grep jdk
java-1.6.0-openjdk-1.6.0.0-1.41.1.10.4.el6.x86_64
2. rpm -e java-1.6.0-openjdk-1.6.0.0-1.41.1.10.4.el6.x86_64
安装jdk
1. JAVA SE 1.6以上,下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
下载jdk-6u32-linux-x64.bin
2. cd /usr/jdk6
3. chmod 755 *.bin
4. ./jdk-6u32-linux-x64.bin
5. 配置环境变量
/etc/profile 文件末尾处添加:
/etc/profile:
#set java environment
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
JAVA_OPTS="$JAVA_OPTS -server"
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME JAVA_OPTS CLASSPATH PATH
#JAVA_OPTS="$JAVA_OPTS -server -Xms2g -Xmx12g -XX:NewSize=128m -XX:MaxNewSize=128m"
$source /etc/profile 使环境变量生效
ulimit 打开文件最大数限制设置--打开文件句柄最大数限制设置
ulimit -u
1. /etc/security/limits.conf
* soft nofile 655350
* hard nofile 655350
2. /etc/security/limits.d/90-nproc.conf
* soft nproc 10240
* hard nproc 60240
6. hadoop用户配置
/etc/sudoers 中root ALL=(ALL) ALL 下面添加
root ALL=(ALL) ALL
hadoop ALL=(ALL) ALL
$groupadd hadoop
$useradd hadoop –g hadoop
$passwd hadoop
7. root用户登录 cup-master-1 关闭防火墙 $service iptables stop 依次执行各节点
8. root-> /etc/ssh/sshd_config
#UseLogin no修改为
UseLogin yes
重启ssh: $service sshd restart
否则会报-bash: ulimit: open files: cannot modify limit: Operation not permitted
8. cup-master-1 --> 到其他节点的SSH无密码登陆配置:
hadoop用户登录 cup-master-1
$mkdir .ssh ------主节点不用建
$ssh-keygen –t rsa –f ~/.ssh/id_rsa –P ''
在cup-master-2、cup-slave-1、cup-slave-2、cup-slave-3、cup-slave-4节点新建.ssh目录:$mkdir .ssh
$scp .ssh/id_rsa.pub hadoop@cup-slave-1:/home/hadoop/.ssh/ 依次执行各节点
$scp .ssh/id_rsa.pub hadoop@cup-slave-2:/home/hadoop/sshcm1/
$scp .ssh/id_rsa.pub hadoop@cup-slave-3:/home/hadoop/sshcm1/
$scp .ssh/id_rsa.pub hadoop@cup-slave-4:/home/hadoop/sshcm1/
$scp .ssh/id_rsa.pub hadoop@cup-master-2:/home/hadoop/sshcm1/
hadoop用户登录 cup-master-1 配置本机
$cd ~/.ssh
$chmod 700 ~/.ssh
$cat id_rsa.pub >> authorized_keys
$chmod 600 .ssh/authorized_keys
hadoop用户登录 cup-slave-1 配置其他机器
$mkdir .ssh
$chmod 700 .ssh
$cd .ssh
$cat sshcm1/id_rsa.pub >> ~/.ssh/authorized_keys
$chmod 600 ~/.ssh/authorized_keys
其他节点依次用hadoop用户登录执行
hadoop用户登录 cup-master-1 测试无密码SSH登录: $ssh hadoop@cup-master-2 或者$ssh cup-master-2 其他节点依次执行
注意:
第一次连接的时候会有询问语句打出来,输入yes即可,,,
然后再~/.ssh/目录下回生成known_hosts文件,,,,,,
如果以后出现什么ssh无密码登陆的问题,可以删除该文件,重新做rsa数字签名,再重新做远程ssh登陆操作即可。
known_hosts文件:
cup-slave-1,192.168.98.225 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr5bf6Fe2TRprWmB+RK1ZeriV+wwlwsIKLv9Y1sneLoXgPqIA9RBi9RodiWogImu5J8Ht4KZ2UyXIb/w2/NQeZKYJExpGlpXGSdKfDjDe+8wzXi01FPhkwzClhjstGNHaPwZVnDKtGERX4PE985xq9wOuyGl1AFAhYz8neCTpKqRGA+/cquulTTdwQ8mLsWumZHKNcgkGtGU6MvqbVt4mDNwEJmUizeThp/h03bCoSlg2YG9Zqf/W71WA9ZqCPB2nWBRn9buhHOvNaUTn6/6dQna8Quzg8DC9WGYgecLNUIt6LMSnQUgsONl2AiNbVN+W7DHA4BkuCIafXj7g5Hj8ow==
cup-slave-2,192.168.98.227 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr5bf6Fe2TRprWmB+RK1ZeriV+wwlwsIKLv9Y1sneLoXgPqIA9RBi9RodiWogImu5J8Ht4KZ2UyXIb/w2/NQeZKYJExpGlpXGSdKfDjDe+8wzXi01FPhkwzClhjstGNHaPwZVnDKtGERX4PE985xq9wOuyGl1AFAhYz8neCTpKqRGA+/cquulTTdwQ8mLsWumZHKNcgkGtGU6MvqbVt4mDNwEJmUizeThp/h03bCoSlg2YG9Zqf/W71WA9ZqCPB2nWBRn9buhHOvNaUTn6/6dQna8Quzg8DC9WGYgecLNUIt6LMSnQUgsONl2AiNbVN+W7DHA4BkuCIafXj7g5Hj8ow==
9. cup-master-2 --> 到其他节点的SSH无密码登陆配置:
hadoop用户登录 cup-master-2
$mkdir .ssh
$ssh-keygen –t rsa –f ~/.ssh/id_rsa –P ''
在cup-master-1、cup-slave-1、cup-slave-2、cup-slave-3、cup-slave-4节点新建.ssh目录:$mkdir .ssh
$scp .ssh/id_rsa.pub hadoop@cup-master-1:/home/hadoop/sshcm2/ 依次执行各节点
$scp .ssh/id_rsa.pub hadoop@cup-slave-1:/home/hadoop/sshcm2/
$scp .ssh/id_rsa.pub hadoop@cup-slave-2:/home/hadoop/sshcm2/
$scp .ssh/id_rsa.pub hadoop@cup-slave-3:/home/hadoop/sshcm2/
$scp .ssh/id_rsa.pub hadoop@cup-slave-4:/home/hadoop/sshcm2/
hadoop用户登录 cup-master-2 配置本机
$cd ~/.ssh
$chmod 700 ~/.ssh
$cat id_rsa.pub >> authorized_keys
$chmod 600 .ssh/authorized_keys
hadoop用户登录 cup-slave-1 配置其他机器
$mkdir .ssh
$chmod 700 .ssh
$cd .ssh
$cat sshcm2/id_rsa.pub >> ~/.ssh/authorized_keys
$chmod 600 ~/.ssh/authorized_keys
其他节点依次用hadoop用户登录执行
hadoop用户登录 cup-master-2 测试无密码SSH登录: $ssh hadoop@cup-master-1 或者$ssh cup-master-1 其他节点依次执行
注意:
~/.ssh/authorized_keys 的权限必须为600,如果权限给的太高会报安全错误!
$cat sshcm2/id_rsa.pub >> ~/.ssh/authorized_keys意思是将sshcm2/id_rsa.pub添加到~/.ssh/authorized_keys的末尾,即追加
1. hadoop用户登录 cup-master-1
安装hadoop, 部署namenode
上传hadoop介质hadoop-2.0.0-cdh4.1.2.tar.gz
$tar zxvf hadoop-2.0.0-cdh4.1.2.tar.gz 解压缩
2. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/hadoop-env.sh
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
2.1 /home/hadoop/.bash_profile:
# User specific environment and startup programs
HADOOP_HOME=/home/cup/hadoop-2.0.0-cdh4.2.1
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
HADOOP_CONF_HOME=${HADOOP_HOME}/etc/hadoop
YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
ANT_HOME=/home/cup/apache-ant-1.8.4
MAVEN_HOME=/home/cup/apache-maven-3.0.4
ZOOKEEPER_HOME=/home/cup/zookeeper-3.4.5-cdh4.2.1
HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1
HADOOP_HOME_WARN_SUPPRESS=1
HADOOP_CLASSPATH=$CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common:${HADOOP_HOME}/share/hadoop/common/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/hdfs:${HADOOP_HOME}/share/hadoop/hdfs/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/mapreduce:${HADOOP_HOME}/share/hadoop/mapreduce/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tools/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/yarn:${HADOOP_HOME}/share/hadoop/yarn/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`:$HADOOP_CLASSPATH
JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:/home/cup/shell:$PATH
export JAVA_LIBRARY_PATH LD_LIBRARY_PATH HADOOP_CLASSPATH
export HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME
export ZOOKEEPER_HOME HBASE_HOME ANT_HOME MAVEN_HOME HADOOP_HOME_WARN_SUPPRESS PATH
# HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1
# HADOOP_CLASSPATH=${HIVE_HOME}/lib:$HADOOP_CLASSPATH
# HIVE_CLASSPATH=$HBASE_HOME/conf
# PATH=$HIVE_HOME/bin:$PATH
# export HIVE_HOME HIVE_CLASSPATH HADOOP_CLASSPATH PATH
$source /home/hadoop/.bash_profile
Hadoop集群安装完毕后,第一件事就是修改bin/hadoop-evn.sh文件设置内存。主流节点内存配置为32GB,典型场景内存设置如下
NN: 15-25 GB
JT:2-4GB
DN:1-4 GB
TT:1-2 GB,Child VM 1-2 GB
集群的使用场景不同相关设置也有不同,如果集群有大量小文件,则要求NN内存至少要20GB,DN内存至少2GB。
3. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/core-site.xml
$hadoop fs -rmr /xxx/xxx 不会被彻底删除,被你删除的数据将会mv到操作用户目录的".Trash"文件夹
value单位为分钟,开启垃圾箱后,如果希望文件直接被删除,可以在使用删除命令时添加“–skipTrash” 参数
$hadoop fs –rm –skipTrash /xxxx
4. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/hdfs-site.xml
5. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/mapred-site.xml
at. If "local", then jobs are run in-process as a single map
and reduce task.
6. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/yarn-site.xml
7. 各节点上hadoop用户登录,创建hadoop工作目录
$mkdir /home/hadoop/hadoopworkspace
6. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/slaves
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4
/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/masters 该文件没有也可以
cup-master-1
cup-master-2
hadoop 压缩-----------------------------------------------------
7.0 拷贝native本地库文件/libhadoop/hadoop-lzo/hadoop-snappy
到 /home/hadoop/hadoop-2.0.0-cdh4.1.2/lib/native/
以及拷贝hadoop-lzo/hadoop-snappy相应的jar包
hadoop-snappy已经集成进了hadoop-common中,所以没有单独的jar包
1). snappy本身的链接库-/usr/local/lib/libsnappy*.*
2). hadoop-common的jar包-hadoop-common-2.0.0-cdh4.2.0.jar
源码在hadoop-2.0.0-cdh4.2.0\src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\compress\snappy
3). hadoop-common的native链接库-libhadoop.a, libhadoop.so, libhadoop.so.1.0.0
源码在hadoop-2.0.0-cdh4.2.0\src\hadoop-common-project\hadoop-common\src\main\native\src\org\apache\hadoop\io\compress\snappy
snappy-1.1.0 #root用户安装
$./configure
$make
$make install
/usr/local/lib/libsnappy*.*
如果make时报
libtool: Version mismatch error. This is libtool 2.4.2 Debian-2.4.2-1ubuntu1, but the
libtool: definition of this LT_INIT comes from libtool 2.4.
libtool: You should recreate aclocal.m4 with macros from libtool 2.4.2 Debian-2.4.2-1ubuntu1
libtool: and run autoconf again.
则需要运行
$autoreconf -ivf
## $autoreconf --force --install
完了再$make
core-site.xml::::::::::::::::::::::::::::::::;
## LzoCodec与SnappyCodec只能配置一个,按照哪个压缩配置哪个
mapred-site.xml: MR的输出使用snappy压缩:
7. DN节点多盘存储方案:
扩磁盘之前系统盘/几乎满了,利用率99%,
扩磁盘之后系统盘/的利用率下降为80%~90%左右,,
后面持续观察,,看看是否持续下降,,,,,,,
收回系统盘-->先停掉一个datanode,,让集群自动搬数据,,
优化方案-->
1)stop the entire cluster
2)mv /home/cup/hadoopworkspace/dfs/data/current/* /cup/d0/dfs2/data/current/
3)add /cup/d0/dfs2/data into the dfs.datanode.data.dir
4)start the entire cluster
7. 安装hadoop, 部署datanode
hadoop-->cup-master-1
$scp -rp hadoop-2.0.0-cdh4.1.2 hadoop@cup-master-2:/home/hadoop/ 依次执行各节点
8. $hdfs namenode -format 第一次需要格式化namenode
./start-dfs.sh
./start-yarn.sh
./stop-dfs.sh
./stop-yarn.sh
以上操作slave节点会被自动启动以及关闭
9. 浏览器中输入 http://192.168.101.122:8088可以查看hadoop集群状态
http://192.168.101.122:50070可以查看namenode状态
10. $jps 查看进程
NN: ResourceManager NameNode SecondaryNameNode
DN: NodeManager DataNode
1. zookeeper/hbase install
2. hadoop-->cup-master-1:
解压zookeeper-3.4.3-cdh4.1.2 hbase-0.92.1-cdh4.1.2
1. /etc/profile 文件末尾处添加:
见前述
$source /etc/profile 使环境变量生效
2. zookeeper install
/home/hadoop/zookeeper-3.4.3-cdh4.1.2/conf/zoo_sample.cfg 改名为 zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/hadoopworkspace/zookeeper/data
dataLogDir=/home/hadoop/hadoopworkspace/zookeeper/log
clientPort=2181
server.1=cup-master-1:2888:3888
server.2=cup-slave-1:2888:3888
server.3=cup-slave-2:2888:3888
server.4=cup-slave-3:2888:3888
server.5=cup-slave-4:2888:3888
$mkdir /home/hadoop/hadoopworkspace/zookeeper/data 各节点依次执行,ZK不会自动创建
$mkdir /home/hadoop/hadoopworkspace/zookeeper/log 各节点依次执行,ZK不会自动创建
3. $scp -rp /home/hadoop/zookeeper-3.4.3-cdh4.1.2 hadoop@cup-slave-1:/home/hadoop/
4. create myid in dataDir 各节点依次执行
for cup-master-1, the content in myid file should be 1
for cup-slave-1, the content in myid file should be 2
4. 配置ZK自动清理策略
/home/hadoop/zookeeper-3.4.3-cdh4.1.2/conf/zoo.cfg
autopurge.purgeInterval=2
autopurge.snapRetainCount=10
5. /home/hadoop/zookeeper-3.4.3-cdh4.1.2/bin/
$ ./zkServer.sh start 各节点依次执行启动 (第一台机器启动时报大量错误,无妨,是因为还没有选出领导者的缘故)
6. $jps 进程查看
每个节点上都会多出一个 QuorumPeerMain 进程
7. hbase install
/home/hadoop/hbase-0.92.1-cdh4.1.2/conf/hbase-env.sh
export HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2
export HBASE_HOME=/home/hadoop/hbase-0.92.1-cdh4.1.2
export JAVA_HOME=/usr/jdk6/jdk1.6.0_32
export HBASE_MANAGES_ZK=false
export HBASE_HEAPSIZE=4000
/home/hadoop/hbase-0.92.1-cdh4.1.2/conf/hbase-site.xml
hbase 压缩-----------------------------------------------------
hbase-site.xml===============================
/home/hadoop/hbase-0.92.1-cdh4.1.2/conf/regionservers
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4
8. $ scp -rp hbase-0.92.1-cdh4.1.2 hadoop@cup-slave-1:/home/hadoop/ 其他slave节点依次执行
9. 注意时间同步master与各个slave之间需要进行时间同步(包括时区),时间差不能超过30000ms,否则hbase regionserver启动报org.apache.hadoop.hbase.ClockOutOfSyncException错误
9.1 手动同步时间
root用户登录
$date -s 20130219
$date -s 14:37:00
$ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
9.2 hbase-site.xml中增加
10. /home/hadoop/hbase-0.92.1-cdh4.1.2/bin/
$ ./start-hbase.sh slave节点会自动被启动
$ ./stop-hbase.sh slave节点会自动被关闭
11. http://192.168.101.122:50070可以查看namenode状态以及hdfs上的/hbase目录
http://192.168.101.122:60010可以查看hbase状态
12. 进程查看
NN:
13326 ResourceManager
18617 QuorumPeerMain
19630 Jps
12980 NameNode
13190 SecondaryNameNode
19411 HMaster
DN:
30404 Jps
30181 HRegionServer
27489 QuorumPeerMain
14014 DataNode
14148 NodeManager
HBASE测试snappy压缩:
$hbase org.apache.hadoop.hbase.util.CompressionTest /home/cup/kv.txt snappy
HBASE优化参数:
/etc/profile:
JAVA_OPTS="$JAVA_OPTS -server -Xms2g -Xmx12g -XX:NewSize=128m -XX:MaxNewSize=128m"
hbase-env.sh:
export HBASE_HEAPSIZE=4000
export HBASE_OPTS="$HBASE_OPTS -XX:NewSize=128m -XX:MaxNewSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-master-$(hostname).log"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xmx12g -Xms12g -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-regionserver-$(hostname).log"
export HBASE_OPTS="$HBASE_OPTS -Xms4g -Xmx4g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:NewRatio=3 -XX:SurvivorRatio=6 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=73 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-master-$(hostname).log"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xms12g -Xmx12g -XX:NewSize=3g -XX:MaxNewSize=3g -XX:NewRatio=3 -XX:SurvivorRatio=6 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=73 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-regionserver-$(hostname).log"
hbase-site.xml
hbase.client.write.buffer: 20MB
hbase.regionserver.handler.count: 100
hbase.hregion.memstore.flush.size: 384MB
hbase.hregion.max.filesize: 2GB
hbase.hstore.compactionThreshold: 3
hbase.hstore.blockingStoreFiles: 10
hbase.hstore.flush.thread: 20
hbase.hstore.compaction.thread: 15
hbase.master.distributed.log.splitting: false
zoo.cfg:
# The number of milliseconds of each tick
tickTime=30000
hbase的各种时间参数设置在[2*tickTime, 20*tickTime]范围之内
1. 集群中新增加一台机器,现有的集群节点不用重启,
首先做NN到新增加机器的SSH无密码登陆等基础安装配置,
再将新机器的主机名添加到
/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/slaves
/home/hadoop/hbase-0.92.1-cdh4.1.2/conf/regionservers
中,再对hadoop以及hbase执行启动命令,现有节点上的进程不会被影响
2. Hadoop Balancer 可以使DataNode节点上选择策略重新平衡DataNode上的数据块的分布
/home/hadoop/hadoop-2.0.0-cdh4.1.2/sbin/start-balancer.sh –t 10%
这个命令中-t参数后面跟的是HDFS达到平衡状态的磁盘使用率偏差值。
如果机器与机器之间磁盘使用率偏差小于10%,那么我们就认为HDFS集群已经达到了平衡的状态。
1. Oozie install
/etc/profile:
OOZIE_HOME=/home/hadoop/oozie-3.2.0-cdh4.1.2
$OOZIE_HOME//oozie-server/bin/catalina.sh:
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CATALINA_HOME=/home/cup/oozie-3.3.0-cdh4.2.1/oozie-server
$OOZIE_HOME/bin/oozie-setup.sh:
$oozie-setup.sh -extjs /home/hadoop/ext-2.2.zip -hadoop 0.20.200 $HADOOP_HOME
$oozie-setup.sh -extjs /home/hadoop/ext-2.2.zip -hadoop 2.0 $HADOOP_HOME
2. $OOZIE_HOME/bin/oozie-run.sh 启动oozie
5. oozie启动报找不到org/apache/hadoop/utils/ReflectionUtils类
将/home/hadoop/oozie-3.2.0-cdh4.1.2/libtools/*.jar copy to /home/hadoop/oozie-3.2.0-cdh4.1.2/oozie-server/webapps/oozie/WEB-INF/lib下
6. oozie启动报
REASON: org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Schema 'SA' does not exist {SELECT t0.bean_type, t0.conf, t0.console_url, t0.cred, t0.data, t0.error_code, t0.error_message, t0.external_child_ids, t0.external_id, t0.external_status, t0.name, t0.retries, t0.stats, t0.tracker_uri, t0.transition, t0.type, t0.user_retry_count, t0.user_retry_interval, t0.user_retry_max, t0.end_time, t0.execution_path, t0.last_check_time, t0.log_token, t0.pending, t0.pending_age, t0.signal_value, t0.sla_xml, t0.start_time, t0.status, t0.wf_id FROM WF_ACTIONS t0 WHERE t0.bean_type = ? AND t0.id = ?} [code=30000, state=42Y07]
7. $OOZIE_HOME/bin/ooziedb.sh create -sqlfile oozie.sql -run
Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE
Oozie DB has been created for Oozie version '3.2.0-cdh4.1.2'
The SQL commands have been written to: oozie.sql
sql脚本保存到$OOZIE_HOME/bin/oozie.sql文件中.
8. oozie-site.xml:
8.
Error occurred during initialization of VM
Incompatible minimum and maximum heap sizes specified
oozie-env.sh:
export CATALINA_OPTS="$CATALINA_OPTS -Xms2g -Xmx4g"
8. $OOZIE_HOME/bin/oozie-run.sh 启动oozie
$OOZIE_HOME/bin/oozie-run.sh & 后台启动oozie
最新:
$oozied.sh run
$ jps
28945 Bootstrap
9. $OOZIE_HOME/bin/oozie admin -oozie http://192.168.101.122:11000/oozie -status
System mode: NORMAL 则表示已经成功
http://192.168.101.122:11000/oozie就能看到Oozie的管理界面
重启机器hostname变了,集群启动不起来:
2. hostname变了,需要修改 /etc/sysconfig/network
/etc/sysconfig/network: (永久修改主机名)
NETWORKING=yes
HOSTNAME=cup-master-1
GATEWAY=192.168.101.1
依次执行
$source /etc/sysconfig/network
依次执行
3. /etc/profile 环境变量挪到hadoop用户下
5. 关闭防火墙 $sudo service iptables stop
查看防火墙 $sudo service iptables status
6. /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.122 cup-master-1
192.168.101.123 cup-master-2
192.168.101.121 cup-slave-1
192.168.101.125 cup-slave-2
192.168.101.124 cup-slave-3
192.168.101.120 cup-slave-4
#::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
这一行不注释掉, hbase起不来,,,,,
7. 时间同步 date -s
HBASE启动不起来:
ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
a. 关闭防火墙 $sudo service iptables stop
b. /etc/hosts 注释掉 ::1 localhost 这一行, 即禁用ipv6
c. 集群中节点时间同步
mysql5.1.67
8. $ sudo /etc/init.d/mysqld start 启动mysql $service mysqld start
$ sudo service mysqld status
$ mysql 进入mysql服务模式
mysql>
mysql>exit 退出进入bash shell命令行模式
$ /usr/bin/mysqladmin -u root password '123' 设置root用户密码
$ /usr/bin/mysqladmin -u root -h cup-master-1 password '123'
1. Hive Install
1.1 .bash_profile
HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
export HIVE_HOME
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hadoop/hive-0.9.0-cdh4.1.2/lib:$CLASSPATH:$HADOOP_HOME/bin
1.2 $ cd /home/hadoop/hive-0.9.0-cdh4.1.2/conf
1.3 $ cp hive-default.xml.template hive-site.xml
1.4 hive-site.xml:
最上面添加:
hive.metastore.warehouse.dir: /home/hadoop/hive-0.9.0-cdh4.1.2/warehouse
hive.exec.scratchdir: /home/hadoop/hive-0.9.0-cdh4.1.2/hive-${user.name}
javax.jdo.option.ConnectionURL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName: com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName: hive
javax.jdo.option.ConnectionPassword: hive
以下两处的description标签有语法错误,需要补上:
1) hive.optimize.union.remove at line474
2) hive.mapred.supports.subdirectories at line 489
以下三处的partition-dir标签有语法错误,需要补上:
1) hive.exec.list.bucketing.default.dir at line561
2) hive.exec.list.bucketing.default.dir at line562
3) hive.exec.list.bucketing.default.dir at line563
hive-env.sh:
export HADOOP_HOME=/home/cup/hadoop-2.0.0-cdh4.2.1
export HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1
export JAVA_HOME=/usr/jdk6/jdk1.6.0_32
export HIVE_CLASSPATH=$HBASE_HOME/conf
####export HIVE_AUX_JARS_PATH=/home/cup/hive-0.10.0-cdh4.2.1/lib:$HADOOP_CLASSPATH
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
注释掉HIVE_AUX_JARS_PATH的原因:
因为hive提交mr任务的时候调用hive.aux.jars.path变量,
该变量的值应该为file:///root/hive-0.10.0-cdh4.2.0/lib/hive-hbase-handler-0.10.0-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/hbase-0.94.2-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/zookeeper-3.4.5-cdh4.2.0.jar
这个是在hive-site.xml中配置,而
hive-env.sh中的export HIVE_AUX_JARS_PATH需要注释,
否则报java.io.FileNotFoundException: File file:/home/hadoop/hive-0.10.0-cdh4.4.0/lib:***** does not exist
就算不注释掉,也得修改为
export HIVE_AUX_JARS_PATH=file:///home/cup/hive-0.10.0-cdh4.2.1/lib
##使用HIVE脚本往外部表(映射到hbase的snappy压缩表)中insert数据时HIVE需要通过HIVE_AUX_JARS_PATH找到以下jar包:
hive-hbase-handler-0.10.0-cdh4.2.0.jar
hbase-0.94.2-cdh4.2.0.jar
zookeeper-3.4.5-cdh4.2.0.jar
所以此处需要配置为HIVE_AUX_JARS_PATH=/root/hive-0.10.0-cdh4.2.0/lib/:$HADOOP_CLASSPATH
添加$HADOOP_CLASSPATH是因为在HIVE里面添加外部表(与HBASE的snappy压缩表关联)时找不到snappy的类
将hadoop-common的jar包拷贝到/home/cup/hive-0.10.0-cdh4.2.1/lib下,
否则
Failed with exception java.io.IOException:java.io.IOException:
Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
或者
Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.Sna
ppyCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:134)
at org.apache.hadoop.io.compress.CompressionCodecFactory.
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 23 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.io.compress.Sna
ppyCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
... 25 more
hive-log4j.properties:
hive.log.dir=/home/cup/hive-0.10.0-cdh4.2.1/logs
hive.log.file=hive.log
重新启动mysql
$ mysql -u root -p 输入密码123
mysql>
mysql> create database hive;
## grant select on 数据库.* to 用户名@登录主机 identified by "密码"
mysql> grant all on hive.* to 'hive'@'localhost' identified by 'hive';
mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';
mysql-connector-java-5.1.22-bin.jar 拷贝到/home/hadoop/hive-0.9.0-cdh4.1.2/lib下
1.5
hive --service hwi &
http://192.168.98.20:9999/hwi
hive --service hiveserver &
[hadoop@cup-master-1 bin]$ Starting Hive Thrift Server
$ jps
29082 RunJar
$nohup hive --service hiveserver &
[hadoop@cup-master-1 bin]$ nohup: ignoring input and appending output to `nohup.out'
或者可以按照完HUE后由HUE进行统一启动。
HIVE 集成 HBASE:
hive>create external table snappy_hive(key int, value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping"=":key,cf:value")
tblproperties ("hbase.table.name"="snappy_table");
hive>create table hive (key int,value string) row format delimited fields terminated by ',';
hive>load data local inpath '/home/cup/kv.txt' into table hive;
hive>insert overwrite table snappy_hive select * from hive;
snappy --- HIVE
To enable Snappy compression for Hive output when creating SequenceFile outputs, use the following settings:
SET hive.exec.compress.output=true;
SET hive.exec.compress.intermediate=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.cli.print.header=true;
SET hive.cli.print.current.db=true;
# JVM reuse
Hadoop will typically launch map or reduce tasks in a forked JVM.
the JVM startup may create significant overhead, especially when launching
jobs with hundreds or thousands of tasks, most which have short execution times.
Reuse allows a JVM instance to be reused up to N times for the same job.
in mapred-site.xml:
hive.exec.scratchdir:
/home/cup/hive-0.10.0-cdh4.2.1/hive-${user.name}
hive.metastore.warehouse.dir:
/home/cup/hive-0.10.0-cdh4.2.1/warehouse
HIVE元数据库使用ORACLE:
1) 手动oracle版本的hive元数据库脚本 hive-0.10.0-cdh4.2.1\scripts\metastore\upgrade\oracle\hive-schema-0.10.0.oracle.sql
2) 修改hive-site.xml--jdbc连接
3) nohup hive --service hiveserver &
HIVE用户权限:
其他用户想执行HIVE需要配置以下几项:
.bash_profile
/home/hadoop/cdh42/cdhworkspace/tmp chmod 777
/home/hadoop/cdh42/hive-0.10.0-cdh4.2.0/logs chmod 777
hive>grant create/all on database default to user xhyt;
hive>show grant user xhyt on databaase default;
hive>grant select on table hive_t to user xhyt;
hive>grant select on table hive_t to group xhyt;
hbase-env.sh里面加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
hadoop-env.sh里面也加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
root用户下/etc/profile:
#set java environment
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
JAVA_OPTS="$JAVA_OPTS -server -Xms1024m -Xmx4096m"
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME JAVA_OPTS CLASSPATH PATH
hadoop用户下/home/hadoop/.bash_profile:
# User specific environment and startup programs
HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.3-cdh4.1.2
HBASE_HOME=/home/hadoop/hbase-0.92.1-cdh4.1.2
OOZIE_HOME=/home/hadoop/oozie-3.2.0-cdh4.1.2
CATALINA_HOME=$OOZIE_HOME/oozie-server
ANT_HOME=/home/hadoop/apache-ant-1.8.4
MAVEN_HOME=/home/hadoop/apache-maven-3.0.4
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$OOZIE_HOME/bin:$CATALINA_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:$PATH
export HADOOP_CLASSPATH HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME ZOOKEEPER_HOME HBASE_HOME OOZIE_HOME CATALINA_HOME ANT_HOME MAVEN_HOME PATH
3. jdk内存调整大小 /etc/profile
export JAVA_OPTS="$JAVA_OPTS -server -Xms1024m -Xmx4096m"
$source /etc/profile
各节点依次执行
HADOOP机架感知-提高网络性能
core-site.xml:
/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/rackaware.sh
#!/bin/bash
HADOOP_CONF=/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop
while [ $# -gt 0 ] ; do
nodeArg=$1
exec< ${HADOOP_CONF}/topology.data
result=""
while read line ; do
ar=( $line )
if [ "${ar[0]}" = "$nodeArg" ] ; then
result="${ar[1]}"
fi
done
shift
if [ -z "$result" ] ; then
echo -n "/default/rack "
else
echo -n "$result "
fi
done
$chmod 755 rackaware.sh
/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/topology.data
cup-master-1 /default/rack1
cup-master-2 /default/rack1
cup-slave-1 /default/rack1
cup-slave-2 /default/rack1
cup-slave-3 /default/rack1
cup-slave-4 /default/rack1
cup-slave-5 /default/rack1
cup-slave-6 /default/rack1
cup-slave-7 /default/rack2
cup-slave-8 /default/rack2
cup-slave-9 /default/rack2
cup-slave-10 /default/rack2
cup-slave-11 /default/rack2
cup-slave-12 /default/rack2
10.204.193.10 /default/rack1
10.204.193.11 /default/rack1
10.204.193.20 /default/rack1
10.204.193.21 /default/rack1
10.204.193.22 /default/rack1
10.204.193.23 /default/rack1
10.204.193.24 /default/rack1
10.204.193.25 /default/rack1
10.204.193.26 /default/rack2
10.204.193.27 /default/rack2
10.204.193.28 /default/rack2
10.204.193.29 /default/rack2
10.204.193.30 /default/rack2
10.204.193.31 /default/rack2
1. hue install (hadoop user experience)
$python 进入python解释器
ctrl+z退出python解释器
Required Dependencies:
gcc, g++,
libgcrypt-devel, libxml2-devel, libxslt-devel,
cyrus-sasl-devel, cyrus-sasl-gssapi,
mysql-devel, python-devel, python-setuptools, python-simplejson,
sqlite-devel, openldap-devel,
ant
libgcrypt-devel-1.4.5-9.el6.x86_64
libxslt-devel-1.1.26-2.el6.x86_64
cyrus-sasl-devel-2.1.23-13.el6.x86_64
mysql-devel-5.1.52.el6_0.1.x86_64
openldap-devel-2.4.23-20.el6.x86_64
install ant
install maven
$make
/home/hadoop/hue-2.1.0-cdh4.1.2/Makefile.vars:42: *** "Error: must have python development packages for 2.4, 2.5, 2.6 or 2.7. Could not find Python.h. Please install python2.4-devel, python2.5-devel, python2.6-devel or python2.7-devel". Stop.
/usr/include/python2.6/下只有pyconfig-64.h,没有Python.h文件
/home/hadoop/hue-2.1.0-cdh4.1.2/Makefile.vars中会进行判断
这是因为没有安装python-devel模块的原因
5. $ cd /home/hadoop/hue-2.1.0-cdh4.1.2
$ PREFIX=/home/hadoop/hue-2.1.0-cdh4.1.2-bin make install
$ sudo chmod 4750 apps/shell/src/shell/build/setuid
2. hadoop config
hdfs-site.xml:
core-site.xml:
httpfs-site.xml:
mapred-site.xml:
3. $ cd /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue
$ cp desktop/libs/hadoop/java-lib/hue-plugins-*.jar /home/hadoop/hadoop-2.0.0-cdh4.1.2/share/hadoop/mapreduce/lib
如果HUE安装主机和hadoop集群master主机不再同一个主机上,那么需要使用scp命令进行拷贝
HUE使用这个插件jar文件来与JobTracker通信
4. 重启hadoop集群
5. config oozie for hue
oozie-site.xml:
6. 重启oozie
7. 确认关闭防火墙(HUE SERVER对外提供服务使用默认8888端口)
9. /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/desktop/conf/hue.ini
[desktop]
http_host=0.0.0.0
http_port=8888
[[database]]
engine=mysql
host=cup-master-1
port=3306
user=hue
password=hue
name=hue
[[hdfs_clusters]]
fs_defaultfs=hdfs://cup-master-1:9000
webhdfs_url=http://cup-master-1:50070/webhdfs/v1
hadoop_hdfs_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[[mapred_clusters]]
jobtracker_host=cup-master-1
jobtracker_port=8021
thrift_port=9290
hadoop_mapred_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[[yarn_clusters]]
resourcemanager_host=cup-master-1
resourcemanager_port=8032
hadoop_mapred_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[liboozie]
oozie_url=http://cup-master-1:11000/oozie
[beeswax]
hive_home_dir=/home/hadoop/hive-0.9.0-cdh4.1.2
hive_conf_dir=/home/hadoop/hive-0.9.0-cdh4.1.2/conf
HUE默认使用sqlite库,,,,
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, or sqlite3
#
# Note that for sqlite3, 'name', below is a filename;
# for other backends, it is the database name.
engine=sqlite3
## host=
## port=
## user=
## password=
name=/home/cup/hue-2.2.0-cdh4.2.1-bin/hue/desktop/desktop.db
10. 初始化
重新启动mysql
$ mysql -u root -p 输入密码123
mysql>
mysql> create database hue;
## grant select on 数据库.* to 用户名@登录主机 identified by "密码"
mysql> grant all on hue.* to 'hue'@'localhost' identified by 'hue';
备份已有数据文件 /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
$ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue dumpdata > /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
$ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue syncdb --noinput
$ mysql -u hue -p hue -e "DELETE FROM hue.django_content_type;"
migrate之前备份的数据:
$ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue loaddata /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
11. .bash_profile:
HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`
HADOOP_CLASSPATH=/home/hadoop/hive-0.9.0-cdh4.1.2/lib:$HADOOP_CLASSPATH:$CLASSPATH:$HADOOP_HOME/bin
12. 启动
$ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/supervisor
HUE会把HIVE一并启动
***停的时候需要使用root用户kill掉Runjar进程,,否则cup用户kill的时候
总是会自动重新启动
13. 查看
http://192.168.101.122:8888 hue/hue hadoop/hadoop
5. HUE shell配置
HUE supervisor进程查询 $ps -f -u cup
[cup@cup-master-1 ~]$ ps -f -u cup
UID PID PPID C STIME TTY TIME CMD
cup 7597 7594 0 17:18 ? 00:00:00 sshd: cup@pts/1
cup 7598 7597 0 17:18 pts/1 00:00:00 -bash
cup 7777 7598 0 17:19 pts/1 00:00:00 vim hive-site.xml
cup 7943 7940 0 17:21 ? 00:00:00 sshd: cup@pts/5
cup 7944 7943 0 17:21 pts/5 00:00:00 -bash
cup 9860 9857 0 17:32 ? 00:00:00 sshd: cup@pts/9
cup 9861 9860 0 17:32 pts/9 00:00:00 -bash
cup 10560 10558 0 17:36 ? 00:00:01 sshd: cup@pts/2
cup 10561 10560 0 17:36 pts/2 00:00:00 -bash
cup 10780 10560 0 17:38 ? 00:00:00 /usr/libexec/openssh/sftp-server
cup 11683 10561 0 17:47 pts/2 00:00:00 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 ./supervisor
cup 11687 11683 0 17:47 pts/2 00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/hue runspawningserver
cup 11689 11683 2 17:47 pts/2 00:00:17 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup 11743 11687 0 17:47 pts/2 00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 -c import sys; from spawning import spawning_child; spawning_child.main() 11687 3 15 s
cup 11874 11873 0 17:49 pts/1 00:00:00 bash
cup 11896 7944 9 17:49 pts/5 00:00:44 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup 12147 11874 4 17:50 pts/1 00:00:21 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup 12351 11874 0 17:54 pts/1 00:00:00 vim hive-site.xml
cup 12748 10561 4 17:57 pts/2 00:00:00 ps -f -u cup
cup 24208 1 2 Jul09 ? 00:30:54 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_namenode -Xmx2000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-
cup 24660 1 0 Jul09 ? 00:02:07 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_zkfc -Xmx2000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4
cup 24842 1 0 Jul09 ? 00:11:10 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dyarn.log.dir=/home/cup/hado
cup 25394 1 1 Jul09 ? 00:14:32 /usr/jdk6/jdk1.6.0_32/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx24000m -Xms24g -Xmx32g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:NewRatio=3 -XX:Sur
cup 41822 41819 0 13:45 ? 00:00:00 sshd: cup
cup 51570 51568 0 Jul08 ? 00:00:00 sshd: cup@pts/3
cup 51571 51570 0 Jul08 pts/3 00:00:00 -bash
cup 56534 56531 0 Jul08 ? 00:00:01 sshd: cup@notty
cup 56535 56534 0 Jul08 ? 00:00:00 /usr/libexec/openssh/sftp-server
cup 58691 58688 0 09:46 ? 00:00:00 sshd: cup@pts/0
cup 58692 58691 0 09:46 pts/0 00:00:00 -bash
其中的
cup 11683 10561 0 17:47 pts/2 00:00:00 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 ./supervisor
cup 11687 11683 0 17:47 pts/2 00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/hue runspawningserver
cup 11689 11683 2 17:47 pts/2 00:00:17 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup 11743 11687 0 17:47 pts/2 00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 -c import sys; from spawning import spawning_child; spawning_child.main() 11687 3 15 s
是HUE相关的进程,,
想要停掉HUE需要先kill -9 11689,即RunJar进程,,
再停掉11687(runspawningserver)以及11683(supervisor)
否则不停掉11689(hue runjar)下次启动hue时会报8002,8003端口的socket无法创建
HBASE优化参数:
hbase-env.sh:
export HBASE_HEAPSIZE=4000
hbase-site.xml:
hbase.client.write.buffer: 20MB
hbase.regionserver.handler.count: 100
hbase.hregion.memstore.flush.size: 384MB
hbase.hregion.max.filesize: 2GB
hbase.hstore.compactionThreshold: 3
hbase.hstore.blockingStoreFiles: 10
hbase.hstore.flush.thread: 20
hbase.hstore.compaction.thread: 15
zoo.cfg:
# The number of milliseconds of each tick
tickTime=30000
hbase的各种时间参数设置在[2*tickTime, 20*tickTime]范围之内
hbase-site.xml:
60 seconds. Clients must report in within this period else they are
considered dead.
HBase passes this to the zk quorum as suggested maximum time for a
session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
"The client sends a requested timeout, the server responds with the
timeout that it can give the client. "
In milliseconds.
A bigger buffer takes more memory -- on both the client and server
side since server instantiates the passed write buffer to process
it -- but a larger buffer size reduces the number of RPCs made.
For an estimate of server-side memory-used, evaluate
hbase.client.write.buffer * hbase.regionserver.handler.count
Same property is used by the Master for count of master handlers.
Default is 10.
Memstore will be flushed to disk if size of the memstore
exceeds this number of bytes. Value is checked by a thread that runs
every hbase.server.thread.wakefrequency.
Maximum HStoreFile size. If any one of a column families' HStoreFiles has
grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
If more than this number of HStoreFiles in any one HStore
(one HStoreFile is written per flush of memstore) then a compaction
is run to rewrite all HStoreFiles files as one. Larger numbers
put off compaction but when it runs, it takes longer to complete.
If more than this number of StoreFiles in any one Store
(one StoreFile is written per flush of MemStore) then updates are
blocked for this HRegion until a compaction is completed, or
until hbase.hstore.blockingWaitTime has been exceeded.
HADOOP2.0 HA (NO NN Federation)
1. SSH无密码登陆配置
2. 修改hadoop配置文件(cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4)
配置文件如下:
vi core-site.xml:
vi hdfs-site.xml
或者是
SSH connection timeout, in milliseconds, to use with the builtin
sshfence fencer.
Specifies the maximum number of threads to use for transferring data
in and out of the DN.
[root@HA2kerberos conf]# vim slaves
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4
3. master-1上的hadoop拷贝到master-2上 scp
4. 把各个zookeeper起来
5. 然后在某一个主节点执行hdfs zkfc -formatZK,创建命名空间
6. 在dfs.namenode.shared.edits.dir指定的各个节点
(qjournal://cup-master-1:8485;cup-slave-1:8485;cup-slave-2:8485;cup-slave-3:8485;cup-slave-4:8485/mycluster)
用./hadoop-daemon.sh start journalnode启日志程序
7. 在主namenode节点用hadoop namenode -format格式化namenode和journalnode目录
8. 在主namenode节点启动./hadoop-daemon.sh start namenode进程 ./start-dfs.sh
9. 在备namenode节点执行hdfs namenode -bootstrapStandby,
这个是把主namenode节点的目录格式化并把数据从主namenode节点的元数据拷本过来
然后用./hadoop-daemon.sh start namenode启动namenode进程!
6. ./hadoop-daemon.sh start zkfc 主备namenode两个节点都做
7. ./hadoop-daemon.sh start datanode所有datanode节点都做
先起namenode在起zkfc你会发现namenode无法active状态,当你把zkfc启动后就可以了!!!
以上的顺序不能变,我在做的过程就因为先把zkfc启动了,导到namenode起不来!!!
自动启动的时候能看出来,zkfc是最后才启动的!!
[hadoop@ClouderaHA1 sbin]$ ./start-dfs.sh
Starting namenodes on [ClouderaHA1 ClouderaHA2]
ClouderaHA1: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-ClouderaHA1.out
ClouderaHA2: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-ClouderaHA2.out
ClouderaHA3: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA3.out
ClouderaHA1: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA1.out
ClouderaHA2: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA2.out
Starting ZK Failover Controllers on NN hosts [ClouderaHA1 ClouderaHA2]
ClouderaHA1: starting zkfc, logging to /app/hadoop/logs/hadoop-hadoop-zkfc-ClouderaHA1.out
ClouderaHA2: starting zkfc, logging to /app/hadoop/logs/hadoop-hadoop-zkfc-ClouderaHA2.out
A. 先各个节点启journalnode
hadoop-daemon.sh start journalnode
B. 在主master节点start-dfs.sh start-yarn.sh
[hadoop@cup-master-1 ~]$ start-dfs.sh
Starting namenodes on [cup-master-1 cup-master-2]
hadoop@cup-master-1's password: cup-master-2: starting namenode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-namenode-cup-master-2.out
cup-master-1: starting namenode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-namenode-cup-master-1.out
cup-slave-4: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-4.out
cup-slave-1: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-1.out
cup-slave-3: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-3.out
cup-slave-2: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-2.out
Starting ZK Failover Controllers on NN hosts [cup-master-1 cup-master-2]
hadoop@cup-master-1's password: cup-master-2: starting zkfc, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-zkfc-cup-master-2.out
cup-master-1: starting zkfc, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-zkfc-cup-master-1.out
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-1 ~]$ jps
30939 NameNode
28526 QuorumPeerMain
29769 JournalNode
31283 Jps
31207 DFSZKFailoverController
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-2 ~]$ jps
13197 DFSZKFailoverController
12305 NameNode
15106 Jps
[hadoop@cup-master-2 ~]$
[hadoop@cup-master-1 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-resourcemanager-cup-master-1.out
cup-slave-4: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-4.out
cup-slave-1: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-1.out
cup-slave-3: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-3.out
cup-slave-2: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-2.out
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-1 ~]$ jps
30939 NameNode
28526 QuorumPeerMain
29769 JournalNode
31628 Jps
31207 DFSZKFailoverController
31365 ResourceManager
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-2 ~]$ jps
13197 DFSZKFailoverController
12305 NameNode
17092 Jps
由此得知HA只是针对HDFS, 与MR2无关
[hadoop@cup-slave-1 ~]$ jps
30692 JournalNode
31453 NodeManager
31286 DataNode
30172 QuorumPeerMain
31562 Jps
[hadoop@cup-slave-1 ~]$
HBASE HA CONF:
1. hbase-site.xml
2. 将core-site.xml和hdfs-site.xml拷贝到hbase_home\conf\下
否则hbase无法启动,不认hdfs://mycluster
HA调试失败
还原的时候必须
1. 清空目录
NNs上: /home/hadoop/cdh42/cdhworkspace/dfs/name
DNs上: /home/hadoop/cdh42/cdhworkspace/dfs/data
JNs上: /home/hadoop/cdh42/cdhworkspace/dfs/jn
2. 做格式化操作
NNs上: hdfs namenode -format
Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/name state: NOT_FORMATTED
NNs上: hdfs namenode -format
format时要求ZK进程以及JN进程启动
zkServer.sh start
hadoop-daemon.sh start journalnode
Incompatible namespaceID for journal Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster: NameNode has nsId 264369592 but storage has nsId 1178230309
修改/home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster/current/VERSION文件中的namespaceID
Incompatible clusterID for journal Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster: NameNode has clusterId 'CID-34eabdd9-ca2c-48ff-9127-b6df81aded90' but storage has clusterId 'CID-c1012f1d-e2f1-4a0b-89f6-cafabef1cf7e'
修改/home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster/current/VERSION文件中的clusterId
Incompatible clusterIDs in /home/hadoop/cdh42/cdhworkspace/dfs/data: namenode clusterID = CID-34eabdd9-ca2c-48ff-9127-b6df81aded90; datanode clusterID = CID-c1012f1d-e2f1-4a0b-89f6-cafabef1cf7e
修改/home/hadoop/cdh42/cdhworkspace/dfs/data/current/VERSION文件中的clusterId
原因:每次format会新生成namespaceID以及clusterID
而此时cdhworkspace/dfs/name,cdhworkspace/dfs/data, cdhworkspace/dfs/jn里面的namespaceID以及clusterID是旧的,
所以要在format前清空所有机器上的所有目录
NNs上: /home/hadoop/cdh42/cdhworkspace/dfs/name
DNs上: /home/hadoop/cdh42/cdhworkspace/dfs/data
JNs上: /home/hadoop/cdh42/cdhworkspace/dfs/jn
HBASE调大
ulimit -a open files需要调大
dfs.replication.interval
dfs.datanode.handler.count
dfs.namenode.handler.count
HIVE集成HBASE 需要拷贝hbase配置文件到hadoop下:
hbase->hadoop:
hbase-0.94.2-cdh4.2.0/conf/hbase-site.xml copy to hadoop-2.0.0-cdh4.2.0/etc/hadoop/下
挂载ISO镜像文件:
mount -t iso9660 -o loop /*/*.iso /mnt
[contrib1]
name=Server
baseurl=file:///mnt/Server
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
1. 晚上我查询研究了一下,目前主流的观点是rowkey: 10-100B,即rowkey长度控制在10到100个字节,
rowkey过长会降低memstore检索效率以及hfile的存储效率,有百害而无一利。
2. 我这边结合咱们的场景以及数据模型,推荐以下长度:
recommanded 8B=64b,16B=128b,24B=192b,32B=256b,最大不要超过32字节。
即分别是8字节, 16字节, 24字节以及32字节,皆取8的整数倍,原因是64位机器内存分配以8字节倍数对齐。
3. 以下为量化分析:
8B = 64b = 2^64 = 1.844674407371 * 10^19 --20bits long int --最大20位整数
16B = 128b = 2^128 = 3.4028236692094 * 10^38 --39bits long int --最大39位整数
24B = 192b = 2^192 = 6.2771017353867 * 10^57 --58bits long int --最大58位整数
32B = 256b = 2^256 = 1.1579208923732 * 10^77 --78bits long int --最大78位整数
而根据咱们的设计话单表ROWKEY按如下方式组织->
6156911095 8534567490 11000 45000 1111111111111111111
反转电话 10位
取反时间 10位
小区维度 10位
终端维度 19位
总共是49位整数,,所以建议直接采用该方案,ROWKEY按照24个字节走,最大支持58位整数,取57位,
这样仍然有8位的空余可用,如果不需要那就转字节的时候自动填零即可。
CDH2.0 native lib compiling
依赖包::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
maven
apr-1.4.6.tar.gz
apr-util-1.5.1.tar.gz
httpd-2.2.23.tar.gz
php-5.3.18.tar.gz
rrdtool-1.4.7.tar.gz
pcre-8.31.tar.gz
libconfuse-2.6-2.el5.rf.x86_64.rpm
libconfuse-devel-2.6-2.el5.rf.x86_64.rpm
libxml2-devel rpmbuild glib2-devel dbus-devel freetype-devel fontconfig-devel
gcc-c++ expat-devel python-devel libXrender-devel
yum -y install apr-devel apr-util check-devel cairo-devel pango-devel
pcre-devel
tcl-devel
zlib-devel
bzip2-devel
libX11-devel
readline-devel
libXt-devel
tk-devel
tetex-latex
rhbase:
libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev
automake libtool flex bison pkg-config g++ libssl-dev
1. install lzo以及lzo-devel lzo-devel zlib-devel openssl-devel
dependancy: lzo-devel zlib-devel gcc autoconf automake libtool
2. install ProtocolBuffers: http://wiki.apache.org/hadoop/HowToContribute
3. $cd /home/hadoop/protobuf-2.5.0/ ##root用户
$./configure
$make
$make install
4. $cd /home/hadoop/protobuf-2.5.0/java ##hadoop用户
$mvn compile
$mvn install
5. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common
modify pom.xml: add
6. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
$mvn clean install -DskipTests -P native
******************注意, 因为hadoop-common-project/hadoop-common中包含snappy压缩的代码,
所以common本地库编译的时候最好事先安装好snappy,如snappy-1.1.0,否则使用snappy压缩时会提示:
this version of libhadoop was built without snappy support
snappy-1.1.0.tar.
http://code.google.com/p/hadoop-snappy/
$ mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]
$mvn clean install -DskipTests -P native package -Dsnappy.prefix=SNAPPY_INSTALLATION_DIR
$mvn clean install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
##不加-Dsnappy.prefix=/root/snappy-1.1.0的话
会提示snappy native library was compiled without snappy support
this version of libhadoop was built without snappy support
http://code.google.com/p/hadoop-snappy/上有说明
copy to hadoop-common-project/hadoop-common----------------------------
7. copy /home/hadoop/protobuf-2.5.0/java/target/generated-sources/com/google/protobuf/DescriptorProtos.java to
/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/generated-sources/java/com/google/protobuf/
8. copy /home/hadoop/protobuf-2.5.0/java/src/main/java/com/google/protobuf/*.java to
/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/generated-sources/java/com/google/protobuf/
9. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
$mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
注意,没有clean,否则拷过去的java文件会被删除
main:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.427s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.986s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [0.933s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [0.852s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.246s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [0.645s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [0.827s]
[INFO] Apache Hadoop Common .............................. FAILURE [49.566s]
[INFO] Apache Hadoop Common Project ...................... SKIPPED
[INFO] Apache Hadoop HDFS ................................ SKIPPED
[INFO] Apache Hadoop HttpFS .............................. SKIPPED
[INFO] Apache Hadoop HDFS Project ........................ SKIPPED
[INFO] hadoop-yarn ....................................... SKIPPED
[INFO] hadoop-yarn-api ................................... SKIPPED
[INFO] hadoop-yarn-common ................................ SKIPPED
[INFO] hadoop-yarn-server ................................ SKIPPED
[INFO] hadoop-yarn-server-common ......................... SKIPPED
[INFO] hadoop-yarn-server-nodemanager .................... SKIPPED
[INFO] hadoop-yarn-server-web-proxy ...................... SKIPPED
[INFO] hadoop-yarn-server-resourcemanager ................ SKIPPED
[INFO] hadoop-yarn-server-tests .......................... SKIPPED
[INFO] hadoop-yarn-client ................................ SKIPPED
[INFO] hadoop-yarn-applications .......................... SKIPPED
[INFO] hadoop-yarn-applications-distributedshell ......... SKIPPED
[INFO] hadoop-mapreduce-client ........................... SKIPPED
[INFO] hadoop-mapreduce-client-core ...................... SKIPPED
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SKIPPED
[INFO] hadoop-yarn-site .................................. SKIPPED
[INFO] hadoop-yarn-project ............................... SKIPPED
[INFO] hadoop-mapreduce-client-common .................... SKIPPED
[INFO] hadoop-mapreduce-client-shuffle ................... SKIPPED
[INFO] hadoop-mapreduce-client-app ....................... SKIPPED
[INFO] hadoop-mapreduce-client-hs ........................ SKIPPED
[INFO] hadoop-mapreduce-client-jobclient ................. SKIPPED
[INFO] Apache Hadoop MapReduce Examples .................. SKIPPED
[INFO] hadoop-mapreduce .................................. SKIPPED
[INFO] Apache Hadoop MapReduce Streaming ................. SKIPPED
[INFO] Apache Hadoop Distributed Copy .................... SKIPPED
[INFO] Apache Hadoop Archives ............................ SKIPPED
[INFO] Apache Hadoop Rumen ............................... SKIPPED
[INFO] Apache Hadoop Gridmix ............................. SKIPPED
[INFO] Apache Hadoop Data Join ........................... SKIPPED
[INFO] Apache Hadoop Extras .............................. SKIPPED
[INFO] Apache Hadoop Pipes ............................... SKIPPED
[INFO] Apache Hadoop Tools Dist .......................... SKIPPED
[INFO] Apache Hadoop Tools ............................... SKIPPED
[INFO] Apache Hadoop Distribution ........................ SKIPPED
[INFO] Apache Hadoop Client .............................. SKIPPED
[INFO] Apache Hadoop Mini-Cluster ........................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 58.143s
[INFO] Finished at: Tue Apr 09 14:31:49 CST 2013
[INFO] Final Memory: 67M/1380M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/native"): java.io.IOException: error=2, No such file or directory -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn
10. install cmake ##root用户
$tar xvf cmake-*.*.*.tar.gz
$cd cmake-*.*.*
$./bootstrap
$make
$make install
11. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
$mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
注意,没有clean,执行该步骤之后才能生成
/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources目录
copy to hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common----------------------------
12. copy /home/hadoop/protobuf-2.5.0/java/target/generated-sources/com/google/protobuf/DescriptorProtos.java to
/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources/proto/
13. copy /home/hadoop/protobuf-2.5.0/java/src/main/java/com/google/protobuf/*.java to
/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources/proto/
14. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
$mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
注意,没有clean
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.302s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.861s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [0.765s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.010s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.230s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [0.614s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [0.741s]
[INFO] Apache Hadoop Common .............................. SUCCESS [23.666s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.075s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [31.895s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [2.411s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.076s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.265s]
[INFO] hadoop-yarn-api ................................... SUCCESS [6.371s]
[INFO] hadoop-yarn-common ................................ SUCCESS [1.907s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.107s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [1.211s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [2.975s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [0.324s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [0.634s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.367s]
[INFO] hadoop-yarn-client ................................ SUCCESS [0.194s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.108s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [0.344s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.098s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [1.496s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [0.231s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.200s]
[INFO] hadoop-yarn-project ............................... SUCCESS [0.172s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [6.503s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [0.391s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [3.133s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [1.250s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [3.092s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [0.900s]
[INFO] hadoop-mapreduce .................................. SUCCESS [0.105s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [0.706s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [1.513s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [0.828s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [1.201s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [1.040s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [0.409s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [0.545s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [9.772s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [0.467s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.059s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [0.228s]
[INFO] Apache Hadoop Client .............................. SUCCESS [0.624s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.247s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:56.489s
[INFO] Finished at: Tue Apr 09 15:28:54 CST 2013
[INFO] Final Memory: 87M/744M
[INFO] ------------------------------------------------------------------------
15. 编译后的native文件:
/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/native/target/usr/local/lib/
[hadoop@cup-master-1 src]$ find . -name *.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/libposix_util.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/libnative_mini_dfs.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib/libhdfs.a
./hadoop-common-project/hadoop-common/target/native/target/usr/local/lib/libhadoop.a
./hadoop-tools/hadoop-pipes/target/native/libhadooputils.a
./hadoop-tools/hadoop-pipes/target/native/libhadooppipes.a
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/libcontainer.a