hadoop 2.0 chd4.4.0安装

阅读更多
序号 主机IP 主机名称(root/redhat)   远程管理IP 远程管理帐号口令
1 192.168.101.120 cup-slave-4 192.168.101.150 user1/hadoop123
2 192.168.101.121 cup-slave-1 192.168.101.151 user1/hadoop123
3 192.168.101.122 cup-master-1 192.168.101.152 user1/hadoop123
4 192.168.101.123 cup-master-2 192.168.101.153 user1/hadoop123
5 192.168.101.124 cup-slave-3 192.168.101.154 user1/hadoop123
6 192.168.101.125 cup-slave-2 192.168.101.155 user1/hadoop123

临时文件目录:
C:\ProgramFilesDev\CDH4\on cup-master-1\
C:\ProgramFilesDev\CDH4\install files\

注意: 配置文件的编辑最好使用UltraEdit等工具编辑,不要使用写字板等工具,否则在linux下有可能会导致错误!!!!!!!!!

/etc/sysconfig/network: (永久修改主机名)
NETWORKING=yes
HOSTNAME=cup-master-1
GATEWAY=192.168.101.1

依次执行,GATEWAY一定要准确,可以执行$ifconfig查看Bcast属性

$source /etc/sysconfig/network
依次执行


修改hostname:  ##这个步骤一定要执行,否则NN格式化的时候有可能会报UnknownHostEception:cup-master-1的错误
$hostname cup-master-1
$hostname cup-master-2
$hostname cup-slave-1
$hostname cup-slave-2
$hostname cup-slave-3
$hostname cup-slave-4


/etc/hosts中已经配置了的主机:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.101.122  cup-master-1
192.168.101.123  cup-master-2
192.168.101.121  cup-slave-1
192.168.101.125  cup-slave-2
192.168.101.124 cup-slave-3
192.168.101.120 cup-slave-4



$source /etc/hosts
依次执行


DNS:
/etc/resolv.conf 增加
search localdomain
nameserver 192.168.101.110 ##dns ip
nameserver 8.8.8.8
依次执行


语言配置:
/etc/sysconfig/i18n
LANG=en_US
$source /etc/sysconfig/i18n
依次执行
$echo $LANG
进行查看



关闭防火墙 $sudo service iptables stop
查看防火墙 $sudo service iptables status
依次执行

永久关闭: $chkconfig iptables off
          $iptables -F
          $service iptables save


卸载openjdk:
1. rpm -qa|grep jdk
   java-1.6.0-openjdk-1.6.0.0-1.41.1.10.4.el6.x86_64
2. rpm -e java-1.6.0-openjdk-1.6.0.0-1.41.1.10.4.el6.x86_64

安装jdk
1. JAVA SE 1.6以上,下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
   下载jdk-6u32-linux-x64.bin
2. cd /usr/jdk6
3. chmod 755 *.bin
4. ./jdk-6u32-linux-x64.bin
5. 配置环境变量


/etc/profile 文件末尾处添加:
/etc/profile:
#set java environment
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
JAVA_OPTS="$JAVA_OPTS -server"
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME JAVA_OPTS CLASSPATH PATH

#JAVA_OPTS="$JAVA_OPTS -server -Xms2g -Xmx12g -XX:NewSize=128m -XX:MaxNewSize=128m"
$source /etc/profile  使环境变量生效




ulimit 打开文件最大数限制设置--打开文件句柄最大数限制设置
ulimit -u
1. /etc/security/limits.conf
* soft nofile 655350
* hard nofile 655350
2. /etc/security/limits.d/90-nproc.conf
*          soft    nproc     10240
*          hard    nproc     60240



6. hadoop用户配置

   /etc/sudoers 中root ALL=(ALL) ALL 下面添加
   root    ALL=(ALL) ALL
   hadoop    ALL=(ALL) ALL

   $groupadd hadoop
   $useradd hadoop –g hadoop
   $passwd hadoop

7. root用户登录 cup-master-1 关闭防火墙  $service iptables stop 依次执行各节点

8. root-> /etc/ssh/sshd_config
   #UseLogin no修改为
   UseLogin yes
   重启ssh: $service sshd restart
   否则会报-bash: ulimit: open files: cannot modify limit: Operation not permitted

8. cup-master-1 --> 到其他节点的SSH无密码登陆配置:
   hadoop用户登录 cup-master-1
   $mkdir .ssh      ------主节点不用建
   $ssh-keygen –t rsa –f ~/.ssh/id_rsa –P ''
   在cup-master-2、cup-slave-1、cup-slave-2、cup-slave-3、cup-slave-4节点新建.ssh目录:$mkdir .ssh
   $scp .ssh/id_rsa.pub hadoop@cup-slave-1:/home/hadoop/.ssh/  依次执行各节点
   $scp .ssh/id_rsa.pub hadoop@cup-slave-2:/home/hadoop/sshcm1/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-3:/home/hadoop/sshcm1/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-4:/home/hadoop/sshcm1/
   $scp .ssh/id_rsa.pub hadoop@cup-master-2:/home/hadoop/sshcm1/

   hadoop用户登录 cup-master-1  配置本机
   $cd ~/.ssh
   $chmod 700 ~/.ssh
   $cat id_rsa.pub >> authorized_keys
   $chmod 600 .ssh/authorized_keys
  
   hadoop用户登录 cup-slave-1 配置其他机器
   $mkdir .ssh
   $chmod 700 .ssh
   $cd .ssh
   $cat sshcm1/id_rsa.pub >> ~/.ssh/authorized_keys
   $chmod 600 ~/.ssh/authorized_keys
  
   其他节点依次用hadoop用户登录执行

   hadoop用户登录 cup-master-1 测试无密码SSH登录: $ssh hadoop@cup-master-2  或者$ssh cup-master-2  其他节点依次执行
   注意:
   第一次连接的时候会有询问语句打出来,输入yes即可,,,
   然后再~/.ssh/目录下回生成known_hosts文件,,,,,,
   如果以后出现什么ssh无密码登陆的问题,可以删除该文件,重新做rsa数字签名,再重新做远程ssh登陆操作即可。


known_hosts文件:
cup-slave-1,192.168.98.225 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr5bf6Fe2TRprWmB+RK1ZeriV+wwlwsIKLv9Y1sneLoXgPqIA9RBi9RodiWogImu5J8Ht4KZ2UyXIb/w2/NQeZKYJExpGlpXGSdKfDjDe+8wzXi01FPhkwzClhjstGNHaPwZVnDKtGERX4PE985xq9wOuyGl1AFAhYz8neCTpKqRGA+/cquulTTdwQ8mLsWumZHKNcgkGtGU6MvqbVt4mDNwEJmUizeThp/h03bCoSlg2YG9Zqf/W71WA9ZqCPB2nWBRn9buhHOvNaUTn6/6dQna8Quzg8DC9WGYgecLNUIt6LMSnQUgsONl2AiNbVN+W7DHA4BkuCIafXj7g5Hj8ow==

cup-slave-2,192.168.98.227 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAr5bf6Fe2TRprWmB+RK1ZeriV+wwlwsIKLv9Y1sneLoXgPqIA9RBi9RodiWogImu5J8Ht4KZ2UyXIb/w2/NQeZKYJExpGlpXGSdKfDjDe+8wzXi01FPhkwzClhjstGNHaPwZVnDKtGERX4PE985xq9wOuyGl1AFAhYz8neCTpKqRGA+/cquulTTdwQ8mLsWumZHKNcgkGtGU6MvqbVt4mDNwEJmUizeThp/h03bCoSlg2YG9Zqf/W71WA9ZqCPB2nWBRn9buhHOvNaUTn6/6dQna8Quzg8DC9WGYgecLNUIt6LMSnQUgsONl2AiNbVN+W7DHA4BkuCIafXj7g5Hj8ow==



9. cup-master-2 --> 到其他节点的SSH无密码登陆配置:
   hadoop用户登录 cup-master-2
   $mkdir .ssh
   $ssh-keygen –t rsa –f ~/.ssh/id_rsa –P ''
   在cup-master-1、cup-slave-1、cup-slave-2、cup-slave-3、cup-slave-4节点新建.ssh目录:$mkdir .ssh
   $scp .ssh/id_rsa.pub hadoop@cup-master-1:/home/hadoop/sshcm2/  依次执行各节点
   $scp .ssh/id_rsa.pub hadoop@cup-slave-1:/home/hadoop/sshcm2/ 
   $scp .ssh/id_rsa.pub hadoop@cup-slave-2:/home/hadoop/sshcm2/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-3:/home/hadoop/sshcm2/
   $scp .ssh/id_rsa.pub hadoop@cup-slave-4:/home/hadoop/sshcm2/

   hadoop用户登录 cup-master-2  配置本机
   $cd ~/.ssh
   $chmod 700 ~/.ssh
   $cat id_rsa.pub >> authorized_keys
   $chmod 600 .ssh/authorized_keys
  
   hadoop用户登录 cup-slave-1 配置其他机器
   $mkdir .ssh
   $chmod 700 .ssh
   $cd .ssh
   $cat sshcm2/id_rsa.pub >> ~/.ssh/authorized_keys
   $chmod 600 ~/.ssh/authorized_keys
  
   其他节点依次用hadoop用户登录执行

   hadoop用户登录 cup-master-2 测试无密码SSH登录: $ssh hadoop@cup-master-1  或者$ssh cup-master-1  其他节点依次执行
  
  
   注意:
   ~/.ssh/authorized_keys 的权限必须为600,如果权限给的太高会报安全错误!
   $cat sshcm2/id_rsa.pub >> ~/.ssh/authorized_keys意思是将sshcm2/id_rsa.pub添加到~/.ssh/authorized_keys的末尾,即追加





1. hadoop用户登录 cup-master-1
   安装hadoop, 部署namenode
   上传hadoop介质hadoop-2.0.0-cdh4.1.2.tar.gz

   $tar zxvf hadoop-2.0.0-cdh4.1.2.tar.gz 解压缩

2. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/hadoop-env.sh
   JAVA_HOME=/usr/jdk6/jdk1.6.0_32

2.1 /home/hadoop/.bash_profile:

# User specific environment and startup programs
HADOOP_HOME=/home/cup/hadoop-2.0.0-cdh4.2.1
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
HADOOP_CONF_HOME=${HADOOP_HOME}/etc/hadoop
YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

ANT_HOME=/home/cup/apache-ant-1.8.4
MAVEN_HOME=/home/cup/apache-maven-3.0.4

ZOOKEEPER_HOME=/home/cup/zookeeper-3.4.5-cdh4.2.1
HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1

HADOOP_HOME_WARN_SUPPRESS=1
HADOOP_CLASSPATH=$CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/common:${HADOOP_HOME}/share/hadoop/common/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/hdfs:${HADOOP_HOME}/share/hadoop/hdfs/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/mapreduce:${HADOOP_HOME}/share/hadoop/mapreduce/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/tools/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=${HADOOP_HOME}/share/hadoop/yarn:${HADOOP_HOME}/share/hadoop/yarn/lib:$HADOOP_CLASSPATH
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`:$HADOOP_CLASSPATH

JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:/home/cup/shell:$PATH

export JAVA_LIBRARY_PATH LD_LIBRARY_PATH HADOOP_CLASSPATH
export HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME
export ZOOKEEPER_HOME HBASE_HOME ANT_HOME MAVEN_HOME HADOOP_HOME_WARN_SUPPRESS PATH


# HIVE_HOME=/home/cup/hive-0.10.0-cdh4.2.1
# HADOOP_CLASSPATH=${HIVE_HOME}/lib:$HADOOP_CLASSPATH
# HIVE_CLASSPATH=$HBASE_HOME/conf
# PATH=$HIVE_HOME/bin:$PATH
# export HIVE_HOME HIVE_CLASSPATH HADOOP_CLASSPATH PATH

$source /home/hadoop/.bash_profile
 


Hadoop集群安装完毕后,第一件事就是修改bin/hadoop-evn.sh文件设置内存。主流节点内存配置为32GB,典型场景内存设置如下
NN: 15-25 GB
JT:2-4GB
DN:1-4 GB
TT:1-2 GB,Child VM 1-2 GB
集群的使用场景不同相关设置也有不同,如果集群有大量小文件,则要求NN内存至少要20GB,DN内存至少2GB。




3. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/core-site.xml

     
          fs.defaultFS
          hdfs://cup-master-1:9000
     

     
           hadoop.tmp.dir
           /home/hadoop/hadoopworkspace/tmp
     



 
  fs.trash.interval 
  1440 
 
$hadoop fs -rmr /xxx/xxx  不会被彻底删除,被你删除的数据将会mv到操作用户目录的".Trash"文件夹
value单位为分钟,开启垃圾箱后,如果希望文件直接被删除,可以在使用删除命令时添加“–skipTrash” 参数
$hadoop fs –rm –skipTrash /xxxx



4. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/hdfs-site.xml

  
       dfs.namenode.name.dir
       /home/hadoop/hadoopworkspace/dfs/name
  

  
       dfs.datanode.data.dir
       /home/hadoop/hadoopworkspace/dfs/data
  

  
       dfs.replication
       3
  

  
       dfs.permissions
       false
  



5. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/mapred-site.xml


mapreduce.framework.name
yarn


mapreduce.job.tracker
hdfs://cup-master-1:9001
true


mapreduce.jobtracker.address
cup-master-1:9002
The host and port that the MapReduce job tracker runs
at.  If "local", then jobs are run in-process as a single map
and reduce task.



mapred.system.dir
/home/hadoop/hadoopworkspace/mapred/system
true


mapred.local.dir
/home/hadoop/hadoopworkspace/mapred/local
true


6. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/yarn-site.xml


yarn.resourcemanager.address
cup-master-1:8080


yarn.resourcemanager.scheduler.address
cup-master-1:8081


yarn.resourcemanager.resource-tracker.address
cup-master-1:8082



yarn.nodemanager.aux-services
mapreduce.shuffle


yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler


7. 各节点上hadoop用户登录,创建hadoop工作目录
   $mkdir /home/hadoop/hadoopworkspace


6. /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/slaves

cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4

   /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/masters  该文件没有也可以

cup-master-1
cup-master-2




hadoop 压缩-----------------------------------------------------
7.0 拷贝native本地库文件/libhadoop/hadoop-lzo/hadoop-snappy
    到 /home/hadoop/hadoop-2.0.0-cdh4.1.2/lib/native/
    以及拷贝hadoop-lzo/hadoop-snappy相应的jar包
    hadoop-snappy已经集成进了hadoop-common中,所以没有单独的jar包

1). snappy本身的链接库-/usr/local/lib/libsnappy*.*
2). hadoop-common的jar包-hadoop-common-2.0.0-cdh4.2.0.jar
   源码在hadoop-2.0.0-cdh4.2.0\src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\compress\snappy
3). hadoop-common的native链接库-libhadoop.a, libhadoop.so, libhadoop.so.1.0.0
   源码在hadoop-2.0.0-cdh4.2.0\src\hadoop-common-project\hadoop-common\src\main\native\src\org\apache\hadoop\io\compress\snappy


    snappy-1.1.0   #root用户安装
    $./configure
    $make
    $make install
    /usr/local/lib/libsnappy*.*

    如果make时报
    libtool: Version mismatch error.  This is libtool 2.4.2 Debian-2.4.2-1ubuntu1, but the
    libtool: definition of this LT_INIT comes from libtool 2.4.
    libtool: You should recreate aclocal.m4 with macros from libtool 2.4.2 Debian-2.4.2-1ubuntu1
    libtool: and run autoconf again.
    则需要运行
    $autoreconf -ivf
    ## $autoreconf --force --install
    完了再$make

core-site.xml::::::::::::::::::::::::::::::::;

  io.compression.codecs   org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,org.apache.hadoop.io.compress.BZip2Codec


  io.compression.codec.lzo.class
  com.hadoop.compression.lzo.LzoCodec


  io.compression.codec.snappy.class
  org.apache.hadoop.io.compress.SnappyCodec


## LzoCodec与SnappyCodec只能配置一个,按照哪个压缩配置哪个


mapred-site.xml:  MR的输出使用snappy压缩:


  mapred.compress.map.output
  true


  mapred.map.output.compression.codec
  org.apache.hadoop.io.compress.SnappyCodec


  mapred.output.compression.type
  BLOCK



  mapreduce.map.output.compress
  true


  mapred.map.output.compress.codec
  org.apache.hadoop.io.compress.SnappyCodec


  mapreduce.output.fileoutputformat.compress.type
  BLOCK


  mapreduce.output.fileoutputformat.compress.codec
  org.apache.hadoop.io.compress.SnappyCodec




7. DN节点多盘存储方案:
扩磁盘之前系统盘/几乎满了,利用率99%,
扩磁盘之后系统盘/的利用率下降为80%~90%左右,,
后面持续观察,,看看是否持续下降,,,,,,,

收回系统盘-->先停掉一个datanode,,让集群自动搬数据,,

优化方案-->
1)stop the entire cluster
2)mv /home/cup/hadoopworkspace/dfs/data/current/* /cup/d0/dfs2/data/current/
3)add /cup/d0/dfs2/data into the dfs.datanode.data.dir
4)start the entire cluster





7. 安装hadoop, 部署datanode
   hadoop-->cup-master-1
   $scp -rp hadoop-2.0.0-cdh4.1.2 hadoop@cup-master-2:/home/hadoop/   依次执行各节点


8. $hdfs namenode -format  第一次需要格式化namenode
   ./start-dfs.sh
   ./start-yarn.sh
   ./stop-dfs.sh
   ./stop-yarn.sh
   以上操作slave节点会被自动启动以及关闭

9. 浏览器中输入 http://192.168.101.122:8088可以查看hadoop集群状态
                http://192.168.101.122:50070可以查看namenode状态

10. $jps 查看进程
   NN: ResourceManager NameNode SecondaryNameNode
   DN: NodeManager DataNode













1. zookeeper/hbase install

2. hadoop-->cup-master-1:
   解压zookeeper-3.4.3-cdh4.1.2 hbase-0.92.1-cdh4.1.2

1. /etc/profile 文件末尾处添加:
   见前述

$source /etc/profile  使环境变量生效


2. zookeeper install
   /home/hadoop/zookeeper-3.4.3-cdh4.1.2/conf/zoo_sample.cfg 改名为 zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/hadoop/hadoopworkspace/zookeeper/data
dataLogDir=/home/hadoop/hadoopworkspace/zookeeper/log
clientPort=2181
server.1=cup-master-1:2888:3888
server.2=cup-slave-1:2888:3888
server.3=cup-slave-2:2888:3888
server.4=cup-slave-3:2888:3888
server.5=cup-slave-4:2888:3888

$mkdir /home/hadoop/hadoopworkspace/zookeeper/data  各节点依次执行,ZK不会自动创建
$mkdir /home/hadoop/hadoopworkspace/zookeeper/log  各节点依次执行,ZK不会自动创建

3. $scp -rp /home/hadoop/zookeeper-3.4.3-cdh4.1.2 hadoop@cup-slave-1:/home/hadoop/

4. create myid in dataDir  各节点依次执行
for cup-master-1, the content in myid file should be 1
for cup-slave-1, the content in myid file should be 2

4. 配置ZK自动清理策略
   /home/hadoop/zookeeper-3.4.3-cdh4.1.2/conf/zoo.cfg
autopurge.purgeInterval=2
autopurge.snapRetainCount=10

5. /home/hadoop/zookeeper-3.4.3-cdh4.1.2/bin/
   $ ./zkServer.sh start  各节点依次执行启动 (第一台机器启动时报大量错误,无妨,是因为还没有选出领导者的缘故)

6. $jps 进程查看
   每个节点上都会多出一个 QuorumPeerMain 进程











7. hbase install
   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/hbase-env.sh
export HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2
export HBASE_HOME=/home/hadoop/hbase-0.92.1-cdh4.1.2
export JAVA_HOME=/usr/jdk6/jdk1.6.0_32
export HBASE_MANAGES_ZK=false
export HBASE_HEAPSIZE=4000
  
   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/hbase-site.xml

hbase.rootdir
hdfs://cup-master-1:9000/hbase


hbase.cluster.distributed
true


hbase.master
cup-master-1:60000


hbase.zookeeper.quorum
cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4


hbase.master.info.port
60010


hbase.master.port
60000


hbase.master.maxclockskew
600000
Time difference of regionserver from master


hbase 压缩-----------------------------------------------------
hbase-site.xml===============================

hbase.regionserver.codecs
snappy,lzo






   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/regionservers
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4

8. $ scp -rp hbase-0.92.1-cdh4.1.2 hadoop@cup-slave-1:/home/hadoop/  其他slave节点依次执行

9. 注意时间同步master与各个slave之间需要进行时间同步(包括时区),时间差不能超过30000ms,否则hbase regionserver启动报org.apache.hadoop.hbase.ClockOutOfSyncException错误

9.1 手动同步时间
    root用户登录
    $date -s 20130219
    $date -s 14:37:00
    $ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

9.2 hbase-site.xml中增加

hbase.master.maxclockskew
180000
Time difference of regionserver from master


10. /home/hadoop/hbase-0.92.1-cdh4.1.2/bin/
   $ ./start-hbase.sh   slave节点会自动被启动
   $ ./stop-hbase.sh    slave节点会自动被关闭

11. http://192.168.101.122:50070可以查看namenode状态以及hdfs上的/hbase目录
    http://192.168.101.122:60010可以查看hbase状态

12. 进程查看
    NN:
13326 ResourceManager
18617 QuorumPeerMain
19630 Jps
12980 NameNode
13190 SecondaryNameNode
19411 HMaster
    DN:
30404 Jps
30181 HRegionServer
27489 QuorumPeerMain
14014 DataNode
14148 NodeManager

HBASE测试snappy压缩:
$hbase org.apache.hadoop.hbase.util.CompressionTest /home/cup/kv.txt snappy


   
HBASE优化参数:

/etc/profile:
JAVA_OPTS="$JAVA_OPTS -server -Xms2g -Xmx12g -XX:NewSize=128m -XX:MaxNewSize=128m"



hbase-env.sh:
export HBASE_HEAPSIZE=4000

export HBASE_OPTS="$HBASE_OPTS -XX:NewSize=128m -XX:MaxNewSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-master-$(hostname).log"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xmx12g -Xms12g -XX:NewSize=256m -XX:MaxNewSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-regionserver-$(hostname).log"


export HBASE_OPTS="$HBASE_OPTS -Xms4g -Xmx4g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:NewRatio=3  -XX:SurvivorRatio=6 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=73 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-master-$(hostname).log"

export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xms12g -Xmx12g -XX:NewSize=3g -XX:MaxNewSize=3g -XX:NewRatio=3 -XX:SurvivorRatio=6 -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=73 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-hbase-hadoop-regionserver-$(hostname).log"




hbase-site.xml

hbase.client.write.buffer: 20MB
hbase.regionserver.handler.count: 100
hbase.hregion.memstore.flush.size: 384MB
hbase.hregion.max.filesize: 2GB
hbase.hstore.compactionThreshold: 3
hbase.hstore.blockingStoreFiles: 10
hbase.hstore.flush.thread: 20
hbase.hstore.compaction.thread: 15
hbase.master.distributed.log.splitting: false



zoo.cfg:
# The number of milliseconds of each tick
tickTime=30000


hbase的各种时间参数设置在[2*tickTime, 20*tickTime]范围之内



1. 集群中新增加一台机器,现有的集群节点不用重启,
   首先做NN到新增加机器的SSH无密码登陆等基础安装配置,
   再将新机器的主机名添加到
   /home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop/slaves
   /home/hadoop/hbase-0.92.1-cdh4.1.2/conf/regionservers
   中,再对hadoop以及hbase执行启动命令,现有节点上的进程不会被影响

2. Hadoop Balancer  可以使DataNode节点上选择策略重新平衡DataNode上的数据块的分布
   /home/hadoop/hadoop-2.0.0-cdh4.1.2/sbin/start-balancer.sh –t 10%
   这个命令中-t参数后面跟的是HDFS达到平衡状态的磁盘使用率偏差值。
   如果机器与机器之间磁盘使用率偏差小于10%,那么我们就认为HDFS集群已经达到了平衡的状态。








1. Oozie install

   /etc/profile:
   OOZIE_HOME=/home/hadoop/oozie-3.2.0-cdh4.1.2


   $OOZIE_HOME//oozie-server/bin/catalina.sh:
   JAVA_HOME=/usr/jdk6/jdk1.6.0_32
   CATALINA_HOME=/home/cup/oozie-3.3.0-cdh4.2.1/oozie-server

   $OOZIE_HOME/bin/oozie-setup.sh:
   $oozie-setup.sh -extjs /home/hadoop/ext-2.2.zip -hadoop 0.20.200 $HADOOP_HOME

   $oozie-setup.sh -extjs /home/hadoop/ext-2.2.zip -hadoop 2.0 $HADOOP_HOME

2. $OOZIE_HOME/bin/oozie-run.sh 启动oozie


5. oozie启动报找不到org/apache/hadoop/utils/ReflectionUtils类
   将/home/hadoop/oozie-3.2.0-cdh4.1.2/libtools/*.jar copy to /home/hadoop/oozie-3.2.0-cdh4.1.2/oozie-server/webapps/oozie/WEB-INF/lib下

6. oozie启动报
REASON: org.apache.oozie.service.ServiceException: E0103: Could not load service classes, Schema 'SA' does not exist {SELECT t0.bean_type, t0.conf, t0.console_url, t0.cred, t0.data, t0.error_code, t0.error_message, t0.external_child_ids, t0.external_id, t0.external_status, t0.name, t0.retries, t0.stats, t0.tracker_uri, t0.transition, t0.type, t0.user_retry_count, t0.user_retry_interval, t0.user_retry_max, t0.end_time, t0.execution_path, t0.last_check_time, t0.log_token, t0.pending, t0.pending_age, t0.signal_value, t0.sla_xml, t0.start_time, t0.status, t0.wf_id FROM WF_ACTIONS t0 WHERE t0.bean_type = ? AND t0.id = ?} [code=30000, state=42Y07]

7. $OOZIE_HOME/bin/ooziedb.sh create -sqlfile oozie.sql -run

Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '3.2.0-cdh4.1.2'


The SQL commands have been written to: oozie.sql

sql脚本保存到$OOZIE_HOME/bin/oozie.sql文件中.

8. oozie-site.xml:
   
   
        oozie.service.ProxyUserService.proxyuser.hue.hosts
        *
   

   
        oozie.service.ProxyUserService.proxyuser.hue.groups
        *
   

   
        oozie.service.ProxyUserService.proxyuser.cup.hosts
        *
   

   
        oozie.service.ProxyUserService.proxyuser.cup.groups
        *
   



8.
Error occurred during initialization of VM
Incompatible minimum and maximum heap sizes specified

oozie-env.sh:
export CATALINA_OPTS="$CATALINA_OPTS -Xms2g -Xmx4g"


8. $OOZIE_HOME/bin/oozie-run.sh 启动oozie   
   $OOZIE_HOME/bin/oozie-run.sh & 后台启动oozie

   最新:
   $oozied.sh run

   $ jps
   28945 Bootstrap

9. $OOZIE_HOME/bin/oozie admin -oozie http://192.168.101.122:11000/oozie -status
   System mode: NORMAL 则表示已经成功
   http://192.168.101.122:11000/oozie就能看到Oozie的管理界面













重启机器hostname变了,集群启动不起来:

2. hostname变了,需要修改 /etc/sysconfig/network
/etc/sysconfig/network: (永久修改主机名)
NETWORKING=yes
HOSTNAME=cup-master-1
GATEWAY=192.168.101.1
依次执行

$source /etc/sysconfig/network
依次执行

3. /etc/profile 环境变量挪到hadoop用户下

5. 关闭防火墙 $sudo service iptables stop
           查看防火墙 $sudo service iptables status


6. /etc/hosts:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.122  cup-master-1
192.168.101.123  cup-master-2
192.168.101.121  cup-slave-1
192.168.101.125  cup-slave-2
192.168.101.124 cup-slave-3
192.168.101.120 cup-slave-4

#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
这一行不注释掉, hbase起不来,,,,,

7. 时间同步 date -s


HBASE启动不起来:
ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect

a. 关闭防火墙 $sudo service iptables stop
b. /etc/hosts 注释掉 ::1         localhost 这一行, 即禁用ipv6
c. 集群中节点时间同步










mysql5.1.67
8. $ sudo /etc/init.d/mysqld start 启动mysql  $service mysqld start
   $ sudo service mysqld status
   $ mysql 进入mysql服务模式
   mysql>
   mysql>exit 退出进入bash shell命令行模式

   $ /usr/bin/mysqladmin -u root password '123' 设置root用户密码
   $ /usr/bin/mysqladmin -u root -h cup-master-1 password '123'


1. Hive Install
1.1 .bash_profile
HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
export HIVE_HOME
HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/hadoop/hive-0.9.0-cdh4.1.2/lib:$CLASSPATH:$HADOOP_HOME/bin
1.2 $ cd /home/hadoop/hive-0.9.0-cdh4.1.2/conf
1.3 $ cp hive-default.xml.template hive-site.xml
1.4 hive-site.xml:

最上面添加:

  hive.aux.jars.path
  file:///root/hive-0.10.0-cdh4.2.0/lib/hive-hbase-handler-0.10.0-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/hbase-0.94.2-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/zookeeper-3.4.5-cdh4.2.0.jar


hive.metastore.warehouse.dir: /home/hadoop/hive-0.9.0-cdh4.1.2/warehouse
hive.exec.scratchdir: /home/hadoop/hive-0.9.0-cdh4.1.2/hive-${user.name}

javax.jdo.option.ConnectionURL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName: com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName: hive
javax.jdo.option.ConnectionPassword: hive


以下两处的description标签有语法错误,需要补上:
1) hive.optimize.union.remove  at line474
2) hive.mapred.supports.subdirectories at line 489
以下三处的partition-dir标签有语法错误,需要补上:
1) hive.exec.list.bucketing.default.dir at line561
2) hive.exec.list.bucketing.default.dir at line562
3) hive.exec.list.bucketing.default.dir at line563


hive-env.sh:

export HADOOP_HOME=/home/cup/hadoop-2.0.0-cdh4.2.1
export HBASE_HOME=/home/cup/hbase-0.94.2-cdh4.2.1
export JAVA_HOME=/usr/jdk6/jdk1.6.0_32
export HIVE_CLASSPATH=$HBASE_HOME/conf
####export HIVE_AUX_JARS_PATH=/home/cup/hive-0.10.0-cdh4.2.1/lib:$HADOOP_CLASSPATH
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib


注释掉HIVE_AUX_JARS_PATH的原因:
因为hive提交mr任务的时候调用hive.aux.jars.path变量,
该变量的值应该为file:///root/hive-0.10.0-cdh4.2.0/lib/hive-hbase-handler-0.10.0-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/hbase-0.94.2-cdh4.2.0.jar,file:///root/hive-0.10.0-cdh4.2.0/lib/zookeeper-3.4.5-cdh4.2.0.jar
这个是在hive-site.xml中配置,而
hive-env.sh中的export HIVE_AUX_JARS_PATH需要注释,
否则报java.io.FileNotFoundException: File file:/home/hadoop/hive-0.10.0-cdh4.4.0/lib:***** does not exist
就算不注释掉,也得修改为
export HIVE_AUX_JARS_PATH=file:///home/cup/hive-0.10.0-cdh4.2.1/lib


##使用HIVE脚本往外部表(映射到hbase的snappy压缩表)中insert数据时HIVE需要通过HIVE_AUX_JARS_PATH找到以下jar包:
hive-hbase-handler-0.10.0-cdh4.2.0.jar
hbase-0.94.2-cdh4.2.0.jar
zookeeper-3.4.5-cdh4.2.0.jar
所以此处需要配置为HIVE_AUX_JARS_PATH=/root/hive-0.10.0-cdh4.2.0/lib/:$HADOOP_CLASSPATH
添加$HADOOP_CLASSPATH是因为在HIVE里面添加外部表(与HBASE的snappy压缩表关联)时找不到snappy的类


将hadoop-common的jar包拷贝到/home/cup/hive-0.10.0-cdh4.2.1/lib下,
否则
Failed with exception java.io.IOException:java.io.IOException:
Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
或者
Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.Sna
ppyCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:134)
at org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:174)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 23 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.io.compress.Sna
ppyCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
... 25 more


hive-log4j.properties:

hive.log.dir=/home/cup/hive-0.10.0-cdh4.2.1/logs
hive.log.file=hive.log


   重新启动mysql
   $ mysql -u root -p 输入密码123
   mysql>
  
   mysql> create database hive;
   ## grant select on 数据库.* to 用户名@登录主机 identified by "密码"
   mysql> grant all on hive.* to 'hive'@'localhost' identified by 'hive';
   mysql> grant all on hive.* to 'hive'@'%' identified by 'hive';

   mysql-connector-java-5.1.22-bin.jar 拷贝到/home/hadoop/hive-0.9.0-cdh4.1.2/lib下

1.5
hive --service hwi &
http://192.168.98.20:9999/hwi

hive --service hiveserver &
[hadoop@cup-master-1 bin]$ Starting Hive Thrift Server

$ jps
29082 RunJar

$nohup hive --service hiveserver &
[hadoop@cup-master-1 bin]$ nohup: ignoring input and appending output to `nohup.out'

或者可以按照完HUE后由HUE进行统一启动。


HIVE 集成 HBASE:

hive>create external table snappy_hive(key int, value string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
with serdeproperties ("hbase.columns.mapping"=":key,cf:value")
tblproperties ("hbase.table.name"="snappy_table");

hive>create table hive (key int,value string) row format delimited fields terminated by ',';
hive>load data local inpath '/home/cup/kv.txt' into table hive;
hive>insert overwrite table snappy_hive select * from hive;


snappy --- HIVE
To enable Snappy compression for Hive output when creating SequenceFile outputs, use the following settings:
SET hive.exec.compress.output=true;
SET hive.exec.compress.intermediate=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
SET hive.cli.print.header=true;
SET hive.cli.print.current.db=true;


# JVM reuse
Hadoop will typically launch map or reduce tasks in a forked JVM.
the JVM startup may create significant overhead, especially when launching
jobs with hundreds or thousands of tasks, most which have short execution times.
Reuse allows a JVM instance to be reused up to N times for the same job.
in mapred-site.xml:

mapred.job.reuse.jvm.num.tasks
10



hive.exec.scratchdir:
/home/cup/hive-0.10.0-cdh4.2.1/hive-${user.name}
hive.metastore.warehouse.dir:
/home/cup/hive-0.10.0-cdh4.2.1/warehouse



HIVE元数据库使用ORACLE:
1) 手动oracle版本的hive元数据库脚本 hive-0.10.0-cdh4.2.1\scripts\metastore\upgrade\oracle\hive-schema-0.10.0.oracle.sql
2) 修改hive-site.xml--jdbc连接
3) nohup hive --service hiveserver &


HIVE用户权限:

其他用户想执行HIVE需要配置以下几项:
.bash_profile
/home/hadoop/cdh42/cdhworkspace/tmp               chmod 777
/home/hadoop/cdh42/hive-0.10.0-cdh4.2.0/logs      chmod 777

hive>grant create/all on database default to user xhyt;
hive>show grant user xhyt on databaase default;
hive>grant select on table hive_t to user xhyt;
hive>grant select on table hive_t to group xhyt;




hbase-env.sh里面加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib
hadoop-env.sh里面也加了export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native:/usr/lib64:/usr/local/lib




root用户下/etc/profile:
#set java environment
JAVA_HOME=/usr/jdk6/jdk1.6.0_32
CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
JAVA_OPTS="$JAVA_OPTS -server -Xms1024m -Xmx4096m"
PATH=$JAVA_HOME/bin:$PATH
export JAVA_HOME JAVA_OPTS CLASSPATH PATH


hadoop用户下/home/hadoop/.bash_profile:
# User specific environment and startup programs
HADOOP_HOME=/home/hadoop/hadoop-2.0.0-cdh4.1.2
HADOOP_MAPRED_HOME=$HADOOP_HOME
HADOOP_COMMON_HOME=$HADOOP_HOME
HADOOP_HDFS_HOME=$HADOOP_HOME
YARN_HOME=$HADOOP_HOME
ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.3-cdh4.1.2
HBASE_HOME=/home/hadoop/hbase-0.92.1-cdh4.1.2
OOZIE_HOME=/home/hadoop/oozie-3.2.0-cdh4.1.2
CATALINA_HOME=$OOZIE_HOME/oozie-server
ANT_HOME=/home/hadoop/apache-ant-1.8.4
MAVEN_HOME=/home/hadoop/apache-maven-3.0.4
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`

PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$OOZIE_HOME/bin:$CATALINA_HOME/bin:$ANT_HOME/bin:$MAVEN_HOME/bin:$PATH
export HADOOP_CLASSPATH HADOOP_HOME HADOOP_MAPRED_HOME HADOOP_COMMON_HOME HADOOP_HDFS_HOME YARN_HOME ZOOKEEPER_HOME HBASE_HOME OOZIE_HOME CATALINA_HOME ANT_HOME MAVEN_HOME PATH



3. jdk内存调整大小 /etc/profile
   export JAVA_OPTS="$JAVA_OPTS -server -Xms1024m -Xmx4096m"
   $source /etc/profile
   各节点依次执行






HADOOP机架感知-提高网络性能

core-site.xml:

topology.script.file.name
/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/rackaware.sh


/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/rackaware.sh
#!/bin/bash

HADOOP_CONF=/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop

while [ $# -gt 0 ] ; do
  nodeArg=$1
  exec< ${HADOOP_CONF}/topology.data
  result=""
  while read line ; do
    ar=( $line )
    if [ "${ar[0]}" = "$nodeArg" ] ; then
      result="${ar[1]}"
    fi
  done
  shift
  if [ -z "$result" ] ; then
    echo -n "/default/rack "
  else
    echo -n "$result "
  fi
done

$chmod 755 rackaware.sh


/home/cup/hadoop-2.0.0-cdh4.2.1/etc/hadoop/topology.data
cup-master-1  /default/rack1
cup-master-2  /default/rack1
cup-slave-1  /default/rack1
cup-slave-2  /default/rack1
cup-slave-3  /default/rack1
cup-slave-4  /default/rack1
cup-slave-5  /default/rack1
cup-slave-6  /default/rack1
cup-slave-7  /default/rack2
cup-slave-8  /default/rack2
cup-slave-9  /default/rack2
cup-slave-10 /default/rack2
cup-slave-11 /default/rack2
cup-slave-12 /default/rack2
10.204.193.10 /default/rack1
10.204.193.11 /default/rack1
10.204.193.20 /default/rack1
10.204.193.21 /default/rack1
10.204.193.22 /default/rack1
10.204.193.23 /default/rack1
10.204.193.24 /default/rack1
10.204.193.25 /default/rack1
10.204.193.26 /default/rack2
10.204.193.27 /default/rack2
10.204.193.28 /default/rack2
10.204.193.29 /default/rack2
10.204.193.30 /default/rack2
10.204.193.31 /default/rack2






1. hue install (hadoop user experience)
   $python 进入python解释器
   ctrl+z退出python解释器

   Required Dependencies:
   gcc, g++,
   libgcrypt-devel, libxml2-devel, libxslt-devel,
   cyrus-sasl-devel, cyrus-sasl-gssapi,
   mysql-devel, python-devel, python-setuptools, python-simplejson,
   sqlite-devel, openldap-devel,
   ant

libgcrypt-devel-1.4.5-9.el6.x86_64
libxslt-devel-1.1.26-2.el6.x86_64
cyrus-sasl-devel-2.1.23-13.el6.x86_64
mysql-devel-5.1.52.el6_0.1.x86_64
openldap-devel-2.4.23-20.el6.x86_64

   install ant
   install maven


$make
/home/hadoop/hue-2.1.0-cdh4.1.2/Makefile.vars:42: *** "Error: must have python development packages for 2.4, 2.5, 2.6 or 2.7. Could not find Python.h. Please install python2.4-devel, python2.5-devel, python2.6-devel or python2.7-devel".  Stop.

/usr/include/python2.6/下只有pyconfig-64.h,没有Python.h文件
/home/hadoop/hue-2.1.0-cdh4.1.2/Makefile.vars中会进行判断

这是因为没有安装python-devel模块的原因



5. $ cd /home/hadoop/hue-2.1.0-cdh4.1.2
   $ PREFIX=/home/hadoop/hue-2.1.0-cdh4.1.2-bin make install
   $ sudo chmod 4750 apps/shell/src/shell/build/setuid



2. hadoop config
hdfs-site.xml:

  dfs.webhdfs.enabled
  true


core-site.xml:

  hadoop.proxyuser.hadoop.hosts
  *


  hadoop.proxyuser.hadoop.groups
  *


  hadoop.proxyuser.hue.hosts
  *


  hadoop.proxyuser.hue.groups
  *


httpfs-site.xml:

  httpfs.proxyuser.hadoop.hosts
  *


  httpfs.proxyuser.hadoop.groups
  *


  httpfs.proxyuser.hue.hosts
  *


  httpfs.proxyuser.hue.groups
  *


mapred-site.xml:

  jobtracker.thrift.address
  0.0.0.0:9290


  mapred.jobtracker.plugins
  org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin
  Comma-separated list of jobtracker plug-ins to be activated.


3. $ cd /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue
   $ cp desktop/libs/hadoop/java-lib/hue-plugins-*.jar /home/hadoop/hadoop-2.0.0-cdh4.1.2/share/hadoop/mapreduce/lib
   如果HUE安装主机和hadoop集群master主机不再同一个主机上,那么需要使用scp命令进行拷贝
   HUE使用这个插件jar文件来与JobTracker通信

4. 重启hadoop集群

5. config oozie for hue
oozie-site.xml:

    oozie.service.ProxyUserService.proxyuser.hadoop.hosts
    *


    oozie.service.ProxyUserService.proxyuser.hadoop.groups
    *


    oozie.service.ProxyUserService.proxyuser.hue.hosts
    *


    oozie.service.ProxyUserService.proxyuser.hue.groups
    *


  oozie.service.AuthorizationService.security.enabled
  true


6. 重启oozie

7. 确认关闭防火墙(HUE SERVER对外提供服务使用默认8888端口)



9. /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/desktop/conf/hue.ini
[desktop]
http_host=0.0.0.0
http_port=8888
[[database]]
engine=mysql
host=cup-master-1
port=3306
user=hue
password=hue
name=hue
[[hdfs_clusters]]
fs_defaultfs=hdfs://cup-master-1:9000
webhdfs_url=http://cup-master-1:50070/webhdfs/v1
hadoop_hdfs_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[[mapred_clusters]]
jobtracker_host=cup-master-1
jobtracker_port=8021
thrift_port=9290
hadoop_mapred_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[[yarn_clusters]]
resourcemanager_host=cup-master-1
resourcemanager_port=8032
hadoop_mapred_home=/home/hadoop/hadoop-2.0.0-cdh4.1.2
hadoop_bin=/home/hadoop/hadoop-2.0.0-cdh4.1.2/bin/hadoop
hadoop_conf_dir=/home/hadoop/hadoop-2.0.0-cdh4.1.2/etc/hadoop
[liboozie]
oozie_url=http://cup-master-1:11000/oozie
[beeswax]
hive_home_dir=/home/hadoop/hive-0.9.0-cdh4.1.2
hive_conf_dir=/home/hadoop/hive-0.9.0-cdh4.1.2/conf


HUE默认使用sqlite库,,,,
  [[database]]
    # Database engine is typically one of:
    # postgresql_psycopg2, mysql, or sqlite3
    #
    # Note that for sqlite3, 'name', below is a filename;
    # for other backends, it is the database name.
    engine=sqlite3
    ## host=
    ## port=
    ## user=
    ## password=
    name=/home/cup/hue-2.2.0-cdh4.2.1-bin/hue/desktop/desktop.db

10. 初始化

   重新启动mysql
   $ mysql -u root -p 输入密码123
   mysql>
  
   mysql> create database hue;
   ## grant select on 数据库.* to 用户名@登录主机 identified by "密码"
   mysql> grant all on hue.* to 'hue'@'localhost' identified by 'hue';


    备份已有数据文件 /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue dumpdata > /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json
   

    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue syncdb --noinput
    $ mysql -u hue -p hue -e "DELETE FROM hue.django_content_type;"

    migrate之前备份的数据:
    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/hue loaddata /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/hue_dump.json

11. .bash_profile:
HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
HADOOP_CLASSPATH=`$HBASE_HOME/bin/hbase classpath`
HADOOP_CLASSPATH=/home/hadoop/hive-0.9.0-cdh4.1.2/lib:$HADOOP_CLASSPATH:$CLASSPATH:$HADOOP_HOME/bin



12. 启动
    $ /home/hadoop/hue-2.1.0-cdh4.1.2-bin/hue/build/env/bin/supervisor

    HUE会把HIVE一并启动

    ***停的时候需要使用root用户kill掉Runjar进程,,否则cup用户kill的时候
    总是会自动重新启动

13. 查看
    http://192.168.101.122:8888  hue/hue  hadoop/hadoop



5. HUE shell配置
   HUE supervisor进程查询 $ps -f -u cup

[cup@cup-master-1 ~]$ ps -f -u cup
UID        PID  PPID  C STIME TTY          TIME CMD
cup       7597  7594  0 17:18 ?        00:00:00 sshd: cup@pts/1 
cup       7598  7597  0 17:18 pts/1    00:00:00 -bash
cup       7777  7598  0 17:19 pts/1    00:00:00 vim hive-site.xml
cup       7943  7940  0 17:21 ?        00:00:00 sshd: cup@pts/5 
cup       7944  7943  0 17:21 pts/5    00:00:00 -bash
cup       9860  9857  0 17:32 ?        00:00:00 sshd: cup@pts/9 
cup       9861  9860  0 17:32 pts/9    00:00:00 -bash
cup      10560 10558  0 17:36 ?        00:00:01 sshd: cup@pts/2 
cup      10561 10560  0 17:36 pts/2    00:00:00 -bash
cup      10780 10560  0 17:38 ?        00:00:00 /usr/libexec/openssh/sftp-server
cup      11683 10561  0 17:47 pts/2    00:00:00 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 ./supervisor
cup      11687 11683  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/hue runspawningserver
cup      11689 11683  2 17:47 pts/2    00:00:17 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      11743 11687  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 -c import sys; from spawning import spawning_child; spawning_child.main() 11687 3 15 s
cup      11874 11873  0 17:49 pts/1    00:00:00 bash
cup      11896  7944  9 17:49 pts/5    00:00:44 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      12147 11874  4 17:50 pts/1    00:00:21 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      12351 11874  0 17:54 pts/1    00:00:00 vim hive-site.xml
cup      12748 10561  4 17:57 pts/2    00:00:00 ps -f -u cup
cup      24208     1  2 Jul09 ?        00:30:54 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_namenode -Xmx2000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-
cup      24660     1  0 Jul09 ?        00:02:07 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_zkfc -Xmx2000m -Djava.net.preferIPv4Stack=true -Xmx128m -Xmx128m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4
cup      24842     1  0 Jul09 ?        00:11:10 /usr/jdk6/jdk1.6.0_32/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dyarn.log.dir=/home/cup/hado
cup      25394     1  1 Jul09 ?        00:14:32 /usr/jdk6/jdk1.6.0_32/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx24000m -Xms24g -Xmx32g -XX:NewSize=1g -XX:MaxNewSize=1g -XX:NewRatio=3 -XX:Sur
cup      41822 41819  0 13:45 ?        00:00:00 sshd: cup       
cup      51570 51568  0 Jul08 ?        00:00:00 sshd: cup@pts/3 
cup      51571 51570  0 Jul08 pts/3    00:00:00 -bash
cup      56534 56531  0 Jul08 ?        00:00:01 sshd: cup@notty 
cup      56535 56534  0 Jul08 ?        00:00:00 /usr/libexec/openssh/sftp-server
cup      58691 58688  0 09:46 ?        00:00:00 sshd: cup@pts/0 
cup      58692 58691  0 09:46 pts/0    00:00:00 -bash



其中的
cup      11683 10561  0 17:47 pts/2    00:00:00 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 ./supervisor
cup      11687 11683  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/hue runspawningserver
cup      11689 11683  2 17:47 pts/2    00:00:17 /usr/jdk6/jdk1.6.0_32/bin/java -Xmx2000m -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/home/cup/hadoop-2.0.0-cdh4.2.1/logs -Dhadoop.log.file=ha
cup      11743 11687  0 17:47 pts/2    00:00:02 /home/cup/hue-2.2.0-cdh4.2.1-bin/hue/build/env/bin/python2.6 -c import sys; from spawning import spawning_child; spawning_child.main() 11687 3 15 s
是HUE相关的进程,,
想要停掉HUE需要先kill -9 11689,即RunJar进程,,
再停掉11687(runspawningserver)以及11683(supervisor)

否则不停掉11689(hue runjar)下次启动hue时会报8002,8003端口的socket无法创建










HBASE优化参数:

hbase-env.sh:
export HBASE_HEAPSIZE=4000

hbase-site.xml:

hbase.client.write.buffer: 20MB
hbase.regionserver.handler.count: 100
hbase.hregion.memstore.flush.size: 384MB
hbase.hregion.max.filesize: 2GB
hbase.hstore.compactionThreshold: 3
hbase.hstore.blockingStoreFiles: 10
hbase.hstore.flush.thread: 20
hbase.hstore.compaction.thread: 15


zoo.cfg:
# The number of milliseconds of each tick
tickTime=30000


hbase的各种时间参数设置在[2*tickTime, 20*tickTime]范围之内
hbase-site.xml:

hbase.rootdir
hdfs://cup-master-1:9000/hbase


hbase.cluster.distributed
true


hbase.master
cup-master-1:60000


hbase.zookeeper.quorum
cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4


hbase.master.info.port
60010


hbase.master.port
60000


hbase.master.maxclockskew
180000
Time difference of regionserver from master




hbase.rpc.timeout
540000



ipc.socket.timeout
540000



hbase.regionserver.lease.period
540000
HRegion server lease period in milliseconds. Default is
      60 seconds. Clients must report in within this period else they are
      considered dead.
   


 
    zookeeper.session.timeout
    540000
    ZooKeeper session timeout.
      HBase passes this to the zk quorum as suggested maximum time for a
      session.  See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
      "The client sends a requested timeout, the server responds with the
      timeout that it can give the client. "
      In milliseconds.
   

 


hbase.regionserver.restart.on.zk.expire
true
when timeout occurs, regionserver will be restarted but not to shut down
 


 
    hbase.client.write.buffer
    20971520 
    Default size of the HTable client write buffer in bytes.
    A bigger buffer takes more memory -- on both the client and server
    side since server instantiates the passed write buffer to process
    it -- but a larger buffer size reduces the number of RPCs made.
    For an estimate of server-side memory-used, evaluate
    hbase.client.write.buffer * hbase.regionserver.handler.count
   

 

 
    hbase.regionserver.handler.count
    100
    Count of RPC Server instances spun up on RegionServers
    Same property is used by the Master for count of master handlers.
    Default is 10.
   

 
 
 
    hbase.hregion.memstore.flush.size
    402653184
   
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency.
   

 
 
 
    hbase.hregion.max.filesize
    2147483648
   
    Maximum HStoreFile size. If any one of a column families' HStoreFiles has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
   

 
 
 
    hbase.hstore.compactionThreshold
    3
   
    If more than this number of HStoreFiles in any one HStore
    (one HStoreFile is written per flush of memstore) then a compaction
    is run to rewrite all HStoreFiles files as one.  Larger numbers
    put off compaction but when it runs, it takes longer to complete.
   

 
 
 
    hbase.hstore.blockingStoreFiles
    10
   
    If more than this number of StoreFiles in any one Store
    (one StoreFile is written per flush of MemStore) then updates are
    blocked for this HRegion until a compaction is completed, or
    until hbase.hstore.blockingWaitTime has been exceeded.
   

 
 
 
 
    hbase.hstore.flush.thread
    20
 
   
 
    hbase.hstore.compaction.thread
    15
 

 
 
 
 
 
 
 
 
 
 
 
HADOOP2.0 HA (NO NN Federation)

1. SSH无密码登陆配置
2. 修改hadoop配置文件(cup-master-1,cup-slave-1,cup-slave-2,cup-slave-3,cup-slave-4)

配置文件如下:
vi core-site.xml:



  fs.defaultFS
  hdfs://mycluster  


   ha.zookeeper.quorum
   cup-master-1:2181,cup-slave-1:2181,cup-slave-2:2181,cup-slave-3:2181,cup-slave-4:2181




vi hdfs-site.xml


  dfs.nameservices
  mycluster


   dfs.ha.namenodes.mycluster
   nn1,nn2


   dfs.namenode.rpc-address.mycluster.nn1
   cup-master-1:9000


    dfs.namenode.rpc-address.mycluster.nn2
    cup-master-2:9000


     dfs.namenode.http-address.mycluster.nn1
     cup-master-1:50070


    dfs.namenode.http-address.mycluster.nn2
    cup-master-2:50070


  dfs.namenode.shared.edits.dir
  qjournal://cup-master-1:8485;cup-slave-1:8485;cup-slave-2:8485;cup-slave-3:8485;cup-slave-4:8485/mycluster


   dfs.journalnode.edits.dir
   /home/hadoop/hadoopworkspace/dfs/jn


  dfs.client.failover.proxy.provider.mycluster
  org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider




  dfs.ha.fencing.methods
  shell(/bin/true)

或者是

dfs.ha.fencing.methods
sshfence


dfs.ha.fencing.ssh.private-key-files
/home/exampleuser/.ssh/id_rsa


    dfs.ha.fencing.ssh.connect-timeout
    30000
      
          SSH connection timeout, in milliseconds, to use with the builtin
        sshfence fencer.
        







  dfs.datanode.max.transfer.threads
  4096
           
                 Specifies the maximum number of threads to use for transferring data
                     in and out of the DN.
                 



    dfs.ha.automatic-failover.enabled
    true



   dfs.namenode.name.dir
   /home/hadoop/hadoopworkspace/dfs/name


   dfs.datanode.data.dir
   /home/hadoop/hadoopworkspace/dfs/data

  


[root@HA2kerberos conf]# vim slaves
cup-slave-1
cup-slave-2
cup-slave-3
cup-slave-4

3. master-1上的hadoop拷贝到master-2上 scp
4. 把各个zookeeper起来
5. 然后在某一个主节点执行hdfs zkfc -formatZK,创建命名空间

6. 在dfs.namenode.shared.edits.dir指定的各个节点
   (qjournal://cup-master-1:8485;cup-slave-1:8485;cup-slave-2:8485;cup-slave-3:8485;cup-slave-4:8485/mycluster)
   用./hadoop-daemon.sh start journalnode启日志程序
  
7. 在主namenode节点用hadoop namenode -format格式化namenode和journalnode目录

8. 在主namenode节点启动./hadoop-daemon.sh start namenode进程  ./start-dfs.sh

9. 在备namenode节点执行hdfs namenode -bootstrapStandby,
   这个是把主namenode节点的目录格式化并把数据从主namenode节点的元数据拷本过来
  
   然后用./hadoop-daemon.sh start namenode启动namenode进程!
  
6. ./hadoop-daemon.sh start zkfc 主备namenode两个节点都做
7. ./hadoop-daemon.sh start datanode所有datanode节点都做


先起namenode在起zkfc你会发现namenode无法active状态,当你把zkfc启动后就可以了!!!
以上的顺序不能变,我在做的过程就因为先把zkfc启动了,导到namenode起不来!!!
自动启动的时候能看出来,zkfc是最后才启动的!!
[hadoop@ClouderaHA1 sbin]$ ./start-dfs.sh
Starting namenodes on [ClouderaHA1 ClouderaHA2]
ClouderaHA1: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-ClouderaHA1.out
ClouderaHA2: starting namenode, logging to /app/hadoop/logs/hadoop-hadoop-namenode-ClouderaHA2.out
ClouderaHA3: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA3.out
ClouderaHA1: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA1.out
ClouderaHA2: starting datanode, logging to /app/hadoop/logs/hadoop-hadoop-datanode-ClouderaHA2.out
Starting ZK Failover Controllers on NN hosts [ClouderaHA1 ClouderaHA2]
ClouderaHA1: starting zkfc, logging to /app/hadoop/logs/hadoop-hadoop-zkfc-ClouderaHA1.out
ClouderaHA2: starting zkfc, logging to /app/hadoop/logs/hadoop-hadoop-zkfc-ClouderaHA2.out




A. 先各个节点启journalnode
   hadoop-daemon.sh start journalnode
  
B. 在主master节点start-dfs.sh start-yarn.sh

[hadoop@cup-master-1 ~]$ start-dfs.sh
Starting namenodes on [cup-master-1 cup-master-2]
hadoop@cup-master-1's password: cup-master-2: starting namenode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-namenode-cup-master-2.out

cup-master-1: starting namenode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-namenode-cup-master-1.out
cup-slave-4: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-4.out
cup-slave-1: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-1.out
cup-slave-3: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-3.out
cup-slave-2: starting datanode, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-datanode-cup-slave-2.out
Starting ZK Failover Controllers on NN hosts [cup-master-1 cup-master-2]
hadoop@cup-master-1's password: cup-master-2: starting zkfc, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-zkfc-cup-master-2.out

cup-master-1: starting zkfc, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/hadoop-hadoop-zkfc-cup-master-1.out
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-1 ~]$ jps
30939 NameNode
28526 QuorumPeerMain
29769 JournalNode
31283 Jps
31207 DFSZKFailoverController
[hadoop@cup-master-1 ~]$

[hadoop@cup-master-2 ~]$ jps
13197 DFSZKFailoverController
12305 NameNode
15106 Jps
[hadoop@cup-master-2 ~]$



[hadoop@cup-master-1 ~]$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-resourcemanager-cup-master-1.out
cup-slave-4: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-4.out
cup-slave-1: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-1.out
cup-slave-3: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-3.out
cup-slave-2: starting nodemanager, logging to /home/hadoop/hadoop-2.0.0-cdh4.1.2/logs/yarn-hadoop-nodemanager-cup-slave-2.out
[hadoop@cup-master-1 ~]$
[hadoop@cup-master-1 ~]$ jps
30939 NameNode
28526 QuorumPeerMain
29769 JournalNode
31628 Jps
31207 DFSZKFailoverController
31365 ResourceManager
[hadoop@cup-master-1 ~]$

[hadoop@cup-master-2 ~]$ jps
13197 DFSZKFailoverController
12305 NameNode
17092 Jps

由此得知HA只是针对HDFS, 与MR2无关

[hadoop@cup-slave-1 ~]$ jps
30692 JournalNode
31453 NodeManager
31286 DataNode
30172 QuorumPeerMain
31562 Jps
[hadoop@cup-slave-1 ~]$





HBASE HA CONF:

1. hbase-site.xml


hbase.rootdir
hdfs://mycluster/hbase   


hbase.cluster.distributed
true


hbase.master
cup-master-1:60000


2. 将core-site.xml和hdfs-site.xml拷贝到hbase_home\conf\下
   否则hbase无法启动,不认hdfs://mycluster





HA调试失败

还原的时候必须
1. 清空目录
NNs上: /home/hadoop/cdh42/cdhworkspace/dfs/name
DNs上: /home/hadoop/cdh42/cdhworkspace/dfs/data
JNs上: /home/hadoop/cdh42/cdhworkspace/dfs/jn

2. 做格式化操作
NNs上: hdfs namenode -format


Cannot start an HA namenode with name dirs that need recovery. Dir: Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/name state: NOT_FORMATTED

NNs上: hdfs namenode -format
format时要求ZK进程以及JN进程启动
zkServer.sh start
hadoop-daemon.sh start journalnode



Incompatible namespaceID for journal Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster: NameNode has nsId 264369592 but storage has nsId 1178230309

修改/home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster/current/VERSION文件中的namespaceID


Incompatible clusterID for journal Storage Directory /home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster: NameNode has clusterId 'CID-34eabdd9-ca2c-48ff-9127-b6df81aded90' but storage has clusterId 'CID-c1012f1d-e2f1-4a0b-89f6-cafabef1cf7e'

修改/home/hadoop/cdh42/cdhworkspace/dfs/jn/mycluster/current/VERSION文件中的clusterId


Incompatible clusterIDs in /home/hadoop/cdh42/cdhworkspace/dfs/data: namenode clusterID = CID-34eabdd9-ca2c-48ff-9127-b6df81aded90; datanode clusterID = CID-c1012f1d-e2f1-4a0b-89f6-cafabef1cf7e

修改/home/hadoop/cdh42/cdhworkspace/dfs/data/current/VERSION文件中的clusterId

原因:每次format会新生成namespaceID以及clusterID
而此时cdhworkspace/dfs/name,cdhworkspace/dfs/data, cdhworkspace/dfs/jn里面的namespaceID以及clusterID是旧的,
所以要在format前清空所有机器上的所有目录
NNs上: /home/hadoop/cdh42/cdhworkspace/dfs/name
DNs上: /home/hadoop/cdh42/cdhworkspace/dfs/data
JNs上: /home/hadoop/cdh42/cdhworkspace/dfs/jn




HBASE调大
ulimit -a  open files需要调大



dfs.replication.interval
dfs.datanode.handler.count
dfs.namenode.handler.count


HIVE集成HBASE 需要拷贝hbase配置文件到hadoop下:
hbase->hadoop:
hbase-0.94.2-cdh4.2.0/conf/hbase-site.xml copy to hadoop-2.0.0-cdh4.2.0/etc/hadoop/下






挂载ISO镜像文件:
mount -t iso9660 -o loop /*/*.iso /mnt


[contrib1]
name=Server
baseurl=file:///mnt/Server
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release








1. 晚上我查询研究了一下,目前主流的观点是rowkey: 10-100B,即rowkey长度控制在10到100个字节,
   rowkey过长会降低memstore检索效率以及hfile的存储效率,有百害而无一利。
2. 我这边结合咱们的场景以及数据模型,推荐以下长度:
    recommanded 8B=64b,16B=128b,24B=192b,32B=256b,最大不要超过32字节。
    即分别是8字节, 16字节, 24字节以及32字节,皆取8的整数倍,原因是64位机器内存分配以8字节倍数对齐。

3. 以下为量化分析:
8B = 64b = 2^64 = 1.844674407371 * 10^19     --20bits long int  --最大20位整数
16B = 128b = 2^128 = 3.4028236692094 * 10^38  --39bits long int  --最大39位整数
24B = 192b = 2^192 = 6.2771017353867 * 10^57  --58bits long int  --最大58位整数
32B = 256b = 2^256 = 1.1579208923732 * 10^77  --78bits long int  --最大78位整数

而根据咱们的设计话单表ROWKEY按如下方式组织->
6156911095 8534567490 11000 45000 1111111111111111111
反转电话 10位
取反时间 10位
小区维度 10位
终端维度 19位
总共是49位整数,,所以建议直接采用该方案,ROWKEY按照24个字节走,最大支持58位整数,取57位,
这样仍然有8位的空余可用,如果不需要那就转字节的时候自动填零即可。








CDH2.0 native lib compiling
依赖包::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
maven
apr-1.4.6.tar.gz
apr-util-1.5.1.tar.gz
httpd-2.2.23.tar.gz
php-5.3.18.tar.gz
rrdtool-1.4.7.tar.gz
pcre-8.31.tar.gz
libconfuse-2.6-2.el5.rf.x86_64.rpm
libconfuse-devel-2.6-2.el5.rf.x86_64.rpm
libxml2-devel rpmbuild glib2-devel dbus-devel freetype-devel fontconfig-devel
gcc-c++ expat-devel python-devel libXrender-devel
yum -y install apr-devel apr-util check-devel cairo-devel pango-devel

pcre-devel
tcl-devel
zlib-devel
bzip2-devel
libX11-devel
readline-devel   
libXt-devel  
tk-devel
tetex-latex

rhbase:
libboost-dev libboost-test-dev libboost-program-options-dev libevent-dev
automake libtool flex bison pkg-config g++ libssl-dev


1. install lzo以及lzo-devel  lzo-devel  zlib-devel openssl-devel
   dependancy: lzo-devel  zlib-devel  gcc autoconf automake libtool

2. install ProtocolBuffers: http://wiki.apache.org/hadoop/HowToContribute
3. $cd /home/hadoop/protobuf-2.5.0/   ##root用户
   $./configure
   $make
   $make install

4. $cd /home/hadoop/protobuf-2.5.0/java ##hadoop用户
   $mvn compile
   $mvn install

5. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common
   modify pom.xml: add

    com.google.protobuf
    protobuf-java
    2.5.0 
  

6. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
   $mvn clean install -DskipTests -P native


******************注意, 因为hadoop-common-project/hadoop-common中包含snappy压缩的代码,
所以common本地库编译的时候最好事先安装好snappy,如snappy-1.1.0,否则使用snappy压缩时会提示:
this version of libhadoop was built without snappy support
snappy-1.1.0.tar.

http://code.google.com/p/hadoop-snappy/
$ mvn package [-Dsnappy.prefix=SNAPPY_INSTALLATION_DIR]
$mvn clean install -DskipTests -P native package -Dsnappy.prefix=SNAPPY_INSTALLATION_DIR 
$mvn clean install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0

##不加-Dsnappy.prefix=/root/snappy-1.1.0的话
会提示snappy native library was compiled without snappy support
this version of libhadoop was built without snappy support
http://code.google.com/p/hadoop-snappy/上有说明


copy to hadoop-common-project/hadoop-common----------------------------

7. copy /home/hadoop/protobuf-2.5.0/java/target/generated-sources/com/google/protobuf/DescriptorProtos.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/generated-sources/java/com/google/protobuf/

8. copy /home/hadoop/protobuf-2.5.0/java/src/main/java/com/google/protobuf/*.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/generated-sources/java/com/google/protobuf/

9. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
   $mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
   注意,没有clean,否则拷过去的java文件会被删除

main:
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.427s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.986s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [0.933s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [0.852s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.246s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [0.645s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [0.827s]
[INFO] Apache Hadoop Common .............................. FAILURE [49.566s]
[INFO] Apache Hadoop Common Project ...................... SKIPPED
[INFO] Apache Hadoop HDFS ................................ SKIPPED
[INFO] Apache Hadoop HttpFS .............................. SKIPPED
[INFO] Apache Hadoop HDFS Project ........................ SKIPPED
[INFO] hadoop-yarn ....................................... SKIPPED
[INFO] hadoop-yarn-api ................................... SKIPPED
[INFO] hadoop-yarn-common ................................ SKIPPED
[INFO] hadoop-yarn-server ................................ SKIPPED
[INFO] hadoop-yarn-server-common ......................... SKIPPED
[INFO] hadoop-yarn-server-nodemanager .................... SKIPPED
[INFO] hadoop-yarn-server-web-proxy ...................... SKIPPED
[INFO] hadoop-yarn-server-resourcemanager ................ SKIPPED
[INFO] hadoop-yarn-server-tests .......................... SKIPPED
[INFO] hadoop-yarn-client ................................ SKIPPED
[INFO] hadoop-yarn-applications .......................... SKIPPED
[INFO] hadoop-yarn-applications-distributedshell ......... SKIPPED
[INFO] hadoop-mapreduce-client ........................... SKIPPED
[INFO] hadoop-mapreduce-client-core ...................... SKIPPED
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SKIPPED
[INFO] hadoop-yarn-site .................................. SKIPPED
[INFO] hadoop-yarn-project ............................... SKIPPED
[INFO] hadoop-mapreduce-client-common .................... SKIPPED
[INFO] hadoop-mapreduce-client-shuffle ................... SKIPPED
[INFO] hadoop-mapreduce-client-app ....................... SKIPPED
[INFO] hadoop-mapreduce-client-hs ........................ SKIPPED
[INFO] hadoop-mapreduce-client-jobclient ................. SKIPPED
[INFO] Apache Hadoop MapReduce Examples .................. SKIPPED
[INFO] hadoop-mapreduce .................................. SKIPPED
[INFO] Apache Hadoop MapReduce Streaming ................. SKIPPED
[INFO] Apache Hadoop Distributed Copy .................... SKIPPED
[INFO] Apache Hadoop Archives ............................ SKIPPED
[INFO] Apache Hadoop Rumen ............................... SKIPPED
[INFO] Apache Hadoop Gridmix ............................. SKIPPED
[INFO] Apache Hadoop Data Join ........................... SKIPPED
[INFO] Apache Hadoop Extras .............................. SKIPPED
[INFO] Apache Hadoop Pipes ............................... SKIPPED
[INFO] Apache Hadoop Tools Dist .......................... SKIPPED
[INFO] Apache Hadoop Tools ............................... SKIPPED
[INFO] Apache Hadoop Distribution ........................ SKIPPED
[INFO] Apache Hadoop Client .............................. SKIPPED
[INFO] Apache Hadoop Mini-Cluster ........................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 58.143s
[INFO] Finished at: Tue Apr 09 14:31:49 CST 2013
[INFO] Final Memory: 67M/1380M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/native"): java.io.IOException: error=2, No such file or directory -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn -rf :hadoop-common



10. install cmake  ##root用户
    $tar xvf cmake-*.*.*.tar.gz
    $cd cmake-*.*.*
    $./bootstrap
    $make
    $make install

11. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
    $mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
    注意,没有clean,执行该步骤之后才能生成
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources目录


copy to hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common----------------------------

12. copy /home/hadoop/protobuf-2.5.0/java/target/generated-sources/com/google/protobuf/DescriptorProtos.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources/proto/

13. copy /home/hadoop/protobuf-2.5.0/java/src/main/java/com/google/protobuf/*.java to
   /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/target/generated-sources/proto/

14. $cd /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src
   $mvn install -DskipTests -P native package -Dsnappy.prefix=/root/snappy-1.1.0
   注意,没有clean


[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop Main ................................ SUCCESS [1.302s]
[INFO] Apache Hadoop Project POM ......................... SUCCESS [0.861s]
[INFO] Apache Hadoop Annotations ......................... SUCCESS [0.765s]
[INFO] Apache Hadoop Project Dist POM .................... SUCCESS [1.010s]
[INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.230s]
[INFO] Apache Hadoop Auth ................................ SUCCESS [0.614s]
[INFO] Apache Hadoop Auth Examples ....................... SUCCESS [0.741s]
[INFO] Apache Hadoop Common .............................. SUCCESS [23.666s]
[INFO] Apache Hadoop Common Project ...................... SUCCESS [0.075s]
[INFO] Apache Hadoop HDFS ................................ SUCCESS [31.895s]
[INFO] Apache Hadoop HttpFS .............................. SUCCESS [2.411s]
[INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.076s]
[INFO] hadoop-yarn ....................................... SUCCESS [0.265s]
[INFO] hadoop-yarn-api ................................... SUCCESS [6.371s]
[INFO] hadoop-yarn-common ................................ SUCCESS [1.907s]
[INFO] hadoop-yarn-server ................................ SUCCESS [0.107s]
[INFO] hadoop-yarn-server-common ......................... SUCCESS [1.211s]
[INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [2.975s]
[INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [0.324s]
[INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [0.634s]
[INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.367s]
[INFO] hadoop-yarn-client ................................ SUCCESS [0.194s]
[INFO] hadoop-yarn-applications .......................... SUCCESS [0.108s]
[INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [0.344s]
[INFO] hadoop-mapreduce-client ........................... SUCCESS [0.098s]
[INFO] hadoop-mapreduce-client-core ...................... SUCCESS [1.496s]
[INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [0.231s]
[INFO] hadoop-yarn-site .................................. SUCCESS [0.200s]
[INFO] hadoop-yarn-project ............................... SUCCESS [0.172s]
[INFO] hadoop-mapreduce-client-common .................... SUCCESS [6.503s]
[INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [0.391s]
[INFO] hadoop-mapreduce-client-app ....................... SUCCESS [3.133s]
[INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [1.250s]
[INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [3.092s]
[INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [0.900s]
[INFO] hadoop-mapreduce .................................. SUCCESS [0.105s]
[INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [0.706s]
[INFO] Apache Hadoop Distributed Copy .................... SUCCESS [1.513s]
[INFO] Apache Hadoop Archives ............................ SUCCESS [0.828s]
[INFO] Apache Hadoop Rumen ............................... SUCCESS [1.201s]
[INFO] Apache Hadoop Gridmix ............................. SUCCESS [1.040s]
[INFO] Apache Hadoop Data Join ........................... SUCCESS [0.409s]
[INFO] Apache Hadoop Extras .............................. SUCCESS [0.545s]
[INFO] Apache Hadoop Pipes ............................... SUCCESS [9.772s]
[INFO] Apache Hadoop Tools Dist .......................... SUCCESS [0.467s]
[INFO] Apache Hadoop Tools ............................... SUCCESS [0.059s]
[INFO] Apache Hadoop Distribution ........................ SUCCESS [0.228s]
[INFO] Apache Hadoop Client .............................. SUCCESS [0.624s]
[INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [0.247s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1:56.489s
[INFO] Finished at: Tue Apr 09 15:28:54 CST 2013
[INFO] Final Memory: 87M/744M
[INFO] ------------------------------------------------------------------------



15. 编译后的native文件:
    /home/hadoop/cdh42/hadoop-2.0.0-cdh4.2.0/src/hadoop-common-project/hadoop-common/target/native/target/usr/local/lib/



[hadoop@cup-master-1 src]$ find . -name *.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/libposix_util.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/libnative_mini_dfs.a
./hadoop-hdfs-project/hadoop-hdfs/target/native/target/usr/local/lib/libhdfs.a
./hadoop-common-project/hadoop-common/target/native/target/usr/local/lib/libhadoop.a
./hadoop-tools/hadoop-pipes/target/native/libhadooputils.a
./hadoop-tools/hadoop-pipes/target/native/libhadooppipes.a
./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/target/native/libcontainer.a

你可能感兴趣的:(hadoop 2.0 chd4.4.0安装)