hadoop-2.6.2 lzo 配置

Hadoop集群说明

    集群有三台主机,主机名分别是:bi10,bi12,bi13。我们的操作都在bi10上面进行。

安装依赖包

    安装lzo需要一些依赖包,如果你已经安装过了,那么可以跳过这一步。首先你需要切换到root用户下

yum install gcc gcc-c++ kernel-devel
yum install git

    除了以上两个之外,你还需要配置maven环境,下载之后直接解压并配置环境变量即可使用

wget http://apache.fayea.com/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
tar -xzf apache-maven-3.3.9-bin.tar.gz

    配置maven环境变量,maven软件包放置到/home/hadoop/work/apache-maven-3.3.9

[hadoop@bi10 hadoop-2.6.2]$ vim ~/.bash_profile
#init maven environment
export MAVEN_HOME=/home/hadoop/work/apache-maven-3.3.9
export PATH=$PATH:$MAVEN_HOME/bin

LZO安装

    下载lzo安装包

[hadoop@bi10 apps]$ wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.09.tar.gz

    解压并编译安装lzo到:/usr/local/hadoop/lzo/,安装时切换到root用户下

[hadoop@bi10 apps]$ tar -xzf lzo-2.09.tar.gz 
[hadoop@bi10 apps]$ cd lzo-2.09
[hadoop@bi10 apps]$ su root
[root@bi10 lzo-2.09]$ ./configure -enable-shared -prefix=/usr/local/hadoop/lzo/ 
[root@bi10 lzo-2.09]$ make && make test && make install

    查看安装目录

[hadoop@bi10 lzo-2.09]$ ls /usr/local/hadoop/lzo/
include  lib  share

HADOOP-LZO安装

    下载hadoop-lzo

git clone https://github.com/twitter/hadoop-lzo.git

    设置环境变量,并使用maven编译

[hadoop@bi10 hadoop-lzo]$ export CFLAGS=-m64
[hadoop@bi10 hadoop-lzo]$ export CXXFLAGS=-m64
[hadoop@bi10 hadoop-lzo]$ export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
[hadoop@bi10 hadoop-lzo]$ export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
[hadoop@bi10 hadoop-lzo]$ mvn clean package -Dmaven.test.skip=true

    将编译好的文件拷贝到hadoop的安装目录

[hadoop@bi10 hadoop-lzo]$ tar -cBf - -C target/native/Linux-amd64-64/lib . | tar -xBvf - -C $HADOOP_HOME/lib/native/
[hadoop@bi10 hadoop-lzo]$ cp target/hadoop-lzo-0.4.20-SNAPSHOT.jar $HADOOP_HOME/share/hadoop/common/
[hadoop@bi10 hadoop-lzo]$ scp target/hadoop-lzo-0.4.20-SNAPSHOT.jar bi12:$HADOOP_HOME/share/hadoop/common/
[hadoop@bi10 hadoop-lzo]$ scp target/hadoop-lzo-0.4.20-SNAPSHOT.jar bi13:$HADOOP_HOME/share/hadoop/common/

    将编译好的文件分别复制到集群其他机器对应的目录,其中native目录需要先打包再拷贝到集群的其他机器上,然后解压。

tar -czf hadoop-native.tar.gz /$HADOOP_HOME/lib/native/
scp hadoop-native.tar.gz bi12:/$HADOOP_HOME/lib
scp hadoop-native.tar.gz bi13:/$HADOOP_HOME/lib

修改hadoop配置文件

    修改hadoop-env.sh,增加一条

# The lzo library
export LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib

    修改core-site.xml

  <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value>
  </property>
  <property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

    修改mapred-site.xml

  <!-- lzo压缩 -->
  <property>
    <name>mapred.compress.map.output</name>
    <value>true</value>
  </property>
  <property>
    <name>mapred.map.output.compression.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>
  <property>
    <name>mapred.child.env</name>
    <value>LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib</value>
  </property>

    拷贝三个配置文件到集群其他机器

scp etc/hadoop/hadoop-env.sh bi12:/home/hadoop/work/hadoop-2.6.2/etc/hadoop/
scp etc/hadoop/hadoop-env.sh bi13:/home/hadoop/work/hadoop-2.6.2/etc/hadoop/

scp etc/hadoop/core-site.xml bi12:/home/hadoop/work/hadoop-2.6.2/etc/hadoop/
scp etc/hadoop/core-site.xml bi13:/home/hadoop/work/hadoop-2.6.2/etc/hadoop/

scp etc/hadoop/mapred-site.xml bi12:/home/hadoop/work/hadoop-2.6.2/etc/hadoop/
scp etc/hadoop/mapred-site.xml bi13:/home/hadoop/work/hadoop-2.6.2/etc/hadoop/

测试hadoop lzo

    安装lzop,需要切换到root用户下

yum install lzop

    进入hadoop安装目录然后对LICENSE.txt执行lzo压缩,会生成一个lzo压缩文件LICENSE.txt.lzo

lzop LICENSE.txt

    上传压缩文件到hdfs

[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -mkdir /user/hadoop/wordcount/lzoinput
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -put LICENSE.txt.lzo /user/hadoop/wordcount/lzoinput
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls /user/hadoop/wordcount/lzoinput
Found 1 items
-rw-r--r--   2 hadoop supergroup       7773 2016-02-16 20:59 /user/hadoop/wordcount/lzoinput/LICENSE.txt.lzo

    对lzo压缩文件建立索引

hadoop jar ./share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar com.hadoop.compression.lzo.DistributedLzoIndexer /user/hadoop/wordcount/lzoinput/
[hadoop@bi10 hadoop-2.6.2]$ hdfs dfs -ls /user/hadoop/wordcount/lzoinput/
Found 2 items
-rw-r--r--   2 hadoop supergroup       7773 2016-02-16 20:59 /user/hadoop/wordcount/lzoinput/LICENSE.txt.lzo
-rw-r--r--   2 hadoop supergroup          8 2016-02-16 21:02 /user/hadoop/wordcount/lzoinput/LICENSE.txt.lzo.index

    对lzo压缩文件执行wordcount

hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.2.jar wordcount /user/hadoop/wordcount/lzoinput/ /user/hadoop/wordcount/output2


你可能感兴趣的:(hadoop,lzo)