hadoop-lzo安装教程请链接
https://github.com/twitter/hadoop-lzo
下载打包hadoop-lzo
https://github.com/twitter/hadoop-lzo/zipball/master
1.其中说明:首先要在本地安装lzo库,方法如下:
http://www.oberhumer.com/opensource/lzo/#download下载解压后,安装说明编译安装,建议指定安装路径如下:
tar -zxvf lzo-2.06.tar.gz -C /opt/tool/
cd /opt/tool/lzo-2.06/
mkdir /usr/local/lzo
./configure --enable-shared --prefix /usr/local/lzo
make & sudo makeinstall
2.本地库安装完成后,回头制作hadoop-lzo的jar包,即解压下载过来的hadoop-lzo-master.zip
unzip hadoop-lzo-master.zip
在目录下执行:
export CFLAGS=-m64
export CXXFLAGS=-m64
export LIBRARY_PATH=/usr/local/lzo/lib
C_INCLUDE_PATH=/usr/local/lzo/include \
LIBRARY_PATH=/usr/local/lzo/lib \
mvn clean package -Dmaven.test.skip=true
cd target/native/Linux-amd64-64
tar -cBf - -C lib . | tar -xBvf - -C ~
mv ~/libgplcompression* $HADOOP_HOME/lib/native/
3.将mvn打包的hadoop-lzo-0.4.20-SNAPSHOT.jar复制到hadoop的common目录下
cp hadoop-lzo-0.4.20-SNAPSHOT.jar $HADOOP_HOME/share/hadoop/common/
4.mapreduce中间压缩的配置
在core-site.xml中添加配置
io.compression.codecs
org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec
在mapred.size中设置
mapred.compress.map.output
true
mapred.map.output.compression.codec
com.hadoop.compression.lzo.LzoCodec
5.测试
hadoop中给Lzo文件建立Index
hadoop jar $HADOOP_HOME/share/common/hadoop-lzo-0.4.20-SNAPSHOT.jar com.hadoop.compression.lzo.LzoIndexer /test/input
hadoop中对mr任务直接使用LzoCodec压缩
在inputformat中使用LzoTextInputFormat,可对lzo压缩格式的数据进行mr优化job
在hive中建表使用lzo压缩格式文件
create table lzo(id int,name string)
row format delimited fields terminated by '^'
stored as inputformat 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
load data local inpath '/home/hadoop/test/hive/lzo.txt.lzo' into table lzo;
select * from lzo;
后续问题:
6 .在给HIve使用TEZ引擎后,导致mapreduce on tez运行抛异常,hive on tez直接无法启动hive的cli抛异常,原因:LZO的jar包缺失
解决办法:
不要使用编译出来的tez-0.8.4.tar.gz,应当使用tez-dist/target/中的tez-0.8.4-minimal.tar.gz,在本地解压,设置环境变量
export TEZ_HOME=/opt/single/tez
export TEZ_CONF_DIR=$TEZ_HOME/conf
export TEZ_JARS=$TEZ_HOME
在hadoop-env.sh中加入
tez.lib.uris
hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz
tez.use.cluster.hadoop-libs
true
这是关于lzo问题修改的部分tez配置,其他配置参考其他文章
7.在配置的Spark on hadoop中,无法运行Spark的app
异常大概如下:
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:185)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
....
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
... 76 more
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:180)
at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 81 more
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
... 83 more
在spark-env.sh中添加配置如下:
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/single/spark/lib:/usr/local/lzo/lib
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/single/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jar
export HADOOP_HOME=/opt/single/hadoop-2.7.2
export HADOOP_CONF_DIR=/opt/single/hadoop-2.7.2/etc/hadoop
其他不变