1.环境准备:
在Avro官网下载Avro的jar文件,以最新版本1.7.4为例,分别下载avro-1.7.4.jar和avro-tool-1.7.4.jar;并下载JSON的jar文件core-asl和mapper-asl。将以上四个文件放入${HADOOP_HOME}/lib目录下(当前为/usr/local/hadoop/lib,为了以后hadoop项目方便)。
2.定义模式(Schema):
编辑如下内容,生成文件user.avsc:
{ "namespace": "example.avro", "type": "record", "name": "User", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]} ] }
3.编译模式:
在当前目录下执行如下命令:
java -jar ${HADOOP_HOME}/lib/avro-tools-1.7.4.jar compile schema user.avsc .
这时候当前目录下会生成example/avro/User.java目录和文件。
4 编写测试文件
编辑如下内容,生成文件Test.java:
/** * @Author wzw * @Date 2013.07.17 */ import java.io.*; import java.lang.*; import org.apache.avro.io.DatumWriter; import org.apache.avro.io.DatumReader; import org.apache.avro.specific.SpecificDatumWriter; import org.apache.avro.specific.SpecificDatumReader; import org.apache.avro.file.DataFileWriter; import org.apache.avro.file.DataFileReader; import example.avro.User; public class Test { public static void main(String args[]) { User user1 = new User(); user1.setName("Arway"); user1.setFavoriteNumber(3); user1.setFavoriteColor("green"); User user2 = new User("Ben", 7, "red"); //construct with builder User user3 = User.newBuilder().setName("Charlie").setFavoriteColor("blue").setFavoriteNumber(100).build(); //Serialize user1, user2 and user3 to disk File file = new File("users.avro"); DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(User.class); DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(userDatumWriter); try { dataFileWriter.create(user1.getSchema(), new File("users.avro")); dataFileWriter.append(user1); dataFileWriter.append(user2); dataFileWriter.append(user3); dataFileWriter.close(); } catch (IOException e) { } //Deserialize Users from dist DatumReader<User> userDatumReader = new SpecificDatumReader<User>(User.class); DataFileReader<User> dataFileReader = null; try { dataFileReader = new DataFileReader<User>(file, userDatumReader); } catch (IOException e) { } User user = null; try { while (dataFileReader.hasNext()) { // Reuse user object by passing it to next(). This saves // us from allocating and garbage collecting many objects for // files with many items. user = dataFileReader.next(user); System.out.println(user); } } catch (IOException e) { } } }
5.编写编译文件:
编辑如下内容,生成文件compile.sh,注意其中的类路径:
#!/usr/bin/env bash javac -classpath /usr/local/hadoop/lib/avro-1.7.4.jar:/usr/local/hadoop/lib/avro-tools-1.7.4.jar:/usr/local/hadoop/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/lib/jackson-mapper-asl-1.9.13.jar example/avro/User.java Test.java
6.编写运行文件:
编辑如下内容,生成文件run.sh,注意其中的类路径:
#!/usr/bin/env bash java -classpath /usr/local/hadoop/lib/avro-1.7.4.jar:/usr/local/hadoop/lib/avro-tools-1.7.4.jar:/usr/local/hadoop/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/lib/jackson-mapper-asl-1.9.13.jar:User.jar:. Test
7.测试:
(1).编译:
运行compile.sh脚本,编译example/avro/User.java和Test.java文件,生成对应的类文件。
(2).打包User类文件:
jar cvf ./example .
(2).运行:
运行run.sh脚本,查看程序输出结果。
(3).查看avro序列化效果:
在Test.java的写入部分添加一个for循环,多写一些user(如100次)到user.avro,然后把run.sh的输出结果存储到纯文本中user.plain中,观察user.avro和user.plain的大小:
-rw-r--r-- 1 hadoop hadoop 245 2013-07-17 17:18 user.avsc -rw-r--r-- 1 hadoop hadoop 5486 2013-07-17 18:39 User.jar -rw-r--r-- 1 hadoop hadoop 1737 2013-07-17 19:11 users.avro -rw-r--r-- 1 hadoop hadoop 6892 2013-07-17 19:12 users.plain
由以上输出结果可以对avro的序列化功能有一个直观感受。
参考资料:
http://avro.apache.org/docs/1.7.4/gettingstartedjava.html
http://blog.csdn.net/zhumin726/article/details/8467805
wzw0114
2013.07.17
本文出自 “wzw0114的技术博客” 博客,转载请与作者联系!