使用kafka中提供的Avro序列化框架实现序列化

Avro是一种与编程语言无关的序列化格式,Avro数据通过与语言无关的schema来定义,schema通过json来描述,不过一般会使用二进制文件。Avro在读写文件时需要用到schema,schema一般会被内嵌在数据文件里。Avro有一个特性,当负责写消息的应用程序使用了新的schema,负责读消息的应用程序可以继续处理消息而无需做任何改动。缺点:每条kafka记录中都嵌入了schema,这会让记录的大小成倍的增加,可以使用schema注册表将所有数据的schema保存在注册表里,然后在记录里引用schema的id。

参考书籍:《kafka权威指南》

下面就来实现一下Avro的序列化:

1.创建项目,编写pom文件:


    org.apache.avro
    avro
    1.8.2


    com.twitter
    bijection-core_2.11
    0.9.6


    com.twitter
    bijection-avro_2.11
    0.9.6


    org.apache.kafka
    kafka-clients
    2.1.0

2.创建producer,使用schema来定义格式,相当于我们定义的类,里面含有一些字段:

package avrodemo;
import java.util.Properties;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import com.twitter.bijection.Injection;
import com.twitter.bijection.avro.GenericAvroCodecs;
//发送序列化的信息
public class AvroProducer{
    public static void main(String[] args)throws Exception{
        String schemaStr="{\"type\":\"record\",\"name\":\"Student\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"age\",\"type\":\"int\"}]}";
    Properties props=new Properties();
    props.put("bootstrap.servers","192.168.184.128:9092");
    props.put("key.serializer","org.apache.kafka.common.serialization.StringSerializer");
    props.put("value.serializer",
            "org.apache.kafka.common.serialization.ByteArraySerializer");
    Schema.Parser parser=new Schema.Parser();
    Schema schema=parser.parse(schemaStr);
    Injection recordInjection = GenericAvroCodecs.toBinary(schema);
    Producer producer = new KafkaProducer<>(props);
    GenericRecord avroRecord = new GenericData.Record(schema);
    avroRecord.put("id",123);
    avroRecord.put("name","jack");
    avroRecord.put("age",18);
    byte[] avroRecordBytes = recordInjection.apply(avroRecord);
    ProducerRecord record = new ProducerRecord<>("wyh-avro-topic", avroRecordBytes);
    producer.send(record).get();
    }
}

3.创建消费者:

package avrodemo;
import java.util.Collections;
import java.util.Properties;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericRecord;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import com.twitter.bijection.Injection;
import com.twitter.bijection.avro.GenericAvroCodecs;
//反序列化消息
public class AvroConsumer{
    public static void main(String[] args){
    Properties props=new Properties();
    props.put("bootstrap.servers","192.168.184.128:9092");
    props.put("group.id","wyh-avro-group");
    props.put("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer","org.apache.kafka.common.serialization.ByteArrayDeserializer");
    KafkaConsumer consumer = new KafkaConsumer<>(props);
    consumer.subscribe(Collections.singletonList("wyh-avro-topic"));
    Schema.Parser parser=new Schema.Parser();
    String schemaStr="{\"type\":\"record\",\"name\":\"Student\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"age\",\"type\":\"int\"}]}";    
    Schema schema=parser.parse(schemaStr);
    Injection recordInjection = GenericAvrocodecs.toBinary(schema);
    while(true){
        ConsumerRecords records = consumer.poll(100);
        for(ConsumerRecord record : records){
            GenericRecord genericRecord = recordInjection.invert(record.value()).get();
            systen.out.printin("value = [student.id = "+genericRecord.get("id")+",student.name = "+genericRecord.get("name")+",student.age = "+genericRecord.get("age")+"],"+"partition = "+record.partition()+",offset = "+record.offset());
    }
  }
 }
}

先启动消费者,再启动生产者,查看控制台:

这样就可以使用kafka提供的Avro序列化框架来传送各种类型的数据了。

你可能感兴趣的:(kafka,kafka,kafka,avro)