Kafka终极

Kafka 生产调优参数:
Producer: 
acks: all


buffer.memory: 536870912


compression.type :snappy


retries: 100
        max.in.flight.requests.per.connection = 1


batch.size: 10000 字节 不是条数
        max.request.size = 2097152
request.timeout.ms = 360000    大于 replica.lag.time.max.ms 
metadata.fetch.timeout.ms= 360000
timeout.ms = 360000


linger.ms 5s (生产不用)


max.block.ms 1800000





Broker: CDH
message.max.bytes 2560KB  1条消息的大小
zookeeper.session.timeout.ms 180000
replica.fetch.max.bytes 5M   大于message.max.bytes
num.replica.fetchers 6
replica.lag.max.messages 6000
replica.lag.time.max.ms 15000


log.flush.interval.messages 10000
log.flush.interval.ms 5s




Consumer:
https://issues.apache.org/jira/browse/SPARK-22968
        , "max.partition.fetch.bytes" -> (5242880: java.lang.Integer) //default: 1048576
, "request.timeout.ms" -> (90000: java.lang.Integer) //default: 60000
, "session.timeout.ms" -> (60000: java.lang.Integer) //default: 30000
, "heartbeat.interval.ms" -> (5000: java.lang.Integer)
, "receive.buffer.bytes" -> (10485760: java.lang.Integer)




Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:


Broker:   No changes, you still need to increase properties message.max.bytes 
          and replica.fetch.max.bytes. message.max.bytes has to be equal or smaller(*) than 
 replica.fetch.max.bytes.
Producer: Increase max.request.size to send the larger message.
Consumer: Increase max.partition.fetch.bytes to receive larger messages.
(*) Read the comments to learn more about message.max.bytes<=replica.fetch.max.bytes








2.消费者的值
ConsumerRecord(
topic = onlinelogs, partition = 0, 
offset = 1452002, CreateTime = -1, checksum = 3849965367, 
serialized key size = -1, serialized value size = 305, 


key = null, 
value = {"hostname":"yws76","servicename":"namenode",
"time":"2018-03-21 20:11:30,090","logtype":"INFO",
"loginfo":
"org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
Finalizing edits file /dfs/nn/current/edits_inprogress_0000000000001453017 -> /dfs/nn/current/edits_0000000000001453017-0000000000001453030"})


2.1解释前面讲的曲线图
2.2 key=null;
    分区策略,


Key is not null: Utils.abs(key.hashCode) % numPartitions
key=null: 
http://www.2bowl.info/kafka%E6%BA%90%E7%A0%81%E8%A7%A3%E8%AF%BB-key%E4%B8%BAnulll%E6%97%B6kafka%E5%A6%82%E4%BD%95%E9%80%89%E6%8B%A9%E5%88%86%E5%8C%BApartition/




3.1
记录自定义kafka的parcel库,CDH安装kafka服务,无法安装过去的排雷过程
http://blog.itpub.net/30089851/viewspace-2136372/


3.2
断电 ,导致Kafka的Topic的损坏
现象:  CDH web界面,Kafka进程绿色,我们一般认为绿色就是进程ok,其实不然
       生产者和消费者 无法work,抛exception


流程:
      去机器上看broker日志
        kafka.common.NotAssignedReplicaException: 
Leader 186 failed to record follower 191's position -1 
since the replica is not recognized to be one of the assigned replicas 186 
for partition [__consumer_offsets,3].




1.服务down,broker节点的kafka log目录删除
2.zk的kafka的元数据
3.重新装个kafka和topic


思考:
1.重刷,数据重复怎么办?
      HBase put api(insert+update)


2.假如数据是落在HDFS,思考?
Hive 支持update,从哪个版本?加什么参数?




3.分区内保证排序的,多个分区怎样保证排序? 0.11版本 
insert 
delete
insert --> delete
delete --> insert

你可能感兴趣的:(Kafka终极)