农村外出务工男

深入分析KafkaProducer消息发送流程

一、Kafka生产者发送消息示例

注意：以下所用kafka版本为0.10.1.0

KafkaProducer是线程安全对象，建议可以将其封装成多线程共享一个实例，效率反而比多实例更高，在深入分析前，先简单看一个生产者生产消息的demo

package com.tanjie.kafka;

import java.util.List;
import java.util.Properties;
import java.util.UUID;
import java.util.concurrent.ExecutionException;

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import org.apache.kafka.common.PartitionInfo;

public class ProducerDemo2 {

	public static void main(String[] args) {
		boolean isAsync =  args.length == 0 || args[0].trim().equalsIgnoreCase("sync");
		Properties props = new Properties();
		props.put("bootstrap.servers", "192.168.80.132:9092");
		props.put("client.id", "producerDemo2");
		props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
		props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
		KafkaProducer producer = new KafkaProducer<>(props);
		List partitionInfos = producer.partitionsFor("kafka-topic-02");
		for(final PartitionInfo partitionInfo : partitionInfos) {
			System.out.println(partitionInfo.toString());
		}
		String topic = "kafka-topic-02";
		for (int i = 0; i < 10; i++) {
			String message = UUID.randomUUID().toString();
			long startTime = System.currentTimeMillis();
			if (isAsync) {
				producer.send(new ProducerRecord(topic, i, message),
						new ProducerAckCallback(startTime, i, message));
			} else {
				try {
					/** 同步等待返回 */
					producer.send(new ProducerRecord(topic, i, message)).get();
					System.out.println("Send message : " + "key:" + i + ";value:" + message);
				} catch (InterruptedException e) {
					e.printStackTrace();
				} catch (ExecutionException e) {
					e.printStackTrace();
				}
			}
		}
		producer.close();
	}

	static class ProducerAckCallback implements Callback {

		private final Long startTime;

		private final int key;

		private final String value;

		public ProducerAckCallback(Long startTime, int key, String value) {
			this.startTime = startTime;
			this.key = key;
			this.value = value;
		}

		@Override
		public void onCompletion(RecordMetadata metadata, Exception e) {
			long spendTime = System.currentTimeMillis() - startTime;
			if (null != metadata) {
				System.out.println("消息(" + key + "," + value + ")send to partition(" + metadata.partition()
						+ " and offest " + metadata.offset() + " and spend  " + spendTime + " ms");

			}
		}
	}
}

首先新建一个topic

bin/kafka-topics.sh --create --zookeeper 192.168.80.132:2181 --replication-factor 3 --partitions  3 --topic kafka-topic-02

运行代码如下

Partition(topic = kafka-topic-02, partition = 2, leader = 0, replicas = [0,1,2,], isr = [0,2,])
Partition(topic = kafka-topic-02, partition = 1, leader = 0, replicas = [0,1,2,], isr = [0,1,2,])
Partition(topic = kafka-topic-02, partition = 0, leader = 0, replicas = [0,1,2,], isr = [0,2,])
消息(2,c2f2d395-80af-4982-94fd-7ad517734cb9)send to partition(2 and offest 3 and spend  39 ms
消息(5,c55e3b06-ed35-421b-9a56-d02a4921a9ab)send to partition(2 and offest 4 and spend  37 ms
消息(6,1b4a6b27-a63e-499e-bf36-c83d3ad438ad)send to partition(2 and offest 5 and spend  37 ms
消息(0,8f670453-b31d-4b9f-b7c0-e80fbc06e26e)send to partition(1 and offest 4 and spend  106 ms
消息(3,c7015529-0b01-4bcb-8b07-2ebca01d2309)send to partition(1 and offest 5 and spend  39 ms
消息(4,03bc15b6-b8df-410f-a350-f7e205d378f9)send to partition(1 and offest 6 and spend  39 ms
消息(9,24285a99-f02f-4f98-80eb-8ffdfd61111a)send to partition(1 and offest 7 and spend  37 ms
消息(1,324d88cc-65e3-4dbe-8ea6-9aa09c362884)send to partition(0 and offest 3 and spend  43 ms
消息(7,ea6ab214-31ff-4438-985b-f06ff33dcbf7)send to partition(0 and offest 4 and spend  38 ms
消息(8,558b6b86-10d3-440d-9f9e-b6cf25fe90f3)send to partition(0 and offest 5 and spend  37 ms

整个代码比较简单，当然也不是很优雅，只是为了演示这样的一个效果，下面我们先来看一下整个生产者生产的消息是如何流转到kafka server的

整个流程大概分为如下几步

1、构建一个KafkaProducer对象，初始化一些用到的组件，比如缓存区，Sender线程等

2、如果配置了拦截器，可用对发送的消息进行可定制化的拦截或更改

3、对Key,value进行序列化

4、根据传入的参数，为消息选择合适的分区，具体怎么选，后面分析

5、将消息按照分区发送到RecordAccmulator暂存，消息按照每个分区进行汇总

6、后台Sender线程被触发后从RecordAccmulator里面获取消息然后构建成ClientRequest，怎么构建后面分析

7、将ClientRequest封装成NetWorkClient准备发送

8、NetWorkClient将请求放入KafkaChannel准备发送，然后执行网络IO，最后发送到kafka server

二、深入分析KafkaProducer的实现

根据上面的demo，我们首先来看kafkaProducer的初始化过程，首先看一下kafkaProducer的属性

public class KafkaProducer implements Producer {

    private static final Logger log = LoggerFactory.getLogger(KafkaProducer.class);
    private static final AtomicInteger PRODUCER_CLIENT_ID_SEQUENCE = new AtomicInteger(1);
    private static final String JMX_PREFIX = "kafka.producer";

    private String clientId;   //客户端的一个标识
    private final Partitioner partitioner;   //分区选择器，根据传入的参数，决定该条消息被放到哪个分区
    private final int maxRequestSize;  //客户端最大的消息大小
    private final long totalMemorySize; //单个消息的缓存区大小
    private final Metadata metadata;  //kafka 元数据维护
    private final RecordAccumulator accumulator; //消息暂存区
    private final Sender sender; //发送消息的sender任务
    private final Metrics metrics; //一些统计信息
    private final Thread ioThread; //执行Sender任务发送消息的线程
    private final CompressionType compressionType; //消息的压缩策略
    private final Sensor errors; //
    private final Time time; 
    private final Serializer keySerializer; //key序列化
    private final Serializer valueSerializer; //value序列化
    private final ProducerConfig producerConfig; //生产者相关配置信息
    private final long maxBlockTimeMs; //在等待metadata更新的最大等待时间
    private final int requestTimeoutMs; //消息的超时时间
    private final ProducerInterceptors interceptors; //消息拦截器

再看构造函数，我一般看源码习惯关注主干，不是很纠结细节，所以下面标注一些重要的东西

 private KafkaProducer(ProducerConfig config, Serializer keySerializer, Serializer valueSerializer) {
        try {
            log.trace("Starting the Kafka producer");
            Map userProvidedConfigs = config.originals();
            this.producerConfig = config;
            this.time = new SystemTime();
            clientId = config.getString(ProducerConfig.CLIENT_ID_CONFIG);
            if (clientId.length() <= 0)
                clientId = "producer-" + PRODUCER_CLIENT_ID_SEQUENCE.getAndIncrement();
            Map metricTags = new LinkedHashMap();
            metricTags.put("client-id", clientId);
            MetricConfig metricConfig = new MetricConfig().samples(config.getInt(ProducerConfig.METRICS_NUM_SAMPLES_CONFIG))
                    .timeWindow(config.getLong(ProducerConfig.METRICS_SAMPLE_WINDOW_MS_CONFIG), TimeUnit.MILLISECONDS)
                    .tags(metricTags);
            List reporters = config.getConfiguredInstances(ProducerConfig.METRIC_REPORTER_CLASSES_CONFIG,
                    MetricsReporter.class);
            reporters.add(new JmxReporter(JMX_PREFIX));
            this.metrics = new Metrics(metricConfig, reporters, time);
           //初始化分区选择器
            this.partitioner = config.getConfiguredInstance(ProducerConfig.PARTITIONER_CLASS_CONFIG, Partitioner.class);
            long retryBackoffMs = config.getLong(ProducerConfig.RETRY_BACKOFF_MS_CONFIG);
           //序列化key  
           if (keySerializer == null) {
                this.keySerializer = config.getConfiguredInstance(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
                        Serializer.class);
                this.keySerializer.configure(config.originals(), true);
            } else {
                config.ignore(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG);
                this.keySerializer = keySerializer;
            }
           //序列化value           
           if (valueSerializer == null) {
                this.valueSerializer = config.getConfiguredInstance(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
                        Serializer.class);
                this.valueSerializer.configure(config.originals(), false);
            } else {
                config.ignore(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG);
                this.valueSerializer = valueSerializer;
            }

            // load interceptors and make sure they get clientId
            userProvidedConfigs.put(ProducerConfig.CLIENT_ID_CONFIG, clientId);
            List> interceptorList = (List) (new ProducerConfig(userProvidedConfigs)).getConfiguredInstances(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG,
                    ProducerInterceptor.class);
            this.interceptors = interceptorList.isEmpty() ? null : new ProducerInterceptors<>(interceptorList);

            ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keySerializer, valueSerializer, interceptorList, reporters);
            //初始集群元数据
            this.metadata = new Metadata(retryBackoffMs, config.getLong(ProducerConfig.METADATA_MAX_AGE_CONFIG), true, clusterResourceListeners);
            //初始最大的消息长度
            this.maxRequestSize = config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG);
            //初始消息缓冲区大小
            this.totalMemorySize = config.getLong(ProducerConfig.BUFFER_MEMORY_CONFIG);
            //初始压缩策略
            this.compressionType = CompressionType.forName(config.getString(ProducerConfig.COMPRESSION_TYPE_CONFIG));
            /* check for user defined settings.
             * If the BLOCK_ON_BUFFER_FULL is set to true,we do not honor METADATA_FETCH_TIMEOUT_CONFIG.
             * This should be removed with release 0.9 when the deprecated configs are removed.
             */
            if (userProvidedConfigs.containsKey(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG)) {
                log.warn(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG + " config is deprecated and will be removed soon. " +
                        "Please use " + ProducerConfig.MAX_BLOCK_MS_CONFIG);
                boolean blockOnBufferFull = config.getBoolean(ProducerConfig.BLOCK_ON_BUFFER_FULL_CONFIG);
                if (blockOnBufferFull) {
                    this.maxBlockTimeMs = Long.MAX_VALUE;
                } else if (userProvidedConfigs.containsKey(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG)) {
                    log.warn(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG + " config is deprecated and will be removed soon. " +
                            "Please use " + ProducerConfig.MAX_BLOCK_MS_CONFIG);
                    this.maxBlockTimeMs = config.getLong(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG);
                } else {
                    this.maxBlockTimeMs = config.getLong(ProducerConfig.MAX_BLOCK_MS_CONFIG);
                }
            } else if (userProvidedConfigs.containsKey(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG)) {
                log.warn(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG + " config is deprecated and will be removed soon. " +
                        "Please use " + ProducerConfig.MAX_BLOCK_MS_CONFIG);
                this.maxBlockTimeMs = config.getLong(ProducerConfig.METADATA_FETCH_TIMEOUT_CONFIG);
            } else {
                this.maxBlockTimeMs = config.getLong(ProducerConfig.MAX_BLOCK_MS_CONFIG);
            }

            /* check for user defined settings.
             * If the TIME_OUT config is set use that for request timeout.
             * This should be removed with release 0.9
             */
            if (userProvidedConfigs.containsKey(ProducerConfig.TIMEOUT_CONFIG)) {
                log.warn(ProducerConfig.TIMEOUT_CONFIG + " config is deprecated and will be removed soon. Please use " +
                        ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG);
                this.requestTimeoutMs = config.getInt(ProducerConfig.TIMEOUT_CONFIG);
            } else {
                this.requestTimeoutMs = config.getInt(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG);
            }
            //初始缓冲区RecordAccmulator
            this.accumulator = new RecordAccumulator(config.getInt(ProducerConfig.BATCH_SIZE_CONFIG),
                    this.totalMemorySize,
                    this.compressionType,
                    config.getLong(ProducerConfig.LINGER_MS_CONFIG),
                    retryBackoffMs,
                    metrics,
                    time);

            List addresses = ClientUtils.parseAndValidateAddresses(config.getList(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG));
            this.metadata.update(Cluster.bootstrap(addresses), time.milliseconds());
            ChannelBuilder channelBuilder = ClientUtils.createChannelBuilder(config.values());
            //构建NetworkClient和kafkaserver进行IO
            NetworkClient client = new NetworkClient(
                    new Selector(config.getLong(ProducerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG), this.metrics, time, "producer", channelBuilder),
                    this.metadata,
                    clientId,
                    config.getInt(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION),
                    config.getLong(ProducerConfig.RECONNECT_BACKOFF_MS_CONFIG),
                    config.getInt(ProducerConfig.SEND_BUFFER_CONFIG),
                    config.getInt(ProducerConfig.RECEIVE_BUFFER_CONFIG),
                    this.requestTimeoutMs, time);
           //初始发送消息的sender任务，这些任务在ioThread中运行
           this.sender = new Sender(client,
                    this.metadata,
                    this.accumulator,
                    config.getInt(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION) == 1,
                    config.getInt(ProducerConfig.MAX_REQUEST_SIZE_CONFIG),
                    (short) parseAcks(config.getString(ProducerConfig.ACKS_CONFIG)),
                    config.getInt(ProducerConfig.RETRIES_CONFIG),
                    this.metrics,
                    new SystemTime(),
                    clientId,
                    this.requestTimeoutMs);
            String ioThreadName = "kafka-producer-network-thread" + (clientId.length() > 0 ? " | " + clientId : "");
            this.ioThread = new KafkaThread(ioThreadName, this.sender, true);
            //启动线程
             this.ioThread.start();
            this.errors = this.metrics.sensor("errors");
            config.logUnused();
            AppInfoParser.registerAppInfo(JMX_PREFIX, clientId);
            log.debug("Kafka producer started");
        } catch (Throwable t) {
            // call close methods if internal objects are already constructed
            // this is to prevent resource leak. see KAFKA-2121
            close(0, TimeUnit.MILLISECONDS, true);
            // now propagate the exception
            throw new KafkaException("Failed to construct kafka producer", t);
        }
    }

当KafkaProducer初始化好以后，调用Send进行发送，

 @Override
    public Future send(ProducerRecord record, Callback callback) {
        // intercept the record, which can be potentially modified; this method does not throw exceptions
        ProducerRecord interceptedRecord = this.interceptors == null ? record : this.interceptors.onSend(record);
        return doSend(interceptedRecord, callback);
    }

  private Future doSend(ProducerRecord record, Callback callback) {
        TopicPartition tp = null;
        try {
            // first make sure the metadata for the topic is available
            //等待kafka更新元数据信息
             ClusterAndWaitTime clusterAndWaitTime = waitOnMetadata(record.topic(), record.partition(), maxBlockTimeMs);
            long remainingWaitMs = Math.max(0, maxBlockTimeMs - clusterAndWaitTime.waitedOnMetadataMs);
            //获取元数据里面的集群相关信息
            Cluster cluster = clusterAndWaitTime.cluster;
            //对key进行序列化
           byte[] serializedKey;
            try {
                serializedKey = keySerializer.serialize(record.topic(), record.key());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert key of class " + record.key().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in key.serializer");
            }
            //对value进行序列化
            byte[] serializedValue;
            try {
                serializedValue = valueSerializer.serialize(record.topic(), record.value());
            } catch (ClassCastException cce) {
                throw new SerializationException("Can't convert value of class " + record.value().getClass().getName() +
                        " to class " + producerConfig.getClass(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG).getName() +
                        " specified in value.serializer");
            }
            //获取消息应该被发送到哪个分区
            int partition = partition(record, serializedKey, serializedValue, cluster);
            int serializedSize = Records.LOG_OVERHEAD + Record.recordSize(serializedKey, serializedValue);
            ensureValidRecordSize(serializedSize);
            tp = new TopicPartition(record.topic(), partition);
            long timestamp = record.timestamp() == null ? time.milliseconds() : record.timestamp();
            log.trace("Sending record {} with callback {} to topic {} partition {}", record, callback, record.topic(), partition);
            // producer callback will make sure to call both 'callback' and interceptor callback
            Callback interceptCallback = this.interceptors == null ? callback : new InterceptorCallback<>(callback, this.interceptors, tp);
             //将要发送的消息追加到RecordAccmulator里面
             RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey, serializedValue, interceptCallback, remainingWaitMs);
           //如果消息追加到RecordAccmulator后, 最后一个RecordBatch满了或者队列里面不止一个RecordBatch
           if (result.batchIsFull || result.newBatchCreated) {
                log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
                //唤醒sender线程
                this.sender.wakeup();
            }
           //返回最后的结果  
           return result.future;
            // handling exceptions and record the errors;
            // for API exceptions return them in the future,
            // for other exceptions throw directly
        } catch (ApiException e) {
            // ..................省略
        }
    }

在分析waitOnMetadata之前，先说一下kafka集群的元数据，我们知道，每个topic有多个分区，每个分区有多个副本，而每个分区的副本里面都需要有一个Leader副本，其他副本只需要同步leader副本的数据即可，而Kafak的元数据就是记录了比如某个分区有哪些副本，leader副本在哪台机器上，follow副本在哪台机器上，哪些副本在ISR（可以理解为follower副本中数据和Leader副本数据相差不大的副本节点）里面 , 在kafka里面主要通过下面的几个类来进行元数据的维护

Metadata里面有个属性叫cluster, 而Cluster类主要维护了下面的关系

 private final boolean isBootstrapConfigured;
//整个kafka集群的节点相关信息
 private final List nodes;
 private final Set unauthorizedTopics;
 private final Set internalTopics;
 //维护集群里面某个分区和具体的分区之间的映射关系
 private final Map partitionsByTopicPartition;
 //维护topic name和具体分区信息之间的映射关系
 private final Map> partitionsByTopic; 
 //维护topic name和具体分区信息之间的映射关系，这个和上一个的区别是，这个里面的分区必须是有leader的，上一个可以没有
  private final Map> availablePartitionsByTopic;
 //维护节点node 和具体的分区信息，可以根据某个节点查询这个节点里的所有分区信息
  private final Map> partitionsByNode;
 //维护节点id和具体节点之间的映射关系
 private final Map nodesById;
 private final ClusterResource clusterResource;

关于Node，Topicpartition，PartitionInfo 字段的定义都比较简单，这里就不累赘了。

说完了Cluster，在Metadata里面持有对Cluster的引用，Metadata的主要字段含义如下

    private final long refreshBackoffMs;
    private final long metadataExpireMs;
   //版本号，每更新一次元数据信息，版本号+1 
    private int version;
    //上一次更新的时间戳
    private long lastRefreshMs;
    //上一次成功更新的时间戳
    private long lastSuccessfulRefreshMs;
    //记录kafka集群的元数据信息
    private Cluster cluster;
    //是否需要强制更新，如果为true,会触发后台sender线程去强制更新元数据信息
    private boolean needUpdate;
    /* Topics with expiry time */
    //记录当前的topic信息
    private final Map topics;
    private final List listeners;
    private final ClusterResourceListeners clusterResourceListeners;
    private boolean needMetadataForAllTopics;
    private final boolean topicExpiryEnabled;

接下来我们回到waitOnMetadata，来看一下元数据是如何更新的

private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long maxWaitMs) throws InterruptedException {
        // add topic to metadata topic list if it is not there already and reset expiry
        metadata.add(topic);
        //获取集群元数据信息 
        Cluster cluster = metadata.fetch();
        //获取当前topic对应的分区个数
         Integer partitionsCount = cluster.partitionCountForTopic(topic);
        // Return cached metadata if we have it, and if the record's partition is either undefined
        // or within the known partition range
        //假如topic对应的分区信息记录在metadata里面已经有了，就是说集群加载了分区相关信息,就直接返回元数据  
       if (partitionsCount != null && (partition == null || partition < partitionsCount))
            return new ClusterAndWaitTime(cluster, 0);

        long begin = time.milliseconds();
        long remainingWaitMs = maxWaitMs;
        long elapsed;
        // Issue metadata requests until we have metadata for the topic or maxWaitTimeMs is exceeded.
        // In case we already have cached metadata for the topic, but the requested partition is greater
        // than expected, issue an update request only once. This is necessary in case the metadata
        // is stale and the number of partitions for this topic has increased in the meantime.
        do {
            log.trace("Requesting metadata update for topic {}.", topic);
            //将needUpdate设置为true,表示元数据需要更新  
            int version = metadata.requestUpdate();
           //唤醒sender线程
            sender.wakeup();
            try {   
               //等待sender线程去更新元数据信息           
                 metadata.awaitUpdate(version, remainingWaitMs);
            } catch (TimeoutException ex) {
                // Rethrow with original maxWaitMs to prevent logging exception with remainingWaitMs
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            }
            cluster = metadata.fetch();
            elapsed = time.milliseconds() - begin;
            if (elapsed >= maxWaitMs)
                throw new TimeoutException("Failed to update metadata after " + maxWaitMs + " ms.");
            if (cluster.unauthorizedTopics().contains(topic))
                throw new TopicAuthorizationException(topic);
            remainingWaitMs = maxWaitMs - elapsed;
            partitionsCount = cluster.partitionCountForTopic(topic);
        } while (partitionsCount == null);

        if (partition != null && partition >= partitionsCount) {
            throw new KafkaException(
                    String.format("Invalid partition given with record: %d is not in the range [0...%d).", partition, partitionsCount));
        }

        return new ClusterAndWaitTime(cluster, elapsed);
    }

当元数据更新后，下一步选择一个分区用来存放咱们的消息，

   int partition = partition(record, serializedKey, serializedValue, cluster);

如果你发过来的消息已经指定了某个分区，那么直接返回即可。

 private int partition(ProducerRecord record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {
        Integer partition = record.partition();
        return partition != null ?
                partition :
                partitioner.partition(
                        record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);
    }

如果没有指定，调用partitioner.partition进行判断，kafka提供了默认的实现，当然你可以自己定制分发策略

 public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
       //根据指定topic获取所有分区信息
        List partitions = cluster.partitionsForTopic(topic);
        //获取分区个数
         int numPartitions = partitions.size();
         //如果没有指定消息key
          if (keyBytes == null) {
            
            int nextValue = counter.getAndIncrement();
           //获取指定topic对应的可利用的分区信息，这些可利用是说副本有leader副本的，有些分区他是没有leader副本的，有可能因为一些原因导致
           List availablePartitions = cluster.availablePartitionsForTopic(topic);
            if (availablePartitions.size() > 0) {
                 // 对可利用的分区数取模获取下标 
                int part = Utils.toPositive(nextValue) % availablePartitions.size();
                return availablePartitions.get(part).partition();
            } else {
                // no partitions are available, give a non-available partition
                return Utils.toPositive(nextValue) % numPartitions;
            }
       //如果指定了消息key,直接对key进行hash后然后对分区数大小进行取模操作   
       } else {
            // hash the keyBytes to choose a partition
            return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
        }
    }

总结一下：

1、如果你指定了分区，那么只会将这条消息发送到指定分区

2、如果你同时指定了分区和消息key，也是指发送到这个分区

3、如果没有指定分区，指定了消息key，那么对key进行hash后对当前分区数进行取模后得出消息应该放到哪个分区

4、如果没有分区，也没有指定key，则按照一定的轮询方式(counter和分区数取模，counter每次递增，确保消息不会发送到同一个分区里面)来获取分区数

说完如何获取消息发送的分区后，下一步就是将消息放到暂存区RecordAccumulator，我们先来看一下RecordAccumulator

从上面的3个类的关系可以看到，RecordAccmulator里面有一个ConcurrentMap，里面key是TopicPartition，value是一个双端队列，说明RecordAccmulator里面是按照每个分区进行消息缓存的，也就是相同分区的消息会放到一起，而每个RecordBatch里面持有MemoryRecords的引用，MemoryRecords才是真正放消息的地方，

MemoryRecords里面支持对消息的压缩，还有保存消息数据的java nio ByteBuffer。当我们往RecordAccmulator里面追加消息的时候，看append方法

  public RecordAppendResult append(TopicPartition tp,
                                     long timestamp,
                                     byte[] key,
                                     byte[] value,
                                     Callback callback,
                                     long maxTimeToBlock) throws InterruptedException {
        // We keep track of the number of appending thread to make sure we do not miss batches in
        // abortIncompleteBatches().
        //统计当前正在像RecordAccmulator中追加的线程数
        appendsInProgress.incrementAndGet();
        try {
            // check if we have an in-progress batch
            //第一步：首先查找已有的batches,如果已经有该分区对应的队列信息则直接返回，否则从新建一个
            Deque dq = getOrCreateDeque(tp);
            //因为ArrayDeque是非线程安全的，所以这里加锁进行处理
            synchronized (dq) {
                if (closed)
                    throw new IllegalStateException("Cannot send after the producer is closed.");
               //第二步：像Deque中最后一个RecordBatch追加MemoryRecord    
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, callback, dq);
               //如果追加成功了那么直接返回
                if (appendResult != null)
                    return appendResult;
            }
            //如果上面那一步追加的时候没有追加成功，比如某个线程发送的消息太大了，在这个队列里的最后一个MemoryRecords空间不足以存这条消息
           // we don't have an in-progress record batch try to allocate a new batch
            //重新计算大小
            int size = Math.max(this.batchSize, Records.LOG_OVERHEAD + Record.recordSize(key, value));
            log.trace("Allocating a new {} byte message buffer for topic {} partition {}", size, tp.topic(), tp.partition());
            //重新申请新的空间
            ByteBuffer buffer = free.allocate(size, maxTimeToBlock);
            synchronized (dq) {
                // Need to check if producer is closed again after grabbing the dequeue lock.
                if (closed)
                    throw new IllegalStateException("Cannot send after the producer is closed.");
                //重新追加
                RecordAppendResult appendResult = tryAppend(timestamp, key, value, callback, dq);
                if (appendResult != null) {
                   //释放刚刚申请的空间
                   // Somebody else found us a batch, return the one we waited for! Hopefully this doesn't happen often...
                    free.deallocate(buffer);
                    return appendResult;
                }
                //如果重新追加也失败了，构建一个MemoryRecords
                MemoryRecords records = MemoryRecords.emptyRecords(buffer, compression, this.batchSize);
                 //构建一个RecordBatch
                RecordBatch batch = new RecordBatch(tp, records, time.milliseconds());
                //在刚刚新建的RecordBatch中追加消息
                FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, callback, time.milliseconds()));
                //将新建的RecordBatch追加到队列的尾部
                dq.addLast(batch);
                incomplete.add(batch);
                return new RecordAppendResult(future, dq.size() > 1 || batch.records.isFull(), true);
            }
        } finally {
            //释放
            appendsInProgress.decrementAndGet();
        }
    }

这样，消息就放到了消息缓冲区中，接下来，看一下Sender线程是如何去获取消息和Kafka server通信的。

  void run(long now) {
        //获取集群元数据信息
        Cluster cluster = metadata.fetch();
        // get the list of partitions with data ready to send
        //从缓存区中选择出可以向哪些Node节点发送消息,具体怎么选择的，下面会分析 
        RecordAccumulator.ReadyCheckResult result = this.accumulator.ready(cluster, now);
        //如果返回的分区存在没有leader副本的情况,那么标记元数据需要更新
        // if there are any partitions whose leaders are not known yet, force metadata update
        if (!result.unknownLeaderTopics.isEmpty()) {
            // The set of topics with unknown leader contains topics with leader election pending as well as
            // topics which may have expired. Add the topic again to metadata to ensure it is included
            // and request metadata update, since there are messages to send to the topic.
            for (String topic : result.unknownLeaderTopics)
                this.metadata.add(topic);
            this.metadata.requestUpdate();
        }

        // remove any nodes we aren't ready to send to
        Iterator iter = result.readyNodes.iterator();
        long notReadyTimeout = Long.MAX_VALUE;
        while (iter.hasNext()) {
            Node node = iter.next();
            //针对返回的结果，检查网络IO是否符合发送条件，如果有不符合的node节点就将其移除
            if (!this.client.ready(node, now)) {
                iter.remove();
                notReadyTimeout = Math.min(notReadyTimeout, this.client.connectionDelay(node, now));
            }
        }
        // 获取代发送的消息集合,key为哪个节点，value为对应的所有消息
        // create produce requests
        Map> batches = this.accumulator.drain(cluster,
                                                                         result.readyNodes,
                                                                         this.maxRequestSize,
                                                                         now);
        if (guaranteeMessageOrder) {
            // Mute all the partitions drained
            for (List batchList : batches.values()) {
                for (RecordBatch batch : batchList)
                    this.accumulator.mutePartition(batch.topicPartition);
            }
        }

        List expiredBatches = this.accumulator.abortExpiredBatches(this.requestTimeout, now);
        // update sensors
        for (RecordBatch expiredBatch : expiredBatches)
            this.sensors.recordErrors(expiredBatch.topicPartition.topic(), expiredBatch.recordCount);

        sensors.updateProduceRequestMetrics(batches);
        //将发送的消息构建成clietnRequest,每个node节点只会构建一个clientRequest
        List requests = createProduceRequests(batches, now);
        // If we have any nodes that are ready to send + have sendable data, poll with 0 timeout so this can immediately
        // loop and try sending more data. Otherwise, the timeout is determined by nodes that have partitions with data
        // that isn't yet sendable (e.g. lingering, backing off). Note that this specifically does not include nodes
        // with sendable data that aren't ready to send since they would cause busy looping.
        long pollTimeout = Math.min(result.nextReadyCheckDelayMs, notReadyTimeout);
        if (result.readyNodes.size() > 0) {
            log.trace("Nodes with data ready to send: {}", result.readyNodes);
            log.trace("Created {} produce requests: {}", requests.size(), requests);
            pollTimeout = 0;
        }
        for (ClientRequest request : requests)
            client.send(request, now);

        // if some partitions are already ready to be sent, the select time would be 0;
        // otherwise if some partition already has some data accumulated but not ready yet,
        // the select time will be the time difference between now and its linger expiry time;
        // otherwise the select time will be the time difference between now and the metadata expiry time;
        //发送消息到服务端
        this.client.poll(pollTimeout, now);
    }

在run方法里面，有几个方法很关键，下面我们一个一个来分析，第一个就是ready方法，ready方法从RecordAccmulator里面获取已经准备好的待发送的消息集合，那么怎样才算是准备好了呢？看下面方法

   public ReadyCheckResult ready(Cluster cluster, long nowMs) {
        Set readyNodes = new HashSet<>();
        long nextReadyCheckDelayMs = Long.MAX_VALUE;
        Set unknownLeaderTopics = new HashSet<>();
        //判断缓冲池是否被耗尽
        boolean exhausted = this.free.queued() > 0;
         //遍历所有的队列
          for (Map.Entry> entry : this.batches.entrySet()) {
            TopicPartition part = entry.getKey();
            Deque deque = entry.getValue();
            //获取某个topicPartition的Leader broker
            Node leader = cluster.leaderFor(part);
            synchronized (deque) {
               //如果某个分区没有leader并且队列不为null(也就是虽然该分区没有Leader但是里面有消息，消息是可用的)
                if (leader == null && !deque.isEmpty()) {
                    // This is a partition for which leader is not known, but messages are available to send.
                    // Note that entries are currently not removed from batches when deque is empty.
                    unknownLeaderTopics.add(part.topic());
                //如果待发送的set集合里面不包含当前待发送的node,且这个topic partition不是处于正在发送中(保证单个分区消息的有序性) 
                 } else if (!readyNodes.contains(leader) && !muted.contains(part)) {
                    RecordBatch batch = deque.peekFirst();
                    if (batch != null) {
                        //判断当前待发送的batch是否处于重发中，也就是再等待下次进行重发
                        boolean backingOff = batch.attempts > 0 && batch.lastAttemptMs + retryBackoffMs > nowMs;
                        //计算等待时间
                        long waitedTimeMs = nowMs - batch.lastAttemptMs;
                        //需要等待的时间
                        long timeToWaitMs = backingOff ? retryBackoffMs : lingerMs;
                        //还需要等待的时间
                        long timeLeftMs = Math.max(timeToWaitMs - waitedTimeMs, 0);
                        //batch是否已经满了
                         boolean full = deque.size() > 1 || batch.records.isFull();
                       //是否超时了，也就是等待时间太长了
                         boolean expired = waitedTimeMs >= timeToWaitMs;
                         boolean sendable = full || expired || exhausted || closed || flushInProgress();
                        //bathc已满或者 batch已经在缓存池待了足够长时间或者缓冲池被关闭了，那么就加入待发送set里面
                        if (sendable && !backingOff) {
                            readyNodes.add(leader);
                        } else {
                            // Note that this results in a conservative estimate since an un-sendable partition may have
                            // a leader that will later be found to have sendable data. However, this is good enough
                            // since we'll just wake up and then sleep again for the remaining time.
                            nextReadyCheckDelayMs = Math.min(timeLeftMs, nextReadyCheckDelayMs);
                        }
                    }
                }
            }
        }

        return new ReadyCheckResult(readyNodes, nextReadyCheckDelayMs, unknownLeaderTopics);
    }

将这些已经准备好的node返回后，接下来看一下drain方法，drain方法的作用就是，根据上一步返回的给定节点信息，从这个已经准备好消息发送的节点里面，从缓冲区中抽取节点下作为leader的topic partition的batch集合。

  public Map> drain(Cluster cluster,
                                                 Set nodes,
                                                 int maxSize,
                                                 long now) {
        if (nodes.isEmpty())
            return Collections.emptyMap();

        Map> batches = new HashMap<>();
      //遍历节点
       for (Node node : nodes) {
            int size = 0;
            //获取某个节点下所有的有Leader的分区信息
            List parts = cluster.partitionsForNode(node.id());
            List ready = new ArrayList<>();
            /* to make starvation less likely this loop doesn't start at 0 */
            int start = drainIndex = drainIndex % parts.size();
            do {
                PartitionInfo part = parts.get(drainIndex);
                TopicPartition tp = new TopicPartition(part.topic(), part.partition());
                // Only proceed if the partition has no in-flight batches.
                if (!muted.contains(tp)) {
                  //从缓存区中根据分区topic,分区id，找到对应的消息队列
                   Deque deque = getDeque(new TopicPartition(part.topic(), part.partition()));
                    if (deque != null) {
                        synchronized (deque) {
                            //获取队列的第一个batch
                            RecordBatch first = deque.peekFirst();
                            if (first != null) {
                                boolean backoff = first.attempts > 0 && first.lastAttemptMs + retryBackoffMs > now;
                                // Only drain the batch if it is not during backoff period.
                                if (!backoff) {
                                    if (size + first.records.sizeInBytes() > maxSize && !ready.isEmpty()) {
                                        // there is a rare case that a single batch size is larger than the request size due
                                        // to compression; in this case we will still eventually send this batch in a single
                                        // request
                                        break;
                                    } else {
                                        RecordBatch batch = deque.pollFirst();
                                        batch.records.close();
                                        size += batch.records.sizeInBytes();
                                        ready.add(batch);
                                        batch.drainedMs = now;
                                    }
                                }
                            }
                        }
                    }
                }
                this.drainIndex = (this.drainIndex + 1) % parts.size();
            } while (start != drainIndex);
           //最后返回的key为某个node,value是这个node上所有需要发送的消息batch 
            batches.put(node.id(), ready);
        }
        return batches;
    }

下一个重点是createProduceRequests，这个方法是Sender用来将待发送的消息封装成ClientRequest

List requests = createProduceRequests(batches, now);

 private List createProduceRequests(Map> collated, long now) {
        List requests = new ArrayList(collated.size());
        for (Map.Entry> entry : collated.entrySet())
           //每个node节点只会构造一个ClientRequest
           requests.add(produceRequest(now, entry.getKey(), acks, requestTimeout, entry.getValue()));
        return requests;
    }

private ClientRequest produceRequest(long now, int destination, short acks, int timeout, List batches) {
        //一个分区和该分区的   
        Map produceRecordsByPartition = new HashMap(batches.size());
        //某个分区和对应该分区的RecordBatch映射
        final Map recordsByPartition = new HashMap(batches.size());
        //将RecordBatch列表按照partition进行分类，整理为上面2个Map的key,value
        for (RecordBatch batch : batches) {
            TopicPartition tp = batch.topicPartition;
            produceRecordsByPartition.put(tp, batch.records.buffer());
            recordsByPartition.put(tp, batch);
        }
        ProduceRequest request = new ProduceRequest(acks, timeout, produceRecordsByPartition);
        RequestSend send = new RequestSend(Integer.toString(destination),
                                           this.client.nextRequestHeader(ApiKeys.PRODUCE),
                                           request.toStruct());
         //创建回调函数
         RequestCompletionHandler callback = new RequestCompletionHandler() {
            public void onComplete(ClientResponse response) {
                handleProduceResponse(response, recordsByPartition, time.milliseconds());
            }
        };

        return new ClientRequest(now, acks != 0, send, callback);
    }

至于ClientRequest后面如何和kafka server进行IO的，鉴于篇幅原因，本文到此截止，基本上大流程已经清楚了，后续会分几个小节分别对涉及的类进行细节分析，敬请期待。

你可能感兴趣的:(kafka)

Kafka 消息丢失如何处理？架构文摘JGWZ 学习
今天给大家分享一个在面试中经常遇到的问题：Kafka消息丢失该如何处理？这个问题啊，看似简单，其实里面藏着很多“套路”。来，咱们先讲一个面试的“真实”案例。面试官问：“Kafka消息丢失如何处理？”小明一听，反问：“你是怎么发现消息丢失了？”面试官顿时一愣，沉默了片刻后，可能有点不耐烦，说道：“这个你不用管，反正现在发现消息丢失了，你就说如何处理。”小明一头雾水：“问题是都不知道怎么丢的，处理起来
【六】阿伟开始搭建Kafka学习环境能源恒观中间件学习 kafka spring
阿伟开始搭建Kafka学习环境概述上一篇文章阿伟学习了Kafka的核心概念，并且把市面上流行的消息中间件特性进行了梳理和对比，方便大家在学习过程中进行对比学习，最后梳理了一些Kafka使用中经常遇到的Kafka难题以及解决思路，经过上一篇的学习我相信大家对Kafka有了初步的认识，本篇将继续学习Kafka。一、安装和配置学习一项技术首先要搭建一套服务，而Kafka的运行主要需要部署jdk、zook
Java面试题精选：消息队列(二) 芒果不是芒 Java面试题精选 java kafka
一、Kafka的特性1.消息持久化：消息存储在磁盘，所以消息不会丢失2.高吞吐量：可以轻松实现单机百万级别的并发3.扩展性：扩展性强，还是动态扩展4.多客户端支持：支持多种语言（Java、C、C++、GO、）5.KafkaStreams（一个天生的流处理）:在双十一或者销售大屏就会用到这种流处理。使用KafkaStreams可以快速的把销售额统计出来6.安全机制：Kafka进行生产或者消费的时候会
Kafka是如何保证数据的安全性、可靠性和分区的喜欢猪猪 kafka 分布式
Kafka作为一个高性能、可扩展的分布式流处理平台，通过多种机制来确保数据的安全性、可靠性和分区的有效管理。以下是关于Kafka如何保证数据安全性、可靠性和分区的详细解析：一、数据安全性SSL/TLS加密：Kafka支持SSL/TLS协议，通过配置SSL证书和密钥来加密数据传输，确保数据在传输过程中不会被窃取或篡改。这一机制有效防止了中间人攻击，保护了数据的安全性。SASL认证：Kafka支持多种
Kafka详细解析与应用分析芊言芊语 kafka 分布式
Kafka是一个开源的分布式事件流平台（EventStreamingPlatform），由LinkedIn公司最初采用Scala语言开发，并基于ZooKeeper协调管理。如今，Kafka已经被Apache基金会纳入其项目体系，广泛应用于大数据实时处理领域。Kafka凭借其高吞吐量、持久化、分布式和可靠性的特点，成为构建实时流数据管道和流处理应用程序的重要工具。Kafka架构Kafka的架构主要由
Kafka 基础与架构理解 StaticKing KAFKA kafka
目录前言Kafka基础概念消息队列简介：Kafka与传统消息队列（如RabbitMQ、ActiveMQ）的对比Kafka的组件Kafka的工作原理：消息的生产、分发、消费流程Kafka系统架构Kafka的分布式架构设计Leader-Follower机制与数据复制Log-basedStorage和持久化Broker间通信协议Zookeeper在Kafka中的角色总结前言Kafka是一个分布式的消息系
全面指南：用户行为从前端数据采集到实时处理的最佳实践数字沉思营销流量运营系统架构前端内容运营大数据
引言在当今的数据驱动世界，实时数据采集和处理已经成为企业做出及时决策的重要手段。本文将详细介绍如何通过前端JavaScript代码采集用户行为数据、利用API和Kafka进行数据传输、通过Flink实时处理数据的完整流程。无论你是想提升产品体验还是做用户行为分析，这篇文章都将为你提供全面的解决方案。设计一个通用的ClickHouse表来存储用户事件时，需要考虑多种因素，包括事件类型、时间戳、用户信
Docker安装Kafka和Kafka-Manager 阿靖哦
本文介绍如何通过Docker安装kafka与kafka界面管理界面一、拉取zookeeper由于kafka需要依赖于zookeeper，因此这里先运行zookeeper1、拉取镜像dockerpullwurstmeister/zookeeper2、启动dockerrun-d--namezookeeper-p2181:2181-eTZ="Asia/Shanghai"--restartalwayswu
主流行架构 rainbowcheng 架构架构
nexus，gitlab,svn,jenkins,sonar,docker，apollo，catteambition，axure，蓝湖，禅道,WCP；redis，kafka，es，zookeeper，dubbo，shardingjdbc，mysql，InfluxDB，Telegraf，Grafana，Nginx，xxl-job，Neo4j,NebulaGraph是一个高性能的,NOSQL图形数据库
月度总结 | 2022年03月 | 考研与就业的抉择 | 确定未来走大数据开发路线「已注销」个人总结 hadoop
一、时间线梳理3月3日，寻找到同专业的就业伙伴3月5日，着手准备Java八股文，决定先走Java后端路线3月8月，申请到了校图书馆的考研专座，决定暂时放弃就业，先准备考研，买了数学和408的资料书3月9日-3月13日，因疫情原因，宿舍区暂封，这段时间在准备考研，发现内容特别多3月13日-3月19日，大部分时间在刷Hadoop、Zookeeper、Kafka的视频，同时在准备实习的项目3月20日，退
分布式消息队列Kafka 叶域大数据分布式 kafka scala spark
分布式消息队列Kafka简介：Kafka是一个分布式消息队列系统，用于处理实时数据流。消息按照主题（Topic）进行分类存储，发送消息的实体称为Producer，接收消息的实体称为Consumer。Kafka集群由多个Kafka实例（Server）组成，每个实例称为Broker。主要用途：广泛应用于构建实时数据管道和流应用程序，适用于需要高吞吐量和低延迟的数据处理场景依赖：Kafka集群和消费者依
K8S学习之PV&&PVC david161
部署mysql之前我们需要先了解一个概念有状态服务。这是一种特殊的服务，简单的归纳下就是会产生需要持久化的数据，并且有很强的I/O需求，且重启需要依赖上次存储到磁盘的数据。如典型的mysql，kafka，zookeeper等等。在我们有比较优秀的商业存储的前提下，非常推荐使用有状态服务进行部署，计算和存储分离那是相当的爽的。在实际生产中如果没有这种存储，localPV也是不错的选择，当然local
Kafka系列之：kafka命令详细总结快乐骑行^_^ 日常分享专栏 Kafka Kafka系列 kafka命令详细总结
Kafka系列之：kafka命令详细总结一、添加和删除topic二、修改topic三、平衡领导者四、检查消费者位置五、管理消费者群体一、添加和删除topicbin/kafka-topics.sh--bootstrap-serverbroker_host:port--create--topicmy_topic_name\--partitions20--replication-factor3--con
搭建Kafka+zookeeper集群调度 krb___ kafka 分布式
前言硬件环境172.18.0.5kafkazk1Kafka+zookeeperKafkaBroker集群172.18.0.6kafkazk2Kafka+zookeeperKafkaBroker集群172.18.0.7kafkazk3Kafka+zookeeperKafkaBroker集群软件环境zookeeper3.5.9资源调度、写作Kafka2.8.0消息通信中间件安装JDK1.8安装搭建zo
Kafka和Pulsar深入解析 jasen91 大数据开发 kafka 分布式
Kafka多租户：单租户系统数据迁移：依赖MirrorMaker，需要额外维护。市场上也有ConfluentReplicator等供应商工具。分层存储：由供应商提供商业使用。组件依赖：KafkaRaft（KRaft）从Kafka2.8开始处于早期访问模式，允许Kafka在没有ZooKeeper的情况下工作。这对Kafka来说是一个显著的优势，因为它简化了Kafka的体系结构并降低了学习成本。云原生
Linux系统部署Kafka教学情书学长 linux 学习笔记 kafka
第一步：Zookeeper安装（准备工作）1、解压安装将安装包上传到/opt/software目录下，解压并修改名称tar-zxvfapache-zookeeper-3.5.7-bin.tar.gz-C/opt/module/mvapache-zookeeper-3.5.7-bin/zookeeper2、配置服务器编号1)在/opt/module/zookeeper-3.5.7/这个目录下创建zk
数仓开发之DWD层完整使用 (第五章) 小坏讲微服务数据仓库 hadoop scala kafka
数仓开发之DWD层完整使用一、流量域未精加工的事务事实表1、主要任务1）数据清洗（ETL）2）新老访客状态标记修复3）分流2、思路1）数据清洗（ETL）2）新老访客状态标记修复（1）前端埋点新老访客状态标记设置规则（2）新老访客状态标记修复思路3）利用侧输出流实现数据拆分（1）埋点日志结构分析（2）分流日志分类（3）分流思路3、图解4、代码1）在KafkaUtil工具类中补充getKafkaPro
Kafka 应用场景 zinuxer kafka 分布式
数据流处理：Kafka支持实时数据流处理，能够在数据流动时进行处理和分析，确保应用程序与最新信息保持同步！日志聚合：可以将来自不同来源的日志集中和聚合，简化应用程序的调试和监控！消息队列：Kafka充当高性能的消息队列，确保不同系统组件之间可靠且可扩展的通信！网络活动追踪：Kafka可以追踪网络活动，改进用户体验和推动业务增长！数据复制：Kafka允许在多个集群之间实现无缝数据复制，确保高可用性和
Kafka的ack机制香山上的麻雀
ack=0/1/-1的不同情况：0：producer不等待broker的ack，broker一接收到还没有写入磁盘就已经返回，当broker故障时有可能丢失数据；1：producer等待broker的ack，partition的leader落盘成功后返回ack，如果在follower同步成功之前leader故障，那么将会丢失数据；-1：producer等待broker的ack，partition的
Kafka 实战 - Kafka分区和副本机制理解用心去追梦 kafka 分布式
ApacheKafka的分区（Partition）和副本（Replica）机制是其核心架构和可靠性保证的关键组成部分。以下是对其理解的详细解释：分区（Partition）分区概念：在Kafka中，每个主题（Topic）可以被划分为多个分区。分区是一个有序的、不可变的消息序列。这意味着消息在分区中按生成顺序存储，每个消息都有一个唯一的偏移量（Offset）。目的：分区的主要目的是为了水平扩展和并行处
编程常用命令总结 Yellow0523 Linux BigData 大数据
编程命令大全1.软件环境变量的配置JavaScalaSparkHadoopHive2.大数据软件常用命令Spark基本命令Spark-SQL命令Hive命令HDFS命令YARN命令Zookeeper命令kafka命令Hibench命令MySQL命令3.Linux常用命令Git命令conda命令pip命令查看Linux系统的详细信息查看Linux系统架构(X86还是ARM，两种方法都可)端口号命令L
zookeeper+kafka消息队列部署 TBF610218 zookeeper kafka 分布式
消息队列的概念什么是消息队列消息是指在应用间传送的数据消息队列是一种应用间的通信方式解决方法，确保消息的可靠传递专门为消息做缓存的消息队列的特征存储将消息存储在某个类型的缓冲区中，指导目标进读取这些消息或者将其从消息队列中显示移除为止异步消息队列通过缓冲消息可以在应用程序当中公开一定程度的异步性，允许源进程发送消息并在队列当中累积消息，而且目标进程可以挑选消息并进行处理为什么需要消息队列解耦冗余扩
分布式中间件-几个常用的消息中间件问道飞鱼分布式技术分布式中间件
文章目录常见消息中间件1.RabbitMQ2.ApacheKafka3.RedisPub/Sub4.ActiveMQ5.AmazonSimpleNotificationService(SNS)和SimpleQueueService(SQS)6.RocketMQ差异总结消息协议1.AMQP(AdvancedMessageQueuingProtocol)2.STOMP(SimpleTextOrient
kafka php 教程,php 使用kafka weixin_39713841 kafka php 教程
准备工作gitclonehttps://github.com/edenhill/librdkafka.git./configuremakesudomakeinstall$gitclonehttps://github.com/arnaud-lb/php-rdkafka.git#生成configure文件$/Users/shiyibo/LNMP/php/bin/phpize#编译安装$./config
Kafka快速入门 G丶AEOM 速成学习区 kafka linq 分布式
讲一下什么是Kafka首先引入这样一个场景：A服务可以发送200qps（QueriesPerSecond，是指每秒查询率），而B服务可以处理100qps。很显然，B服务很可能会被A服务压垮掉。怎么为了保证B不被压垮的同时还能处理A消息，没有什么是不能通过一层中间件解决的，如果有，那就再加一层。开始很容易想到，可以在B服务中增加一个队列，其实就是个链表，B服务根据自己的消费能力，消费链表中的消息。每
【Python系列】异步任务的终止 Kwan的解忧杂货铺@新空间代码工作室 s2 Python python 开发语言
欢迎来到我的博客，很高兴能够在这里和您见面！希望您在这里可以感受到一份轻松愉快的氛围，不仅可以获得有趣的内容和知识，也可以畅所欲言、分享您的想法和见解。推荐:kwan的首页,持续学习,不断总结,共同进步,活到老学到老导航檀越剑指大厂系列:全面总结java核心技术,jvm,并发编程redis,kafka,Spring,微服务等常用开发工具系列:常用的开发工具,IDEA,Mac,Alfred,Git,
老版本kafka查询topic消费情况(python查询) 代码是谁 kafka python 分布式
由于老版本的kafka缺少shell，导致无法通过命令直接进行查询，所以通过python代码，实现消费情况查询安装必须的包#pyhon2.5pipinstallkafka-python==1.4.7python脚本#!/usr/bin/envpythonimportsysfromkafkaimportKafkaConsumer,TopicPartitioniflen(sys.argv)!=2:pr
【Python系列】使用切片移动元素位置 Kwan的解忧杂货铺@新空间代码工作室 s2 Python python 开发语言
欢迎来到我的博客，很高兴能够在这里和您见面！希望您在这里可以感受到一份轻松愉快的氛围，不仅可以获得有趣的内容和知识，也可以畅所欲言、分享您的想法和见解。推荐:kwan的首页,持续学习,不断总结,共同进步,活到老学到老导航檀越剑指大厂系列:全面总结java核心技术,jvm,并发编程redis,kafka,Spring,微服务等常用开发工具系列:常用的开发工具,IDEA,Mac,Alfred,Git,
字节架构师：来说说 Kafka 的消费者客户端详解，你都搞懂了吗？ 2401_84049200 程序员 kafka linq 分布式
点对点模式基于队列，类似于同一个消费者组中的数据，由生产者发送数据到分区，然后消费者拉取分区的消息进行消费，此时消息只能被同一个消费者组的消费者消费一次。发布订阅模式模式就是kafka中的分区消息可以被不同消费者组的消费者消费。这就是一对多的广播模式应用。当然，消费者组是一个逻辑的概念，通过客户端参数group.id来配置，默认值为空字符串。而消费者并不是逻辑的概念，它是真正消费数据的实体，可以是
Java Kafka生产者实现 stormsha Java web java kafka linq
欢迎莅临我的博客，很高兴能够在这里和您见面！希望您在这里可以感受到一份轻松愉快的氛围，不仅可以获得有趣的内容和知识，也可以畅所欲言、分享您的想法和见解。推荐：「stormsha的主页」，「stormsha的知识库」持续学习，不断总结，共同进步，为了踏实，做好当下事儿~专栏导航Python系列:Python面试题合集，剑指大厂Git系列:Git操作技巧GO系列:记录博主学习GO语言的笔记，该笔记专栏
java线程Thread和Runnable区别和联系 zx_code java jvm thread 多线程 Runnable
我们都晓得java实现线程2种方式，一个是继承Thread，另一个是实现Runnable。模拟窗口买票，第一例子继承thread，代码如下 package thread; public class ThreadTest { public static void main(String[] args) { Thread1 t1 = new Thread1(
【转】JSON与XML的区别比较丁_新 json xml
1.定义介绍 (1).XML定义扩展标记语言 (Extensible Markup Language, XML) ，用于标记电子文件使其具有结构性的标记语言，可以用来标记数据、定义数据类型，是一种允许用户对自己的标记语言进行定义的源语言。 XML使用DTD(document type definition)文档类型定义来组织数据;格式统一，跨平台和语言，早已成为业界公认的标准。 XML是标
c++ 实现五种基础的排序算法 CrazyMizzz C++c 算法
#include<iostream> using namespace std; //辅助函数，交换两数之值 template<class T> void mySwap(T &x, T &y){ T temp = x; x = y; y = temp; } const int size = 10; //一、用直接插入排
我的软件麦田的设计者我的软件音乐类娱乐放松
这是我写的一款app软件，耗时三个月，是一个根据央视节目开门大吉改变的，提供音调，猜歌曲名。1、手机拥有者在android手机市场下载本APP，同意权限，安装到手机上。2、游客初次进入时会有引导页面提醒用户注册。（同时软件自动播放背景音乐）。3、用户登录到主页后，会有五个模块。a、点击不胫而走，用户得到开门大吉首页部分新闻，点击进入有新闻详情。b、
linux awk命令详解被触发 linux awk
awk是行处理器: 相比较屏幕处理的优点，在处理庞大文件时不会出现内存溢出或是处理缓慢的问题，通常用来格式化文本信息 awk处理过程: 依次对每一行进行处理，然后输出 awk命令形式: awk [-F|-f|-v] ‘BEGIN{} //{command1; command2} END{}’ file [-F|-f|-v]大参数，-F指定分隔符，-f调用脚本，-v定义变量 var=val
各种语言比较 _wy_ 编程语言
Java Ruby PHP 擅长领域
oracle 中数据类型为clob的编辑知了ing oracle clob
public void updateKpiStatus(String kpiStatus,String taskId){ Connection dbc=null; Statement stmt=null; PreparedStatement ps=null; try { dbc = new DBConn().getNewConnection(); //stmt = db
分布式服务框架 Zookeeper -- 管理分布式环境中的数据矮蛋蛋 zookeeper
原文地址： http://www.ibm.com/developerworks/cn/opensource/os-cn-zookeeper/ 安装和配置详解本文介绍的 Zookeeper 是以 3.2.2 这个稳定版本为基础，最新的版本可以通过官网 http://hadoop.apache.org/zookeeper/来获取，Zookeeper 的安装非常简单，下面将从单机模式和集群模式两
tomcat数据源 alafqq tomcat
数据库 JNDI(Java Naming and Directory Interface，Java命名和目录接口)是一组在Java应用中访问命名和目录服务的API。没有使用JNDI时我用要这样连接数据库： 03. Class.forName("com.mysql.jdbc.Driver"); 04. conn
遍历的方法百合不是茶遍历
遍历在java的泛
linux查看硬件信息的命令 bijian1013 linux
linux查看硬件信息的命令一.查看CPU： cat /proc/cpuinfo 二.查看内存： free 三.查看硬盘： df linux下查看硬件信息 1、lspci 列出所有PCI 设备； lspci - list all PCI devices:列出机器中的PCI设备（声卡、显卡、Modem、网卡、USB、主板集成设备也能
java常见的ClassNotFoundException bijian1013 java
1.java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory 添加包common-logging.jar2.java.lang.ClassNotFoundException: javax.transaction.Synchronization
【Gson五】日期对象的序列化和反序列化 bit1129 反序列化
对日期类型的数据进行序列化和反序列化时，需要考虑如下问题： 1. 序列化时，Date对象序列化的字符串日期格式如何 2. 反序列化时，把日期字符串序列化为Date对象，也需要考虑日期格式问题 3. Date A -> str -> Date B,A和B对象是否equals 默认序列化和反序列化 import com
【Spark八十六】Spark Streaming之DStream vs. InputDStream bit1129 Stream
1. DStream的类说明文档： /** * A Discretized Stream (DStream), the basic abstraction in Spark Streaming, is a continuous * sequence of RDDs (of the same type) representing a continuous st
通过nginx获取header信息 ronin47 nginx header
1. 提取整个的Cookies内容到一个变量，然后可以在需要时引用，比如记录到日志里面， if ( $http_cookie ~* "(.*)$") { set $all_cookie $1; } 变量$all_cookie就获得了cookie的值，可以用于运算了
java-65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 bylijinnan java
参考了网上的http://blog.csdn.net/peasking_dd/article/details/6342984 写了个java版的： public class Print_1_To_NDigit { /** * Q65.输入数字n，按顺序输出从1最大的n位10进制数。比如输入3，则输出1、2、3一直到最大的3位数即999 * 1.使用字符串
Netty源码学习-ReplayingDecoder bylijinnan java netty
ReplayingDecoder是FrameDecoder的子类，不熟悉FrameDecoder的，可以先看看 http://bylijinnan.iteye.com/blog/1982618 API说，ReplayingDecoder简化了操作，比如： FrameDecoder在decode时，需要判断数据是否接收完全： public class IntegerH
js特殊字符过滤 cngolon js特殊字符 js特殊字符过滤
1.js中用正则表达式过滤特殊字符, 校验所有输入域是否含有特殊符号function stripscript(s) { var pattern = new RegExp("[`~!@#$^&*()=|{}':;',\\[\\].<>/?~！@#￥……&*（）——|{}【】‘；：”“'。，、？]"
hibernate使用sql查询 ctrain Hibernate
import java.util.Iterator; import java.util.List; import java.util.Map; import org.hibernate.Hibernate; import org.hibernate.SQLQuery; import org.hibernate.Session; import org.hibernate.Transa
linux shell脚本中切换用户执行命令方法 daizj linux shell 命令切换用户
经常在写shell脚本时，会碰到要以另外一个用户来执行相关命令，其方法简单记下： 1、执行单个命令：su - user -c "command" 如：下面命令是以test用户在/data目录下创建test123目录 [root@slave19 /data]# su - test -c "mkdir /data/test123"
好的代码里只要一个 return 语句 dcj3sjt126com return
别再这样写了：public boolean foo() { if (true) { return true; } else { return false;
Android动画效果学习 dcj3sjt126com android
1、透明动画效果方法一：代码实现 public View onCreateView(LayoutInflater inflater, ViewGroup container, Bundle savedInstanceState) { View rootView = inflater.inflate(R.layout.fragment_main, container, fals
linux复习笔记之bash shell (4)管道命令 eksliang linux管道命令汇总 linux管道命令 linux常用管道命令
转载请出自出处： http://eksliang.iteye.com/blog/2105461 bash命令执行的完毕以后，通常这个命令都会有返回结果，怎么对这个返回的结果做一些操作呢？那就得用管道命令‘|’。上面那段话，简单说了下管道命令的作用，那什么事管道命令呢？答：非常的经典的一句话，记住了，何为管
Android系统中自定义按键的短按、双击、长按事件 gqdy365 android
在项目中碰到这样的问题：由于系统中的按键在底层做了重新定义或者新增了按键，此时需要在APP层对按键事件（keyevent）做分解处理，模拟Android系统做法，把keyevent分解成： 1、单击事件：就是普通key的单击； 2、双击事件：500ms内同一按键单击两次； 3、长按事件：同一按键长按超过1000ms（系统中长按事件为500ms）； 4、组合按键：两个以上按键同时按住；
asp.net获取站点根目录下子目录的名称 hvt .net C#asp.net hovertree Web Forms
使用Visual Studio建立一个.aspx文件(Web Forms)，例如hovertree.aspx,在页面上加入一个ListBox代码如下： <asp:ListBox runat="server" ID="lbKeleyiFolder" /> 那么在页面上显示根目录子文件夹的代码如下： string[] m_sub
Eclipse程序员要掌握的常用快捷键 justjavac java eclipse 快捷键 ide
判断一个人的编程水平，就看他用键盘多，还是鼠标多。用键盘一是为了输入代码（当然了，也包括注释），再有就是熟练使用快捷键。曾有人在豆瓣评《卓有成效的程序员》：“人有多大懒，才有多大闲”。之前我整理了一个程序员图书列表，目的也就是通过读书，让程序员变懒。写道程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可
c++编程随记 lx.asymmetric C++笔记
为了字体更好看，改变了格式…… &&运算符： #include<iostream> using namespace std; int main(){ int a=-1,b=4,k; k=(++a<0)&&!(b--
linux标准IO缓冲机制研究音频数据 linux
一、什么是缓存I/O(Buffered I/O)缓存I/O又被称作标准I/O,大多数文件系统默认I/O操作都是缓存I/O。在Linux的缓存I/O机制中，操作系统会将I/O的数据缓存在文件系统的页缓存(page cache)中，也就是说，数据会先被拷贝到操作系统内核的缓冲区中，然后才会从操作系统内核的缓冲区拷贝到应用程序的地址空间。1.缓存I/O有以下优点:A.缓存I/O使用了操作系统内核缓冲区，
随想生活暗黑小菠萝生活
其实账户之前就申请了，但是决定要自己更新一些东西看也是最近。从毕业到现在已经一年了。没有进步是假的，但是有多大的进步可能只有我自己知道。毕业的时候班里12个女生，真正最后做到软件开发的只要两个包括我，PS：我不是说测试不好。当时因为考研完全放弃找工作，考研失败，我想这只是我的借口。那个时候才想到为什么大学的时候不能好好的学习技术，增强自己的实战能力，以至于后来找工作比较费劲。我
我认为POJO是一个错误的概念 windshome java POJO 编程 J2EE 设计
这篇内容其实没有经过太多的深思熟虑，只是个人一时的感觉。从个人风格上来讲，我倾向简单质朴的设计开发理念；从方法论上，我更加倾向自顶向下的设计；从做事情的目标上来看，我追求质量优先，更愿意使用较为保守和稳妥的理念和方法。 &