新手路上的程序员

Flink 源码解析 Flink-Connector (Kafka)

 
  org.apache.flink
  flink-connector-kafka_2.11 
  1.11.1

FlinkKafkaConsumerBase:
所有Flink Kafka Consumer数据源的基类。这个类实现了所有Kafka版本的公共行为

open方法

	@Override
	public void open(Configuration configuration) throws Exception {
		// determine the offset commit mode
		//获取到offset的提交方式
		this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
				getIsAutoCommitEnabled(),
				enableCommitOnCheckpoints,
				((StreamingRuntimeContext) getRuntimeContext()).isCheckpointingEnabled());

		// create the partition discoverer
		//创建分区发现器
		this.partitionDiscoverer = createPartitionDiscoverer(
				topicsDescriptor,
				getRuntimeContext().getIndexOfThisSubtask(),
				getRuntimeContext().getNumberOfParallelSubtasks());
		//初始化kafkaConsumer对象
		this.partitionDiscoverer.open();

		subscribedPartitionsToStartOffsets = new HashMap<>();
		//为每个SubTask分配消费的topic以及对应分区
		final List allPartitions = partitionDiscoverer.discoverPartitions();
		//判断是否从checkPoint中启动
		if (restoredState != null) {
			for (KafkaTopicPartition partition : allPartitions) {
				//如果checkPoint中没有找到对应的这个分区 则默认从EARLIEST_OFFSET开始读取
				if (!restoredState.containsKey(partition)) {
					restoredState.put(partition, KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET);
				}
			}

			for (Map.Entry restoredStateEntry : restoredState.entrySet()) {
				// seed the partition discoverer with the union state while filtering out
				// restored partitions that should not be subscribed by this subtask
				//返回应该分配给此SubTask特定Kafka分区的索引
				if (KafkaTopicPartitionAssigner.assign(
					restoredStateEntry.getKey(), getRuntimeContext().getNumberOfParallelSubtasks())
						== getRuntimeContext().getIndexOfThisSubtask()){
          // 将restoredState中保存的一组topic的partition和要开始读取的起始偏移量保存到subscribedPartitionsToStartOffsets
          // 其中restoredStateEntry.getKey为某个Topic的摸个partition,restoredStateEntry.getValue为该partition的要开始读取的起始偏移量
          subscribedPartitionsToStartOffsets.put(
              restoredStateEntry.getKey(), restoredStateEntry.getValue());
				}
			}
			//过滤掉topic名称不符合topicsDescriptor的topicPattern的分区
			if (filterRestoredPartitionsWithCurrentTopicsDescriptor) {
				subscribedPartitionsToStartOffsets.entrySet().removeIf(entry -> {
					if (!topicsDescriptor.isMatchingTopic(entry.getKey().getTopic())) {
						LOG.warn(
							"{} is removed from subscribed partitions since it is no longer associated with topics descriptor of current execution.",
							entry.getKey());
						return true;
					}
					return false;
				});
			}

			LOG.info("Consumer subtask {} will start reading {} partitions with offsets in restored state: {}",
				getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets.size(), subscribedPartitionsToStartOffsets);
		} else {
			// use the partition discoverer to fetch the initial seed partitions,
			// and set their initial offsets depending on the startup mode.
			// for SPECIFIC_OFFSETS and TIMESTAMP modes, we set the specific offsets now;
			// for other modes (EARLIEST, LATEST, and GROUP_OFFSETS), the offset is lazily determined
			// when the partition is actually read.
			//通过startupMode来获取topic对应分区的偏移量 默认是GROUP_OFFSETS
			//GROUP_OFFSETS  Start from committed offsets in ZK / Kafka brokers of a specific consumer group (default)
			switch (startupMode) {
				//初始化给定topic 分区对应的offset
				case SPECIFIC_OFFSETS:
					if (specificStartupOffsets == null) {
						throw new IllegalStateException(
							"Startup mode for the consumer set to " + StartupMode.SPECIFIC_OFFSETS +
								", but no specific offsets were specified.");
					}

					for (KafkaTopicPartition seedPartition : allPartitions) {
						Long specificOffset = specificStartupOffsets.get(seedPartition);
						if (specificOffset != null) {
							// since the specified offsets represent the next record to read, we subtract
							// it by one so that the initial state of the consumer will be correct
							subscribedPartitionsToStartOffsets.put(seedPartition, specificOffset - 1);
						} else {
							// default to group offset behaviour if the user-provided specific offsets
							// do not contain a value for this partition
							subscribedPartitionsToStartOffsets.put(seedPartition, KafkaTopicPartitionStateSentinel.GROUP_OFFSET);
						}
					}

					break;
				//通过时间确定topic 分区对应的offset
				case TIMESTAMP:
					if (startupOffsetsTimestamp == null) {
						throw new IllegalStateException(
							"Startup mode for the consumer set to " + StartupMode.TIMESTAMP +
								", but no startup timestamp was specified.");
					}

					for (Map.Entry partitionToOffset
							: fetchOffsetsWithTimestamp(allPartitions, startupOffsetsTimestamp).entrySet()) {
						subscribedPartitionsToStartOffsets.put(
							partitionToOffset.getKey(),
							(partitionToOffset.getValue() == null)
									// if an offset cannot be retrieved for a partition with the given timestamp,
									// we default to using the latest offset for the partition
									? KafkaTopicPartitionStateSentinel.LATEST_OFFSET
									// since the specified offsets represent the next record to read, we subtract
									// it by one so that the initial state of the consumer will be correct
									: partitionToOffset.getValue() - 1);
					}

					break;
				default:
					for (KafkaTopicPartition seedPartition : allPartitions) {
						//SPECIFIC_OFFSETS
						subscribedPartitionsToStartOffsets.put(seedPartition, startupMode.getStateSentinel());
					}
			}
			//SubTask订阅的topic 分区对应offset指针是否全未找到
			if (!subscribedPartitionsToStartOffsets.isEmpty()) {
				switch (startupMode) {
					//LOG打印出SubTask订阅了几个分区 每个topic以及对应的分区编号
					case EARLIEST:
						LOG.info("Consumer subtask {} will start reading the following {} partitions from the earliest offsets: {}",
							getRuntimeContext().getIndexOfThisSubtask(),
							subscribedPartitionsToStartOffsets.size(),
							subscribedPartitionsToStartOffsets.keySet());
						break;
					case LATEST:
						LOG.info("Consumer subtask {} will start reading the following {} partitions from the latest offsets: {}",
							getRuntimeContext().getIndexOfThisSubtask(),
							subscribedPartitionsToStartOffsets.size(),
							subscribedPartitionsToStartOffsets.keySet());
						break;
					case TIMESTAMP:
						LOG.info("Consumer subtask {} will start reading the following {} partitions from timestamp {}: {}",
							getRuntimeContext().getIndexOfThisSubtask(),
							subscribedPartitionsToStartOffsets.size(),
							startupOffsetsTimestamp,
							subscribedPartitionsToStartOffsets.keySet());
						break;
					case SPECIFIC_OFFSETS:
						LOG.info("Consumer subtask {} will start reading the following {} partitions from the specified startup offsets {}: {}",
							getRuntimeContext().getIndexOfThisSubtask(),
							subscribedPartitionsToStartOffsets.size(),
							specificStartupOffsets,
							subscribedPartitionsToStartOffsets.keySet());

						List partitionsDefaultedToGroupOffsets = new ArrayList<>(subscribedPartitionsToStartOffsets.size());
						for (Map.Entry subscribedPartition : subscribedPartitionsToStartOffsets.entrySet()) {
							//在指定偏移量时 有些分区的偏移量未指定或者指定失败则放入partitionsDefaultedToGroupOffsets中
							if (subscribedPartition.getValue() == KafkaTopicPartitionStateSentinel.GROUP_OFFSET) {
								partitionsDefaultedToGroupOffsets.add(subscribedPartition.getKey());
							}
						}

						if (partitionsDefaultedToGroupOffsets.size() > 0) {
							LOG.warn("Consumer subtask {} cannot find offsets for the following {} partitions in the specified startup offsets: {}" +
									"; their startup offsets will be defaulted to their committed group offsets in Kafka.",
								getRuntimeContext().getIndexOfThisSubtask(),
								partitionsDefaultedToGroupOffsets.size(),
								partitionsDefaultedToGroupOffsets);
						}
						break;
					case GROUP_OFFSETS:
						LOG.info("Consumer subtask {} will start reading the following {} partitions from the committed group offsets in Kafka: {}",
							getRuntimeContext().getIndexOfThisSubtask(),
							subscribedPartitionsToStartOffsets.size(),
							subscribedPartitionsToStartOffsets.keySet());
				}
			} else {
				//SubTask订阅topic 分区offset未找到
				LOG.info("Consumer subtask {} initially has no partitions to read from.",
					getRuntimeContext().getIndexOfThisSubtask());
			}
		}

		this.deserializer.open(
				RuntimeContextInitializationContextAdapters.deserializationAdapter(
						getRuntimeContext(),
						metricGroup -> metricGroup.addGroup("user")
				)
		);
	}

（1）指定offer的提交模式

OffffsetCommitMode:表示偏移量如何从外部提交回Kafka brokers/ Zookeeper的行为它的确切值是在运行时在使用者子任务中确定的。
在使用kafka时一般都是默认启动了checkPoint&&在checkPoint时提交offerset

	public static OffsetCommitMode fromConfiguration(
			boolean enableAutoCommit,
			boolean enableCommitOnCheckpoint,
			boolean enableCheckpointing) {
		//是否开启了checkPoint
		if (enableCheckpointing) {
			// if checkpointing is enabled, the mode depends only on whether committing on checkpoints is enabled
			// enableCommitOnCheckpoint 是否启用了在检查点上的提交Offset  
			return (enableCommitOnCheckpoint) ? OffsetCommitMode.ON_CHECKPOINTS : OffsetCommitMode.DISABLED;
		} else {
			// else, the mode depends only on whether auto committing is enabled in the provided Kafka properties
			//enableAutoCommit 是否启用了自动提交Offset
			return (enableAutoCommit) ? OffsetCommitMode.KAFKA_PERIODIC : OffsetCommitMode.DISABLED;
		}
	}

在这里解释OffffsetCommitMode

public enum OffsetCommitMode {

	/** Completely disable offset committing. */
	//禁用offset的提交
	DISABLED,

	/** Commit offsets back to Kafka only when checkpoints are completed. */
	//在checkPoint完成时提交offset
	ON_CHECKPOINTS,

	/** Commit offsets periodically back to Kafka, using the auto commit functionality of internal Kafka clients. */
	//使用内部Kafka客户机的自动提交功能，定期将偏移量提交回Kafka。
	KAFKA_PERIODIC;
}

（2）接下来创建和启动分区发现工具

	/**
	 * Creates the partition discoverer that is used to find new partitions for this subtask.
	 * 创建用于为此子任务查找新分区的分区发现器。
	 * @param topicsDescriptor Descriptor that describes whether we are discovering partitions for fixed topics or a topic pattern.
	 *                         传入的topic是固定的topic还是正则表达式的topic
	 * @param indexOfThisSubtask The index of this consumer subtask.
	 *                             子任务的索引
	 * @param numParallelSubtasks The total number of parallel consumer subtasks.
	 *							 子任务的总数(并行度)
	 * @return The instantiated partition discoverer
	 */
	protected abstract AbstractPartitionDiscoverer createPartitionDiscoverer(
			KafkaTopicsDescriptor topicsDescriptor,
			int indexOfThisSubtask,
			int numParallelSubtasks);

(3) 打开分区发现程序，初始化所有需要的Kafka连接。创建出KafkaConsumer对象。

	/**
	 * Opens the partition discoverer, initializing all required Kafka connections.
	 *
	 * NOTE: thread-safety is not guaranteed. 这个是线程不安全的
	 */
	public void open() throws Exception {
		closed = false;
		initializeConnections();
	}

	/** Establish the required connections in order to fetch topics and partitions metadata.
	 *  建立连接以获取主题和分区元数据
	 *  */
	protected abstract void initializeConnections() throws Exception;

	@Override
	protected void initializeConnections() {
        //创建kafkaConsumer对象
		this.kafkaConsumer = new KafkaConsumer<>(kafkaProperties);
	}

（4）已订阅的分区列表，这里将它初始化

private Map subscribedPartitionsToStartOffsets;
//已订阅的分区列表，这里将它初始化
subscribedPartitionsToStartOffsets = new HashMap<>();

（5）获取每个SubTask分配消费的topic以及对应分区

final List allPartitions = partitionDiscoverer.discoverPartitions();


	//发现分区的执行过程
	public List discoverPartitions() throws WakeupException, ClosedException {
		//判断SubTask是否关闭或者未被唤醒
		if (!closed && !wakeup) {
			try {
				List newDiscoveredPartitions;

				// (1) get all possible partitions, based on whether we are subscribed to fixed topics or a topic pattern
				//判断topic是否是固定的topic 固定topic和正则表达式topic都是一样的 目的是获取topic已经对应分区的元数据信息
				if (topicsDescriptor.isFixedTopics()) {
					//获取topic对应的分区元数据信息
					newDiscoveredPartitions = getAllPartitionsForTopics(topicsDescriptor.getFixedTopics());
				} else {
					List matchedTopics = getAllTopics();

					// retain topics that match the pattern
					Iterator iter = matchedTopics.iterator();
					while (iter.hasNext()) {
						if (!topicsDescriptor.isMatchingTopic(iter.next())) {
							iter.remove();
						}
					}
					//如果有匹配的topic 则获取对应的分区
					if (matchedTopics.size() != 0) {
						// get partitions only for matched topics
						newDiscoveredPartitions = getAllPartitionsForTopics(matchedTopics);
					} else {
						//否则将newDiscoveredPartitions设置为null
						newDiscoveredPartitions = null;
					}
				}

				// (2) eliminate partition that are old partitions or should not be subscribed by this subtask
				//删除旧分区或不应由此子任务订阅的分区
				if (newDiscoveredPartitions == null || newDiscoveredPartitions.isEmpty()) {
					throw new RuntimeException("Unable to retrieve any partitions with KafkaTopicsDescriptor: " + topicsDescriptor);
				} else {
					Iterator iter = newDiscoveredPartitions.iterator();
					KafkaTopicPartition nextPartition;
					while (iter.hasNext()) {
						nextPartition = iter.next();
						//校验此SubTask是否应该订阅此topic对应的此分区
						if (!setAndCheckDiscoveredPartition(nextPartition)) {
							iter.remove();
						}
					}
				}
				//返回SubTask应该订阅的topic以及对应分区
				return newDiscoveredPartitions;
			} catch (WakeupException e) {
				// the actual topic / partition metadata fetching methods
				// may be woken up midway; reset the wakeup flag and rethrow
				wakeup = false;
				throw e;
			}
		} else if (!closed && wakeup) {
			// may have been woken up before the method call
			wakeup = false;
			throw new WakeupException();
		} else {
			throw new ClosedException();
		}
	}

	@Override
	protected List getAllPartitionsForTopics(List topics) throws WakeupException, RuntimeException {
		final List partitions = new LinkedList<>();

		try {
			for (String topic : topics) {
				//获取topic对应的元数据信息
				final List kafkaPartitions = kafkaConsumer.partitionsFor(topic);

				if (kafkaPartitions == null) {
					throw new RuntimeException(String.format("Could not fetch partitions for %s. Make sure that the topic exists.", topic));
				}

				for (PartitionInfo partitionInfo : kafkaPartitions) {
					partitions.add(new KafkaTopicPartition(partitionInfo.topic(), partitionInfo.partition()));
				}
			}
		} catch (org.apache.kafka.common.errors.WakeupException e) {
			// rethrow our own wakeup exception
			throw new WakeupException();
		}

		return partitions;
	}


    public boolean setAndCheckDiscoveredPartition(KafkaTopicPartition partition) {
		if (isUndiscoveredPartition(partition)) {
			discoveredPartitions.add(partition);

			return KafkaTopicPartitionAssigner.assign(partition, numParallelSubtasks) == indexOfThisSubtask;
		}

		return false;
	}

	public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
		int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;

		// here, the assumption is that the id of Kafka partitions are always ascending
		// starting from 0, and therefore can be used directly as the offset clockwise from the start index
		return (startIndex + partition.getPartition()) % numParallelSubtasks;
	}



    //返回指定Kafka分区应该分配给的目标子任务的索引。
    //跨子任务均匀分布  分区是循环分布的
	public static int assign(KafkaTopicPartition partition, int numParallelSubtasks) {
		int startIndex = ((partition.getTopic().hashCode() * 31) & 0x7FFFFFFF) % numParallelSubtasks;

		// here, the assumption is that the id of Kafka partitions are always ascending
		// starting from 0, and therefore can be used directly as the offset clockwise from the start index
		return (startIndex + partition.getPartition()) % numParallelSubtasks;
	}

(6)restoredState的赋值 FlinkKafkaConsumerBase实现了CheckpointedFunction所以肯定会有snapshotState()和initializeState()方法此时直接看initializeState()

public abstract class FlinkKafkaConsumerBase extends RichParallelSourceFunction implements
		CheckpointListener,
		ResultTypeQueryable,
		CheckpointedFunction




	@Override
	public final void initializeState(FunctionInitializationContext context) throws Exception {

		OperatorStateStore stateStore = context.getOperatorStateStore();

		this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>(OFFSETS_STATE_NAME,
			createStateSerializer(getRuntimeContext().getExecutionConfig())));
        //当程序发生故障的时候值为true
		if (context.isRestored()) {
			restoredState = new TreeMap<>(new KafkaTopicPartition.Comparator());

			// populate actual holder for restored state
			for (Tuple2 kafkaOffset : unionOffsetStates.get()) {
				restoredState.put(kafkaOffset.f0, kafkaOffset.f1);
			}

			LOG.info("Consumer subtask {} restored state: {}.", getRuntimeContext().getIndexOfThisSubtask(), restoredState);
		} else {
			LOG.info("Consumer subtask {} has no restore state.", getRuntimeContext().getIndexOfThisSubtask());
		}
	}

Run方法

	@Override
	public void run(SourceContext sourceContext) throws Exception {
		//判断保存分区和读取起始偏移量的集合是否为空
		if (subscribedPartitionsToStartOffsets == null) {
			throw new Exception("The partitions were not set for the consumer");
		}

		// initialize commit metrics and default offset callback method
		//初始化提交指标和默认偏移量回调方法
		//记录Kafka offset成功提交和失败提交的数量
		//private transient Counter successfulCommits;  Counter for successful Kafka offset commits.
		this.successfulCommits = this.getRuntimeContext().getMetricGroup().counter(COMMITS_SUCCEEDED_METRICS_COUNTER);
		//private transient Counter failedCommits;  Counter for failed Kafka offset commits
		this.failedCommits =  this.getRuntimeContext().getMetricGroup().counter(COMMITS_FAILED_METRICS_COUNTER);
		//获取当前SubTask的索引
		final int subtaskIndex = this.getRuntimeContext().getIndexOfThisSubtask();
		//注册一个提交时的回调函数，提交成功时，提交成功计数器加一；提交失败时，提交失败计数器加一
		this.offsetCommitCallback = new KafkaCommitCallback() {
			@Override
			public void onSuccess() {
				successfulCommits.inc();
			}

			@Override
			public void onException(Throwable cause) {
				LOG.warn(String.format("Consumer subtask %d failed async Kafka commit.", subtaskIndex), cause);
				failedCommits.inc();
			}
		};

		// mark the subtask as temporarily idle if there are no initial seed partitions;
		// once this subtask discovers some partitions and starts collecting records, the subtask's
		// status will automatically be triggered back to be active.
		//判断subscribedPartitionsToStartOffsets集合是否为空。如果为空，标记数据源的状态为暂时空闲。
		// 多分区会有watermark的问题 watermark的值是分区中最小的watermark 标记空闲后忽略此分区
		if (subscribedPartitionsToStartOffsets.isEmpty()) {
			sourceContext.markAsTemporarilyIdle();
		}

		LOG.info("Consumer subtask {} creating fetcher with offsets {}.",
			getRuntimeContext().getIndexOfThisSubtask(), subscribedPartitionsToStartOffsets);
		// from this point forward:
		//   - 'snapshotState' will draw offsets from the fetcher,
		//     instead of being built from `subscribedPartitionsToStartOffsets`
		//   - 'notifyCheckpointComplete' will start to do work (i.e. commit offsets to
		//     Kafka through the fetcher, if configured to do so)
		//如果是快照状态则分区offsets直接从fetcher中获取
		//如果是通知检查点完成状态则通过fetcher提交offsets
		//创建一个KafkaFetcher,借助KafkaConsumer API从Kafka的broker拉取数据
		this.kafkaFetcher = createFetcher(
				sourceContext,
				subscribedPartitionsToStartOffsets,
				watermarkStrategy,
				(StreamingRuntimeContext) getRuntimeContext(),
				offsetCommitMode,
				getRuntimeContext().getMetricGroup().addGroup(KAFKA_CONSUMER_METRICS_GROUP),
				useMetrics);
		//如果是非运行 则直接返回
		if (!running) {
			return;
		}

		// depending on whether we were restored with the current state version (1.3),
		// remaining logic branches off into 2 paths:
		//  1) New state - partition discovery loop executed as separate thread, with this
		//                 thread running the main fetcher loop
		//  2) Old state - partition discovery is disabled and only the main fetcher loop is executed
		//根据分区发现间隔时间，来确定是否启动分区定时发现任务
		//如果没有配置分区定时发现时间间隔，则直接启动获取数据任务；否则，启动定期分区发现任务和数据获取任务
		if (discoveryIntervalMillis == PARTITION_DISCOVERY_DISABLED) {
			//开启循环拉取数据
			kafkaFetcher.runFetchLoop();
		} else {
			runWithPartitionDiscovery();
		}
	}

(1)createFetcher()在创建Fetcher对象时会通过offsetCommitMode的模式判断是否关闭offset的自动提交。

	@Override
	protected AbstractFetcher createFetcher(
		SourceContext sourceContext,
		Map assignedPartitionsWithInitialOffsets,
		SerializedValue> watermarkStrategy,
		StreamingRuntimeContext runtimeContext,
		OffsetCommitMode offsetCommitMode,
		MetricGroup consumerMetricGroup,
		boolean useMetrics) throws Exception {

		// make sure that auto commit is disabled when our offset commit mode is ON_CHECKPOINTS;
		// this overwrites whatever setting the user configured in the properties
		adjustAutoCommitConfig(properties, offsetCommitMode);

}
        //父类方法中
        //初始化反序列化器
		this.deserializer = deserializer;
        //消费者线程和任务线程之间的数据和异常的切换
		this.handover = new Handover();
        //运行KafkaConsumer并将记录批次传递给fetcher的线程
		this.consumerThread = new KafkaConsumerThread(
			LOG,
			handover,
			kafkaProperties,
			unassignedPartitionsQueue,
			getFetcherName() + " for " + taskNameWithSubtasks,
			pollTimeout,
			useMetrics,
			consumerMetricGroup,
			subtaskMetricGroup);
        //以批处理方式发出记录的收集器(bundle)
		this.kafkaCollector = new KafkaCollector();

（2）如果配置了分区发现器（默认是开启的）则会启动分区发现器线程

	private void runWithPartitionDiscovery() throws Exception {
		final AtomicReference discoveryLoopErrorRef = new AtomicReference<>();
		//创建分区发现的定时任务
		createAndStartDiscoveryLoop(discoveryLoopErrorRef);
		//开启循环拉取数据
		kafkaFetcher.runFetchLoop();

		// make sure that the partition discoverer is waked up so that
		// the discoveryLoopThread exits
		//确保分区发现器在分区发现循环线程启动期间一直处于唤醒状态
		partitionDiscoverer.wakeup();
		//等待发现分区线程执行完成
		joinDiscoveryLoopThread();

		// rethrow any fetcher errors
		final Exception discoveryLoopError = discoveryLoopErrorRef.get();
		if (discoveryLoopError != null) {
			throw new RuntimeException(discoveryLoopError);
		}
	}

	private void createAndStartDiscoveryLoop(AtomicReference discoveryLoopErrorRef) {
		discoveryLoopThread = new Thread(() -> {
			try {
				// --------------------- partition discovery loop ---------------------

				// throughout the loop, we always eagerly check if we are still running before
				// performing the next operation, so that we can escape the loop as soon as possible

				while (running) {
					if (LOG.isDebugEnabled()) {
						LOG.debug("Consumer subtask {} is trying to discover new partitions ...", getRuntimeContext().getIndexOfThisSubtask());
					}

					final List discoveredPartitions;
					try {
						//发现分区的执行过程
						//此子任务应订阅的已发现的新分区列表。  返回SubTask应该订阅的topic以及对应分区
						discoveredPartitions = partitionDiscoverer.discoverPartitions();
					} catch (AbstractPartitionDiscoverer.WakeupException | AbstractPartitionDiscoverer.ClosedException e) {
						// the partition discoverer may have been closed or woken up before or during the discovery;
						// this would only happen if the consumer was canceled; simply escape the loop
						break;
					}

					// no need to add the discovered partitions if we were closed during the meantime
					//如果在此期间关闭分区发现则不需要添加发现的分区
					if (running && !discoveredPartitions.isEmpty()) {
						//将发现的新分区添加到kafkaFetcher中
						kafkaFetcher.addDiscoveredPartitions(discoveredPartitions);
					}

					// do not waste any time sleeping if we're not running anymore
					if (running && discoveryIntervalMillis != 0) {
						try {
							Thread.sleep(discoveryIntervalMillis);
						} catch (InterruptedException iex) {
							// may be interrupted if the consumer was canceled midway; simply escape the loop
							break;
						}
					}
				}
			} catch (Exception e) {
				discoveryLoopErrorRef.set(e);
			} finally {
				// calling cancel will also let the fetcher loop escape
				// (if not running, cancel() was already called)
				if (running) {
					cancel();
				}
			}
		}, "Kafka Partition Discovery for " + getRuntimeContext().getTaskNameWithSubtasks());
		//启动分区发现定时任务
		discoveryLoopThread.start();
	}

（3）runFetchLoop()循环拉取数据

	@Override
	public void runFetchLoop() throws Exception {
		try {
			// kick off the actual Kafka consumer
			//开始kafka 实际的消费端 定期将消费到的数据转交给handover handover对象在createFetcher初始化的
			consumerThread.start();

			while (running) {
				// this blocks until we get the next records
				// it automatically re-throws exceptions encountered in the consumer thread
				//获取handover中的数据 在consumerThread线程没有将数据发送给handover时此方法会堵塞
				final ConsumerRecords records = handover.pollNext();

				// get the records for each topic partition
				//获取所有的分区记录
				for (KafkaTopicPartitionState partition : subscribedPartitionStates()) {
					//获取此分区的Records
					List> partitionRecords =
						records.records(partition.getKafkaPartitionHandle());

					partitionConsumerRecordsHandler(partitionRecords, partition);
				}
			}
		}
		finally {
			// this signals the consumer thread that no more work is to be done
			consumerThread.shutdown();
		}

		// on a clean exit, wait for the runner thread
		try {
			consumerThread.join();
		}
		catch (InterruptedException e) {
			// may be the result of a wake-up interruption after an exception.
			// we ignore this here and only restore the interruption state
			Thread.currentThread().interrupt();
		}
	}

（4）consumerThread.start() consumerThread线程的启动

	@Override
	public void run() {
		// early exit check
		if (!running) {
			return;
		}

		// this is the means to talk to FlinkKafkaConsumer's main thread
		//handover 这是与FlinkKafkaConsumer的main保持会话
		final Handover handover = this.handover;

		// This method initializes the KafkaConsumer and guarantees it is torn down properly.
		// This is important, because the consumer has multi-threading issues,
		// including concurrent 'close()' calls.
		try {
			this.consumer = getConsumer(kafkaProperties);
		}
		catch (Throwable t) {
			handover.reportError(t);
			return;
		}

		// from here on, the consumer is guaranteed to be closed properly
		try {
			// register Kafka's very own metrics in Flink's metric reporters
			if (useMetrics) {
				// register Kafka metrics to Flink
				Map metrics = consumer.metrics();
				if (metrics == null) {
					// MapR's Kafka implementation returns null here.
					log.info("Consumer implementation does not support metrics");
				} else {
					// we have Kafka metrics, register them
					for (Map.Entry metric: metrics.entrySet()) {
						consumerMetricGroup.gauge(metric.getKey().name(), new KafkaMetricWrapper(metric.getValue()));

						// TODO this metric is kept for compatibility purposes; should remove in the future
						subtaskMetricGroup.gauge(metric.getKey().name(), new KafkaMetricWrapper(metric.getValue()));
					}
				}
			}

			// early exit check
			if (!running) {
				return;
			}

			// the latest bulk of records. May carry across the loop if the thread is woken up
			// from blocking on the handover
			ConsumerRecords records = null;

			// reused variable to hold found unassigned new partitions.
			// found partitions are not carried across loops using this variable;
			// they are carried across via re-adding them to the unassigned partitions queue
			List> newPartitions;

			// main fetch loop
			while (running) {

				// check if there is something to commit
				if (!commitInProgress) {
					// get and reset the work-to-be committed, so we don't repeatedly commit the same
					final Tuple2, KafkaCommitCallback> commitOffsetsAndCallback =
							nextOffsetsToCommit.getAndSet(null);

					if (commitOffsetsAndCallback != null) {
						log.debug("Sending async offset commit request to Kafka broker");

						// also record that a commit is already in progress
						// the order here matters! first set the flag, then send the commit command.
						commitInProgress = true;
						consumer.commitAsync(commitOffsetsAndCallback.f0, new CommitCallback(commitOffsetsAndCallback.f1));
					}
				}

				try {
					if (hasAssignedPartitions) {
						newPartitions = unassignedPartitionsQueue.pollBatch();
					}
					else {
						// if no assigned partitions block until we get at least one
						// instead of hot spinning this loop. We rely on a fact that
						// unassignedPartitionsQueue will be closed on a shutdown, so
						// we don't block indefinitely
						newPartitions = unassignedPartitionsQueue.getBatchBlocking();
					}
					if (newPartitions != null) {
						reassignPartitions(newPartitions);
					}
				} catch (AbortedReassignmentException e) {
					continue;
				}

				if (!hasAssignedPartitions) {
					// Without assigned partitions KafkaConsumer.poll will throw an exception
					continue;
				}

				// get the next batch of records, unless we did not manage to hand the old batch over
				if (records == null) {
					try {
						records = consumer.poll(pollTimeout);
					}
					catch (WakeupException we) {
						continue;
					}
				}

				try {
					//handover线程获取到了consumer消费的数据然后将数据发送
					handover.produce(records);
					records = null;
				}
				catch (Handover.WakeupException e) {
					// fall through the loop
				}
			}
			// end main fetch loop
		}
		catch (Throwable t) {
			// let the main thread know and exit
			// it may be that this exception comes because the main thread closed the handover, in
			// which case the below reporting is irrelevant, but does not hurt either
			handover.reportError(t);
		}
		finally {
			// make sure the handover is closed if it is not already closed or has an error
			handover.close();

			// make sure the KafkaConsumer is closed
			try {
				consumer.close();
			}
			catch (Throwable t) {
				log.warn("Error while closing Kafka consumer", t);
			}
		}
	}

（5）partitionConsumerRecordsHandler()数据发送和收尾

	protected void partitionConsumerRecordsHandler(
			List> partitionRecords,
			KafkaTopicPartitionState partition) throws Exception {

		for (ConsumerRecord record : partitionRecords) {
			//反序列化record 将数据交给kafkaCollector 已备数据往下发送
			deserializer.deserialize(record, kafkaCollector);

			// emit the actual records. this also updates offset state atomically and emits
			// watermarks
			//发送数据 更新offset 生产timestamp和watermarks
			emitRecordsWithTimestamps(
				kafkaCollector.getRecords(),
				partition,
				record.offset(),
				record.timestamp());
			//如果数据源已经到末尾了(收到了流结束信号),停止fetcher循环
			if (kafkaCollector.isEndOfStreamSignalled()) {
				// end of stream signaled
				running = false;
				break;
			}
		}
	}

（6）emitRecordsWithTimestamps()发出一个附加时间戳的记录。

	protected void emitRecordsWithTimestamps(
			Queue records,
			KafkaTopicPartitionState partitionState,
			long offset,
			long kafkaEventTimestamp) {
		// emit the records, using the checkpoint lock to guarantee
		// atomicity of record emission and offset state update
		synchronized (checkpointLock) {
			T record;
			while ((record = records.poll()) != null) {
				long timestamp = partitionState.extractTimestamp(record, kafkaEventTimestamp);
				//发送数据
				sourceContext.collectWithTimestamp(record, timestamp);

				// this might emit a watermark, so do it after emitting the record
				//发送watermark
				partitionState.onEvent(record, timestamp);
			}
			//更新offset
			partitionState.setOffset(offset);
		}
	}

initializeState()方法

//访问操作符状态后端中的状态
private transient ListState> unionOffsetStates;
//如果消费者从检查点恢复状态，则恢复到的偏移量。
private transient volatile TreeMap restoredState;
	@Override
	public final void initializeState(FunctionInitializationContext context) throws Exception {

		OperatorStateStore stateStore = context.getOperatorStateStore();

		this.unionOffsetStates = stateStore.getUnionListState(new ListStateDescriptor<>(OFFSETS_STATE_NAME,
			createStateSerializer(getRuntimeContext().getExecutionConfig())));
		//判断是否从上一次快照送恢复的
		if (context.isRestored()) {
			//如果是从checkPoint恢复的则初始化restoredState 然后和open对应
			restoredState = new TreeMap<>(new KafkaTopicPartition.Comparator());

			// populate actual holder for restored state
			//将快照中的数据填充到restoredState中 和open对应
			for (Tuple2 kafkaOffset : unionOffsetStates.get()) {
				restoredState.put(kafkaOffset.f0, kafkaOffset.f1);
			}

			LOG.info("Consumer subtask {} restored state: {}.", getRuntimeContext().getIndexOfThisSubtask(), restoredState);
		} else {
			LOG.info("Consumer subtask {} has no restore state.", getRuntimeContext().getIndexOfThisSubtask());
		}
	}

snapshotState()

	/** Data for pending but uncommitted offsets. */
   //未完成checkPoint时未提交的偏移量的数据
	private final LinkedMap pendingOffsetsToCommit = new LinkedMap();

	@Override
	public final void snapshotState(FunctionSnapshotContext context) throws Exception {
		if (!running) {
			LOG.debug("snapshotState() called on closed source");
		} else {
			//清除状态中的数据 方便重新填充
			unionOffsetStates.clear();

			final AbstractFetcher fetcher = this.kafkaFetcher;
			if (fetcher == null) {
				// the fetcher has not yet been initialized, which means we need to return the
				// originally restored offsets or the assigned partitions
				//fetcher还没有初始化，这意味着我们需要返回最初恢复的偏移量或分配的分区  在run()方法中初始化的
				for (Map.Entry subscribedPartition : subscribedPartitionsToStartOffsets.entrySet()) {
					//将open方法中获取的数据放入
					unionOffsetStates.add(Tuple2.of(subscribedPartition.getKey(), subscribedPartition.getValue()));
				}

				if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) {
					// the map cannot be asynchronously updated, because only one checkpoint call can happen
					// on this function at a time: either snapshotState() or notifyCheckpointComplete()
					//保存checkPoint进行时的offset数据 restoredState不是状态对象是使用对象
					pendingOffsetsToCommit.put(context.getCheckpointId(), restoredState);
				}
			} else {
				//获取fetcher对象中的当前topic 分区对应的offset
				HashMap currentOffsets = fetcher.snapshotCurrentState();

				if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) {
					// the map cannot be asynchronously updated, because only one checkpoint call can happen
					// on this function at a time: either snapshotState() or notifyCheckpointComplete()
					//保存checkPoint进行时的offset数据
					pendingOffsetsToCommit.put(context.getCheckpointId(), currentOffsets);
				}

				for (Map.Entry kafkaTopicPartitionLongEntry : currentOffsets.entrySet()) {
					//将最新的topic 分区对应的offset放入状态中
					unionOffsetStates.add(
							Tuple2.of(kafkaTopicPartitionLongEntry.getKey(), kafkaTopicPartitionLongEntry.getValue()));
				}
			}

			if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) {
				// truncate the map of pending offsets to commit, to prevent infinite growth
				//保证因为checkPoint导致的topic 分区对应的offset数据不会过多
				while (pendingOffsetsToCommit.size() > MAX_NUM_PENDING_CHECKPOINTS) {
					pendingOffsetsToCommit.remove(0);
				}
			}
		}
	}

到此重要的一些方法都已经分析了。

你可能感兴趣的:(Flink)

flink作业访问zk出现acl报错问题分析 spring208208 大数据组件线上问题分析 flink zookeeper 大数据
#问题现象向yarn集群提交flink作业的时候会出现zkacl的异常经确认：1.zk相关acl密码没有更改过2.重新部署客户端配置后提交任务同样报错3.修改flink的zk目录，重启后可以正常运行任务(在zk重新生了新的znode节点)#问题分析1.首先确认是否是权限的问题，即程序中zk用户没有权限操作zk上的flink节点目录确认集群上zookeeper的flink的acl权限，确认为flin
Flink Cdc TiDB详解 24k小善 flink 大数据 java
1.什么是FlinkTiDBCDC？简单说就是用Flink实时抓取TiDB数据库的数据变化（比如新增、修改、删除），并将这些变化数据以流的形式处理，用于实时分析、同步到其他系统等场景。TiDB本身是分布式数据库，而Flink是流处理引擎，两者的结合适合需要高吞吐、低延迟的大规模数据处理场景[7][8]。2.底层原理TiDB侧：通过TiCDC组件（TiDB的变更数据捕获工具）捕获数据变更，类似MyS
Flink CDC 与 SeaTunnel CDC 简单对比窝窝和牛牛 flink 大数据 cdc SeaTunnel
FlinkCDC与SeaTunnelCDC简单对比CDC技术概述变更数据捕获（ChangeDataCapture，简称CDC）是一种用于捕获数据库中数据变更的技术，能够实时识别、捕获并输出数据库中的插入、更新和删除操作。CDC技术在现代数据架构中扮演着至关重要的角色，特别是在实时数据集成、数据同步和事件驱动架构等场景中。CDC的工作原理CDC主要通过以下几种方式捕获数据变更：基于日志的CDC：直接
Flink相关面试题努力的搬砖人. 面试 java 后端 flink
以下是150道ApacheFlink面试题及其详细回答，涵盖了Flink的基础知识、核心架构、API使用、性能调优等多个方面，每道题目都尽量详细且简单易懂：Flink基础概念类1.什么是ApacheFlink？ApacheFlink是一个开源的流处理和批处理框架，能够实现快速、可靠、可扩展的大数据处理。它既可以处理无界的数据流，也可以处理有界的数据批，提供了低延迟和高吞吐量的实时数据处理能力。Fl
Flink命令行启动Job任务平凡的运维之路 linux 程序人生
Flink非交互式运行Job任务Flink命令行启动Job任务具体命令flink参数说明-c,--class-d,--detached后台运行-p,--parallelism并行度[test@xxx~]$flinkrun-d-cclass_nameJob-p3./flink-statics-1.0.jar-zookeeper"10.130.41.51:2181,10.130.41.52:2181,
快速启动flink项目 for your wish flink java 大数据
按照这个步骤1分钟内创建完成idea-----File----new---Project------Maven----Createfromarchetype----AddArchetype弹出框：GroupId填org.apache.flinkArtifactId填flink-quickstart-javaVersion填1.14.0选中刚刚添加的Archetype，点Next填写你要创建的这个f
【Flink】flink启动任务，taskmanager.out 文件增涨非常快九师兄 flink 大数据
1.概述flink启动任务，taskmanager.out文件增涨非常快，这个文件大小怎么限定？测试了很多办法发现都不起作用这个问题可以试试：【Flink】Flink1.11.2onYARN滚动日志配置但是后面我发现不是这个导致的，是slf4j依赖冲突，jar包删除就可以了
IDEA本地启动flink 任务 Direction_Wind intellij-idea flink java
1pom中添加org.apache.flinkflink-clients_${scala.binary.version}${flink.version}org.apache.flinkflink-runtime-web_${scala.binary.version}${flink.version}2下载flink-dist包并3打印日志中搜索localhost可以找到flink的管理页面
Flink启动任务 swg321321 flink 大数据
Flink以本地运行作为解读例如：第一章Python机器学习入门之pandas的使用提示：写完文章后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录Flink前言StreamExecutionEnvironmentLocalExecutorMiniClusterStreamGraph二、使用步骤1.引入库2.读入数据总结前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发
FlinkCDC实战：将 MySQL 数据同步至 ES 小DuDu flink mysql
当前需要处理的业务场景:将订单表和相关联的表(比如:商品表、子订单表、物流信息表)组织成宽表,放入到ES中,加速订单数据的查询.同步数据到es.概述1.什么是CDC2.什么是FlinkCDC3.FlinkCDCConnectors和Flink的版本映射实战1.宽表查询1.1创建mysql表1.2启动Flink集群和FlinkSQLCLI1.3在FlinkSQLCLI中使用FlinkDDL创建表1.
Flink 通过 Chunjun Oracle LogMiner 实时读取 Oracle 变更日志并写入 Doris 的方案 roman_日积跬步-终至千里 #flink 实战 flink oracle 大数据
文章目录一、技术背景二、关键技术1、OracleLogMiner2、Chunjun的LogMiner关键流程3、修复ChunjunOracleLogMiner问题一、技术背景在大数据实时同步场景中，需要将Oracle数据库的变更数据（CDC）采集并写入ApacheDoris，以支持数据分析、BI报表、实时数据仓库等应用。本方案基于Flink+Chunjun，通过OracleLogMiner解析Re
数据中台（二）数据中台相关技术栈 Yuan_CSDF #数据中台
1.平台搭建1.1.Amabari+HDP1.2.CM+CDH2.相关的技术栈数据存储：HDFS，HBase，Kudu等数据计算：MapReduce,Spark,Flink交互式查询：Impala,Presto在线实时分析：ClickHouse，Kylin，Doris，Druid，Kudu等资源调度：YARN，Mesos，Kubernetes任务调度：Oozie，Azakaban，AirFlow，
Apache Doris整合Iceberg + Flink CDC构建实时湖仓体的联邦查询分析架构 MfvShell apache flink 架构 Flink
随着大数据技术的迅猛发展，构建实时湖仓体并进行联邦查询分析成为了许多企业的迫切需求。在这篇文章中，我们将探讨如何利用ApacheDoris整合Iceberg和FlinkCDC来构建这样一个架构，并提供相应的源代码示例。简介实时湖仓体是一种灵活、可扩展的数据架构，结合了数据湖和数据仓库的优势。ApacheDoris是一款开源的分布式SQL引擎，专注于实时分析和查询。Iceberg是一种开放式表格格式
flink从kafka读取数据写入clickhouse本地表的实现 Breatrice_li kafka flink 分布式大数据
实现功能因为直接写clickhouse的分布式表在数据量比较大的时候会有各种问题，所以做了一个flink读取kafka数据然后路由写入到相应的本地表节点，并且关于不同的表的配置信息可以随时更改并设置生效时间。实现流程首先从kafka将数据读取过来然后进行相应的处理及逻辑判断写入到对应的clickhouse表格中最后根据CDC读取来的配置信息进行相应节点的hash路由，直接写入本地表读取kafka数
demo flink写入kafka_Flink 写入数据到 Kafka ONES Piece demo flink写入kafka
Flink写入数据到Kafka前言通过Flink官网可以看到Flink里面就默认支持了不少sink，比如也支持Kafkasinkconnector(FlinkKafkaProducer)，那么这篇文章我们就来看看如何将数据写入到Kafka。准备Flink里面支持Kafka0.8、0.9、0.10、0.11.这里我们需要安装下Kafka，请对应添加对应的FlinkKafkaconnector依赖的版
Flink读取kafka数据并写入HDFS 王知无(import_bigdata) Flink系统性学习专栏 hdfs kafka flink
硬刚大数据系列文章链接：2021年从零到大数据专家的学习指南(全面升级版)2021年从零到大数据专家面试篇之Hadoop/HDFS/Yarn篇2021年从零到大数据专家面试篇之SparkSQL篇2021年从零到大数据专家面试篇之消息队列篇2021年从零到大数据专家面试篇之Spark篇2021年从零到大数据专家面试篇之Hbase篇
中电金信25/3/18面前笔试（需求分析岗+数据开发岗）苍曦需求分析前端 javascript
部分相同题目在第二次数据开发岗中不做解析，本次解析来源于豆包AI，正确与否有待商榷，本文只提供一个速查与知识点的补充。一、需求分析第1题，单选题,Hadoop的核心组件包括HDFS和以下哪个？MapReduceSparkStormFlink解析：Hadoop的核心组件是HDFS（分布式文件系统）和MapReduce（分布式计算框架）。Spark、Storm、Flink虽然也是大数据处理相关技术，但
Flink实践：通过Flink SQL进行SFTP文件的读写操作 kkk1622245 flink sql 大数据
在大数据处理领域，ApacheFlink出类拔萃，它是一个高性能、易扩展、用于处理有界和无界数据流的分布式处理引擎。FlinkSQL是ApacheFlink提供的一种声明式API，允许开发者以SQL的形式，轻松实现复杂的数据流和批处理分析。本文将重点探讨如何通过FlinkSQL来实现对SFTP文件的读写操作，这是在实际应用中经常遇到的一种场景。Flink与SFTP文件的读写在很多实际应用场景中，数
Flink流式计算系统 xyzkenan Flink 大数据大数据开发
本文将以这些概念为基础，逐一介绍Flink的发展背景、核心概念、时间推理与正确性工具、安装部署、客户端操作、编程API等内容，让开发人员对Flink有较为全面的认识并拥有一些基础操作与编程能力。一、发展背景1.1数据处理架构在流处理器出现之前，数据处理架构主要由批处理器组成，其是对无限数据的有限切分，具有吞吐量大、数据较为准确的特点。然而我们知道，批处理器在时间切分点附近仍然无法保证数据结果的真实
Flink 初体验：从 Hello World 到实时数据流处理小诸葛IT课堂 flink 大数据
在大数据处理领域，ApacheFlink以其卓越的流批一体化处理能力脱颖而出，成为众多企业构建实时数据应用的首选框架。本文将带领你迈出Flink学习的第一步，从基础概念入手，逐步引导你编写并运行第一个Flink程序——经典的WordCount，让你亲身感受Flink在实时数据流处理方面的强大魅力。一、Flink基础概念速览1.1什么是FlinkFlink是一个分布式流批一体化开源平台，旨在对无界和
时间语义与窗口操作：Flink 流式计算的核心逻辑小诸葛IT课堂 flink 大数据
在实时数据流处理中，时间是最为关键的维度之一。Flink通过灵活的时间语义和丰富的窗口类型，为开发者提供了强大的时间窗口分析能力。本文将深入解析Flink的时间语义机制，并通过实战案例演示如何利用窗口操作实现实时数据聚合。一、Flink时间语义详解1.1三种时间概念1.1.1EventTime（事件时间）定义：事件实际发生的时间，由事件本身携带的时间戳决定应用场景：需要准确反映事件真实顺序的场景（
Dinky × Jiron：打造高效智能的数据处理平台 jiron开源平台开发 flink 大数据 hive 数据仓库 kafka etl工程师 clickhouse
Dinky×Jiron：打造高效智能的数据处理平台JironGitHub地址https://github.com/642933588/jiron-cloudhttps://gitee.com/642933588/jiron-cloud将基于ApacheFlink的实时计算平台Dinky成功集成至Jiron数据开发平台，以进一步增强平台的数据处理能力，提升数据处理效率与灵活性，同时优化用户体验并降低
数据分析大数据面试题大杂烩01 爱学习的菜鸟罢了大数据 flink 大数据面试 hive hadoop kafka
互联网:通过埋点实时计算用户浏览频次用优惠券等措施吸引用户,通过历史信息用非智能学习的title方式构造用户画像(抖音,京东)电信,银行统计营收和针对用户的个人画像:处理大量非实时数据政府:健康码,扫码之后确诊,找出与确诊对象有关联的人订单订单表(除商品以外所有信息),商品详情表,通过搜集用户title进行定制化推荐点击流数据通过埋点进行用户点击行为分析FLINK一般用来做实时SPARK一般用来做
Different number of columns sunyaox flink flink异常
org.apache.flink.client.program.ProgramInvocationException:Themainmethodcausedanerror:Columntypesofqueryresultandsinkforregisteredtable‘photoTradeInfoHive.db_audit.ods_photo_trade’donotmatch.Cause:Dif
基于 Flink 的海量日志实时处理系统的实践 zhisheng_blog 大数据实时计算引擎 Flink 实战与性能优化
海量日志实时处理需求分析在11.5节中讲解了Flink如何实时处理异常的日志，在那节中对比分析了几种常用的日志采集工具。我们也知道通常在排查线上异常故障的时候，查询日志总是必不可缺的一部分，但是现在微服务架构下日志都被分散到不同的机器上，日志查询就会比较困难，所以统一的日志收集几乎也是每家公司必不可少的。据笔者调研，不少公司现在是有日志统一的收集，也会去做日志的实时ETL，利用一些主流的技术比如E
Java_实例变量和局部变量及this关键字详解 Matrix70 Java java 开发语言
最近得看看Java,想学一学Flink实时的东西了，当然Scala语法也有这样的规定，简单看一下这两个吧，都比较容易忽视实例变量和局部变量实例变量和局部变量是常见的两种变量类型，区别作用域：实例变量：实例变量属于类的实例，可以在整个类中被访问和使用。每个类的实例（对象）都有一份自己的实例变量副本。局部变量：局部变量只在声明它的方法或代码块中可见，超出该范围就无法访问。生存周期：实例变量：实例变量的
Flink架构组件JobManager和TaskManager m0_37651941 flink 架构大数据
JobManager和TaskManager交互通过Task对象ActorSystem是Akka最重要的一个组件。JobDispatcher负责接收Client提交的JobGraph对象，然后拆分成不同的作业，提交到TaskManager.这个过程会涉及到Job的分发。standlone模式和yarn模式的ResourceManager是不同的实现。TaskManager启动后会主动向JobMan
flink读kafka写入mysql_Flink 1.9 实战：使用 SQL 读取 Kafka 并写入 MySQL 苏远岫
上周六在深圳分享了《FlinkSQL1.9.0技术内幕和最佳实践》，会后许多小伙伴对最后演示环节的Demo代码非常感兴趣，迫不及待地想尝试下，所以写了这篇文章分享下这份代码。希望对于FlinkSQL的初学者能有所帮助。完整分享可以观看Meetup视频回顾：https://developer.aliyun.com/live/1416这份代码主要由两部分组成：1)能用来提交SQL文件的SqlSubmi
Flink 1.17.2 版本用 java 读取 starrocks 小强签名设计 flink java python
文章目录方法一：使用FlinkJDBC连接器（兼容MySQL协议）方法二：使用StarRocksFlinkConnector（推荐）在Flink1.17.2中使用Java读取StarRocks数据，可以通过JDBC连接器或StarRocks官方提供的FlinkConnector实现。以下是两种方法的详细步骤：方法一：使用FlinkJDBC连接器（兼容MySQL协议） StarRocks兼容M
Flink SQL 读取 Kafka 数据到 Mysql 实战小技工丨大数据技术学习 flink sql kafka
Flink1.9.2SQL读取Kafka数据到Mysql实战案例需求通过Flinksql使用DDL的方式，实现读取kafka用户行为数据，对数据进行实时处理，根据时间分组，求PV和UV，然后输出到mysql中。1、kafka中的消息的格式数据以JSON格式编码，格式如下：{"user_id":1101,"item_id":1875,"category_id":456876,"behavior":"
PHP，安卓，UI，java，linux视频教程合集 cocos2d-x小菜 java UI PHP android linux
╔-----------------------------------╗┆
各表中的列名必须唯一。在表 'dbo.XXX' 中多次指定了列名 'XXX'。 bozch .net .net mvc
在.net mvc5中，在执行某一操作的时候，出现了如下错误：各表中的列名必须唯一。在表 'dbo.XXX' 中多次指定了列名 'XXX'。经查询当前的操作与错误内容无关，经过对错误信息的排查发现，事故出现在数据库迁移上。回想过去：在迁移之前已经对数据库进行了添加字段操作，再次进行迁移插入XXX字段的时候，就会提示如上错误。 &
Java 对象大小的计算 e200702084 java
Java对象的大小如何计算一个对象的大小呢？
Mybatis Spring 171815164 mybatis
ApplicationContext ac = new ClassPathXmlApplicationContext("applicationContext.xml"); CustomerService userService = (CustomerService) ac.getBean("customerService"); Customer cust
JVM 不稳定参数 g21121 jvm
-XX 参数被称为不稳定参数，之所以这么叫是因为此类参数的设置很容易引起JVM 性能上的差异，使JVM 存在极大的不稳定性。当然这是在非合理设置的前提下，如果此类参数设置合理讲大大提高JVM 的性能及稳定性。可以说“不稳定参数”
用户自动登录网站永夜-极光用户
1.目标:实现用户登录后,再次登录就自动登录,无需用户名和密码 2.思路:将用户的信息保存为cookie 每次用户访问网站,通过filter拦截所有请求,在filter中读取所有的cookie,如果找到了保存登录信息的cookie,那么在cookie中读取登录信息,然后直接
centos7 安装后失去win7的引导记录程序员是怎么炼成的操作系统
1.使用root身份(必须)打开 /boot/grub2/grub.cfg 2.找到 ### BEGIN /etc/grub.d/30_os-prober ### 在后面添加 menuentry "Windows 7 (loader) (on /dev/sda1)" {
Oracle 10g 官方中文安装帮助文档以及Oracle官方中文教程文档下载 aijuans oracle
Oracle 10g 官方中文安装帮助文档下载：http://download.csdn.net/tag/Oracle%E4%B8%AD%E6%96%87API%EF%BC%8COracle%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3%EF%BC%8Coracle%E5%AD%A6%E4%B9%A0%E6%96%87%E6%A1%A3 Oracle 10g 官方中文教程
JavaEE开源快速开发平台G4Studio_V3.2发布了無為子 AOP oracle mysql javaee G4Studio
我非常高兴地宣布,今天我们最新的JavaEE开源快速开发平台G4Studio_V3.2版本已经正式发布。大家可以通过如下地址下载。访问G4Studio网站 http://www.g4it.org G4Studio_V3.2版本变更日志功能新增 (1).新增了系统右下角滑出提示窗口功能。 (2).新增了文件资源的Zip压缩和解压缩
Oracle常用的单行函数应用技巧总结百合不是茶日期函数转换函数(核心)数字函数通用函数(核心)字符函数
单行函数; 字符函数,数字函数,日期函数,转换函数(核心),通用函数(核心) 一:字符函数: .UPPER(字符串) 将字符串转为大写 .LOWER (字符串) 将字符串转为小写 .INITCAP(字符串) 将首字母大写 .LENGTH (字符串) 字符串的长度 .REPLACE(字符串,'A','_') 将字符串字符A转换成_
Mockito异常测试实例 bijian1013 java 单元测试 mockito
Mockito异常测试实例： package com.bijian.study; import static org.mockito.Mockito.mock; import static org.mockito.Mockito.when; import org.junit.Assert; import org.junit.Test; import org.mockito.
GA与量子恒道统计 Bill_chen JavaScript 浏览器百度 Google 防火墙
前一阵子，统计**网址时，Google Analytics（GA）和量子恒道统计（也称量子统计），数据有较大的偏差，仔细找相关资料研究了下，总结如下：为何GA和量子网站统计（量子统计前身为雅虎统计）结果不同？首先：没有一种网站统计工具能保证百分之百的准确出现该问题可能有以下几个原因：（1）不同的统计分析系统的算法机制不同；（2）统计代码放置的位置和前后
【Linux命令三】Top命令 bit1129 linux命令
Linux的Top命令类似于Windows的任务管理器，可以查看当前系统的运行情况，包括CPU、内存的使用情况等。如下是一个Top命令的执行结果： top - 21:22:04 up 1 day, 23:49, 1 user, load average: 1.10, 1.66, 1.99 Tasks: 202 total, 4 running, 198 sl
spring四种依赖注入方式白糖_ spring
平常的java开发中，程序员在某个类中需要依赖其它类的方法，则通常是new一个依赖类再调用类实例的方法，这种开发存在的问题是new的类实例不好统一管理，spring提出了依赖注入的思想，即依赖类不由程序员实例化，而是通过spring容器帮我们new指定实例并且将实例注入到需要该对象的类中。依赖注入的另一种说法是“控制反转”，通俗的理解是：平常我们new一个实例，这个实例的控制权是我
angular.injector boyitech AngularJS AngularJS API
angular.injector 描述: 创建一个injector对象, 调用injector对象的方法可以获得angular的service, 或者用来做依赖注入. 使用方法: angular.injector(modules, [strictDi]) 参数详解: Param Type Details mod
java-同步访问一个数组Integer[10]，生产者不断地往数组放入整数1000，数组满时等待；消费者不断地将数组里面的数置零，数组空时等待 bylijinnan Integer
public class PC { /** * 题目：生产者-消费者。 * 同步访问一个数组Integer[10]，生产者不断地往数组放入整数1000，数组满时等待；消费者不断地将数组里面的数置零，数组空时等待。 */ private static final Integer[] val=new Integer[10]; private static
使用Struts2.2.1配置 Chen.H apache spring Web xml struts
Struts2.2.1 需要如下 jar包: commons-fileupload-1.2.1.jar commons-io-1.3.2.jar commons-logging-1.0.4.jar freemarker-2.3.16.jar javassist-3.7.ga.jar ognl-3.0.jar spring.jar struts2-core-2.2.1.jar struts2-sp
[职业与教育]青春之歌 comsci 教育
每个人都有自己的青春之歌............但是我要说的却不是青春... 大家如果在自己的职业生涯没有给自己以后创业留一点点机会,仅仅凭学历和人脉关系,是难以在竞争激烈的市场中生存下去的.... &nbs
oracle连接(join)中使用using关键字 daizj JOIN oracle sql using
在oracle连接(join)中使用using关键字 34. View the Exhibit and examine the structure of the ORDERS and ORDER_ITEMS tables. Evaluate the following SQL statement: SELECT oi.order_id, product_id, order_date FRO
NIO示例 daysinsun nio
NIO服务端代码： public class NIOServer { private Selector selector; public void startServer(int port) throws IOException { ServerSocketChannel serverChannel = ServerSocketChannel.open(
C语言学习homework1 dcj3sjt126com c homework
0、课堂练习做完 1、使用sizeof计算出你所知道的所有的类型占用的空间。 int x; sizeof(x); sizeof(int); # include <stdio.h> int main(void) { int x1; char x2; double x3; float x4; printf(&quo
select in order by , mysql排序 dcj3sjt126com mysql
If i select like this: SELECT id FROM users WHERE id IN(3,4,8,1); This by default will select users in this order 1,3,4,8, I would like to select them in the same order that i put IN() values so:
页面校验-新建项目 fanxiaolong 页面校验
$(document).ready( function() { var flag = true; $('#changeform').submit(function() { var projectScValNull = true; var s =""; var parent_id = $("#parent_id").v
Ehcache（02）——ehcache.xml简介 234390216 ehcache ehcache.xml 简介
ehcache.xml简介 ehcache.xml文件是用来定义Ehcache的配置信息的，更准确的来说它是定义CacheManager的配置信息的。根据之前我们在《Ehcache简介》一文中对CacheManager的介绍我们知道一切Ehcache的应用都是从CacheManager开始的。在不指定配置信
junit 4.11中三个新功能 jackyrong java
junit 4.11中两个新增的功能，首先是注解中可以参数化，比如 import static org.junit.Assert.assertEquals; import java.util.Arrays; import org.junit.Test; import org.junit.runner.RunWith; import org.junit.runn
国外程序员爱用苹果Mac电脑的10大理由 php教程分享 windows PHP unix Microsoft perl
Mac 在国外很受欢迎，尤其是在设计/web开发/IT 人员圈子里。普通用户喜欢 Mac 可以理解，毕竟 Mac 设计美观，简单好用，没有病毒。那么为什么专业人士也对 Mac 情有独钟呢？从个人使用经验来看我想有下面几个原因： 1、Mac OS X 是基于 Unix 的这一点太重要了，尤其是对开发人员，至少对于我来说很重要，这意味着Unix 下一堆好用的工具都可以随手捡到。如果你是个 wi
位运算、异或的实际应用 wenjinglian 位运算
一．位操作基础，用一张表描述位操作符的应用规则并详细解释。二．常用位操作小技巧，有判断奇偶、交换两数、变换符号、求绝对值。三．位操作与空间压缩，针对筛素数进行空间压缩。 &n
weblogic部署项目出现的一些问题（持续补充中……） Everyday都不同 weblogic部署失败
好吧，weblogic的问题确实…… 问题一： org.springframework.beans.factory.BeanDefinitionStoreException: Failed to read candidate component class: URL [zip:E:/weblogic/user_projects/domains/base_domain/serve
tomcat7性能调优（01） toknowme tomcat7
Tomcat优化： 1、最大连接数最大线程等设置 <Connector port="8082" protocol="HTTP/1.1" useBodyEncodingForURI="t
PO VO DAO DTO BO TO概念与区别 xp9802 java DAO 设计模式 bean 领域模型
O/R Mapping 是 Object Relational Mapping（对象关系映射）的缩写。通俗点讲，就是将对象与关系数据库绑定，用对象来表示关系数据。在O/R Mapping的世界里，有两个基本的也是重要的东东需要了解，即VO，PO。它们的关系应该是相互独立的，一个VO可以只是PO的部分，也可以是多个PO构成，同样也可以等同于一个PO（指的是他们的属性）。这样，PO独立出来，数据持