Echo Lee.

Flink Job提交流程(Dispatcher之后)

文章目录

Flink Job提交流程(Dispatcher之后)

1 Dispatcher
2 ExecutionGraph

2.1 ExecutionJobVertex
2.2 ExecutionVertex
2.3 Execution
2.4 IntermediateResult
2.5 ExecutionEdge

3 Task调度

3.1 DataSourceTask
3.2 StreamTask

3.2.1 StreamOneInputProcessor
3.2.2 StreamTwoInputProcessor
3.2.3 StreamTwoInputSelectableProcessor

3.3 DataSinkTask

4 总结
参考文献

Flink Job提交流程(Dispatcher之后)

本篇主要介绍Dispatcher启动之后是如何将job提交并执行起来的，会先分析下Dispatcher这个类的作用，然后着重分析下ExecutionGraph的生成，最后介绍下Dispatcher之后的整个提交流程。

1 Dispatcher

Dispatcher服务提供REST接口来接收client的job提交，它负责启动JobManager和提交job，同时运行Web UI。Dispatcher的作用可在下图中体现：

Dispatcher是在AppMaster起来以后创建的，AppMaster的主类为YarnJobClusterEntrypoint(per-job模式)或YarnSessionClusterEntrypoint(session模式)，最后通过AbstractDispatcherResourceManagerComponentFactory的create方法来创建并启动

// AbstractDispatcherResourceManagerComponentFactory
public DispatcherResourceManagerComponent<T> create(
		Configuration configuration,
		...) throws Exception {
		
		//创建webMonitorEndpoint并启动
		//创建resourceManager并启动
		
		//创建dispatcher并启动
		//Per-Job模式创建MiniDispatcher，Session模式创建StandaloneDispatcher
		dispatcher = dispatcherFactory.createDispatcher(
			configuration,
			rpcService,
			highAvailabilityServices,
			resourceManagerGatewayRetriever,
			blobServer,
			heartbeatServices,
			jobManagerMetricGroup,
			metricRegistry.getMetricQueryServiceGatewayRpcAddress(),
			archivedExecutionGraphStore,
			fatalErrorHandler,
			historyServerArchivist);
		//其实就是启动了rpc endpoint
		dispatcher.start();
	}

2 ExecutionGraph

我们知道，一个Flink应用的提交必须经过三张graph的转换：

1.首先是通过API会生成transformations，通过transformations会生成StreamGraph

2.因为有些节点可以打包放在一起被JobManage安排调度，所以可将StreamGraph的某些StreamNode Chain在一起生成JobGraph，前两步转换都是在客户端完成

3.最后会将JobGraph转换为ExecutionGraph，相比JobGraph会增加并行度的概念，这一步是在Jobmanager里完成的。

接下来主要介绍下ExecutionGraph以及它是如何生成的。

ExecutionGraph是由ExecutionJobVertex、ExecutionVertex以及Execution组成的

2.1 ExecutionJobVertex

ExecutionJobVertex一一对应JobGraph中的JobVertex

2.2 ExecutionVertex

一个ExecutionJobVertex对应n个ExecutionVertex，其中n就是算子的并行度。ExecutionVertex就是并行任务的一个子任务

2.3 Execution

Execution 是对 ExecutionVertex 的一次执行，通过 ExecutionAttemptId 来唯一标识。

2.4 IntermediateResult

在 JobGraph 中用 IntermediateDataSet 表示 JobVertex 的对外输出，一个 JobGraph 可能有 n(n >=0) 个输出。在 ExecutionGraph 中，与此对应的就是 IntermediateResult。每一个 IntermediateResult 就有 numParallelProducers(并行度) 个生产者，每个生产者的在相应的 IntermediateResult 上的输出对应一个 IntermediateResultPartition。IntermediateResultPartition 表示的是 ExecutionVertex 的一个输出分区

2.5 ExecutionEdge

ExecutionEdge 表示 ExecutionVertex 的输入，通过 ExecutionEdge 将 ExecutionVertex 和 IntermediateResultPartition 连接起来，进而在不同的 ExecutionVertex 之间建立联系。

下图是以上核心概念的关系图：

下面来介绍ExecutionGraph是如何构建出来的

1.构建JobInformation

2.构建ExecutionGraph

3.将JobGraph进行拓扑排序,获取sortedTopology顶点集合

4.构建ExecutionJobVertex，连接IntermediateResultPartition和ExecutionVertex

5.checkpointing、metrics相关设置

6.返回ExecutionGraph

// ExecutionGraphBuilder
	public static ExecutionGraph buildGraph(
		@Nullable ExecutionGraph prior,
		JobGraph jobGraph,
		...) throws JobExecutionException, JobException {
		// 构建JobInformation
		
		// 构建ExecutionGraph
		
		// 将JobGraph进行拓扑排序,获取sortedTopology顶点集合
		List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
		
		executionGraph.attachJobGraph(sortedTopology);
		
		// checkpointing相关设置
		
		// metrics相关设置

		return executionGraph;
	}

//ExecutionGraph
	public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
		for (JobVertex jobVertex : topologiallySorted) {
			// 构建ExecutionJobVertex
			ExecutionJobVertex ejv = new ExecutionJobVertex(
					this,
					jobVertex,
					1,
					maxPriorAttemptsHistoryLength,
					rpcTimeout,
					globalModVersion,
					createTimestamp);
			// 连接IntermediateResultPartition和ExecutionVertex
			ejv.connectToPredecessors(this.intermediateResults);
	}

// ExecutionJobVertex
	public void connectToPredecessors(Map<IntermediateDataSetID, IntermediateResult> intermediateDataSets) throws JobException {
		List<JobEdge> inputs = jobVertex.getInputs();
		
		for (int num = 0; num < inputs.size(); num++) {
			JobEdge edge = inputs.get(num);
			IntermediateResult ires = intermediateDataSets.get(edge.getSourceId());
			this.inputs.add(ires);
			int consumerIndex = ires.registerConsumer();
			
			for (int i = 0; i < parallelism; i++) {
				ExecutionVertex ev = taskVertices[i];
				ev.connectSource(num, ires, edge, consumerIndex);
			}
		}
	}

// ExecutionVertex
	public void connectSource(int inputNumber, IntermediateResult source, JobEdge edge, int consumerNumber) {

		final DistributionPattern pattern = edge.getDistributionPattern();
		final IntermediateResultPartition[] sourcePartitions = source.getPartitions();

		ExecutionEdge[] edges;

		switch (pattern) {
			// 下游 JobVertex 的输入 partition 算法，如果是 forward 或 rescale 的话为 POINTWISE
			case POINTWISE:
				edges = connectPointwise(sourcePartitions, inputNumber);
				break;
			// 每一个并行的ExecutionVertex节点都会链接到源节点产生的所有中间结果IntermediateResultPartition
			case ALL_TO_ALL:
				edges = connectAllToAll(sourcePartitions, inputNumber);
				break;

			default:
				throw new RuntimeException("Unrecognized distribution pattern.");

		}

		inputEdges[inputNumber] = edges;
		for (ExecutionEdge ee : edges) {
			ee.getSource().addConsumer(ee, consumerNumber);
		}
	}


	private ExecutionEdge[] connectPointwise(IntermediateResultPartition[] sourcePartitions, int inputNumber) {
		final int numSources = sourcePartitions.length;
		final int parallelism = getTotalNumberOfParallelSubtasks();

		// caseA 一对一进行连接
		if (numSources == parallelism) {
			return new ExecutionEdge[] { new ExecutionEdge(sourcePartitions[subTaskIndex], this, inputNumber) };
		}
		// caseB 一对多进行连接
		else if (numSources < parallelism) {

			int sourcePartition;

			if (parallelism % numSources == 0) {
				int factor = parallelism / numSources;
				sourcePartition = subTaskIndex / factor;
			}
			else {
				float factor = ((float) parallelism) / numSources;
				sourcePartition = (int) (subTaskIndex / factor);
			}

			return new ExecutionEdge[] { new ExecutionEdge(sourcePartitions[sourcePartition], this, inputNumber) };
		}
		// caseC 多对一进行连接
		else {
			if (numSources % parallelism == 0) {
				int factor = numSources / parallelism;
				int startIndex = subTaskIndex * factor;

				ExecutionEdge[] edges = new ExecutionEdge[factor];
				for (int i = 0; i < factor; i++) {
					edges[i] = new ExecutionEdge(sourcePartitions[startIndex + i], this, inputNumber);
				}
				return edges;
			}
			else {
				float factor = ((float) numSources) / parallelism;

				int start = (int) (subTaskIndex * factor);
				int end = (subTaskIndex == getTotalNumberOfParallelSubtasks() - 1) ?
						sourcePartitions.length :
						(int) ((subTaskIndex + 1) * factor);

				ExecutionEdge[] edges = new ExecutionEdge[end - start];
				for (int i = 0; i < edges.length; i++) {
					edges[i] = new ExecutionEdge(sourcePartitions[start + i], this, inputNumber);
				}

				return edges;
			}
		}
	}


	private ExecutionEdge[] connectAllToAll(IntermediateResultPartition[] sourcePartitions, int inputNumber) {
		ExecutionEdge[] edges = new ExecutionEdge[sourcePartitions.length];

		for (int i = 0; i < sourcePartitions.length; i++) {
			IntermediateResultPartition irp = sourcePartitions[i];
			edges[i] = new ExecutionEdge(irp, this, inputNumber);
		}

		return edges;
	}

ALL_TO_ALL模式，相当于shuffle:

POINTWISE模式:
(1) 源的并行度和目标并行度相等：

(2) 源的并行度小于目标并行度：

(3) 源的并行度大于目标并行度：

3 Task调度

从Client到Dispatcher，然后从Dispatcher到JobMaster，只是透传了JobGraph，同时起了一些服务，其实Task的调度关键是从JobMaster的startScheduling方法开始的，下面就从这里开始分析：

// JobMaster
	private void startScheduling() {
		checkState(jobStatusListener == null);
		jobStatusListener = new JobManagerJobStatusListener();
		schedulerNG.registerJobStatusListener(jobStatusListener);

		schedulerNG.startScheduling();
	}

// LegacyScheduler
	public void startScheduling() {
		executionGraph.scheduleForExecution();
	}

其中executionGraph的创建就是在LegacyScheduler的构造方法中完成的，最终的构建方法在ExecutionGraph已经介绍过。

// LegacyScheduler
	public LegacyScheduler(
			final Logger log,
			final JobGraph jobGraph,
			...) throws Exception {
		// ...
		this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup, checkNotNull(shuffleMaster), checkNotNull(partitionTracker));
	}

	private ExecutionGraph createAndRestoreExecutionGraph(
			JobManagerJobMetricGroup currentJobManagerJobMetricGroup,
			ShuffleMaster<?> shuffleMaster,
			PartitionTracker partitionTracker) throws Exception {
		ExecutionGraph newExecutionGraph = createExecutionGraph(currentJobManagerJobMetricGroup, shuffleMaster, partitionTracker);
		// ...
		return newExecutionGraph;
	}

	private ExecutionGraph createExecutionGraph(
			JobManagerJobMetricGroup currentJobManagerJobMetricGroup,
			ShuffleMaster<?> shuffleMaster,
			final PartitionTracker partitionTracker) throws JobExecutionException, JobException {
		return ExecutionGraphBuilder.buildGraph(
			null,
			jobGraph,
			...);
	}

下面就来看看ExecutionGraph的scheduleForExecution方法

// ExecutionGraph
	public void scheduleForExecution() throws JobException {
		final long currentGlobalModVersion = globalModVersion;
		if (transitionState(JobStatus.CREATED, JobStatus.RUNNING)) {
			final CompletableFuture<Void> newSchedulingFuture = SchedulingUtils.schedule(
				scheduleMode,
				getAllExecutionVertices(),
				this);
		}
		else {
			throw new IllegalStateException("Job may only be scheduled from state " + JobStatus.CREATED);
		}
	}

接着会调用SchedulingUtils的schedule方法，根据scheduleMode来调度批作业或流作业

scheduleMode	描述
LAZY_FROM_SOURCES/LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST	上游task准备好之后再调度下游task，适用于批作业
EAGER	所有task一起被调度起来，适用于流作业

// SchedulingUtils
	public static CompletableFuture<Void> schedule(
			ScheduleMode scheduleMode,
			final Iterable<ExecutionVertex> vertices,
			final ExecutionGraph executionGraph) {

		switch (scheduleMode) {
			// 上游task准备好之后再调度下游task，适用于批任务
			case LAZY_FROM_SOURCES:
			case LAZY_FROM_SOURCES_WITH_BATCH_SLOT_REQUEST:
				return scheduleLazy(vertices, executionGraph);
			// 所有task一起被调度起来，适用于流任务
			case EAGER:
				return scheduleEager(vertices, executionGraph);

			default:
				// IllegalStateException
		}
	}

这里我们主要分析scheduleMode = EAGER，即流作业的场景：

// SchedulingUtils
	public static CompletableFuture<Void> scheduleEager(
			final Iterable<ExecutionVertex> vertices,
			final ExecutionGraph executionGraph) {
		// 遍历vertices并申请slot

		//调度execution
		return allAllocationsFuture.thenAccept(
			(Collection<Execution> executionsToDeploy) -> {
				for (Execution execution : executionsToDeploy) {
					try {
						execution.deploy();
					} catch (Throwable t) {
						// CompletionException
					}
				}
			})
			// 异常处理
	}

// Execution
	public void deploy() throws JobException {
		final LogicalSlot slot  = assignedResource;

		ExecutionState previous = this.state;
		// 状态必须是SCHEDULED或CREATED
		if (previous == SCHEDULED || previous == CREATED) {
			// 将状态置为DEPLOYING
			if (!transitionState(previous, DEPLOYING)) {
				// IllegalStateException
			}
		}
		else {
			// IllegalStateException
		}

		try {
			// 构造TaskDeploymentDescriptor
			final TaskDeploymentDescriptor deployment = TaskDeploymentDescriptorFactory
				.fromExecutionVertex(vertex, attemptNumber)
				.createDeploymentDescriptor(
					slot.getAllocationId(),
					slot.getPhysicalSlotNumber(),
					taskRestore,
					producedPartitions.values());
			taskRestore = null;
			final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

			final ComponentMainThreadExecutor jobMasterMainThreadExecutor =
				vertex.getExecutionGraph().getJobMasterMainThreadExecutor();
			CompletableFuture.supplyAsync(() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor)
				// maybe markFailed
		}
		catch (Throwable t) {
			// markFailed
		}
	}

通过RPC调用TaskExecutor的submitTask方法来提交Task

// RpcTaskManagerGateway
	public CompletableFuture<Acknowledge> submitTask(TaskDeploymentDescriptor tdd, Time timeout) {
		return taskExecutorGateway.submitTask(tdd, jobMasterId, timeout);
	}

(1) 加载jobInformation和taskInformation文件，初始化jobInformation和taskInformation

(2) 构造Task

(3) 启动Task线程

// TaskExecutor
	public CompletableFuture<Acknowledge> submitTask(
			TaskDeploymentDescriptor tdd,
			JobMasterId jobMasterId,
			Time timeout) {

		try {
			final JobID jobId = tdd.getJobId();
			final JobManagerConnection jobManagerConnection = jobManagerTable.get(jobId);

			try {
				// 加载jobInformation和taskInformation文件
				tdd.loadBigData(blobCacheService.getPermanentBlobService());
			} catch (IOException | ClassNotFoundException e) {
				// TaskSubmissionException
			}

			final JobInformation jobInformation;
			final TaskInformation taskInformation;
			try {
				jobInformation = tdd.getSerializedJobInformation().deserializeValue(getClass().getClassLoader());
				taskInformation = tdd.getSerializedTaskInformation().deserializeValue(getClass().getClassLoader());
			} catch (IOException | ClassNotFoundException e) {
				//TaskSubmissionException
			}

			// 加入TaskMetricGroup

			// 构造RpcInputSplitProvider

			TaskManagerActions taskManagerActions = jobManagerConnection.getTaskManagerActions();
			CheckpointResponder checkpointResponder = jobManagerConnection.getCheckpointResponder();
			GlobalAggregateManager aggregateManager = jobManagerConnection.getGlobalAggregateManager();

			LibraryCacheManager libraryCache = jobManagerConnection.getLibraryCacheManager();
			ResultPartitionConsumableNotifier resultPartitionConsumableNotifier = jobManagerConnection.getResultPartitionConsumableNotifier();
			PartitionProducerStateChecker partitionStateChecker = jobManagerConnection.getPartitionStateChecker();

			// 构造TaskStateManager
			
			// 构造Task
			Task task = new Task(
				jobInformation,
				taskInformation,
				tdd.getExecutionAttemptId(),
				tdd.getAllocationId(),
				tdd.getSubtaskIndex(),
				tdd.getAttemptNumber(),
				tdd.getProducedPartitions(),
				tdd.getInputGates(),
				tdd.getTargetSlotNumber(),
				taskExecutorServices.getMemoryManager(),
				taskExecutorServices.getIOManager(),
				taskExecutorServices.getShuffleEnvironment(),
				taskExecutorServices.getKvStateService(),
				taskExecutorServices.getBroadcastVariableManager(),
				taskExecutorServices.getTaskEventDispatcher(),
				taskStateManager,
				taskManagerActions,
				inputSplitProvider,
				checkpointResponder,
				aggregateManager,
				blobCacheService,
				libraryCache,
				fileCache,
				taskManagerConfiguration,
				taskMetricGroup,
				resultPartitionConsumableNotifier,
				partitionStateChecker,
				getRpcService().getExecutor());

			boolean taskAdded;

			try {
				// 加入taskSlotTable
				taskAdded = taskSlotTable.addTask(task);
			} catch (SlotNotFoundException | SlotNotActiveException e) {
				// TaskSubmissionException
			}

			if (taskAdded) {
				// 启动Task线程
				task.startTaskThread();

				setupResultPartitionBookkeeping(
					tdd.getJobId(),
					tdd.getProducedPartitions(),
					task.getTerminationFuture());
				return CompletableFuture.completedFuture(Acknowledge.get());
			} else {
				// TaskSubmissionException
			}
		} catch (TaskSubmissionException e) {
			return FutureUtils.completedExceptionally(e);
		}
	}

Task是执行在TaskExecutor进程里的一个线程，下面来看看其run方法

(1) 检测当前状态，正常情况为CREATED，如果是FAILED或CANCELING直接返回，其余状态将抛异常

(2) 读取DistributedCache文件

(3) 启动ResultPartitionWriter和InputGate

(4) 向taskEventDispatcher注册partitionWriter

(5) 根据nameOfInvokableClass加载对应的类并实例化

(6) 将状态置为RUNNING并执行invoke方法

附上一个flink任务所有可能的状态转换图：

// Task
	public void run() {
		doRun();
	}

	private void doRun() {
		// 循环判断当前状态，正常为CREATED
		while (true) {
			ExecutionState current = this.executionState;
			if (current == ExecutionState.CREATED) {
			// 如果当前状态为CREATED，转换为DEPLOYING，并跳出循环
				if (transitionState(ExecutionState.CREATED, ExecutionState.DEPLOYING)) {
					break;
				}
			}
			// 如果当前状态为FAILED，调用notifyFinalState并return
			else if (current == ExecutionState.FAILED) {
				notifyFinalState();
				return;
			}
			// 如果当前状态为CANCELING，转换为CANCELED，并调用notifyFinalState后return
			else if (current == ExecutionState.CANCELING) {
				if (transitionState(ExecutionState.CANCELING, ExecutionState.CANCELED)) {
					notifyFinalState();
					return;
				}
			}
			else {
				// IllegalStateException
			}
		}

		Map<String, Future<Path>> distributedCacheEntries = new HashMap<>();
		AbstractInvokable invokable = null;

		try {
			FileSystemSafetyNet.initializeSafetyNetForThread();

			blobService.getPermanentBlobService().registerJob(jobId);

			// 获取executionConfig
			userCodeClassLoader = createUserCodeClassloader();
			final ExecutionConfig executionConfig = serializedExecutionConfig.deserializeValue(userCodeClassLoader);

			// 用ExecutionConfig重新赋值taskCancellationInterval

			// 用ExecutionConfig重新赋值taskCancellationTimeout

			// 启动ResultPartitionWriter和InputGate
			setupPartitionsAndGates(consumableNotifyingPartitionWriters, inputGates);

			//向taskEventDispatcher注册partitionWriter
			for (ResultPartitionWriter partitionWriter : consumableNotifyingPartitionWriters) {
				taskEventDispatcher.registerPartition(partitionWriter.getPartitionId());
			}

			try {
				for (Map.Entry<String, DistributedCache.DistributedCacheEntry> entry :
						DistributedCache.readFileInfoFromConfig(jobConfiguration)) {
					Future<Path> cp = fileCache.createTmpFile(entry.getKey(), entry.getValue(), jobId, executionId);
					distributedCacheEntries.put(entry.getKey(), cp);
				}
			}
			catch (Exception e) {
				// Exception
			}

			TaskKvStateRegistry kvStateRegistry = kvStateService.createKvStateTaskRegistry(jobId, getJobVertexId());

			Environment env = new RuntimeEnvironment(
				jobId,
				vertexId,
				executionId,
				executionConfig,
				taskInfo,
				jobConfiguration,
				taskConfiguration,
				userCodeClassLoader,
				memoryManager,
				ioManager,
				broadcastVariableManager,
				taskStateManager,
				aggregateManager,
				accumulatorRegistry,
				kvStateRegistry,
				inputSplitProvider,
				distributedCacheEntries,
				consumableNotifyingPartitionWriters,
				inputGates,
				taskEventDispatcher,
				checkpointResponder,
				taskManagerConfig,
				metrics,
				this);

			executingThread.setContextClassLoader(userCodeClassLoader);
			// 根据nameOfInvokableClass加载对应的类并实例化
			invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env);

			this.invokable = invokable;
			// 将状态由DEPLOYING转换为RUNNING
			if (!transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)) {
				// CancelTaskException
			}
			// 更新TaskExecution状态为RUNNING
			taskManagerActions.updateTaskExecutionState(new TaskExecutionState(jobId, executionId, ExecutionState.RUNNING));
			// 设置classLoader
			executingThread.setContextClassLoader(userCodeClassLoader);
			// 真正执行Task
			invokable.invoke();

			// 结束partitionWriter
			for (ResultPartitionWriter partitionWriter : consumableNotifyingPartitionWriters) {
				if (partitionWriter != null) {
					partitionWriter.finish();
				}
			}

			//将状态标记为FINISHED
			if (!transitionState(ExecutionState.RUNNING, ExecutionState.FINISHED)) {
				throw new CancelTaskException();
			}
		}
		catch (Throwable t) {
			// 异常处理，状态转换
		}
		finally {
			try {
				// invokable置为null
				this.invokable = null;
				// 释放资源
			}
			catch (Throwable t) {
				// notifyFatalError
			}
		}
	}

invokable.invoke()将根据nameOfInvokableClass的不同调度不同的任务，包括批任务、Source任务、Sink任务、流任务。下面我们主要对三种流任务做下介绍，批任务暂时不讨论。

3.1 DataSourceTask

DataSourceTask是数据源对应的Task，比如Kafka Source、File Source等

初始化format、output
获取序列化对象
获取splits
循环订阅split并读取数据发往下游

// DataSourceTask
private List<RecordWriter<?>> eventualOutputs;
	// 收集数据发往下游
	private Collector<OT> output;
	// 格式化实例
	private InputFormat<OT, InputSplit> format;
	// 类型序列化工厂
	private TypeSerializerFactory<OT> serializerFactory;
	// Task配置
	private TaskConfig config;
	// chain在一起的task
	private ArrayList<ChainedDriver<?, ?>> chainedTasks;
	// 退出标志
	private volatile boolean taskCanceled = false;

	public void invoke() throws Exception {
		// 初始化format
		initInputFormat();
		try {
			// 初始化output
			initOutputs(getUserCodeClassLoader());
		} catch (Exception ex) {
			// RuntimeException
		}
		// 创建运行时上下文
		RuntimeContext ctx = createRuntimeContext();
		
		// metrics
		
		if (RichInputFormat.class.isAssignableFrom(this.format.getClass())) {
			((RichInputFormat) this.format).setRuntimeContext(ctx);
			((RichInputFormat) this.format).openInputFormat();
		}

		ExecutionConfig executionConfig = getExecutionConfig();

		boolean objectReuseEnabled = executionConfig.isObjectReuseEnabled();
		// 获取序列化对象
		final TypeSerializer<OT> serializer = this.serializerFactory.getSerializer();
		
		try {
			BatchTask.openChainedTasks(this.chainedTasks, this);
			
			// 获取splits
			final Iterator<InputSplit> splitIterator = getInputSplits();
			
			// 循环订阅split
			while (!this.taskCanceled && splitIterator.hasNext())
			{
				// get start and end
				final InputSplit split = splitIterator.next();
				final InputFormat<OT, InputSplit> format = this.format;
				format.open(split);
				try {
					final Collector<OT> output = new CountingCollector<>(this.output, numRecordsOut);
					if (objectReuseEnabled) {
						OT reuse = serializer.createInstance();
						// as long as there is data to read
						while (!this.taskCanceled && !format.reachedEnd()) {
							OT returned;
							if ((returned = format.nextRecord(reuse)) != null) {
								// 读取数据并发往下游
								output.collect(returned);
							}
						}
					} else {
						// as long as there is data to read
						while (!this.taskCanceled && !format.reachedEnd()) {
							OT returned;
							if ((returned = format.nextRecord(serializer.createInstance())) != null) {
								// 读取数据并发往下游
								output.collect(returned);
							}
						}
					}
				} finally {
					format.close();
				}
				completedSplitsCounter.inc();
			} // end for all input splits
			this.output.close();
			BatchTask.closeChainedTasks(this.chainedTasks, this);
		}
		catch (Exception ex) {
			// exception
		} finally {
			BatchTask.clearWriters(eventualOutputs);
			if (this.format != null && RichInputFormat.class.isAssignableFrom(this.format.getClass())) {
				((RichInputFormat) this.format).closeInputFormat();
			}
		}
	}

3.2 StreamTask

StreamTask是除了source和sink以外中间处理对应的task

checkpoint相关：创建执行异步checkpoint的线程池、创建stateBackend、创建checkpointStorage等
初始化timerService，初始化operatorChain，并获取第一个Operator
task特殊的初始化
初始化state、open所有operator
执行task，处理record并发往下游
关闭和清理操作

// StreamTask
	public final void invoke() throws Exception {
		boolean disposed = false;
		try {
			// checkpoint相关：创建执行异步checkpoint的线程池、创建stateBackend、创建checkpointStorage等

			// 初始化timerService
			
			// 初始化operatorChain，并获取第一个Operator
			operatorChain = new OperatorChain<>(this, recordWriters);
			headOperator = operatorChain.getHeadOperator();

			// task特殊的初始化
			init();
			
			synchronized (lock) {
				// 初始化state、open所有operator
				initializeState();
				openAllOperators();
			}
			// 设置running标志位true
			isRunning = true;
			// 执行task
			run();

			// 标记isRunning为false
			// 关闭和释放operator
			// 公共清理
			// task特殊的清理
		}
		finally {
			// 标记isRunning为false
			// 关闭和释放operator
			// 清理
		}
	}

	private void run() throws Exception {
		final ActionContext actionContext = new ActionContext();
		while (true) {
			if (mailbox.hasMail()) {
				Optional<Runnable> maybeLetter;
				while ((maybeLetter = mailbox.tryTakeMail()).isPresent()) {
					Runnable letter = maybeLetter.get();
					if (letter == POISON_LETTER) {
						return;
					}
					letter.run();
				}
			}
			// 处理数据
			performDefaultAction(actionContext);
		}
	}

	protected void performDefaultAction(ActionContext context) throws Exception {
		// 调用inputProcessor的processInput来处理实际数据
		if (!inputProcessor.processInput()) {
			context.allActionsCompleted();
		}
	}

3.2.1 StreamOneInputProcessor

StreamOneInputProcessor是处理只有一个输入的处理器，对应OneInputStreamOperator，比如sort、project、map等operator。
StreamOneInputProcessor通过TwoInputStreamOperator的process方法来处理元素，包括Watermark、StreamStatus、LatencyMarker以及真正的Record数据。

3.2.2 StreamTwoInputProcessor

StreamTwoInputProcessor是处理有两个输入的处理器，对应TwoInputStreamOperator，比如join。
TwoInputStreamOperator里的process方法都对应两个，StreamTwoInputProcessor根据inputChannel index来选择TwoInputStreamOperator的对应的process方法来处理元素，包括Watermark、StreamStatus、LatencyMarker以及真正的Record数据。

3.2.3 StreamTwoInputSelectableProcessor

StreamTwoInputSelectableProcessor类似StreamTwoInputProcessor，但在选择元素处理上有差别，它会公平地选择两个输入之一进行读取当inputMask包括两个输入并且两个输入均可用，会选择其中之一。否则，选择可用的一个输入，或者等待其中一个输入可用。

3.3 DataSinkTask

DataSinkTask是输出的task，比如Kafka Sink、File Sink等

初始化outputFormat、初始化输入inputReaders
根据inputLocalStrategy初始化MutableObjectIterator，inputLocalStrategy可为NONE、SORT（排序）
open OutputFormat
写出record

// DataSinkTask
public void invoke() throws Exception {
		// 初始化outputFormat
		initOutputFormat();

		try {
			// 初始化输入inputReaders
			initInputReaders();
		} catch (Exception e) {
			// RuntimeException
		}

		RuntimeContext ctx = createRuntimeContext();
		
		if(RichOutputFormat.class.isAssignableFrom(this.format.getClass())){
			((RichOutputFormat) this.format).setRuntimeContext(ctx);
		}

		ExecutionConfig executionConfig = getExecutionConfig();

		boolean objectReuseEnabled = executionConfig.isObjectReuseEnabled();
		
		try {
			MutableObjectIterator<IT> input1;
			switch (this.config.getInputLocalStrategy(0)) {
			case NONE:
				localStrategy = null;
				input1 = reader;
				break;
			// 排序
			case SORT:
				try {
					TypeComparatorFactory<IT> compFact = this.config.getInputComparator(0,
							getUserCodeClassLoader());
					if (compFact == null) {
						// Exception
					}
					
					UnilateralSortMerger<IT> sorter = new UnilateralSortMerger<IT>(
							getEnvironment().getMemoryManager(), 
							getEnvironment().getIOManager(),
							this.reader, this, this.inputTypeSerializerFactory, compFact.createComparator(),
							this.config.getRelativeMemoryInput(0), this.config.getFilehandlesInput(0),
							this.config.getSpillingThresholdInput(0),
							this.config.getUseLargeRecordHandler(),
							this.getExecutionConfig().isObjectReuseEnabled());
					
					this.localStrategy = sorter;
					input1 = sorter.getIterator();
				} catch (Exception e) {
					// RuntimeException
				}
				break;
			default:
				// RuntimeException
			}
						
			final TypeSerializer<IT> serializer = this.inputTypeSerializerFactory.getSerializer();
			final MutableObjectIterator<IT> input = input1;
			final OutputFormat<IT> format = this.format;
			
			// open format
			format.open(this.getEnvironment().getTaskInfo().getIndexOfThisSubtask(), this.getEnvironment().getTaskInfo().getNumberOfParallelSubtasks());
			if (objectReuseEnabled) {
				IT record = serializer.createInstance();
				while (!this.taskCanceled && ((record = input.next(record)) != null)) {
					// 写出record
					numRecordsIn.inc();
					format.writeRecord(record);
				}
			} else {
				IT record;
				while (!this.taskCanceled && ((record = input.next()) != null)) {
					// 写出record
					numRecordsIn.inc();
					format.writeRecord(record);
				}
			}
		}
		catch (Exception ex) {
			// 异常处理
		}
		finally {
			// 关闭format
			// 关闭localStrategy
			// 清理readers
		}
	}

为了描述从Dispatcher到Task执行的整个流程，绘制了下面这张图，其中每个节点表示一个方法，大框表示所属类。从这张图应该可以对Flink的job提交流程有一个很清晰的认识。

4 总结

本篇主要介绍了Flink在Dispatcher之后的job提交流程，顺便提及了在客户端是通过API来生成StreamGraph和JobGraph，然后在JobMaster通过JobGraph来生成ExecutionGraph。最后通过ExecutionGraph来生成可执行的图并将Execution调度成真正执行的Task。

参考文献

http://jm.taobao.org/2017/07/06/20170706/
https://zhuanlan.zhihu.com/p/22736103
http://blog.jrwang.me/2019/flink-source-code-executiongraph/
https://www.jianshu.com/p/13070729289c
https://www.cnblogs.com/bethunebtj/p/9168274.html#24-executiongraph%E7%9A%84%E7%94%9F%E6%88%90

你可能感兴趣的:(Flink)

20250124 Flink中窗口开始时间和結束時間靈臺清明 Flink flink 大数据
增量聚合的ProcessWindowFunction#ProcessWindowFunction可以与ReduceFunction或AggregateFunction搭配使用，使其能够在数据到达窗口的时候进行增量聚合。当窗口关闭时，ProcessWindowFunction将会得到聚合的结果。这样它就可以增量聚合窗口的元素并且从ProcessWindowFunction`中获得窗口的元数据。你也可
Flink (十三) ：Table API 与 DataStream API 的转换（一） Leven199527 Flink flink sql 数据库
TableAPI和DataStreamAPI在定义数据处理管道时同样重要。DataStreamAPI提供了流处理的基本操作（即时间、状态和数据流管理），并且是一个相对低级的命令式编程API。而TableAPI抽象了许多内部实现，提供了一个结构化和声明式的API。这两个API都可以处理有界流和无界流。有界流需要在处理历史数据时进行管理。无界流通常出现在实时处理场景中，可能会先通过历史数据初始化。为了
大数据平台建设整体架构设计方案 AI天才研究院 ChatGPT AI大模型企业级应用开发实战大数据AI人工智能大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
《大数据平台建设整体架构设计方案》关键词：大数据平台、分布式存储、分布式计算、数据仓库、数据湖、数据安全、数据质量管理、数据治理、数据挖掘、机器学习、图计算、自然语言处理、Hadoop、Spark、Flink、项目规划、运维管理、最佳实践。摘要：本文将深入探讨大数据平台建设整体架构设计方案，从概述与核心概念、技术栈、建设实践、运维管理以及经验展望等多个方面进行详细阐述。通过梳理大数据平台的核心组成
Flink访问Kerberos环境下的Hive 我若成风zhb flink flink kerberos hive hadoop
目录测试环境工程搭建示例代码及运行总结本文主要介绍如何使用Flink访问Kerberos环境下的Hive。测试环境1.hive版本为2.1.12.flink版本为1.10.0工程搭建使用IDE工具通过Maven创建一个Java工程，具体创建过程就不详细描述了。1.在工程的pom.xml文件中增加如下依赖org.apache.flinkflink-java${flink.version}provid
Flink读写Kafka（Table API） sf_www 实时计算Flink flink kafka 大数据
前面（Flink读写Kafka（DataStreamAPI）_flinkkafkascram-CSDN博客）我们已经讲解了使用DataStreamAPI来读取Kafka，在这里继续讲解下使用TableAPI来读取Kafka，和前面一样也是引入相同的依赖即可。org.apache.flinkflink-connector-kafka1.15.41.创建KafkaTable可以使用以下方式来创建Kaf
2025年新出炉的MySQL面试题长风清留扬 150道MySQL高频面试题 mysql 数据库面试 sql
作者简介：CSDN\阿里云\腾讯云\华为云开发社区优质创作者，专注分享大数据、Python、数据库、人工智能等领域的优质内容个人主页：长风清留杨的博客形式准则：无论成就大小，都保持一颗谦逊的心，尊重他人，虚心学习。✨推荐专栏：Python入门到入魔，Mysql入门到入魔，Python入门基础大全，Flink入门到实战若缘分至此，无法再续相逢，愿你朝朝暮暮，皆有安好，晨曦微露道早安，日中炽热说午安，
精选了几道MySQL的大厂面试题，被提问的几率很高！长风清留扬 150道MySQL高频面试题 mysql android 数据库面试学习 MySQL面试
作者简介：CSDN\阿里云\腾讯云\华为云开发社区优质创作者，专注分享大数据、Python、数据库、人工智能等领域的优质内容个人主页：长风清留杨的博客形式准则：无论成就大小，都保持一颗谦逊的心，尊重他人，虚心学习。✨推荐专栏：Python入门到入魔，Mysql入门到入魔，Python入门基础大全，Flink入门到实战若缘分至此，无法再续相逢，愿你朝朝暮暮，皆有安好，晨曦微露道早安，日中炽热说午安，
Apache Flink 替换 Spark Stream的架构与实践( bilibili 案例解读)_streamsparkflink加载udf 2501_90243308 apache flink spark
3.基于ApacheFlink的流式计算平台为解决上述问题，bilibili希望根据以下三点要求构建基于ApacheFlink的流式计算平台。第一点，需要提供SQL化编程。bilibili对SQL进行了扩展，称为BSQL。BSQL扩展了Flink底层SQL的上层，即SQL语法层。**第二点，**DAG拖拽编程,一方面用户可以通过画板来构建自己的Pipeline，另一方面用户也可以使用原生Jar方式
Flink (十二) ：Table API & SQL (一) 概览 Leven199527 Flink flink sql 大数据
ApacheFlink有两种关系型API来做流批统一处理：TableAPI和SQL。TableAPI是用于Scala和Java语言的查询API，它可以用一种非常直观的方式来组合使用选取、过滤、join等关系型算子。FlinkSQL是基于ApacheCalcite来实现的标准SQL。无论输入是连续的（流式）还是有界的（批处理），在两个接口中指定的查询都具有相同的语义，并指定相同的结果。TableAP
用 Java 的思路快速学习 Scala 进朱者赤其他大数据 scala Scala
引言Scala是一种结合了面向对象和函数式编程的现代编程语言，广泛应用于大数据处理框架如ApacheSpark和ApacheFlink。对于熟悉Java的开发者来说，Scala的学习曲线相对平缓。本文将通过类比Java中的概念，帮助Java开发者快速上手Scala。1.基本语法1.1.数据类型以下是Scala和Java数据类型的汇总表格：Scala数据类型Java数据类型说明Intint32位整数
Flink之kafka消息解析器2 怎么才能努力学习啊 flink kafka 大数据
概要昨天的话题，FlinkSource消费kafka数据自定义反序列化，获取自己想要的数据和类型实现过程publicclassTestWithMetadataDeserializationSchemaimplementsKafkaRecordDeserializationSchema{第一步：自定义实现这个接口，这里的泛型一般的都是自定义类@Overridepublicvoiddeserializ
Flink之kafka消费数据怎么才能努力学习啊 flink kafka 大数据
场景：本地构建Flink程序问题描述消费Kafka的数据时，使用Flink新的KakfaSource。会报如下错误KafkaSourcekafkaSource=KafkaSource.builder().setBootstrapServers(kafkaProperties.getProperty("kafka.bootstrap.servers")).setTopics("test2").set
【Flink 实战系列】Flink CDC 实时同步 Mysql 全量加增量数据到 Hudi JasonLee实时计算 Flink 实战系列 hbase spark 大数据
【Flink实战系列】FlinkCDC实时同步Mysql全量加增量数据到Hudi前言FlinkCDC是基于Flink开发的变化数据获取组件（Changedatacapture），简单的说就是来捕获变更的数据，ApacheHudi是一个数据湖平台，又支持对数据做增删改查操作，所以FlinkCDC可以很好的和Hudi结合起来，打造实时数仓，实时湖仓一体的架构，下面就来演示一下同步的过程。环境组件版本F
Flink系列-2、Flink架构体系技术武器库大数据专栏 flink 架构 jvm
版权声明：本文为博主原创文章，遵循CC4.0BY-SA版权协议，转载请附上原文出处链接和本声明。大数据系列文章目录官方网址：https://flink.apache.org/学习资料：https://flink-learning.org.cn/目录Flink中的重要角⾊Flink数据流编程模型Libraries支持Flink集群搭建Local本地模式（开发测试）Standalone-伪分布环境（开
Flink 的核心特点和概念 Ray.1998 大数据大数据数据分析数据仓库 flink
Flink是一个流式处理框架，专注于高吞吐量、低延迟的数据流处理。它能处理无限流（即实时数据流）和有限流（批处理），具有很强的灵活性和可扩展性，广泛应用于实时数据分析、监控系统、数据处理平台等场景。下面是一些关于Flink的核心特点和概念：1.流处理和批处理流处理（StreamProcessing）:Flink的核心就是流处理，它能够实时处理不断到达的数据流。Flink会将数据划分成时间窗口来处理
Flink的流处理和批处理 Ray.1998 大数据 flink 大数据数据挖掘数据分析
1.流处理（StreamProcessing）流处理是Flink的核心功能之一，主要用于处理无限流数据，也就是不断到达的数据。它能够实时处理数据流，并对每个数据元素执行操作。流处理中的数据没有预定的边界，它的特征是持续到达，因此，流处理必须实时处理每个事件，而不能等到所有数据都到齐后再进行处理。核心特点：实时性：流处理的最大优势是实时性。Flink允许对实时数据流进行分析，计算和处理，几乎是对数据
HUDI-0.11.0 BUCKET index on Flink 特性试用 _Magic Big Data flink hudi
1.背景在0.10.1版本下，使用默认的index(FLINK_STATE)，在upsert模式下，几十亿级别的数据更新会消耗大量内存，并且检查点（checkpoint）时间过长。因此，切换到0.11.0的BUCKET索引。当前环境：Flink1.13.2+Hudi0.11.0（master2022.04.11）+COW+HDFS。关键配置项：index.type=BUCKEThoodie.buc
Kafka 迁移 AutoMQ 时 Flink 位点管理的挑战与解决方案 AutoMQ 云计算云原生 Kafka 消息计算大数据 AWS AutoMQ 阿里云腾讯云 GCP
编辑导读：AutoMQ是一款与ApacheKafka100%完全兼容的新一代Kafka，可以做到至多10倍的成本降低和极速的弹性。凭借其与Kafka的完全兼容性可以与用户已有的Flink等大数据基础设施进行轻松整合。Flink是重要的流处理引擎，与Kafka有着密切的关系。本文重点介绍了当用户需要将生产Kafka集群迁移到AutoMQ时，如何处理好Flink的位点来确保整体迁移的平滑过渡。引言在云
20250120 Flink 的缓冲区超时（Buffer Timeout）靈臺清明 flink
Flink的缓冲区超时（BufferTimeout）机制确实类似于一辆车等待乘客的过程，如果车每次只载一个乘客就发车，会导致效率低下，资源浪费。同样，在Flink的数据流处理中，缓冲区超时的设置对吞吐量和延迟的权衡至关重要。以下是更详细的原因解析和背后的机制：1.什么是缓冲区超时（BufferTimeout）？在Flink中，算子之间的数据通过网络传输。为了提高传输效率，Flink会在发送数据之前
Java 驱动大数据流处理：Storm 与 Flink 入门（大数据）用心去追梦大数据 java storm
Java是一种广泛使用的编程语言，特别适用于企业级应用开发。随着数据量的不断增长，处理大数据流成为了现代软件开发中的一个重要领域。ApacheStorm和ApacheFlink是两个用于处理大规模数据流的开源框架，它们都支持用Java编写的应用程序。下面将简要介绍这两个框架，并提供一些入门指导。ApacheStormApacheStorm是一个免费、开源的分布式实时计算系统。Storm让用户能够轻
SeaTunnel 与 DataX 、Sqoop、Flume、Flink CDC 对比不二人生 #数据集成工具 SeaTunnel
文章目录SeaTunnel与DataX、Sqoop、Flume、FlinkCDC对比同类产品横向对比2.1、高可用、健壮的容错机制2.2、部署难度和运行模式2.3、支持的数据源丰富度2.4、内存资源占用2.5、数据库连接占用2.6、自动建表2.7、整库同步2.8、断点续传2.9、多引擎支持2.10、数据转换算子2.11、性能2.12、离线同步2.13、增量同步&实时同步2.14、CDC同步2.15
20250120 深入了解 Apache Flink 的 Checkpointing 靈臺清明 Flink apache flink 大数据
ApacheFlink是一种用于实时流处理和批处理的分布式计算框架。在实时流处理任务中，保证数据的一致性和任务的容错性是至关重要的，而Flink的Checkpointing机制正是实现这一目标的核心技术。本文将详细介绍Flink的Checkpointing，包括其概念、原理、配置和实际应用。什么是Checkpointing？Checkpointing是Flink提供的一种用于容错的机制。它会在流处
Flink Standalone 方案中解决挂机问题星尘幻宇科技 flink 大数据
Standalone中可以配置HighAvailability（HA）部署和配置首先了解Flink实际运行时包括两类进程：JobManager（又称为JobMaster）：协调Task的分布式执行，包括调度Task、协调创Checkpoint以及当Jobfailover时协调各个Task从Checkpoint恢复等。TaskManager（又称为Worker）：执行Dataflow中的Tasks，
大数据学习(37)- Flink运行时架构 viperrrrrrr 学习 flink 大数据
&&大数据学习&&系列专栏：哲学语录:承认自己的无知，乃是开启智慧的大门如果觉得博主的文章还不错的话，请点赞+收藏⭐️+留言支持一下博主哦1）作业管理器（JobManager）JobManager是一个Flink集群中任务管理和调度的核心，是控制应用执行的主进程。也就是说，每个应用都应该被唯一的JobManager所控制执行。JobManger又包含3个不同的组件。（1）JobMasterJobM
Flink CDC MySQL同步MySQL错误记录 lingllllove flink mysql 大数据
FlinkCDC简介FlinkCDC（ChangeDataCapture）是一种高效的数据同步工具，利用Flink强大的实时流处理能力，从MySQL等数据库捕获数据变更，并将这些变更实时同步到目标数据库。本文将详细介绍FlinkCDCMySQL同步到MySQL时常见的错误记录及其解决方法。常见错误及解决方法1.连接错误错误信息：FailedtoconnecttoMySQLserver.可能原因：
FFA 2024 「流批一体」专场：探索在不同场景的流批一体 Apache Flink
FlinkForwardAsia2024即将盛大开幕！作为ApacheFlink社区备受期待的年度盛会之一，本届大会将于11月29至30日在上海隆重举行。FlinkForwardAsia（简称FFA）是由Apache官方授权的社区技术大会，旨在汇聚领先的行业实践与技术动态。在众多合作伙伴和技术开发者的支持下，FFA已成功举办六届。适逢ApacheFlink诞生10周年，今年的FFA将与广大开发者分
Scaleph：基于Kubernetes的开放式数据平台尤淞渊
Scaleph：基于Kubernetes的开放式数据平台scalephOpendataplatformbasedonFlinkandKubernetes,supportsweb-uiclick-and-dropdataintegrationwithSeaTunnelbackendedbyFlinkengine,flinkonlinesqldevelopmentbackendedbyFlinkSql
深入Flink : 源码解读数据倾斜代码落地 java
大家好，我是大圣，很高兴又和大家见面。上篇文章，我们详细说了通过使得Flink每个并行子任务上面都有对应的key来解决数据倾斜。但是我们只说了这个方案的思想和设计理解，还没有把这种方案真正应用到我们的Flink任务当中。这篇文章我们就重点把这种方案实践到我们写的Flink任务当中。什么是数据倾斜解决方案回顾代码如下：publicclassRebalanceKeyCreator{privateint
Flink（十）：DataStream API (七) 状态 Leven199527 Flink flink 大数据
1.状态的定义在ApacheFlink中，状态（State）是指在数据流处理过程中需要持久化和追踪的中间数据，它允许Flink在处理事件时保持上下文信息，从而支持复杂的流式计算任务，如聚合、窗口计算、联接等。状态是Flink处理有状态操作（如窗口、时间戳操作、聚合等）的核心组成部分。2.状态的类型Flink提供了强大的状态管理机制，允许应用程序在分布式环境中处理状态，保证高可用性和容错性。Flin
Apache Flink morcake flink 大数据
"ApacheFlinkistheopensourcestreamprocessingframeworkfordistributed,high-performance,ready-to-use,andaccuratestreamprocessingapplications."ApacheFlinkisaframeworkanddistributedprocessingengineforstatef
knob UI插件使用换个号韩国红果果 JavaScript jsonp knob
图形是用canvas绘制的 js代码 var paras = { max:800, min:100, skin:'tron',//button type thickness:.3,//button width width:'200',//define canvas width.,canvas height displayInput:'tr
Android+Jquery Mobile学习系列(5)-SQLite数据库白糖_ JQuery Mobile
目录导航 SQLite是轻量级的、嵌入式的、关系型数据库，目前已经在iPhone、Android等手机系统中使用,SQLite可移植性好，很容易使用，很小，高效而且可靠。因为Android已经集成了SQLite，所以开发人员无需引入任何JAR包，而且Android也针对SQLite封装了专属的API，调用起来非常快捷方便。我也是第一次接触S
impala-2.1.2-CDH5.3.2 dayutianfei impala
最近在整理impala编译的东西，简单记录几个要点：根据官网的信息（https://github.com/cloudera/Impala/wiki/How-to-build-Impala）： 1. 首次编译impala，推荐使用命令： ${IMPALA_HOME}/buildall.sh -skiptests -build_shared_libs -format 2.仅编译BE ${I
求二进制数中1的个数周凡杨 java 算法二进制
解法一：对于一个正整数如果是偶数，该数的二进制数的最后一位是 0 ，反之若是奇数，则该数的二进制数的最后一位是 1 。因此，可以考虑利用位移、判断奇偶来实现。 public int bitCount(int x){ int count = 0; while(x!=0){ if(x%2!=0){ /
spring中hibernate及事务配置 g21121 Hibernate
hibernate的sessionFactory配置：  <bean id="sessionFactory" class="org.springframework.orm.hibernate3.LocalSessionFactoryBean"> <
log4j.properties 使用 510888780 log4j
log4j.properties 使用一.参数意义说明输出级别的种类 ERROR、WARN、INFO、DEBUG ERROR 为严重错误主要是程序的错误 WARN 为一般警告，比如session丢失 INFO 为一般要显示的信息，比如登录登出 DEBUG 为程序的调试信息配置日志信息输出目的地 log4j.appender.appenderName = fully.qua
Spring mvc-jfreeChart柱图（2）布衣凌宇 jfreechart
上一篇中生成的图是静态的，这篇将按条件进行搜索，并统计成图表，左面为统计图，右面显示搜索出的结果。第一步：导包第二步；配置web.xml(上一篇有代码) 建BarRenderer类用于柱子颜色 import java.awt.Color; import java.awt.Paint; import org.jfree.chart.renderer.category.BarR
我的spring学习笔记14-容器扩展点之PropertyPlaceholderConfigurer aijuans Spring3
PropertyPlaceholderConfigurer是个bean工厂后置处理器的实现，也就是BeanFactoryPostProcessor接口的一个实现。关于BeanFactoryPostProcessor和BeanPostProcessor类似。我会在其他地方介绍。 PropertyPlaceholderConfigurer可以将上下文（配置文件）中的属性值放在另一个单独的标准java
maven 之 cobertura 简单使用 antlove maven test unit cobertura report
1. 创建一个maven项目 2. 创建com.CoberturaStart.java package com; public class CoberturaStart { public void helloEveryone(){ System.out.println("=================================================
程序的执行顺序百合不是茶 JAVA执行顺序
刚在看java核心技术时发现对java的执行顺序不是很明白了,百度一下也没有找到适合自己的资料,所以就简单的回顾一下吧代码如下; 经典的程序执行面试题 //关于程序执行的顺序 //例如： //定义一个基类 public class A(){ public A(
设置session失效的几种方法 bijian1013 web.xml session失效监听器
在系统登录后，都会设置一个当前session失效的时间，以确保在用户长时间不与服务器交互，自动退出登录，销毁session。具体设置很简单，方法有三种：（1）在主页面或者公共页面中加入：session.setMaxInactiveInterval(900);参数900单位是秒，即在没有活动15分钟后，session将失效。这里要注意这个session设置的时间是根据服务器来计算的，而不是客户端。所
java jvm常用命令工具 bijian1013 java jvm
一.概述程序运行中经常会遇到各种问题，定位问题时通常需要综合各种信息，如系统日志、堆dump文件、线程dump文件、GC日志等。通过虚拟机监控和诊断工具可以帮忙我们快速获取、分析需要的数据，进而提高问题解决速度。本文将介绍虚拟机常用监控和问题诊断命令工具的使用方法，主要包含以下工具: &nbs
【Spring框架一】Spring常用注解之Autowired和Resource注解 bit1129 Spring常用注解
Spring自从2.0引入注解的方式取代XML配置的方式来做IOC之后，对Spring一些常用注解的含义行为一直处于比较模糊的状态，写几篇总结下Spring常用的注解。本篇包含的注解有如下几个： Autowired Resource Component Service Controller Transactional 根据它们的功能、目的，可以分为三组，Autow
mysql 操作遇到safe update mode问题 bitray update
我并不知道出现这个问题的实际原理,只是通过其他朋友的博客,文章得知的一个解决方案,目前先记录一个解决方法,未来要是真了解以后,还会继续补全. 在mysql5中有一个safe update mode,这个模式让sql操作更加安全,据说要求有where条件,防止全表更新操作.如果必须要进行全表操作,我们可以执行 SET
nginx_perl试用 ronin47 nginx_perl试用
因为空闲时间比较多，所以在CPAN上乱翻，看到了nginx_perl这个项目(原名Nginx::Engine)，现在托管在github.com上。地址见：https://github.com/zzzcpan/nginx-perl 这个模块的目的，是在nginx内置官方perl模块的基础上，实现一系列异步非阻塞的api。用connector/writer/reader完成类似proxy的功能（这里
java-63-在字符串中删除特定的字符 bylijinnan java
public class DeleteSpecificChars { /** * Q 63 在字符串中删除特定的字符 * 输入两个字符串，从第一字符串中删除第二个字符串中所有的字符。 * 例如，输入”They are students.”和”aeiou”，则删除之后的第一个字符串变成”Thy r stdnts.” */ public static voi
EffectiveJava--创建和销毁对象 ccii 创建和销毁对象
本章内容： 1. 考虑用静态工厂方法代替构造器 2. 遇到多个构造器参数时要考虑用构建器（Builder模式） 3. 用私有构造器或者枚举类型强化Singleton属性 4. 通过私有构造器强化不可实例化的能力 5. 避免创建不必要的对象 6. 消除过期的对象引用 7. 避免使用终结方法 1. 考虑用静态工厂方法代替构造器类可以通过
[宇宙时代]四边形理论与光速飞行 comsci
从四边形理论来推论为什么光子飞船必须获得星光信号才能够进行光速飞行？一组星体组成星座向空间辐射一组由复杂星光信号组成的辐射频带，按照四边形-频率假说一组频率就代表一个时空的入口那么这种由星光信号组成的辐射频带就代表由这些星体所控制的时空通道，该时空通道在三维空间的投影是一
ubuntu server下python脚本迁移数据 cywhoyi python Kettle pymysql cx_Oracle ubuntu server
因为是在Ubuntu下，所以安装python、pip、pymysql等都极其方便，sudo apt-get install pymysql，但是在安装cx_Oracle（连接oracle的模块）出现许多问题，查阅相关资料，发现这边文章能够帮我解决，希望大家少走点弯路。http://www.tbdazhe.com/archives/602 1.安装python 2.安装pip、pymysql
Ajax正确但是请求不到值解决方案 dashuaifu Ajax async
Ajax正确但是请求不到值解决方案解决方案：1 . async: false , 2. 设置延时执行js里的ajax或者延时后台java方法！！！！！！！例如： $.ajax({ &
windows安装配置php+memcached dcj3sjt126com PHP Install memcache
Windows下Memcached的安装配置方法 1、将第一个包解压放某个盘下面，比如在c:\memcached。 2、在终端（也即cmd命令界面）下输入 'c:\memcached\memcached.exe -d install' 安装。 3、再输入： 'c:\memcached\memcached.exe -d start' 启动。（需要注意的: 以后memcached将作为windo
iOS开发学习路径的一些建议 dcj3sjt126com ios
iOS论坛里有朋友要求回答帖子，帖子的标题是：想学IOS开发高阶一点的东西，从何开始，然后我吧啦吧啦回答写了很多。既然敲了那么多字，我就把我写的回复也贴到博客里来分享，希望能对大家有帮助。欢迎大家也到帖子里讨论和分享，地址：http://bbs.csdn.net/topics/390920759 下面是我回复的内容：结合自己情况聊下iOS学习建议，
Javascript闭包概念 fanfanlovey JavaScript 闭包
1.参考资料 http://www.jb51.net/article/24101.htm http://blog.csdn.net/yn49782026/article/details/8549462 2.内容概述要理解闭包，首先需要理解变量作用域问题内部函数可以饮用外面全局变量 var n=999; 　　functio
yum安装mysql5.6 haisheng mysql
1、安装http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm 2、yum install mysql 3、yum install mysql-server 4、vi /etc/my.cnf 添加character_set_server=utf8
po/bo/vo/dao/pojo的详介 IT_zhlp80 java BO VO DAO POJO po
JAVA几种对象的解释 PO:persistant object持久对象,可以看成是与数据库中的表相映射的java对象。最简单的PO就是对应数据库中某个表中的一条记录，多个记录可以用PO的集合。PO中应该不包含任何对数据库的操作. VO:value object值对象。通常用于业务层之间的数据传递，和PO一样也是仅仅包含数据而已。但应是抽象出的业务对象,可
java设计模式 kerryg java 设计模式
设计模式的分类：一、设计模式总体分为三大类： 1、创建型模式（5种）：工厂方法模式，抽象工厂模式，单例模式，建造者模式，原型模式。 2、结构型模式（7种）：适配器模式，装饰器模式，代理模式，外观模式，桥接模式，组合模式，享元模式。 3、行为型模式（11种）：策略模式，模版方法模式，观察者模式，迭代子模式，责任链模式，命令模式，备忘录模式，状态模式，访问者
[1]CXF3.1整合Spring开发webservice——helloworld篇木头.java spring webservice CXF
Spring 版本3.2.10 CXF 版本3.1.1 项目采用MAVEN组织依赖jar 我这里是有parent的pom，为了简洁明了，我直接把所有的依赖都列一起了，所以都没version，反正上面已经写了版本 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="ht
Google 工程师亲授：菜鸟开发者一定要投资的十大目标 qindongliang1922 工作感悟人生
身为软件开发者，有什么是一定得投资的？ Google 软件工程师 Emanuel Saringan 整理了十项他认为必要的投资，第一项就是身体健康，英文与数学也都是必备能力吗？来看看他怎么说。（以下文字以作者第一人称撰写））你的健康无疑地，软件开发者是世界上最久坐不动的职业之一。每天连坐八到十六小时，休息时间只有一点点，绝对会让你的鲔鱼肚肆无忌惮的生长。肥胖容易扩大罹患其他疾病的风险，
linux打开最大文件数量1,048,576 tianzhihehe c linux
File descriptors are represented by the C int type. Not using a special type is often considered odd, but is, historically, the Unix way. Each Linux process has a maximum number of files th
java语言中PO、VO、DAO、BO、POJO几种对象的解释衞酆夼 java VO BO POJO po
PO:persistant object持久对象最形象的理解就是一个PO就是数据库中的一条记录。好处是可以把一条记录作为一个对象处理，可以方便的转为其它对象。可以看成是与数据库中的表相映射的java对象。最简单的PO就是对应数据库中某个表中的一条记录，多个记录可以用PO的集合。PO中应该不包含任何对数据库的操作。 BO:business object业务对象封装业务逻辑的java对象