thinktothings

Flink1.7.2 Dataset 并行计算源码分析

概述

了解Flink处理流程(用户程序 -> JobGrapth -> ExecutionGraph -> JobVertex -> ExecutionVertex -> 并行度 -> Task(DataSourceTask,BatchTask,DataSinkTask)
了解ExecutionVetex的构建，Task的构建，执行，任务之间的调用关系

原理分析

程序会转成JobGrapth提交,JobGraph最终转为ExecutionGraph进行处理
ExecutionGraph会拆分成ExecutionJobVertex执行，按(DataSourceTask,BatchTask,DataSinkTask) 进行拆分

0 = jobVertex = {InputFormatVertex@7675} “CHAIN DataSource (at com.opensourceteams.module.bigdata.flink.example.dataset.worldcount.WordCountRun $. m a i n (W o r d C o u n t R u n . s c a l a : 19) (o r g . a p a c h e . f l i n k . a p i . j a v a . i o . T e x t I n p) - > F l a t M a p (F l a t M a p a t c o m . o p e n s o u r c e t e a m s . m o d u l e . b i g d a t a . f l i n k . e x a m p l e . d a t a s e t . w o r l d c o u n t . W o r d C o u n t R u n$ .main(WordCountRun.scala:23)) -> Map (Map at com.opensourceteams.module.bigdata.flink.example.dataset.worldcount.WordCountRun$.main(WordCountRun.scala:23)) -> Combine (SUM(1)) (org.apache.flink.runtime.operators.DataSourceTask)”

1 = jobVertex = {JobVertex@7695} “Reduce (SUM(1)) (org.apache.flink.runtime.operators.BatchTask)”

2 = jobVertex = {OutputFormatVertex@6665} “DataSink (collect()) (org.apache.flink.runtime.operators.DataSinkTask)”
```

ExecutionJobVertex 执行流程CREATED -> DEPLOYING ,转成对应的Task(CREATED -->DEPLOYING --> RUNNING)
默认作业调度模式为：LAZY_FROM_SOURCES，只启动Source任务，下游任务是当上游任务开始给他发送数据时才开始
- 刚开始，只有DataSourceTask对应的ExecutionJobVertex的 jobVertex.inputs 为空(元素个数0个)，所以只对DataSourceTask进行调度，部署，任务运行
- 随着DataSourceTask开始处理，就会产生中间数据，这时候通过输出数据，按key进行分区，分到对应的BatchTask分区数据，这个时候BatchTask就开始调度，部署，任务运行
- 随着BatchTask开始处理，就会产生中间数据，这时候通过输出数据，按key进行分区，分到对应的DataSinkTask分区数据，这个时候DataSinkTask就开始调度，部署，任务运行
- 由于后面的任务依赖前边的任务，就不会一开始就运行所有的任务，串行到，只有该任务有上游的数据发送过来，该任务才会启动，运行，换句话说，就是下游的任务是不启动的，只有上游的任务发送数据过来时，才开始启动，运行，这样节省了计算资源
有几个并行度，ExecutionJobVertex 会转成对应的几个ExecutionVertex,ExecutionVertex 是会转化成Task来运行，ExecutionVertex中并行度通过subTaskIndex来区分，第一个subTaskIndex=0 ,第二个subTaskIndex = 1

输入数据

c a a
b c a

程序

WordCount.scala进行单词统计

package com.opensourceteams.module.bigdata.flink.example.dataset.worldcount

import com.opensourceteams.module.bigdata.flink.common.ConfigurationUtil
import org.apache.flink.api.scala.ExecutionEnvironment


/**
  * 批处理，DataSet WordCount分析
  */
object WordCountRun {


  def main(args: Array[String]): Unit = {

    //调试设置超时问题
    val env : ExecutionEnvironment= ExecutionEnvironment.createLocalEnvironment(ConfigurationUtil.getConfiguration(true))
    env.setParallelism(2)

    val dataSet = env.readTextFile("file:/opt/n_001_workspaces/bigdata/flink/flink-maven-scala-2/src/main/resources/data/line.txt")


    import org.apache.flink.streaming.api.scala._
    val result = dataSet.flatMap(x => x.split(" ")).map((_,1)).groupBy(0).sum(1)



    result.print()




  }

}

源码分析

JobMaster

new JobMaster()

把JobGraph 转换为ExecutionGrapth

this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup);

public JobMaster(
			RpcService rpcService,
			JobMasterConfiguration jobMasterConfiguration,
			ResourceID resourceId,
			JobGraph jobGraph,
			HighAvailabilityServices highAvailabilityService,
			SlotPoolFactory slotPoolFactory,
			JobManagerSharedServices jobManagerSharedServices,
			HeartbeatServices heartbeatServices,
			BlobServer blobServer,
			JobManagerJobMetricGroupFactory jobMetricGroupFactory,
			OnCompletionActions jobCompletionActions,
			FatalErrorHandler fatalErrorHandler,
			ClassLoader userCodeLoader) throws Exception {

		super(rpcService, AkkaRpcServiceUtils.createRandomName(JOB_MANAGER_NAME));

		final JobMasterGateway selfGateway = getSelfGateway(JobMasterGateway.class);

		this.jobMasterConfiguration = checkNotNull(jobMasterConfiguration);
		this.resourceId = checkNotNull(resourceId);
		this.jobGraph = checkNotNull(jobGraph);
		this.rpcTimeout = jobMasterConfiguration.getRpcTimeout();
		this.highAvailabilityServices = checkNotNull(highAvailabilityService);
		this.blobServer = checkNotNull(blobServer);
		this.scheduledExecutorService = jobManagerSharedServices.getScheduledExecutorService();
		this.jobCompletionActions = checkNotNull(jobCompletionActions);
		this.fatalErrorHandler = checkNotNull(fatalErrorHandler);
		this.userCodeLoader = checkNotNull(userCodeLoader);
		this.jobMetricGroupFactory = checkNotNull(jobMetricGroupFactory);

		this.taskManagerHeartbeatManager = heartbeatServices.createHeartbeatManagerSender(
			resourceId,
			new TaskManagerHeartbeatListener(selfGateway),
			rpcService.getScheduledExecutor(),
			log);

		this.resourceManagerHeartbeatManager = heartbeatServices.createHeartbeatManager(
				resourceId,
				new ResourceManagerHeartbeatListener(),
				rpcService.getScheduledExecutor(),
				log);

		final String jobName = jobGraph.getName();
		final JobID jid = jobGraph.getJobID();

		log.info("Initializing job {} ({}).", jobName, jid);

		final RestartStrategies.RestartStrategyConfiguration restartStrategyConfiguration =
				jobGraph.getSerializedExecutionConfig()
						.deserializeValue(userCodeLoader)
						.getRestartStrategy();

		this.restartStrategy = RestartStrategyResolving.resolve(restartStrategyConfiguration,
			jobManagerSharedServices.getRestartStrategyFactory(),
			jobGraph.isCheckpointingEnabled());

		log.info("Using restart strategy {} for {} ({}).", this.restartStrategy, jobName, jid);

		resourceManagerLeaderRetriever = highAvailabilityServices.getResourceManagerLeaderRetriever();

		this.slotPool = checkNotNull(slotPoolFactory).createSlotPool(jobGraph.getJobID());

		this.slotPoolGateway = slotPool.getSelfGateway(SlotPoolGateway.class);

		this.registeredTaskManagers = new HashMap<>(4);

		this.backPressureStatsTracker = checkNotNull(jobManagerSharedServices.getBackPressureStatsTracker());
		this.lastInternalSavepoint = null;

		this.jobManagerJobMetricGroup = jobMetricGroupFactory.create(jobGraph);
		this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup);
		this.jobStatusListener = null;

		this.resourceManagerConnection = null;
		this.establishedResourceManagerConnection = null;
	}

ExecutionGraph

ExecutionGraph.scheduleForExecution()

负责Execution的调度，也就是负责把ExecutionGrapth转成ExecutionJobVertex,ExecutionJobVertex转成ExecutionVertex，再转成任务，这是真正的开始逻辑的地方
更新当前Job的状态，即更新ExecutionGraph的状态,从CREATED更新到RUNNING

transitionState(JobStatus.CREATED, JobStatus.RUNNING)
- INFO级别日志
Job Flink Java Job at Mon Mar 11 18:57:37 CST 2019 (f24b82ed1ec3e1c90455c342a9dfc21e) switched from state CREATED to RUNNING.
```

默认的作业调度模式 LAZY_FROM_SOURCES,
- LAZY_FROM_SOURCES:从sources开始安排任务。一旦输入数据准备就绪，就开始下游任务，(刚开始只有Sources任务，下游任务都是未开始的) ;
- EAGER ：立即安排所有任务

private ScheduleMode scheduleMode = ScheduleMode.LAZY_FROM_SOURCES;
```

调用 ExecutionGraph.scheduleLazy() //延迟调度

public void scheduleForExecution() throws JobException {

		final long currentGlobalModVersion = globalModVersion;

		if (transitionState(JobStatus.CREATED, JobStatus.RUNNING)) {

			final CompletableFuture newSchedulingFuture;

			switch (scheduleMode) {

				case LAZY_FROM_SOURCES:
					newSchedulingFuture = scheduleLazy(slotProvider);
					break;

				case EAGER:
					newSchedulingFuture = scheduleEager(slotProvider, allocationTimeout);
					break;

				default:
					throw new JobException("Schedule mode is invalid.");
			}

			if (state == JobStatus.RUNNING && currentGlobalModVersion == globalModVersion) {
				schedulingFuture = newSchedulingFuture;

				newSchedulingFuture.whenCompleteAsync(
					(Void ignored, Throwable throwable) -> {
						if (throwable != null && !(throwable instanceof CancellationException)) {
							// only fail if the scheduling future was not canceled
							failGlobal(ExceptionUtils.stripCompletionException(throwable));
						}
					},
					futureExecutor);
			} else {
				newSchedulingFuture.cancel(false);
			}
		}
		else {
			throw new IllegalStateException("Job may only be scheduled from state " + JobStatus.CREATED);
		}
	}

JobStatus

作业的状态 CREATED(已创建) -> RUNNING(运行中) -> FINISHED(已完成) 等


/**
 * Possible states of a job once it has been accepted by the job manager.
 */
public enum JobStatus {

	/** Job is newly created, no task has started to run. */
	CREATED(TerminalState.NON_TERMINAL),

	/** Some tasks are scheduled or running, some may be pending, some may be finished. */
	RUNNING(TerminalState.NON_TERMINAL),

	/** The job has failed and is currently waiting for the cleanup to complete */
	FAILING(TerminalState.NON_TERMINAL),
	
	/** The job has failed with a non-recoverable task failure */
	FAILED(TerminalState.GLOBALLY),

	/** Job is being cancelled */
	CANCELLING(TerminalState.NON_TERMINAL),
	
	/** Job has been cancelled */
	CANCELED(TerminalState.GLOBALLY),

	/** All of the job's tasks have successfully finished. */
	FINISHED(TerminalState.GLOBALLY),
	
	/** The job is currently undergoing a reset and total restart */
	RESTARTING(TerminalState.NON_TERMINAL),

	/** The job has been suspended and is currently waiting for the cleanup to complete */
	SUSPENDING(TerminalState.NON_TERMINAL),

	/**
	 * The job has been suspended which means that it has been stopped but not been removed from a
	 * potential HA job store.
	 */
	SUSPENDED(TerminalState.LOCALLY),

	/** The job is currently reconciling and waits for task execution report to recover state. */
	RECONCILING(TerminalState.NON_TERMINAL);
	
	// ----------------------------

ScheduleMode

作业调度模式，即ExecutionGraph调度模式(LAZY_FROM_SOURCES,EAGER)
LAZY_FROM_SOURCES:从sources开始安排任务。一旦输入数据准备就绪，就开始下游任务，(刚开始只有Sources任务，下游任务都是未开始的)
EAGER ：立即安排所有任务

/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.apache.flink.runtime.jobgraph;

/**
 * The ScheduleMode decides how tasks of an execution graph are started.  
 */
public enum ScheduleMode {

	/** Schedule tasks lazily from the sources. Downstream tasks are started once their input data are ready */
	LAZY_FROM_SOURCES,

	/** Schedules all tasks immediately. */
	EAGER;
	
	/**
	 * Returns whether we are allowed to deploy consumers lazily.
	 */
	public boolean allowLazyDeployment() {
		return this == LAZY_FROM_SOURCES;
	}
	
}

ExecutionGraph.scheduleLazy

程序会转成JobGrapth提交,JobGraph最终转为ExecutionGraph进行处理
ExecutionGraph会拆分成ExecutionJobVertex执行，按(DataSourceTask,BatchTask,DataSinkTask) 进行拆分

1 = jobVertex = {JobVertex@7695} “Reduce (SUM(1)) (org.apache.flink.runtime.operators.BatchTask)”

2 = jobVertex = {OutputFormatVertex@6665} “DataSink (collect()) (org.apache.flink.runtime.operators.DataSinkTask)”
```

ExecutionJobVertex （执行流程：CREATED -> DEPLOYING ）,转成对应的Task(执行流程：CREATED -->DEPLOYING --> RUNNING)
```
verticesInCreationOrder = {ArrayList@6145}  size = 3
```

0 = {ExecutionJobVertex@6484}
stateMonitor = {Object@6608}
graph = {ExecutionGraph@5602}
jobVertex = {InputFormatVertex@6609} "CHAIN DataSource (at com.opensourceteams.module.bigdata.flink.example.dataset.worldcount.WordCountRun $. m a i n (W o r d C o u n t R u n . s c a l a : 19) (o r g . a p a c h e . f l i n k . a p i . j a v a . i o . T e x t I n p) - > F l a t M a p (F l a t M a p a t c o m . o p e n s o u r c e t e a m s . m o d u l e . b i g d a t a . f l i n k . e x a m p l e . d a t a s e t . w o r l d c o u n t . W o r d C o u n t R u n$ .main(WordCountRun.scala:23)) -> Map (Map at com.opensourceteams.module.bigdata.flink.example.dataset.worldcount.WordCountRun $KaTeX parse error: Expected '}', got 'EOF' at end of input: \dots = {Collections$ UnmodifiableRandomAccessList@6610} size = 1
userDefinedOperatorIds = {Collections $KaTeX parse error: Expected 'EOF', got '}' at position 34: \dotsAccessList@6611}̲ size = 1 ta\dots$ UnmodifiableRandomAccessList@6655} size = 1
userDefinedOperatorIds = {Collections $KaTeX parse error: Expected 'EOF', got '}' at position 34: \dotsAccessList@6656}̲ size = 1 ta\dots$ UnmodifiableRandomAccessList@6666} size = 1
userDefinedOperatorIds = {Collections$UnmodifiableRandomAccessList@6667} size = 1
taskVertices = {ExecutionVertex[2]@6668}
producedDataSets = {IntermediateResult[0]@6669}
inputs = {ArrayList@6670} size = 1
parallelism = 2
slotSharingGroup = {SlotSharingGroup@6617} “SlotSharingGroup [9eb9f93248a641c71df925ec6245124a, b2e911c37d2ae462430e12812fb033ad, a2139d4cdf059b384d883251604a5f2e]”
coLocationGroup = null
inputSplits = null
maxParallelismConfigured = true
maxParallelism = 2
serializedTaskInformation = null
taskInformationBlobKey = null
taskInformationOrBlobKey = null
splitAssigner = null
```


private CompletableFuture scheduleLazy(SlotProvider slotProvider) {

		final ArrayList> schedulingFutures = new ArrayList<>(numVerticesTotal);
		// simply take the vertices without inputs.
		for (ExecutionJobVertex ejv : verticesInCreationOrder) {
			if (ejv.getJobVertex().isInputVertex()) {
				final CompletableFuture schedulingJobVertexFuture = ejv.scheduleAll(
					slotProvider,
					allowQueuedScheduling,
					LocationPreferenceConstraint.ALL, // since it is an input vertex, the input based location preferences should be empty
					Collections.emptySet());

				schedulingFutures.add(schedulingJobVertexFuture);
			}
		}

		return FutureUtils.waitForAll(schedulingFutures);
	}

ExecutionJobVertex.scheduleAll

有几个并行度把ExecutionJobVertex转成对应个数的 ExecutionVertex
调用ExecutionVertex.scheduleForExecution() 处理
Execution 状态为 CREATED

	//---------------------------------------------------------------------------------------------
	//  Actions
	//---------------------------------------------------------------------------------------------

	/**
	 * Schedules all execution vertices of this ExecutionJobVertex.
	 *
	 * @param slotProvider to allocate the slots from
	 * @param queued if the allocations can be queued
	 * @param locationPreferenceConstraint constraint for the location preferences
	 * @param allPreviousExecutionGraphAllocationIds set with all previous allocation ids in the job graph.
	 *                                                 Can be empty if the allocation ids are not required for scheduling.
	 * @return Future which is completed once all {@link Execution} could be deployed
	 */
	public CompletableFuture scheduleAll(
			SlotProvider slotProvider,
			boolean queued,
			LocationPreferenceConstraint locationPreferenceConstraint,
			@Nonnull Set allPreviousExecutionGraphAllocationIds) {

		final ExecutionVertex[] vertices = this.taskVertices;

		final ArrayList> scheduleFutures = new ArrayList<>(vertices.length);

		// kick off the tasks
		for (ExecutionVertex ev : vertices) {
			scheduleFutures.add(ev.scheduleForExecution(
				slotProvider,
				queued,
				locationPreferenceConstraint,
				allPreviousExecutionGraphAllocationIds));
		}

		return FutureUtils.waitForAll(scheduleFutures);
	}

ExecutionVertex.scheduleForExecution()

调用 Execution.scheduleForExecution

/**
	 * Schedules the current execution of this ExecutionVertex.
	 *
	 * @param slotProvider to allocate the slots from
	 * @param queued if the allocation can be queued
	 * @param locationPreferenceConstraint constraint for the location preferences
	 * @param allPreviousExecutionGraphAllocationIds set with all previous allocation ids in the job graph.
	 *                                                 Can be empty if the allocation ids are not required for scheduling.
	 * @return Future which is completed once the execution is deployed. The future
	 * can also completed exceptionally.
	 */
	public CompletableFuture scheduleForExecution(
			SlotProvider slotProvider,
			boolean queued,
			LocationPreferenceConstraint locationPreferenceConstraint,
			@Nonnull Set allPreviousExecutionGraphAllocationIds) {
		return this.currentExecution.scheduleForExecution(
			slotProvider,
			queued,
			locationPreferenceConstraint,
			allPreviousExecutionGraphAllocationIds);
	}

Execution.scheduleForExecution

分配Slot给Execution

final CompletableFuture allocationFuture = allocateAndAssignSlotForExecution(
				slotProvider,
				queued,
				locationPreferenceConstraint,
				allPreviousExecutionGraphAllocationIds,
				allocationTimeout);

调用Execution.deploy()函数，部署Execution到分给的slot中

/**
	 * NOTE: This method only throws exceptions if it is in an illegal state to be scheduled, or if the tasks needs
	 *       to be scheduled immediately and no resource is available. If the task is accepted by the schedule, any
	 *       error sets the vertex state to failed and triggers the recovery logic.
	 *
	 * @param slotProvider The slot provider to use to allocate slot for this execution attempt.
	 * @param queued Flag to indicate whether the scheduler may queue this task if it cannot
	 *               immediately deploy it.
	 * @param locationPreferenceConstraint constraint for the location preferences
	 * @param allPreviousExecutionGraphAllocationIds set with all previous allocation ids in the job graph.
	 *                                                 Can be empty if the allocation ids are not required for scheduling.
	 * @return Future which is completed once the Execution has been deployed
	 */
	public CompletableFuture scheduleForExecution(
			SlotProvider slotProvider,
			boolean queued,
			LocationPreferenceConstraint locationPreferenceConstraint,
			@Nonnull Set allPreviousExecutionGraphAllocationIds) {
		final Time allocationTimeout = vertex.getExecutionGraph().getAllocationTimeout();
		try {
			final CompletableFuture allocationFuture = allocateAndAssignSlotForExecution(
				slotProvider,
				queued,
				locationPreferenceConstraint,
				allPreviousExecutionGraphAllocationIds,
				allocationTimeout);

			// IMPORTANT: We have to use the synchronous handle operation (direct executor) here so
			// that we directly deploy the tasks if the slot allocation future is completed. This is
			// necessary for immediate deployment.
			final CompletableFuture deploymentFuture = allocationFuture.thenAccept(
				(FutureConsumerWithException) value -> deploy());

			deploymentFuture.whenComplete(
				(Void ignored, Throwable failure) -> {
					if (failure != null) {
						final Throwable stripCompletionException = ExceptionUtils.stripCompletionException(failure);
						final Throwable schedulingFailureCause;

						if (stripCompletionException instanceof TimeoutException) {
							schedulingFailureCause = new NoResourceAvailableException(
								"Could not allocate enough slots within timeout of " + allocationTimeout + " to run the job. " +
									"Please make sure that the cluster has enough resources.");
						} else {
							schedulingFailureCause = stripCompletionException;
						}

						markFailed(schedulingFailureCause);
					}
				});

			// if tasks have to scheduled immediately check that the task has been deployed
			if (!queued && !deploymentFuture.isDone()) {
				deploymentFuture.completeExceptionally(new IllegalArgumentException("The slot allocation future has not been completed yet."));
			}

			return deploymentFuture;
		} catch (IllegalExecutionStateException e) {
			return FutureUtils.completedExceptionally(e);
		}
	}

Execution.deploy()

Execution 状态从SCHDULED到DEPLOYING

构建部署对象

	final TaskDeploymentDescriptor deployment = vertex.createDeploymentDescriptor(
			attemptId,
			slot,
			taskRestore,
			attemptNumber);

调用TaskExecutor.submitTask

	/**
	 * Deploys the execution to the previously assigned resource.
	 *
	 * @throws JobException if the execution cannot be deployed to the assigned resource
	 */
	public void deploy() throws JobException {
		final LogicalSlot slot  = assignedResource;

		checkNotNull(slot, "In order to deploy the execution we first have to assign a resource via tryAssignResource.");

		// Check if the TaskManager died in the meantime
		// This only speeds up the response to TaskManagers failing concurrently to deployments.
		// The more general check is the rpcTimeout of the deployment call
		if (!slot.isAlive()) {
			throw new JobException("Target slot (TaskManager) for deployment is no longer alive.");
		}

		// make sure exactly one deployment call happens from the correct state
		// note: the transition from CREATED to DEPLOYING is for testing purposes only
		ExecutionState previous = this.state;
		if (previous == SCHEDULED || previous == CREATED) {
			if (!transitionState(previous, DEPLOYING)) {
				// race condition, someone else beat us to the deploying call.
				// this should actually not happen and indicates a race somewhere else
				throw new IllegalStateException("Cannot deploy task: Concurrent deployment call race.");
			}
		}
		else {
			// vertex may have been cancelled, or it was already scheduled
			throw new IllegalStateException("The vertex must be in CREATED or SCHEDULED state to be deployed. Found state " + previous);
		}

		if (this != slot.getPayload()) {
			throw new IllegalStateException(
				String.format("The execution %s has not been assigned to the assigned slot.", this));
		}

		try {

			// race double check, did we fail/cancel and do we need to release the slot?
			if (this.state != DEPLOYING) {
				slot.releaseSlot(new FlinkException("Actual state of execution " + this + " (" + state + ") does not match expected state DEPLOYING."));
				return;
			}

			if (LOG.isInfoEnabled()) {
				LOG.info(String.format("Deploying %s (attempt #%d) to %s", vertex.getTaskNameWithSubtaskIndex(),
						attemptNumber, getAssignedResourceLocation().getHostname()));
			}

			final TaskDeploymentDescriptor deployment = vertex.createDeploymentDescriptor(
				attemptId,
				slot,
				taskRestore,
				attemptNumber);

			// null taskRestore to let it be GC'ed
			taskRestore = null;

			final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();

			final CompletableFuture submitResultFuture = taskManagerGateway.submitTask(deployment, rpcTimeout);

			submitResultFuture.whenCompleteAsync(
				(ack, failure) -> {
					// only respond to the failure case
					if (failure != null) {
						if (failure instanceof TimeoutException) {
							String taskname = vertex.getTaskNameWithSubtaskIndex() + " (" + attemptId + ')';

							markFailed(new Exception(
								"Cannot deploy task " + taskname + " - TaskManager (" + getAssignedResourceLocation()
									+ ") not responding after a rpcTimeout of " + rpcTimeout, failure));
						} else {
							markFailed(failure);
						}
					}
				},
				executor);
		}
		catch (Throwable t) {
			markFailed(t);
			ExceptionUtils.rethrow(t);
		}
	}

ExecutionState

Execution状态 CREATED(已创建) -> SCHEDULED(已调度) -> DEPLOYING(已部署) -> RUNNING(运行中) -> FINISHED(已完成) 等



package org.apache.flink.runtime.execution;

/**
 * An enumeration of all states that a task can be in during its execution.
 * Tasks usually start in the state {@code CREATED} and switch states according to
 * this diagram:
 * {@code
 *
 *     CREATED  -> SCHEDULED -> DEPLOYING -> RUNNING -> FINISHED
 *        |            |            |          |
 *        |            |            |   +------+
 *        |            |            V   V
 *        |            |         CANCELLING -----+----> CANCELED
 *        |            |                         |
 *        |            +-------------------------+
 *        |
 *        |                                   ... -> FAILED
 *        V
 *    RECONCILING  -> RUNNING | FINISHED | CANCELED | FAILED
 *
 * }
 *
 * It is possible to enter the {@code RECONCILING} state from {@code CREATED}
 * state if job manager fail over, and the {@code RECONCILING} state can switch into
 * any existing task state.
 *
 * 
It is possible to enter the {@code FAILED} state from any other state.
 *
 * The states {@code FINISHED}, {@code CANCELED}, and {@code FAILED} are
 * considered terminal states.
 */
public enum ExecutionState {

	CREATED,
	
	SCHEDULED,
	
	DEPLOYING,
	
	RUNNING,

	/**
	 * This state marks "successfully completed". It can only be reached when a
	 * program reaches the "end of its input". The "end of input" can be reached
	 * when consuming a bounded input (fix set of files, bounded query, etc) or
	 * when stopping a program (not cancelling!) which make the input look like
	 * it reached its end at a specific point.
	 */
	FINISHED,
	
	CANCELING,
	
	CANCELED,
	
	FAILED,

	RECONCILING;

	public boolean isTerminal() {
		return this == FINISHED || this == CANCELED || this == FAILED;
	}
}

TaskExecutor.submitTask

构建Task，Task 默认的状态为CREATED

Task task = new Task(
			jobInformation,
			taskInformation,
			tdd.getExecutionAttemptId(),
			tdd.getAllocationId(),
			tdd.getSubtaskIndex(),
			tdd.getAttemptNumber(),
			tdd.getProducedPartitions(),
			tdd.getInputGates(),
			tdd.getTargetSlotNumber(),
			taskExecutorServices.getMemoryManager(),
			taskExecutorServices.getIOManager(),
			taskExecutorServices.getNetworkEnvironment(),
			taskExecutorServices.getBroadcastVariableManager(),
			taskStateManager,
			taskManagerActions,
			inputSplitProvider,
			checkpointResponder,
			blobCacheService,
			libraryCache,
			fileCache,
			taskManagerConfiguration,
			taskMetricGroup,
			resultPartitionConsumableNotifier,
			partitionStateChecker,
			getRpcService().getExecutor());

调用task.startTaskThread(); //调用task线程的run()函数

// ----------------------------------------------------------------------
	// Task lifecycle RPCs
	// ----------------------------------------------------------------------

	@Override
	public CompletableFuture submitTask(
			TaskDeploymentDescriptor tdd,
			JobMasterId jobMasterId,
			Time timeout) {

		try {
			final JobID jobId = tdd.getJobId();
			final JobManagerConnection jobManagerConnection = jobManagerTable.get(jobId);

			if (jobManagerConnection == null) {
				final String message = "Could not submit task because there is no JobManager " +
					"associated for the job " + jobId + '.';

				log.debug(message);
				throw new TaskSubmissionException(message);
			}

			if (!Objects.equals(jobManagerConnection.getJobMasterId(), jobMasterId)) {
				final String message = "Rejecting the task submission because the job manager leader id " +
					jobMasterId + " does not match the expected job manager leader id " +
					jobManagerConnection.getJobMasterId() + '.';

				log.debug(message);
				throw new TaskSubmissionException(message);
			}

			if (!taskSlotTable.tryMarkSlotActive(jobId, tdd.getAllocationId())) {
				final String message = "No task slot allocated for job ID " + jobId +
					" and allocation ID " + tdd.getAllocationId() + '.';
				log.debug(message);
				throw new TaskSubmissionException(message);
			}

			// re-integrate offloaded data:
			try {
				tdd.loadBigData(blobCacheService.getPermanentBlobService());
			} catch (IOException | ClassNotFoundException e) {
				throw new TaskSubmissionException("Could not re-integrate offloaded TaskDeploymentDescriptor data.", e);
			}

			// deserialize the pre-serialized information
			final JobInformation jobInformation;
			final TaskInformation taskInformation;
			try {
				jobInformation = tdd.getSerializedJobInformation().deserializeValue(getClass().getClassLoader());
				taskInformation = tdd.getSerializedTaskInformation().deserializeValue(getClass().getClassLoader());
			} catch (IOException | ClassNotFoundException e) {
				throw new TaskSubmissionException("Could not deserialize the job or task information.", e);
			}

			if (!jobId.equals(jobInformation.getJobId())) {
				throw new TaskSubmissionException(
					"Inconsistent job ID information inside TaskDeploymentDescriptor (" +
						tdd.getJobId() + " vs. " + jobInformation.getJobId() + ")");
			}

			TaskMetricGroup taskMetricGroup = taskManagerMetricGroup.addTaskForJob(
				jobInformation.getJobId(),
				jobInformation.getJobName(),
				taskInformation.getJobVertexId(),
				tdd.getExecutionAttemptId(),
				taskInformation.getTaskName(),
				tdd.getSubtaskIndex(),
				tdd.getAttemptNumber());

			InputSplitProvider inputSplitProvider = new RpcInputSplitProvider(
				jobManagerConnection.getJobManagerGateway(),
				taskInformation.getJobVertexId(),
				tdd.getExecutionAttemptId(),
				taskManagerConfiguration.getTimeout());

			TaskManagerActions taskManagerActions = jobManagerConnection.getTaskManagerActions();
			CheckpointResponder checkpointResponder = jobManagerConnection.getCheckpointResponder();

			LibraryCacheManager libraryCache = jobManagerConnection.getLibraryCacheManager();
			ResultPartitionConsumableNotifier resultPartitionConsumableNotifier = jobManagerConnection.getResultPartitionConsumableNotifier();
			PartitionProducerStateChecker partitionStateChecker = jobManagerConnection.getPartitionStateChecker();

			final TaskLocalStateStore localStateStore = localStateStoresManager.localStateStoreForSubtask(
				jobId,
				tdd.getAllocationId(),
				taskInformation.getJobVertexId(),
				tdd.getSubtaskIndex());

			final JobManagerTaskRestore taskRestore = tdd.getTaskRestore();

			final TaskStateManager taskStateManager = new TaskStateManagerImpl(
				jobId,
				tdd.getExecutionAttemptId(),
				localStateStore,
				taskRestore,
				checkpointResponder);

			Task task = new Task(
				jobInformation,
				taskInformation,
				tdd.getExecutionAttemptId(),
				tdd.getAllocationId(),
				tdd.getSubtaskIndex(),
				tdd.getAttemptNumber(),
				tdd.getProducedPartitions(),
				tdd.getInputGates(),
				tdd.getTargetSlotNumber(),
				taskExecutorServices.getMemoryManager(),
				taskExecutorServices.getIOManager(),
				taskExecutorServices.getNetworkEnvironment(),
				taskExecutorServices.getBroadcastVariableManager(),
				taskStateManager,
				taskManagerActions,
				inputSplitProvider,
				checkpointResponder,
				blobCacheService,
				libraryCache,
				fileCache,
				taskManagerConfiguration,
				taskMetricGroup,
				resultPartitionConsumableNotifier,
				partitionStateChecker,
				getRpcService().getExecutor());

			log.info("Received task {}.", task.getTaskInfo().getTaskNameWithSubtasks());

			boolean taskAdded;

			try {
				taskAdded = taskSlotTable.addTask(task);
			} catch (SlotNotFoundException | SlotNotActiveException e) {
				throw new TaskSubmissionException("Could not submit task.", e);
			}

			if (taskAdded) {
				task.startTaskThread();

				return CompletableFuture.completedFuture(Acknowledge.get());
			} else {
				final String message = "TaskManager already contains a task for id " +
					task.getExecutionId() + '.';

				log.debug(message);
				throw new TaskSubmissionException(message);
			}
		} catch (TaskSubmissionException e) {
			return FutureUtils.completedExceptionally(e);
		}
	}

Task.run

这才开始处理Task,任务的状态用的是ExecutionState中的状态值
更新Task状态从CREATED 到 DEPLOYING

加载这个Task的jar文件

// first of all, get a user-code classloader
		// this may involve downloading the job's JAR files and/or classes
		LOG.info("Loading JAR files for task {}.", this);

		userCodeClassLoader = createUserCodeClassloader();
		final ExecutionConfig executionConfig = serializedExecutionConfig.deserializeValue(userCodeClassLoader);

构建任务运行环境

Environment env = new RuntimeEnvironment(
			jobId,
			vertexId,
			executionId,
			executionConfig,
			taskInfo,
			jobConfiguration,
			taskConfiguration,
			userCodeClassLoader,
			memoryManager,
			ioManager,
			broadcastVariableManager,
			taskStateManager,
			accumulatorRegistry,
			kvStateRegistry,
			inputSplitProvider,
			distributedCacheEntries,
			producedPartitions,
			inputGates,
			network.getTaskEventDispatcher(),
			checkpointResponder,
			taskManagerConfig,
			metrics,
			this);

更新当前任务状态从 DEPLOYING 到 RUNNING

transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)

调用DataSourceTask.invoke(),会根据具体的任务，调用具体任务的函数

/**
	 * The core work method that bootstraps the task and executes its code.
	 */
	@Override
	public void run() {

		// ----------------------------
		//  Initial State transition
		// ----------------------------
		while (true) {
			ExecutionState current = this.executionState;
			if (current == ExecutionState.CREATED) {
				if (transitionState(ExecutionState.CREATED, ExecutionState.DEPLOYING)) {
					// success, we can start our work
					break;
				}
			}
			else if (current == ExecutionState.FAILED) {
				// we were immediately failed. tell the TaskManager that we reached our final state
				notifyFinalState();
				if (metrics != null) {
					metrics.close();
				}
				return;
			}
			else if (current == ExecutionState.CANCELING) {
				if (transitionState(ExecutionState.CANCELING, ExecutionState.CANCELED)) {
					// we were immediately canceled. tell the TaskManager that we reached our final state
					notifyFinalState();
					if (metrics != null) {
						metrics.close();
					}
					return;
				}
			}
			else {
				if (metrics != null) {
					metrics.close();
				}
				throw new IllegalStateException("Invalid state for beginning of operation of task " + this + '.');
			}
		}

		// all resource acquisitions and registrations from here on
		// need to be undone in the end
		Map> distributedCacheEntries = new HashMap<>();
		AbstractInvokable invokable = null;

		try {
			// ----------------------------
			//  Task Bootstrap - We periodically
			//  check for canceling as a shortcut
			// ----------------------------

			// activate safety net for task thread
			LOG.info("Creating FileSystem stream leak safety net for task {}", this);
			FileSystemSafetyNet.initializeSafetyNetForThread();

			blobService.getPermanentBlobService().registerJob(jobId);

			// first of all, get a user-code classloader
			// this may involve downloading the job's JAR files and/or classes
			LOG.info("Loading JAR files for task {}.", this);

			userCodeClassLoader = createUserCodeClassloader();
			final ExecutionConfig executionConfig = serializedExecutionConfig.deserializeValue(userCodeClassLoader);

			if (executionConfig.getTaskCancellationInterval() >= 0) {
				// override task cancellation interval from Flink config if set in ExecutionConfig
				taskCancellationInterval = executionConfig.getTaskCancellationInterval();
			}

			if (executionConfig.getTaskCancellationTimeout() >= 0) {
				// override task cancellation timeout from Flink config if set in ExecutionConfig
				taskCancellationTimeout = executionConfig.getTaskCancellationTimeout();
			}

			if (isCanceledOrFailed()) {
				throw new CancelTaskException();
			}

			// ----------------------------------------------------------------
			// register the task with the network stack
			// this operation may fail if the system does not have enough
			// memory to run the necessary data exchanges
			// the registration must also strictly be undone
			// ----------------------------------------------------------------

			LOG.info("Registering task at network: {}.", this);

			network.registerTask(this);

			// add metrics for buffers
			this.metrics.getIOMetricGroup().initializeBufferMetrics(this);

			// register detailed network metrics, if configured
			if (taskManagerConfig.getConfiguration().getBoolean(TaskManagerOptions.NETWORK_DETAILED_METRICS)) {
				// similar to MetricUtils.instantiateNetworkMetrics() but inside this IOMetricGroup
				MetricGroup networkGroup = this.metrics.getIOMetricGroup().addGroup("Network");
				MetricGroup outputGroup = networkGroup.addGroup("Output");
				MetricGroup inputGroup = networkGroup.addGroup("Input");

				// output metrics
				for (int i = 0; i < producedPartitions.length; i++) {
					ResultPartitionMetrics.registerQueueLengthMetrics(
						outputGroup.addGroup(i), producedPartitions[i]);
				}

				for (int i = 0; i < inputGates.length; i++) {
					InputGateMetrics.registerQueueLengthMetrics(
						inputGroup.addGroup(i), inputGates[i]);
				}
			}

			// next, kick off the background copying of files for the distributed cache
			try {
				for (Map.Entry entry :
						DistributedCache.readFileInfoFromConfig(jobConfiguration)) {
					LOG.info("Obtaining local cache file for '{}'.", entry.getKey());
					Future cp = fileCache.createTmpFile(entry.getKey(), entry.getValue(), jobId, executionId);
					distributedCacheEntries.put(entry.getKey(), cp);
				}
			}
			catch (Exception e) {
				throw new Exception(
					String.format("Exception while adding files to distributed cache of task %s (%s).", taskNameWithSubtask, executionId), e);
			}

			if (isCanceledOrFailed()) {
				throw new CancelTaskException();
			}

			// ----------------------------------------------------------------
			//  call the user code initialization methods
			// ----------------------------------------------------------------

			TaskKvStateRegistry kvStateRegistry = network.createKvStateTaskRegistry(jobId, getJobVertexId());

			Environment env = new RuntimeEnvironment(
				jobId,
				vertexId,
				executionId,
				executionConfig,
				taskInfo,
				jobConfiguration,
				taskConfiguration,
				userCodeClassLoader,
				memoryManager,
				ioManager,
				broadcastVariableManager,
				taskStateManager,
				accumulatorRegistry,
				kvStateRegistry,
				inputSplitProvider,
				distributedCacheEntries,
				producedPartitions,
				inputGates,
				network.getTaskEventDispatcher(),
				checkpointResponder,
				taskManagerConfig,
				metrics,
				this);

			// now load and instantiate the task's invokable code
			invokable = loadAndInstantiateInvokable(userCodeClassLoader, nameOfInvokableClass, env);

			// ----------------------------------------------------------------
			//  actual task core work
			// ----------------------------------------------------------------

			// we must make strictly sure that the invokable is accessible to the cancel() call
			// by the time we switched to running.
			this.invokable = invokable;

			// switch to the RUNNING state, if that fails, we have been canceled/failed in the meantime
			if (!transitionState(ExecutionState.DEPLOYING, ExecutionState.RUNNING)) {
				throw new CancelTaskException();
			}

			// notify everyone that we switched to running
			taskManagerActions.updateTaskExecutionState(new TaskExecutionState(jobId, executionId, ExecutionState.RUNNING));

			// make sure the user code classloader is accessible thread-locally
			executingThread.setContextClassLoader(userCodeClassLoader);

			// run the invokable
			invokable.invoke();

			// make sure, we enter the catch block if the task leaves the invoke() method due
			// to the fact that it has been canceled
			if (isCanceledOrFailed()) {
				throw new CancelTaskException();
			}

			// ----------------------------------------------------------------
			//  finalization of a successful execution
			// ----------------------------------------------------------------

			// finish the produced partitions. if this fails, we consider the execution failed.
			for (ResultPartition partition : producedPartitions) {
				if (partition != null) {
					partition.finish();
				}
			}

			// try to mark the task as finished
			// if that fails, the task was canceled/failed in the meantime
			if (!transitionState(ExecutionState.RUNNING, ExecutionState.FINISHED)) {
				throw new CancelTaskException();
			}
		}
		catch (Throwable t) {

			// unwrap wrapped exceptions to make stack traces more compact
			if (t instanceof WrappingRuntimeException) {
				t = ((WrappingRuntimeException) t).unwrap();
			}

			// ----------------------------------------------------------------
			// the execution failed. either the invokable code properly failed, or
			// an exception was thrown as a side effect of cancelling
			// ----------------------------------------------------------------

			try {
				// check if the exception is unrecoverable
				if (ExceptionUtils.isJvmFatalError(t) ||
						(t instanceof OutOfMemoryError && taskManagerConfig.shouldExitJvmOnOutOfMemoryError())) {

					// terminate the JVM immediately
					// don't attempt a clean shutdown, because we cannot expect the clean shutdown to complete
					try {
						LOG.error("Encountered fatal error {} - terminating the JVM", t.getClass().getName(), t);
					} finally {
						Runtime.getRuntime().halt(-1);
					}
				}

				// transition into our final state. we should be either in DEPLOYING, RUNNING, CANCELING, or FAILED
				// loop for multiple retries during concurrent state changes via calls to cancel() or
				// to failExternally()
				while (true) {
					ExecutionState current = this.executionState;

					if (current == ExecutionState.RUNNING || current == ExecutionState.DEPLOYING) {
						if (t instanceof CancelTaskException) {
							if (transitionState(current, ExecutionState.CANCELED)) {
								cancelInvokable(invokable);
								break;
							}
						}
						else {
							if (transitionState(current, ExecutionState.FAILED, t)) {
								// proper failure of the task. record the exception as the root cause
								failureCause = t;
								cancelInvokable(invokable);

								break;
							}
						}
					}
					else if (current == ExecutionState.CANCELING) {
						if (transitionState(current, ExecutionState.CANCELED)) {
							break;
						}
					}
					else if (current == ExecutionState.FAILED) {
						// in state failed already, no transition necessary any more
						break;
					}
					// unexpected state, go to failed
					else if (transitionState(current, ExecutionState.FAILED, t)) {
						LOG.error("Unexpected state in task {} ({}) during an exception: {}.", taskNameWithSubtask, executionId, current);
						break;
					}
					// else fall through the loop and
				}
			}
			catch (Throwable tt) {
				String message = String.format("FATAL - exception in exception handler of task %s (%s).", taskNameWithSubtask, executionId);
				LOG.error(message, tt);
				notifyFatalError(message, tt);
			}
		}
		finally {
			try {
				LOG.info("Freeing task resources for {} ({}).", taskNameWithSubtask, executionId);

				// clear the reference to the invokable. this helps guard against holding references
				// to the invokable and its structures in cases where this Task object is still referenced
				this.invokable = null;

				// stop the async dispatcher.
				// copy dispatcher reference to stack, against concurrent release
				ExecutorService dispatcher = this.asyncCallDispatcher;
				if (dispatcher != null && !dispatcher.isShutdown()) {
					dispatcher.shutdownNow();
				}

				// free the network resources
				network.unregisterTask(this);

				// free memory resources
				if (invokable != null) {
					memoryManager.releaseAll(invokable);
				}

				// remove all of the tasks library resources
				libraryCache.unregisterTask(jobId, executionId);
				fileCache.releaseJob(jobId, executionId);
				blobService.getPermanentBlobService().releaseJob(jobId);

				// close and de-activate safety net for task thread
				LOG.info("Ensuring all FileSystem streams are closed for task {}", this);
				FileSystemSafetyNet.closeSafetyNetAndGuardedResourcesForThread();

				notifyFinalState();
			}
			catch (Throwable t) {
				// an error in the resource cleanup is fatal
				String message = String.format("FATAL - exception in resource cleanup of task %s (%s).", taskNameWithSubtask, executionId);
				LOG.error(message, t);
				notifyFatalError(message, t);
			}

			// un-register the metrics at the end so that the task may already be
			// counted as finished when this happens
			// errors here will only be logged
			try {
				metrics.close();
			}
			catch (Throwable t) {
				LOG.error("Error during metrics de-registration of task {} ({}).", taskNameWithSubtask, executionId, t);
			}
		}
	}

DataSourceTask.invoke()

Transformation chain

// start all chained tasks
		BatchTask.openChainedTasks(this.chainedTasks, this);

this.chainedTasks = {ArrayList@7851}  size = 3

0 = {ChainedFlatMapDriver@7850}
1 = {ChainedMapDriver@7988}
2 = {SynchronousChainedCombineDriver@7989}
```

得到输入分片,读取文件的块位置信息

// get input splits to read
			final Iterator splitIterator = getInputSplits();

得到文件位置信息

file:/opt/n_001_workspaces/bigdata/flink/flink-maven-scala-2/src/main/resources/data/line.txt:0+6

循环读取分片信息,读到的数据是按行的

						while (!this.taskCanceled && !format.reachedEnd()) {
							OT returned;
							if ((returned = format.nextRecord(serializer.createInstance())) != null) {
								output.collect(returned);
							}
						}

/**
	 * Create an Invokable task and set its environment.
	 *
	 * @param environment The environment assigned to this invokable.
	 */
	public DataSourceTask(Environment environment) {
		super(environment);
	}

	@Override
	public void invoke() throws Exception {
		// --------------------------------------------------------------------
		// Initialize
		// --------------------------------------------------------------------
		initInputFormat();

		LOG.debug(getLogString("Start registering input and output"));

		try {
			initOutputs(getUserCodeClassLoader());
		} catch (Exception ex) {
			throw new RuntimeException("The initialization of the DataSource's outputs caused an error: " +
					ex.getMessage(), ex);
		}

		LOG.debug(getLogString("Finished registering input and output"));

		// --------------------------------------------------------------------
		// Invoke
		// --------------------------------------------------------------------
		LOG.debug(getLogString("Starting data source operator"));

		RuntimeContext ctx = createRuntimeContext();

		final Counter numRecordsOut;
		{
			Counter tmpNumRecordsOut;
			try {
				OperatorIOMetricGroup ioMetricGroup = ((OperatorMetricGroup) ctx.getMetricGroup()).getIOMetricGroup();
				ioMetricGroup.reuseInputMetricsForTask();
				if (this.config.getNumberOfChainedStubs() == 0) {
					ioMetricGroup.reuseOutputMetricsForTask();
				}
				tmpNumRecordsOut = ioMetricGroup.getNumRecordsOutCounter();
			} catch (Exception e) {
				LOG.warn("An exception occurred during the metrics setup.", e);
				tmpNumRecordsOut = new SimpleCounter();
			}
			numRecordsOut = tmpNumRecordsOut;
		}
		
		Counter completedSplitsCounter = ctx.getMetricGroup().counter("numSplitsProcessed");

		if (RichInputFormat.class.isAssignableFrom(this.format.getClass())) {
			((RichInputFormat) this.format).setRuntimeContext(ctx);
			LOG.debug(getLogString("Rich Source detected. Initializing runtime context."));
			((RichInputFormat) this.format).openInputFormat();
			LOG.debug(getLogString("Rich Source detected. Opening the InputFormat."));
		}

		ExecutionConfig executionConfig = getExecutionConfig();

		boolean objectReuseEnabled = executionConfig.isObjectReuseEnabled();

		LOG.debug("DataSourceTask object reuse: " + (objectReuseEnabled ? "ENABLED" : "DISABLED") + ".");
		
		final TypeSerializer serializer = this.serializerFactory.getSerializer();
		
		try {
			// start all chained tasks
			BatchTask.openChainedTasks(this.chainedTasks, this);
			
			// get input splits to read
			final Iterator splitIterator = getInputSplits();
			
			// for each assigned input split
			while (!this.taskCanceled && splitIterator.hasNext())
			{
				// get start and end
				final InputSplit split = splitIterator.next();

				LOG.debug(getLogString("Opening input split " + split.toString()));
				
				final InputFormat format = this.format;
			
				// open input format
				format.open(split);
	
				LOG.debug(getLogString("Starting to read input from split " + split.toString()));
				
				try {
					final Collector output = new CountingCollector<>(this.output, numRecordsOut);

					if (objectReuseEnabled) {
						OT reuse = serializer.createInstance();

						// as long as there is data to read
						while (!this.taskCanceled && !format.reachedEnd()) {

							OT returned;
							if ((returned = format.nextRecord(reuse)) != null) {
								output.collect(returned);
							}
						}
					} else {
						// as long as there is data to read
						while (!this.taskCanceled && !format.reachedEnd()) {
							OT returned;
							if ((returned = format.nextRecord(serializer.createInstance())) != null) {
								output.collect(returned);
							}
						}
					}

					if (LOG.isDebugEnabled() && !this.taskCanceled) {
						LOG.debug(getLogString("Closing input split " + split.toString()));
					}
				} finally {
					// close. We close here such that a regular close throwing an exception marks a task as failed.
					format.close();
				}
				completedSplitsCounter.inc();
			} // end for all input splits

			// close the collector. if it is a chaining task collector, it will close its chained tasks
			this.output.close();

			// close all chained tasks letting them report failure
			BatchTask.closeChainedTasks(this.chainedTasks, this);

		}
		catch (Exception ex) {
			// close the input, but do not report any exceptions, since we already have another root cause
			try {
				this.format.close();
			} catch (Throwable ignored) {}

			BatchTask.cancelChainedTasks(this.chainedTasks);

			ex = ExceptionInChainedStubException.exceptionUnwrap(ex);

			if (ex instanceof CancelTaskException) {
				// forward canceling exception
				throw ex;
			}
			else if (!this.taskCanceled) {
				// drop exception, if the task was canceled
				BatchTask.logAndThrowException(ex, this);
			}
		} finally {
			BatchTask.clearWriters(eventualOutputs);
			// --------------------------------------------------------------------
			// Closing
			// --------------------------------------------------------------------
			if (this.format != null && RichInputFormat.class.isAssignableFrom(this.format.getClass())) {
				((RichInputFormat) this.format).closeInputFormat();
				LOG.debug(getLogString("Rich Source detected. Closing the InputFormat."));
			}
		}

		if (!this.taskCanceled) {
			LOG.debug(getLogString("Finished data source operator"));
		}
		else {
			LOG.debug(getLogString("Data source operator cancelled"));
		}
	}

DelimitedInputFormat

DelimitedInputFormat.nextRecord

调用 DelimitedInputFormat.readLine()

	public OT nextRecord(OT record) throws IOException {
		if (readLine()) {
			return readRecord(record, this.currBuffer, this.currOffset, this.currLen);
		} else {
			this.end = true;
			return null;
		}
	}

DelimitedInputFormat.readLine()

具体读取文件数据的方法，怎么读文件数据的逻辑，在这里


protected final boolean readLine() throws IOException {
		if (this.stream == null || this.overLimit) {
			return false;
		}

		int countInWrapBuffer = 0;

		// position of matching positions in the delimiter byte array
		int delimPos = 0;

		while (true) {
			if (this.readPos >= this.limit) {
				// readBuffer is completely consumed. Fill it again but keep partially read delimiter bytes.
				if (!fillBuffer(delimPos)) {
					int countInReadBuffer = delimPos;
					if (countInWrapBuffer + countInReadBuffer > 0) {
						// we have bytes left to emit
						if (countInReadBuffer > 0) {
							// we have bytes left in the readBuffer. Move them into the wrapBuffer
							if (this.wrapBuffer.length - countInWrapBuffer < countInReadBuffer) {
								// reallocate
								byte[] tmp = new byte[countInWrapBuffer + countInReadBuffer];
								System.arraycopy(this.wrapBuffer, 0, tmp, 0, countInWrapBuffer);
								this.wrapBuffer = tmp;
							}

							// copy readBuffer bytes to wrapBuffer
							System.arraycopy(this.readBuffer, 0, this.wrapBuffer, countInWrapBuffer, countInReadBuffer);
							countInWrapBuffer += countInReadBuffer;
						}

						this.offset += countInWrapBuffer;
						setResult(this.wrapBuffer, 0, countInWrapBuffer);
						return true;
					} else {
						return false;
					}
				}
			}

			int startPos = this.readPos - delimPos;
			int count;

			// Search for next occurrence of delimiter in read buffer.
			while (this.readPos < this.limit && delimPos < this.delimiter.length) {
				if ((this.readBuffer[this.readPos]) == this.delimiter[delimPos]) {
					// Found the expected delimiter character. Continue looking for the next character of delimiter.
					delimPos++;
				} else {
					// Delimiter does not match.
					// We have to reset the read position to the character after the first matching character
					//   and search for the whole delimiter again.
					readPos -= delimPos;
					delimPos = 0;
				}
				readPos++;
			}

			// check why we dropped out
			if (delimPos == this.delimiter.length) {
				// we found a delimiter
				int readBufferBytesRead = this.readPos - startPos;
				this.offset += countInWrapBuffer + readBufferBytesRead;
				count = readBufferBytesRead - this.delimiter.length;

				// copy to byte array
				if (countInWrapBuffer > 0) {
					// check wrap buffer size
					if (this.wrapBuffer.length < countInWrapBuffer + count) {
						final byte[] nb = new byte[countInWrapBuffer + count];
						System.arraycopy(this.wrapBuffer, 0, nb, 0, countInWrapBuffer);
						this.wrapBuffer = nb;
					}
					if (count >= 0) {
						System.arraycopy(this.readBuffer, 0, this.wrapBuffer, countInWrapBuffer, count);
					}
					setResult(this.wrapBuffer, 0, countInWrapBuffer + count);
					return true;
				} else {
					setResult(this.readBuffer, startPos, count);
					return true;
				}
			} else {
				// we reached the end of the readBuffer
				count = this.limit - startPos;
				
				// check against the maximum record length
				if (((long) countInWrapBuffer) + count > this.lineLengthLimit) {
					throw new IOException("The record length exceeded the maximum record length (" + 
							this.lineLengthLimit + ").");
				}

				// Compute number of bytes to move to wrapBuffer
				// Chars of partially read delimiter must remain in the readBuffer. We might need to go back.
				int bytesToMove = count - delimPos;
				// ensure wrapBuffer is large enough
				if (this.wrapBuffer.length - countInWrapBuffer < bytesToMove) {
					// reallocate
					byte[] tmp = new byte[Math.max(this.wrapBuffer.length * 2, countInWrapBuffer + bytesToMove)];
					System.arraycopy(this.wrapBuffer, 0, tmp, 0, countInWrapBuffer);
					this.wrapBuffer = tmp;
				}

				// copy readBuffer to wrapBuffer (except delimiter chars)
				System.arraycopy(this.readBuffer, startPos, this.wrapBuffer, countInWrapBuffer, bytesToMove);
				countInWrapBuffer += bytesToMove;
				// move delimiter chars to the beginning of the readBuffer
				System.arraycopy(this.readBuffer, this.readPos - delimPos, this.readBuffer, 0, delimPos);

			}
		}
	}

你可能感兴趣的:(Flink)

nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
全面指南：用户行为从前端数据采集到实时处理的最佳实践数字沉思营销流量运营系统架构前端内容运营大数据
引言在当今的数据驱动世界，实时数据采集和处理已经成为企业做出及时决策的重要手段。本文将详细介绍如何通过前端JavaScript代码采集用户行为数据、利用API和Kafka进行数据传输、通过Flink实时处理数据的完整流程。无论你是想提升产品体验还是做用户行为分析，这篇文章都将为你提供全面的解决方案。设计一个通用的ClickHouse表来存储用户事件时，需要考虑多种因素，包括事件类型、时间戳、用户信
详解 Flink 的常见部署方式文刀小桂 Flink flink 大数据
一、常见部署模式分类1.按是否依赖外部资源调度1.1Standalone模式独立模式(Standalone)是独立运行的，不依赖任何外部的资源管理平台，只需要运行所有Flink组件服务1.2Yarn模式Yarn模式是指客户端把Flink应用提交给Yarn的ResourceManager,Yarn的ResourceManager会在Yarn的NodeManager上创建容器。在这些容器上，Flink
大数据之flink与hive 星辰_mya 大数据 flink hive
其实吧我不太想写flink，因为线上经验确实不多，这也是我需要补的地方，没有条件创造条件，先来一篇吧flink：高性能低延迟流批一体的分布式计算框架基于事件时间对实时数据精准处理快速响应支持批处理，高效离线分析和数据挖掘数据仓库的引擎丰富数据源/接收器，集成多种数据存储格式和源，比较常见就是咱们今天的主题hive了checkpoint恢复机制，故障恢复快速恢复计算任务分布式弹性扩展，据业务灵活增加
Java中的大数据处理框架对比分析省赚客app开发者 java 开发语言
Java中的大数据处理框架对比分析大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！今天，我们将深入探讨Java中常用的大数据处理框架，并对它们进行对比分析。大数据处理框架是现代数据驱动应用的核心，它们帮助企业处理和分析海量数据，以提取有价值的信息。本文将重点介绍ApacheHadoop、ApacheSpark、ApacheFlink和ApacheStorm这四种流行的
一文搞懂 Flink Task 数据交互之数据写源码 mn_kw flink 交互 java
一文搞懂FlinkTask数据交互之数据写源码1.RecordWriterOutput2.RecordWriter3.数据分区器ChannelSelector4.数据输出模型ResultPartition5.子模型ResultSubpartition6.本地buffer池LocalBufferPool7.获取buffer8.将buffer添加到ResultSubpartitionFlink重要源码
概率图模型（PGM）综述医学影像处理概率图模型概率图模型综述
RefLink:http://www.sigvc.org/bbs/thread-728-1-1.htmlGraphicalModel的基本类型基本的GraphicalModel可以大致分为两个类别：贝叶斯网络(BayesianNetwork)和马尔可夫随机场(MarkovRandomField)。它们的主要区别在于采用不同类型的图来表达变量之间的关系：贝叶斯网络采用有向无环图(DirectedAc
Python基础知识进阶之正则表达式_头歌python正则表达式进阶前端陈萨龙程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
大数据新视界 --大数据大厂之Flink强势崛起：大数据新视界的璀璨明珠青云交大数据新视界 Flink 大数据数据类型实时处理流处理框架对比应用场景数据处理大数据新视界数据库
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
flink增量检查点降低状态依赖实现的详细步骤 goTsHgo Flink 大数据分布式 flink 大数据
增量检查点启动恢复的时间是很久的，业务上不能接受，所以可以通过降低状态依赖来减少恢复的时间。降低状态依赖尽可能减少状态的复杂性和依赖关系，通过拆分状态或将状态外部化到其他服务中，从而降低恢复的开销。实施措施：将状态分割为更小的单元，减少每次恢复的状态量。使用外部状态存储服务，减少Flink状态后端的负担。拆分状态和将状态外部化到其他服务可以帮助减少作业的状态依赖，从而降低恢复时间和复杂度。以下是详
flink table factory基础知识 loukey_j
一、概述在flink中很多组件都是TableFactory的子类。比如序列化，反序列化，tableSinkFactory,tableSourceFactory.TableFactory是用来创建序列化，反序列器，tableSource和tableSink的工厂。二、TableFactory源码在flink框架中，TableFactory的子类并不是程序员自己随心new出来的。flink的提供给程序
2024年最全使用Python求解方程_python解方程(1)，字节面试官迟到 2401_84569545 程序员 python 学习面试
最后硬核资料：关注即可领取PPT模板、简历模板、行业经典书籍PDF。技术互助：技术群大佬指点迷津，你的问题可能不是问题，求资源在群里喊一声。面试题库：由技术群里的小伙伴们共同投稿，热乎的大厂面试真题，持续更新中。知识体系：含编程语言、算法、大数据生态圈组件（Mysql、Hive、Spark、Flink）、数据仓库、Python、前端等等。网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是
01-Flink安装部署及入门案例（仅供学习），音视频时代你还不会NDK开发小猪佩琪962 2024年程序员学习 flink 学习大数据
先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前阿里P7深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！因此收集整理了一份《2024年最新大数据全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵
比较Spark与Flink 傲雪凌霜，松柏长青大数据后端 spark flink 大数据
ApacheSpark和ApacheFlink都是目前非常流行的大数据处理引擎，但它们在架构、处理模式、应用场景等方面有一些显著的区别。下面是二者的对比：1.处理模式Spark:主要支持批处理（BatchProcessing），也能通过SparkStreaming处理流式数据，但SparkStreaming本质上是通过微批（micro-batching）的方式处理流数据，延迟相对较高。SparkS
Apache Flink：实时流处理与批处理的统一框架小码快撩 flink 大数据
导语在大数据处理领域，流处理和批处理是两种主要的处理方式。然而，传统的系统通常将这两者视为独立的任务，需要不同的工具和框架来处理。ApacheFlink是一个开源的流处理框架，它打破了这种界限，提供了一个统一的平台来处理实时流数据和批处理数据。一、基本概念与架构ApacheFlink的基本概念与架构主要包括以下几个核心组成部分：基本概念1.流处理模型：无界流(UnboundedStreams):数
flink独立集群部署嘎子吱吱吱吱 flink hadoop linux
#flink独立集群部署说明安装环境三台服务器47.106.23.1（master）47.112.173.2（worker1）47.115.162.3（worker1）提前装好jdk和ssh,以下操作最好不要用root账号提前下载好flink的包并解压设置三台服务器之间ssh免密登录生成本机秘钥以47.106.23.1为例（其他两台参考本服务器）#生成本机秘钥cd;ssh-keygen-trsa-
Flink的时间与watermarks详解大数据技术与数仓
当我们在使用Flink的时候，避免不了要和时间(time)、水位线(watermarks)打交道，理解这些概念是开发分布式流处理应用的基础。那么Flink支持哪些时间语义？Flink是如何处理乱序事件的？什么是水位线？水位线是如何生成的？水位线的传播方式是什么？让我们带着这些问题来开始本文的内容。时间语义基本概念时间是Flink等流处理中最重要的概念之一，在Flink中Time可以分为三种：Eve
实时数仓之实时数仓架构(Hudi)(1) 2401_84164527 程序员架构
目前比较流行的实时数仓架构有两类，其中一类是以Flink+Doris为核心的实时数仓架构方案；另一类是以湖仓一体架构为核心的实时数仓架构方案。本文针对Flink+Hudi湖仓一体架构进行介绍，这套架构的特点是可以基于一套数据完全实现Lambda架构。实时数仓架构图如下：技术框架Kafka：用于接入数据源；FlinkCDC：如果直接接入业务数据源可以考虑CDC方式，如果通过Kafka缓冲接入业务数据
2024年大数据最新实时数仓之实时数仓架构(Hudi) 2401_84185556 程序员大数据架构
技术框架Kafka：用于接入数据源；FlinkCDC：如果直接接入业务数据源可以考虑CDC方式，如果通过Kafka缓冲接入业务数据可以忽略;Flink：用于数据ETL，包括接入数据、处理数据及输出数据全链路数据计算任务；Spark：用于数据ETL，包括处理数据及输出数据全链路数据计算任务；Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；Doris：O
实时数仓之实时数仓架构(Hudi)(1)，2024年最新熬夜整理华为最新大数据开发笔试题 2401_84181221 程序员架构大数据
+Hudi：湖仓一体数据管理框架，用来管理模型数据，包括ODS/DWD/DWS/DIM/ADS等；+Doris：OLAP引擎，同步数仓结果模型，对外提供数据服务支持；+Hbase：用来存储维表信息，维表数据来源一部分有Flink加工实时写入，另一部分是从Spark任务生产，其主要作用用来支持FlinkETL处理过程中的LookupJoin功能。这里选用Hbase原因主要因为Table的HbaseC
Flink - CEP kikiki1
Hadoop3.2集群新版本的搭建详细讲解过程，从下面第一张官方的图来看，最新版是3.2，所以大猪将使用3.2的版本来演示，过程中遇到的坑留给自己，把路留给你们，IT之路还有大猪。大猪为了把文章压缩极简方便小伙伴阅读，将使用root帐号进行所有操作。准备两台主机10.211.55.11、10.211.55.12对应的hostname为m1.example.com、m2.example.com具体命
chapter01 Java语言概述知识点Note 月下绯烟 Java java 开发语言
JavaSEJavaEEJavaME大数据Java基础常用技术栈mysqlJDBCSSMspring+springmvc+mybatisLinuxnacosHadoopFlinkJAVAEE消息队列rabbitMQdocker数据库redisspringbootspringcloudsshstruts+spring+hibernate过时技术栈很少用JAVA虚拟机jvm分布式微服务高并发常见dos
【无标题】大数据之批处理，流处理，批流一体概念数字天下大数据
批处理批处理是将一定量的数据集合在一起，形成一个数据批次，然后对这个批次中的数据进行处理。Spark和Flink都支持批处理，其中Spark使用的是批处理模型，即将一批数据一次性读入内存，然后对其进行处理，处理完成后再将结果写入磁盘。Flink也支持批处理，但使用的是基于流处理的批处理模式，即将一批数据分成多个数据流进行处理，可以实现更高效的内存管理和更低的延迟。流处理流式处理是一种将数据流式地处
python flink_《Flink官方文档》Python 编程指南测试版 weixin_39846361 python flink
原文链接译者：hjjxd校对：清英Flink中的分析程序实现了对数据集的某些操作(例如，数据过滤，映射，合并，分组)。这些数据最初来源于特定的数据源(例如来自于读文件或数据集合)。操作执行的结果通过数据池以写入数据到(分布式)文件系统或标准输出(例如命令行终端)的形式返回。Flink程序可以运行在不同的环境中，既能够独立运行，也可以嵌入到其他程序中运行。程序可以运行在本地的JVM上，也可以运行在服
flink---window 搞数据的小杰 flink 大数据
Window介绍DataStream:https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/dev/datastream/operators/windows/SQL:https://nightlies.apache.org/flink/flink-docs-release-1.17/zh/docs/dev/table/
Flink(1.13) 的window机制(一) 万事万物
窗口概述在流处理应用中，数据是连续不断的，因此我们不可能等到所有数据都到了才开始处理。当然我们可以每来一个消息就处理一次，但是有时我们需要做一些聚合类的处理，例如：在过去的1分钟内有多少用户点击了我们的网页。在这种情况下，我们必须定义一个窗口，用来收集最近一分钟内的数据，并对这个窗口内的数据进行计算。流式计算是一种被设计用于处理无限数据集的数据处理引擎，而无限数据集是指一种不断增长的本质上无限的数
pyflink 自定义函数 scan724 Flink实时计算 python 开发语言
frompyflink.datastreamimportStreamExecutionEnvironmentfrompyflink.common.typeinfoimportTypesfrompyflink.datastreamimportStreamExecutionEnvironmentfrompyflink.tableimportStreamTableEnvironmentfrompyfli
flink 问题记录 Jhon_yh flink flink hadoop 大数据
文章目录1.Causedby:java.lang.UnsatisfiedLinkError:org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSums(IILjava/nio/ByteBuffer;ILjava/nio/ByteBuffer;IILjava/lang/String;JZ)V原因java.util.concurrent.Ex
Pyflink教程(三)：自定义函数 yuxj记录学习学习笔记学习 pyflink
该文章例子pyflink环境是apache-flink==1.13.6Python自定义函数是PyFlinkTableAPI中最重要的功能之一，其允许用户在PyFlinkTableAPI中使用Python语言开发的自定义函数，极大地拓宽了PythonTableAPI的使用范围。简单来说就是有的业务逻辑和需求是sql语句满足不了或太麻烦的，需要用过函数来实现。PythonUDFPythonUDF，即
pyflink 滚动窗口实例菜鸟社长菜鸟的大数据进阶之路大数据进阶之路 kafka big data python flink
写在前头：更多大数据相关精彩内容请进我的知识星球，每周定期更新正篇技术路线：模拟kafka生产者发送数据——>flink对kafka数据实时计算处理——>处理后的数据发送到kafka1、模拟客流数据的生产者，参考https://blog.csdn.net/qq_22611181/article/details/1199002502、flink聚合操作原理介绍，参考https://blog.csdn
java数字签名三种方式知了ing java jdk
以下3钟数字签名都是基于jdk7的 1，RSA String password="test"; // 1.初始化密钥 KeyPairGenerator keyPairGenerator = KeyPairGenerator.getInstance("RSA"); keyPairGenerator.initialize(51
Hibernate学习笔记 caoyong Hibernate
1>、Hibernate是数据访问层框架，是一个ORM(Object Relation Mapping)框架，作者为:Gavin King 2>、搭建Hibernate的开发环境 a>、添加jar包: aa>、hibernatte开发包中/lib/required/所
设计模式之装饰器模式Decorator（结构型）漂泊一剑客 Decorator
1. 概述若你从事过面向对象开发，实现给一个类或对象增加行为，使用继承机制，这是所有面向对象语言的一个基本特性。如果已经存在的一个类缺少某些方法，或者须要给方法添加更多的功能（魅力），你也许会仅仅继承这个类来产生一个新类—这建立在额外的代码上。
读取磁盘文件txt，并输入String 一炮送你回车库 String
public static void main(String[] args) throws IOException { String fileContent = readFileContent("d:/aaa.txt"); System.out.println(fileContent);
js三级联动下拉框 3213213333332132 三级联动
//三级联动省/直辖市<select id="province"></select> 市/省直辖<select id="city"></select> 县/区 <select id="area"></select>
erlang之parse_transform编译选项的应用 616050468 parse_transform 游戏服务器属性同步 abstract_code
最近使用erlang重构了游戏服务器的所有代码，之前看过C++/lua写的服务器引擎代码，引擎实现了玩家属性自动同步给前端和增量更新玩家数据到数据库的功能，这也是现在很多游戏服务器的优化方向，在引擎层面去解决数据同步和数据持久化，数据发生变化了业务层不需要关心怎么去同步给前端。由于游戏过程中玩家每个业务中玩家数据更改的量其实是很少
JAVA JSON的解析 darkranger java
// { // “Total”：“条数”， // Code: 1, // // “PaymentItems”:[ // { // “PaymentItemID”:”支款单ID”, // “PaymentCode”:”支款单编号”, // “PaymentTime”:”支款日期”, // ”ContractNo”:”合同号”， //
POJ-1273-Drainage Ditches aijuans ACM_POJ
POJ-1273-Drainage Ditches http://poj.org/problem?id=1273 基本的最大流，按LRJ的白书写的 #include<iostream> #include<cstring> #include<queue> using namespace std; #define INF 0x7fffffff int ma
工作流Activiti5表的命名及含义 atongyeye 工作流 Activiti
activiti5 - http://activiti.org/designer/update在线插件安装 activiti5一共23张表 Activiti的表都以ACT_开头。第二部分是表示表的用途的两个字母标识。用途也和服务的API对应。 ACT_RE_*: 'RE'表示repository。这个前缀的表包含了流程定义和流程静态资源（图片，规则，等等）。 A
android的广播机制和广播的简单使用百合不是茶 android 广播机制广播的注册
Android广播机制简介在Android中，有一些操作完成以后，会发送广播，比如说发出一条短信，或打出一个电话，如果某个程序接收了这个广播，就会做相应的处理。这个广播跟我们传统意义中的电台广播有些相似之处。之所以叫做广播，就是因为它只负责“说”而不管你“听不听”，也就是不管你接收方如何处理。另外，广播可以被不只一个应用程序所接收，当然也可能不被任何应
Spring事务传播行为详解 bijian1013 java spring 事务传播行为
在service类前加上@Transactional，声明这个service所有方法需要事务管理。每一个业务方法开始时都会打开一个事务。 Spring默认情况下会对运行期例外(RunTimeException)进行事务回滚。这
eidtplus operate 征客丶 eidtplus
开启列模式: Alt+C 鼠标选择 OR Alt+鼠标左键拖动列模式替换或复制内容(多行): 右键-->格式-->填充所选内容-->选择相应操作 OR Ctrl+Shift+V(复制多行数据,必须行数一致) -------------------------------------------------------
【Kafka一】Kafka入门 bit1129 kafka
这篇文章来自Spark集成Kafka(http://bit1129.iteye.com/blog/2174765)，这里把它单独取出来，作为Kafka的入门吧下载Kafka http://mirror.bit.edu.cn/apache/kafka/0.8.1.1/kafka_2.10-0.8.1.1.tgz 2.10表示Scala的版本，而0.8.1.1表示Kafka
Spring 事务实现机制 BlueSkator spring 代理事务
Spring是以代理的方式实现对事务的管理。我们在Action中所使用的Service对象，其实是代理对象的实例，并不是我们所写的Service对象实例。既然是两个不同的对象，那为什么我们在Action中可以象使用Service对象一样的使用代理对象呢？为了说明问题，假设有个Service类叫AService，它的Spring事务代理类为AProxyService，AService实现了一个接口
bootstrap源码学习与示例：bootstrap-dropdown（转帖） BreakingBad bootstrap dropdown
bootstrap-dropdown组件是个烂东西，我读后的整体感觉。一个下拉开菜单的设计： <ul class="nav pull-right"> <li id="fat-menu" class="dropdown">
读《研磨设计模式》-代码笔记-中介者模式-Mediator bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* * 中介者模式（Mediator）：用一个中介对象来封装一系列的对象交互。 * 中介者使各对象不需要显式地相互引用，从而使其耦合松散，而且可以独立地改变它们之间的交互。 * * 在我看来，Mediator模式是把多个对象（
常用代码记录 chenjunt3 UI Excel J#
1、单据设置某行或某字段不能修改 //i是行号,"cash"是字段名称 getBillCardPanelWrapper().getBillCardPanel().getBillModel().setCellEditable(i, "cash", false); //取得单据表体所有项用以上语句做循环就能设置整行了 getBillC
搜索引擎与工作流引擎 comsci 算法工作搜索引擎网络应用
最近在公司做和搜索有关的工作，(只是简单的应用开源工具集成到自己的产品中)工作流系统的进一步设计暂时放在一边了，偶然看到谷歌的研究员吴军写的数学之美系列中的搜索引擎与图论这篇文章中的介绍，我发现这样一个关系(仅仅是猜想) -----搜索引擎和流程引擎的基础--都是图论，至少像在我在JWFD中引擎算法中用到的是自定义的广度优先
oracle Health Monitor daizj oracle Health Monitor
About Health Monitor Beginning with Release 11g, Oracle Database includes a framework called Health Monitor for running diagnostic checks on the database. About Health Monitor Checks Health M
JSON字符串转换为对象 dieslrae java json
作为前言,首先是要吐槽一下公司的脑残编译部署方式,web和core分开部署本来没什么问题,但是这丫居然不把json的包作为基础包而作为web的包,导致了core端不能使用,而且我们的core是可以当web来用的(不要在意这些细节),所以在core中处理json串就是个问题.没办法,跟编译那帮人也扯不清楚,只有自己写json的解析了.
C语言学习八结构体，综合应用，学生管理系统 dcj3sjt126com C语言
实现功能的代码： # include <stdio.h> # include <malloc.h> struct Student { int age; float score; char name[100]; }; int main(void) { int len; struct Student * pArr; int i,
vagrant学习笔记 dcj3sjt126com vagrant
想了解多主机是如何定义和使用的, 所以又学习了一遍vagrant 1. vagrant virtualbox 下载安装 https://www.vagrantup.com/downloads.html https://www.virtualbox.org/wiki/Downloads 查看安装在命令行输入vagrant 2.
14.性能优化-优化-软件配置优化 frank1234 软件配置性能优化
1.Tomcat线程池修改tomcat的server.xml文件： <Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" maxThreads="1200" m
一个不错的shell 脚本教程入门级 HarborChung linux shell
一个不错的shell 脚本教程入门级建立一个脚本　　Linux中有好多中不同的shell，但是通常我们使用bash (bourne again shell) 进行shell编程，因为bash是免费的并且很容易使用。所以在本文中笔者所提供的脚本都是使用bash（但是在大多数情况下，这些脚本同样可以在 bash的大姐，bourne shell中运行）。　　如同其他语言一样
Spring4新特性——核心容器的其他改进 jinnianshilongnian spring 动态代理 spring4 依赖注入
Spring4新特性——泛型限定式依赖注入 Spring4新特性——核心容器的其他改进 Spring4新特性——Web开发的增强 Spring4新特性——集成Bean Validation 1.1(JSR-349)到SpringMVC Spring4新特性——Groovy Bean定义DSL Spring4新特性——更好的Java泛型操作API Spring4新
Linux设置tomcat开机启动 liuxingguome tomcat linux 开机自启动
执行命令sudo gedit /etc/init.d/tomcat6 然后把以下英文部分复制过去。（注意第一句#!/bin/sh如果不写，就不是一个shell文件。然后将对应的jdk和tomcat换成你自己的目录就行了。 #!/bin/bash # # /etc/rc.d/init.d/tomcat # init script for tomcat precesses
第13章 Ajax进阶（下） onestopweb Ajax
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
Troubleshooting Crystal Reports off BW blueoxygen BO
http://wiki.sdn.sap.com/wiki/display/BOBJ/Troubleshooting+Crystal+Reports+off+BW#TroubleshootingCrystalReportsoffBW-TracingBOE Quite useful, especially this part: SAP BW connectivity For t
Java开发熟手该当心的11个错误 tomcat_oracle java jvm 多线程单元测试
#1、不在属性文件或XML文件中外化配置属性。比如，没有把批处理使用的线程数设置成可在属性文件中配置。你的批处理程序无论在DEV环境中，还是UAT（用户验收测试）环境中，都可以顺畅无阻地运行，但是一旦部署在PROD 上，把它作为多线程程序处理更大的数据集时，就会抛出IOException，原因可能是JDBC驱动版本不同，也可能是#2中讨论的问题。如果线程数目可以在属性文件中配置，那么使它成为
正则表达式大全 yang852220741 html 编程正则表达式
今天向大家分享正则表达式大全，它可以大提高你的工作效率正则表达式也可以被当作是一门语言，当你学习一门新的编程语言的时候，他们是一个小的子语言。初看时觉得它没有任何的意义，但是很多时候，你不得不阅读一些教程，或文章来理解这些简单的描述模式。一、校验数字的表达式数字：^[0-9]*$ n位的数字：^\d{n}$ 至少n位的数字：^\d{n,}$ m-n位的数字：^\d{m,n}$