攻城狮AA

flink源码分析-flink-yarn-session job独立部署模式的任务提交流程

前言

最近一直在处理着规则引擎项目在flink job的部署问题。目前咱们这边的基础设施有hadoop集群，所以想将flink job部署到yarn上。目前flink on yarn的部署方式有2种：

flink 与 yarn建立session（通过yarn-session.sh脚本启动），然后所有的job通过该session去提交执行；
flink per job on yarn /独立部署模式（通过flink run 命令）

以上2种方式有啥本质的区别呢？第一种方式，flink像yarn申请好空间之后，就不能改变了，当提交新的job的时候，如果空间不足，则只能等前一个job执行完成并释放空间才能提交执行，否则无法提交任务。第二种方式，是一个任务会对应一个yarn上job,向yarn申请资源，直到作业执行完成，并不会影响下一个作业的正常运行，除非是yarn上面没有任何资源的情况下。本篇就从源码的角度来分析一下flink job的独立部署模式（flink版本为：1.9.0）

flink job独立模式分析

获取入口类

flink rest client的开发，首要任务是理清flink per job on yarn的job提交流程，只有理清才能进行开发工作。首先我们从flink bin目录下的flink 脚本开始吧。

./bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar

这个是flink 官网提供的一个提交一个单个job到yarn上执行的example.
我们打开bin目录下的flink 脚本的最后一行：

# Add HADOOP_CLASSPATH to allow the usage of Hadoop file systems
exec $JAVA_RUN $JVM_ARGS "${log_setting[@]}" -classpath "`manglePathList "$CC_CLASSPATH:$INTERNAL_HADOOP_CLASSPATHS"`" org.apache.flink.client.cli.CliFrontend "$@"

得知，命令行执行提交job的入口类为org.apache.flink.client.cli.CliFrontend。

入口类main方法解析

我们来看一下org.apache.flink.client.cli.CliFrontend的main方法：

/**
	 * Submits the job based on the arguments.
	 */
	public static void main(final String[] args) {
		EnvironmentInformation.logEnvironmentInfo(LOG, "Command Line Client", args);
		// 1. find the configuration directory
		final String configurationDirectory = getConfigurationDirectoryFromEnv();
		// 2. load the global configuration
		final Configuration configuration = GlobalConfiguration.loadConfiguration(configurationDirectory);
		// 3. load the custom command lines
		final List<CustomCommandLine<?>> customCommandLines = loadCustomCommandLines(
			configuration,
			configurationDirectory);
		try {
			final CliFrontend cli = new CliFrontend(
				configuration,
				customCommandLines);
			SecurityUtils.install(new SecurityConfiguration(cli.configuration));
			int retCode = SecurityUtils.getInstalledContext()
					.runSecured(() -> cli.parseParameters(args));
			System.exit(retCode);
		}
		catch (Throwable t) {
			final Throwable strippedThrowable = ExceptionUtils.stripException(t, UndeclaredThrowableException.class);
			LOG.error("Fatal error while running command line interface.", strippedThrowable);
			strippedThrowable.printStackTrace();
			System.exit(31);
		}
	}

在这个方法主要干了3件事情：初始化系统全局的配置信息、解析命令行参数与命令、创建CliFrontend的对象并调用parseParameters方法。

parseParameters

接着我们看一下parseParameters方法干了什么

	/**
	 * Parses the command line arguments and starts the requested action.
	 *
	 * @param args command line arguments of the client.
	 * @return The return code of the program
	 */
	public int parseParameters(String[] args) {
		......
		// get action
		String action = args[0];
		// remove action from parameters
		final String[] params = Arrays.copyOfRange(args, 1, args.length);
		try {
			// do action
			switch (action) {
				case ACTION_RUN:
					run(params);
					return 0;
				case ACTION_LIST:
					list(params);
					return 0;
				case ACTION_INFO:
					info(params);
					return 0;
				case ACTION_CANCEL:
					cancel(params);
					return 0;
				case ACTION_STOP:
					stop(params);
					return 0;
				case ACTION_SAVEPOINT:
					savepoint(params);
					return 0;
				case "-h":
				case "--help":
					CliFrontendParser.printHelp(customCommandLines);
					return 0;
				case "-v":
				case "--version":
					String version = EnvironmentInformation.getVersion();
					String commitID = EnvironmentInformation.getRevisionInformation().commitId;
					System.out.print("Version: " + version);
					System.out.println(commitID.equals(EnvironmentInformation.UNKNOWN) ? "" : ", Commit ID: " + commitID);
					return 0;
				default:
					System.out.printf("\"%s\" is not a valid action.\n", action);
					System.out.println();
					System.out.println("Valid actions are \"run\", \"list\", \"info\", \"savepoint\", \"stop\", or \"cancel\".");
					System.out.println();
					System.out.println("Specify the version option (-v or --version) to print Flink version.");
					System.out.println();
					System.out.println("Specify the help option (-h or --help) to get help on the command.");
					return 1;
			}
		} catch (CliArgsException ce) {
			return handleArgException(ce);
		} catch (ProgramParametrizationException ppe) {
			return handleParametrizationException(ppe);
		} catch (ProgramMissingJobException pmje) {
			return handleMissingJobException();
		} catch (Exception e) {
			return handleError(e);
		}
	}

这个方法主要干了一件事情：获取命令第一个参数（动作类型），然后根据动作类型执行相应的操作；具体的动作如下：

run：执行任务；
list：任务列表；
info：查询job的执行计划信息；
cancel：取消一个job;
stop：停止一个job并执行savepoint；
savepoint：执行job的savepoint；
-h/–help：帮助；
-v/–version:查看flink版本；

run方法执行提交流程解析

本次我们主要看job是如何提交的，所以我们得看一下run方法：

	/**
	 * Executions the run action.
	 *
	 * @param args Command line arguments for the run action.
	 */
	protected void run(String[] args) throws Exception {
		LOG.info("Running 'run' command.");
		final Options commandOptions = CliFrontendParser.getRunCommandOptions();
		final Options commandLineOptions = CliFrontendParser.mergeOptions(commandOptions, customCommandLineOptions);
		final CommandLine commandLine = CliFrontendParser.parse(commandLineOptions, args, true);
		final RunOptions runOptions = new RunOptions(commandLine);
		// evaluate help flag
		if (runOptions.isPrintHelp()) {
			CliFrontendParser.printHelpForRun(customCommandLines);
			return;
		}
		if (!runOptions.isPython()) {
			// Java program should be specified a JAR file
			if (runOptions.getJarFilePath() == null) {
				throw new CliArgsException("Java program should be specified a JAR file.");
			}
		}
		final PackagedProgram program;
		try {
			LOG.info("Building program from JAR file");
			program = buildProgram(runOptions);
		}
		catch (FileNotFoundException e) {
			throw new CliArgsException("Could not build the program from JAR file.", e);
		}
		final CustomCommandLine<?> customCommandLine = getActiveCustomCommandLine(commandLine);
		try {
			runProgram(customCommandLine, commandLine, runOptions, program);
		} finally {
			program.deleteExtractedLibraries();
		}
	}

该方法主要干了3件事情：
1. 根据用户提交的jar来构建PackagedProgram对象；
2. 获取当前活动的自定义命令行（CustomCommandLine）；
3. 调用runProgram。
我们来具体分析一下这3件事情：

根据用户提交的jar来构建PackagedProgram对象

首先我们得搞清楚PackagedProgram这个类是干嘛用的，还是来看一下源码：

/**
 * This class encapsulates represents a program, packaged in a jar file. It supplies
 * functionality to extract nested libraries, search for the program entry point, and extract
 * a program plan.
 */
public class PackagedProgram {
//此次省略1千字
......
}

在这个类中，我省略了很多，但是我们重点关注这个类的注释。从注释中我们知道了这个类要干以下三件事情：

从jar中抽取出job依赖的jar包（解析jar目录结构，如果存在lib目录，则对该目录进行扫描，看是否有jar包，如果有，则会提取jar包并写入到本地的临时目录中，供后期使用。）；
从jar包中获取job的入口类（解析jar中的manifest，如果有program-class配置，则优先使用该配置，否则再看一下是否有Main-Class配置）；
从jar包中获取job的执行计划；

获取当前活动的自定义命令行（CustomCommandLine）

CustomCommandLine是个接口，我看一下这个接口的实现类有哪些：

实现类有：DefaultCLI、DummyCustomCommandLine、FlinkYarnSessionCli。然后看一下，当前run方法里的CustomCommandLine到底指的是哪一个呢？我们再回顾头看一下CliFrontend的main方法里：

	// 3. load the custom command lines
		final List<CustomCommandLine<?>> customCommandLines = loadCustomCommandLines(
			configuration,
			configurationDirectory);

在main方法里调用了loadCustomCommandLines方法，loadCustomCommandLines内容如下：

public static List<CustomCommandLine<?>> loadCustomCommandLines(Configuration configuration, String configurationDirectory) {
		List<CustomCommandLine<?>> customCommandLines = new ArrayList<>(2);
		//	Command line interface of the YARN session, with a special initialization here
		//	to prefix all options with y/yarn.
		//	Tips: DefaultCLI must be added at last, because getActiveCustomCommandLine(..) will get the
		//	      active CustomCommandLine in order and DefaultCLI isActive always return true.
		final String flinkYarnSessionCLI = "org.apache.flink.yarn.cli.FlinkYarnSessionCli";
		try {
			customCommandLines.add(
				loadCustomCommandLine(flinkYarnSessionCLI,
					configuration,
					configurationDirectory,
					"y",
					"yarn"));
		} catch (NoClassDefFoundError | Exception e) {
			LOG.warn("Could not load CLI class {}.", flinkYarnSessionCLI, e);
		}
		customCommandLines.add(new DefaultCLI(configuration));
		return customCommandLines;
	}

到这里，我们就可以知道run方法里的CustomCommandLine就是FlinkYarnSessionCli。

runProgram方法分析

接着我们看run方法里调用的runProgram方法具体干了什么。

	private <T> void runProgram(
			CustomCommandLine<T> customCommandLine,
			CommandLine commandLine,
			RunOptions runOptions,
			PackagedProgram program) throws ProgramInvocationException, FlinkException {
		final ClusterDescriptor<T> clusterDescriptor = customCommandLine.createClusterDescriptor(commandLine);
		try {
			final T clusterId = customCommandLine.getClusterId(commandLine);
			final ClusterClient<T> client;
			// directly deploy the job if the cluster is started in job mode and detached
			if (clusterId == null && runOptions.getDetachedMode()) {
				int parallelism = runOptions.getParallelism() == -1 ? defaultParallelism : runOptions.getParallelism();
				final JobGraph jobGraph = PackagedProgramUtils.createJobGraph(program, configuration, parallelism);
				final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
				client = clusterDescriptor.deployJobCluster(
					clusterSpecification,
					jobGraph,
					runOptions.getDetachedMode());
				logAndSysout("Job has been submitted with JobID " + jobGraph.getJobID());
				try {
					client.shutdown();
				} catch (Exception e) {
					LOG.info("Could not properly shut down the client.", e);
				}
			} else {
				final Thread shutdownHook;
				if (clusterId != null) {
					client = clusterDescriptor.retrieve(clusterId);
					shutdownHook = null;
				} else {
					// also in job mode we have to deploy a session cluster because the job
					// might consist of multiple parts (e.g. when using collect)
					final ClusterSpecification clusterSpecification = customCommandLine.getClusterSpecification(commandLine);
					client = clusterDescriptor.deploySessionCluster(clusterSpecification);
					// if not running in detached mode, add a shutdown hook to shut down cluster if client exits
					// there's a race-condition here if cli is killed before shutdown hook is installed
					if (!runOptions.getDetachedMode() && runOptions.isShutdownOnAttachedExit()) {
						shutdownHook = ShutdownHookUtil.addShutdownHook(client::shutDownCluster, client.getClass().getSimpleName(), LOG);
					} else {
						shutdownHook = null;
					}
				}

				try {
					client.setPrintStatusDuringExecution(runOptions.getStdoutLogging());
					client.setDetached(runOptions.getDetachedMode());
					int userParallelism = runOptions.getParallelism();
					LOG.debug("User parallelism is set to {}", userParallelism);
					if (ExecutionConfig.PARALLELISM_DEFAULT == userParallelism) {
						userParallelism = defaultParallelism;
					}
					executeProgram(program, client, userParallelism);
				} finally {
					if (clusterId == null && !client.isDetached()) {
						// terminate the cluster only if we have started it before and if it's not detached
						try {
							client.shutDownCluster();
						} catch (final Exception e) {
							LOG.info("Could not properly terminate the Flink cluster.", e);
						}
						if (shutdownHook != null) {
							// we do not need the hook anymore as we have just tried to shutdown the cluster.
							ShutdownHookUtil.removeShutdownHook(shutdownHook, client.getClass().getSimpleName(), LOG);
						}
					}
					try {
						client.shutdown();
					} catch (Exception e) {
						LOG.info("Could not properly shut down the client.", e);
					}
				}
			}
		} finally {
			try {
				clusterDescriptor.close();
			} catch (Exception e) {
				LOG.info("Could not properly close the cluster descriptor.", e);
			}
		}
	}

该方法涉及到的业务逻辑比较多，也是比较重要的方法，还是通过流程图来梳理一下：

以上是这个方法执行的整个流程。我们本次只关注任务提交的流程，一些细节后期将会写篇（比如JobGraph是什么东西）。在这个方法里，我们重点关注在Detached模式下的clusterDescriptor.deployJobCluster方法与attached模式下的executeProgram。我们先来看一下clusterDescriptor.deployJobCluster方法。这里的clusterDescriptor实际上就是YarnClusterDescriptor类的实例。

独立模式部署实现

在上面的方法里，独立部署模式调用的是clusterDescriptor.deployJobCluster，我们先来看看这个方法的实现：

@Override
	public ClusterClient<ApplicationId> deployJobCluster(
		ClusterSpecification clusterSpecification,
		JobGraph jobGraph,
		boolean detached) throws ClusterDeploymentException {
		// this is required because the slots are allocated lazily
		jobGraph.setAllowQueuedScheduling(true);
		try {
			return deployInternal(
				clusterSpecification,
				"Flink per-job cluster",
				getYarnJobClusterEntrypoint(),
				jobGraph,
				detached);
		} catch (Exception e) {
			throw new ClusterDeploymentException("Could not deploy Yarn job cluster.", e);
		}
	}

在deployJobCluster方法里又调用了deployInternal方法，deployInternal方法所需参数有以下几个：

ClusterSpecification：集群运行时的具体参数配置；
ApplicationName：集群应用的名称，这里已经设置为Flink per-job cluster；
yarnClusterEntrypoint：启动AM/JobManager的入口类，这里调用了getYarnJobClusterEntrypoint()方法并返回YarnJobClusterEntrypoint.class.getName()；
JobGraph：job执行的结构图；
detached：是否时独立模式，这里实际上一直都是true;

我们看一下deployInternal方法的具体实现：


	/**
	 * This method will block until the ApplicationMaster/JobManager have been deployed on YARN.
	 *
	 * @param clusterSpecification Initial cluster specification for the Flink cluster to be deployed
	 * @param applicationName name of the Yarn application to start
	 * @param yarnClusterEntrypoint Class name of the Yarn cluster entry point.
	 * @param jobGraph A job graph which is deployed with the Flink cluster, {@code null} if none
	 * @param detached True if the cluster should be started in detached mode
	 */
	protected ClusterClient<ApplicationId> deployInternal(
			ClusterSpecification clusterSpecification,
			String applicationName,
			String yarnClusterEntrypoint,
			@Nullable JobGraph jobGraph,
			boolean detached) throws Exception {
		// ------------------ Check if configuration is valid --------------------
		validateClusterSpecification(clusterSpecification);
		if (UserGroupInformation.isSecurityEnabled()) {
			// note: UGI::hasKerberosCredentials inaccurately reports false
			// for logins based on a keytab (fixed in Hadoop 2.6.1, see HADOOP-10786),
			// so we check only in ticket cache scenario.
			boolean useTicketCache = flinkConfiguration.getBoolean(SecurityOptions.KERBEROS_LOGIN_USETICKETCACHE);
			UserGroupInformation loginUser = UserGroupInformation.getCurrentUser();
			if (loginUser.getAuthenticationMethod() == UserGroupInformation.AuthenticationMethod.KERBEROS
				&& useTicketCache && !loginUser.hasKerberosCredentials()) {
				LOG.error("Hadoop security with Kerberos is enabled but the login user does not have Kerberos credentials");
				throw new RuntimeException("Hadoop security with Kerberos is enabled but the login user " +
					"does not have Kerberos credentials");
			}
		}
		isReadyForDeployment(clusterSpecification);
		// ------------------ Check if the specified queue exists --------------------
		checkYarnQueues(yarnClient);
		// ------------------ Add dynamic properties to local flinkConfiguraton ------
		Map<String, String> dynProperties = getDynamicProperties(dynamicPropertiesEncoded);
		for (Map.Entry<String, String> dynProperty : dynProperties.entrySet()) {
			flinkConfiguration.setString(dynProperty.getKey(), dynProperty.getValue());
		}

		// ------------------ Check if the YARN ClusterClient has the requested resources --------------

		// Create application via yarnClient
		final YarnClientApplication yarnApplication = yarnClient.createApplication();
		final GetNewApplicationResponse appResponse = yarnApplication.getNewApplicationResponse();

		Resource maxRes = appResponse.getMaximumResourceCapability();
		final ClusterResourceDescription freeClusterMem;
		try {
			freeClusterMem = getCurrentFreeClusterResources(yarnClient);
		} catch (YarnException | IOException e) {
			failSessionDuringDeployment(yarnClient, yarnApplication);
			throw new YarnDeploymentException("Could not retrieve information about free cluster resources.", e);
		}
		final int yarnMinAllocationMB = yarnConfiguration.getInt(YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB, 0);
		final ClusterSpecification validClusterSpecification;
		try {
			validClusterSpecification = validateClusterResources(
				clusterSpecification,
				yarnMinAllocationMB,
				maxRes,
				freeClusterMem);
		} catch (YarnDeploymentException yde) {
			failSessionDuringDeployment(yarnClient, yarnApplication);
			throw yde;
		}
		LOG.info("Cluster specification: {}", validClusterSpecification);
		final ClusterEntrypoint.ExecutionMode executionMode = detached ?
			ClusterEntrypoint.ExecutionMode.DETACHED
			: ClusterEntrypoint.ExecutionMode.NORMAL;
		flinkConfiguration.setString(ClusterEntrypoint.EXECUTION_MODE, executionMode.toString());
		ApplicationReport report = startAppMaster(
			flinkConfiguration,
			applicationName,
			yarnClusterEntrypoint,
			jobGraph,
			yarnClient,
			yarnApplication,
			validClusterSpecification);
		String host = report.getHost();
		int port = report.getRpcPort();
		// Correctly initialize the Flink config
		flinkConfiguration.setString(JobManagerOptions.ADDRESS, host);
		flinkConfiguration.setInteger(JobManagerOptions.PORT, port);
		flinkConfiguration.setString(RestOptions.ADDRESS, host);
		flinkConfiguration.setInteger(RestOptions.PORT, port);
		// the Flink cluster is deployed in YARN. Represent cluster
		return createYarnClusterClient(
			this,
			validClusterSpecification.getNumberTaskManagers(),
			validClusterSpecification.getSlotsPerTaskManager(),
			report,
			flinkConfiguration,
			true);
	}

这个方法比较长，但干的活就几件：

发布前做各种参数的合法性校验以及必须参数的校验；
调用startAppMaster在yarn上启动flink的AM/JobManager；
等待AM/JobManager启动成功之后更新JobManager的rpc的最新地址；
构建YarnClusterClient；
所以这个方法里调用最关键的方法就是startAppMaster。

public ApplicationReport startAppMaster(
			Configuration configuration,
			String applicationName,
			String yarnClusterEntrypoint,
			JobGraph jobGraph,
			YarnClient yarnClient,
			YarnClientApplication yarnApplication,
			ClusterSpecification clusterSpecification) throws Exception {

		// ------------------ Initialize the file systems -------------------------

		org.apache.flink.core.fs.FileSystem.initialize(
			configuration,
			PluginUtils.createPluginManagerFromRootFolder(configuration));

		// initialize file system
		// Copy the application master jar to the filesystem
		// Create a local resource to point to the destination jar path
		final FileSystem fs = FileSystem.get(yarnConfiguration);
		final Path homeDir = fs.getHomeDirectory();

		// hard coded check for the GoogleHDFS client because its not overriding the getScheme() method.
		if (!fs.getClass().getSimpleName().equals("GoogleHadoopFileSystem") &&
				fs.getScheme().startsWith("file")) {
			LOG.warn("The file system scheme is '" + fs.getScheme() + "'. This indicates that the "
					+ "specified Hadoop configuration path is wrong and the system is using the default Hadoop configuration values."
					+ "The Flink YARN client needs to store its files in a distributed file system");
		}

		ApplicationSubmissionContext appContext = yarnApplication.getApplicationSubmissionContext();
		Set<File> systemShipFiles = new HashSet<>(shipFiles.size());
		for (File file : shipFiles) {
			systemShipFiles.add(file.getAbsoluteFile());
		}

		//check if there is a logback or log4j file
		File logbackFile = new File(configurationDirectory + File.separator + CONFIG_FILE_LOGBACK_NAME);
		final boolean hasLogback = logbackFile.exists();
		if (hasLogback) {
			systemShipFiles.add(logbackFile);
		}

		File log4jFile = new File(configurationDirectory + File.separator + CONFIG_FILE_LOG4J_NAME);
		final boolean hasLog4j = log4jFile.exists();
		if (hasLog4j) {
			systemShipFiles.add(log4jFile);
			if (hasLogback) {
				// this means there is already a logback configuration file --> fail
				LOG.warn("The configuration directory ('" + configurationDirectory + "') contains both LOG4J and " +
					"Logback configuration files. Please delete or rename one of them.");
			}
		}

		addEnvironmentFoldersToShipFiles(systemShipFiles);

		// Set-up ApplicationSubmissionContext for the application

		final ApplicationId appId = appContext.getApplicationId();

		// ------------------ Add Zookeeper namespace to local flinkConfiguraton ------
		String zkNamespace = getZookeeperNamespace();
		// no user specified cli argument for namespace?
		if (zkNamespace == null || zkNamespace.isEmpty()) {
			// namespace defined in config? else use applicationId as default.
			zkNamespace = configuration.getString(HighAvailabilityOptions.HA_CLUSTER_ID, String.valueOf(appId));
			setZookeeperNamespace(zkNamespace);
		}

		configuration.setString(HighAvailabilityOptions.HA_CLUSTER_ID, zkNamespace);

		if (HighAvailabilityMode.isHighAvailabilityModeActivated(configuration)) {
			// activate re-execution of failed applications
			appContext.setMaxAppAttempts(
				configuration.getInteger(
					YarnConfigOptions.APPLICATION_ATTEMPTS.key(),
					YarnConfiguration.DEFAULT_RM_AM_MAX_ATTEMPTS));

			activateHighAvailabilitySupport(appContext);
		} else {
			// set number of application retries to 1 in the default case
			appContext.setMaxAppAttempts(
				configuration.getInteger(
					YarnConfigOptions.APPLICATION_ATTEMPTS.key(),
					1));
		}

		final Set<File> userJarFiles = (jobGraph == null)
			// not per-job submission
			? Collections.emptySet()
			// add user code jars from the provided JobGraph
			: jobGraph.getUserJars().stream().map(f -> f.toUri()).map(File::new).collect(Collectors.toSet());

		// local resource map for Yarn
		final Map<String, LocalResource> localResources = new HashMap<>(2 + systemShipFiles.size() + userJarFiles.size());
		// list of remote paths (after upload)
		final List<Path> paths = new ArrayList<>(2 + systemShipFiles.size() + userJarFiles.size());
		// ship list that enables reuse of resources for task manager containers
		StringBuilder envShipFileList = new StringBuilder();

		// upload and register ship files
		List<String> systemClassPaths = uploadAndRegisterFiles(
			systemShipFiles,
			fs,
			homeDir,
			appId,
			paths,
			localResources,
			envShipFileList);

		final List<String> userClassPaths = uploadAndRegisterFiles(
			userJarFiles,
			fs,
			homeDir,
			appId,
			paths,
			localResources,
			envShipFileList);

		if (userJarInclusion == YarnConfigOptions.UserJarInclusion.ORDER) {
			systemClassPaths.addAll(userClassPaths);
		}

		// normalize classpath by sorting
		Collections.sort(systemClassPaths);
		Collections.sort(userClassPaths);

		// classpath assembler
		StringBuilder classPathBuilder = new StringBuilder();
		if (userJarInclusion == YarnConfigOptions.UserJarInclusion.FIRST) {
			for (String userClassPath : userClassPaths) {
				classPathBuilder.append(userClassPath).append(File.pathSeparator);
			}
		}
		for (String classPath : systemClassPaths) {
			classPathBuilder.append(classPath).append(File.pathSeparator);
		}

		// Setup jar for ApplicationMaster
		Path remotePathJar = setupSingleLocalResource(
			"flink.jar",
			fs,
			appId,
			flinkJarPath,
			localResources,
			homeDir,
			"");

		// set the right configuration values for the TaskManager
		configuration.setInteger(
			TaskManagerOptions.NUM_TASK_SLOTS,
			clusterSpecification.getSlotsPerTaskManager());

		configuration.setString(
			TaskManagerOptions.TASK_MANAGER_HEAP_MEMORY,
			clusterSpecification.getTaskManagerMemoryMB() + "m");

		// Upload the flink configuration
		// write out configuration file
		File tmpConfigurationFile = File.createTempFile(appId + "-flink-conf.yaml", null);
		tmpConfigurationFile.deleteOnExit();
		BootstrapTools.writeConfiguration(configuration, tmpConfigurationFile);

		Path remotePathConf = setupSingleLocalResource(
			"flink-conf.yaml",
			fs,
			appId,
			new Path(tmpConfigurationFile.getAbsolutePath()),
			localResources,
			homeDir,
			"");

		paths.add(remotePathJar);
		classPathBuilder.append("flink.jar").append(File.pathSeparator);
		paths.add(remotePathConf);
		classPathBuilder.append("flink-conf.yaml").append(File.pathSeparator);

		if (userJarInclusion == YarnConfigOptions.UserJarInclusion.LAST) {
			for (String userClassPath : userClassPaths) {
				classPathBuilder.append(userClassPath).append(File.pathSeparator);
			}
		}

		// write job graph to tmp file and add it to local resource
		// TODO: server use user main method to generate job graph
		if (jobGraph != null) {
			try {
				File fp = File.createTempFile(appId.toString(), null);
				fp.deleteOnExit();
				try (FileOutputStream output = new FileOutputStream(fp);
					ObjectOutputStream obOutput = new ObjectOutputStream(output);){
					obOutput.writeObject(jobGraph);
				}

				final String jobGraphFilename = "job.graph";
				flinkConfiguration.setString(JOB_GRAPH_FILE_PATH, jobGraphFilename);

				Path pathFromYarnURL = setupSingleLocalResource(
					jobGraphFilename,
					fs,
					appId,
					new Path(fp.toURI()),
					localResources,
					homeDir,
					"");
				paths.add(pathFromYarnURL);
				classPathBuilder.append(jobGraphFilename).append(File.pathSeparator);
			} catch (Exception e) {
				LOG.warn("Add job graph to local resource fail");
				throw e;
			}
		}

		final Path yarnFilesDir = getYarnFilesDir(appId);
		FsPermission permission = new FsPermission(FsAction.ALL, FsAction.NONE, FsAction.NONE);
		fs.setPermission(yarnFilesDir, permission); // set permission for path.

		//To support Yarn Secure Integration Test Scenario
		//In Integration test setup, the Yarn containers created by YarnMiniCluster does not have the Yarn site XML
		//and KRB5 configuration files. We are adding these files as container local resources for the container
		//applications (JM/TMs) to have proper secure cluster setup
		Path remoteKrb5Path = null;
		Path remoteYarnSiteXmlPath = null;
		boolean hasKrb5 = false;
		if (System.getenv("IN_TESTS") != null) {
			File f = new File(System.getenv("YARN_CONF_DIR"), Utils.YARN_SITE_FILE_NAME);
			LOG.info("Adding Yarn configuration {} to the AM container local resource bucket", f.getAbsolutePath());
			Path yarnSitePath = new Path(f.getAbsolutePath());
			remoteYarnSiteXmlPath = setupSingleLocalResource(
				Utils.YARN_SITE_FILE_NAME,
				fs,
				appId,
				yarnSitePath,
				localResources,
				homeDir,
				"");

			String krb5Config = System.getProperty("java.security.krb5.conf");
			if (krb5Config != null && krb5Config.length() != 0) {
				File krb5 = new File(krb5Config);
				LOG.info("Adding KRB5 configuration {} to the AM container local resource bucket", krb5.getAbsolutePath());
				Path krb5ConfPath = new Path(krb5.getAbsolutePath());
				remoteKrb5Path = setupSingleLocalResource(
					Utils.KRB5_FILE_NAME,
					fs,
					appId,
					krb5ConfPath,
					localResources,
					homeDir,
					"");
				hasKrb5 = true;
			}
		}

		// setup security tokens
		Path remotePathKeytab = null;
		String keytab = configuration.getString(SecurityOptions.KERBEROS_LOGIN_KEYTAB);
		if (keytab != null) {
			LOG.info("Adding keytab {} to the AM container local resource bucket", keytab);
			remotePathKeytab = setupSingleLocalResource(
				Utils.KEYTAB_FILE_NAME,
				fs,
				appId,
				new Path(keytab),
				localResources,
				homeDir,
				"");
		}

		final ContainerLaunchContext amContainer = setupApplicationMasterContainer(
			yarnClusterEntrypoint,
			hasLogback,
			hasLog4j,
			hasKrb5,
			clusterSpecification.getMasterMemoryMB());

		if (UserGroupInformation.isSecurityEnabled()) {
			// set HDFS delegation tokens when security is enabled
			LOG.info("Adding delegation token to the AM container..");
			Utils.setTokensFor(amContainer, paths, yarnConfiguration);
		}

		amContainer.setLocalResources(localResources);
		fs.close();

		// Setup CLASSPATH and environment variables for ApplicationMaster
		final Map<String, String> appMasterEnv = new HashMap<>();
		// set user specified app master environment variables
		appMasterEnv.putAll(Utils.getEnvironmentVariables(ResourceManagerOptions.CONTAINERIZED_MASTER_ENV_PREFIX, configuration));
		// set Flink app class path
		appMasterEnv.put(YarnConfigKeys.ENV_FLINK_CLASSPATH, classPathBuilder.toString());

		// set Flink on YARN internal configuration values
		appMasterEnv.put(YarnConfigKeys.ENV_TM_COUNT, String.valueOf(clusterSpecification.getNumberTaskManagers()));
		appMasterEnv.put(YarnConfigKeys.ENV_TM_MEMORY, String.valueOf(clusterSpecification.getTaskManagerMemoryMB()));
		appMasterEnv.put(YarnConfigKeys.FLINK_JAR_PATH, remotePathJar.toString());
		appMasterEnv.put(YarnConfigKeys.ENV_APP_ID, appId.toString());
		appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_HOME_DIR, homeDir.toString());
		appMasterEnv.put(YarnConfigKeys.ENV_CLIENT_SHIP_FILES, envShipFileList.toString());
		appMasterEnv.put(YarnConfigKeys.ENV_SLOTS, String.valueOf(clusterSpecification.getSlotsPerTaskManager()));
		appMasterEnv.put(YarnConfigKeys.ENV_DETACHED, String.valueOf(detached));
		appMasterEnv.put(YarnConfigKeys.ENV_ZOOKEEPER_NAMESPACE, getZookeeperNamespace());
		appMasterEnv.put(YarnConfigKeys.FLINK_YARN_FILES, yarnFilesDir.toUri().toString());

		// https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#identity-on-an-insecure-cluster-hadoop_user_name
		appMasterEnv.put(YarnConfigKeys.ENV_HADOOP_USER_NAME, UserGroupInformation.getCurrentUser().getUserName());

		if (remotePathKeytab != null) {
			appMasterEnv.put(YarnConfigKeys.KEYTAB_PATH, remotePathKeytab.toString());
			String principal = configuration.getString(SecurityOptions.KERBEROS_LOGIN_PRINCIPAL);
			appMasterEnv.put(YarnConfigKeys.KEYTAB_PRINCIPAL, principal);
		}

		//To support Yarn Secure Integration Test Scenario
		if (remoteYarnSiteXmlPath != null) {
			appMasterEnv.put(YarnConfigKeys.ENV_YARN_SITE_XML_PATH, remoteYarnSiteXmlPath.toString());
		}
		if (remoteKrb5Path != null) {
			appMasterEnv.put(YarnConfigKeys.ENV_KRB5_PATH, remoteKrb5Path.toString());
		}

		if (dynamicPropertiesEncoded != null) {
			appMasterEnv.put(YarnConfigKeys.ENV_DYNAMIC_PROPERTIES, dynamicPropertiesEncoded);
		}

		// set classpath from YARN configuration
		Utils.setupYarnClassPath(yarnConfiguration, appMasterEnv);

		amContainer.setEnvironment(appMasterEnv);

		// Set up resource type requirements for ApplicationMaster
		Resource capability = Records.newRecord(Resource.class);
		capability.setMemory(clusterSpecification.getMasterMemoryMB());
		capability.setVirtualCores(flinkConfiguration.getInteger(YarnConfigOptions.APP_MASTER_VCORES));

		final String customApplicationName = customName != null ? customName : applicationName;

		appContext.setApplicationName(customApplicationName);
		appContext.setApplicationType(applicationType != null ? applicationType : "Apache Flink");
		appContext.setAMContainerSpec(amContainer);
		appContext.setResource(capability);

		if (yarnQueue != null) {
			appContext.setQueue(yarnQueue);
		}

		setApplicationNodeLabel(appContext);

		setApplicationTags(appContext);

		// add a hook to clean up in case deployment fails
		Thread deploymentFailureHook = new DeploymentFailureHook(yarnClient, yarnApplication, yarnFilesDir);
		Runtime.getRuntime().addShutdownHook(deploymentFailureHook);
		LOG.info("Submitting application master " + appId);
		yarnClient.submitApplication(appContext);

		LOG.info("Waiting for the cluster to be allocated");
		final long startTime = System.currentTimeMillis();
		ApplicationReport report;
		YarnApplicationState lastAppState = YarnApplicationState.NEW;
		loop: while (true) {
			try {
				report = yarnClient.getApplicationReport(appId);
			} catch (IOException e) {
				throw new YarnDeploymentException("Failed to deploy the cluster.", e);
			}
			YarnApplicationState appState = report.getYarnApplicationState();
			LOG.debug("Application State: {}", appState);
			switch(appState) {
				case FAILED:
				case FINISHED:
				case KILLED:
					throw new YarnDeploymentException("The YARN application unexpectedly switched to state "
						+ appState + " during deployment. \n" +
						"Diagnostics from YARN: " + report.getDiagnostics() + "\n" +
						"If log aggregation is enabled on your cluster, use this command to further investigate the issue:\n" +
						"yarn logs -applicationId " + appId);
					//break ..
				case RUNNING:
					LOG.info("YARN application has been deployed successfully.");
					break loop;
				default:
					if (appState != lastAppState) {
						LOG.info("Deploying cluster, current state " + appState);
					}
					if (System.currentTimeMillis() - startTime > 60000) {
						LOG.info("Deployment took more than 60 seconds. Please check if the requested resources are available in the YARN cluster");
					}

			}
			lastAppState = appState;
			Thread.sleep(250);
		}
		// print the application id for user to cancel themselves.
		if (isDetachedMode()) {
			LOG.info("The Flink YARN client has been started in detached mode. In order to stop " +
					"Flink on YARN, use the following command or a YARN web interface to stop " +
					"it:\nyarn application -kill " + appId + "\nPlease also note that the " +
					"temporary files of the YARN session in the home directory will not be removed.");
		}
		// since deployment was successful, remove the hook
		ShutdownHookUtil.removeShutdownHook(deploymentFailureHook, getClass().getSimpleName(), LOG);
		return report;
	}

这个方法也是非常非常的long，梳理下来之后发现，它只干了3件事：

准备启动AM所需资源，将需要的资源上传到HDFS上；
设置启动AM的环境变量以及最终启动脚本；
通过YarnClient向Yarn提交请求并自旋等待yarn的响应结果并返回；

到此为止，flink 提交任务的独立模式从源码的角度已分析结束。

你可能感兴趣的:(大数据)

flink+kafka实现流数据处理学习上海研博数据 java
在应用系统的建设过程中，通常都会遇到需要实时处理数据的场景，处理实时数据的框架有很多，本文将以一个示例来介绍flink+kafka在流数据处理中的应用。1、概念介绍flink：是一个分布式、高可用、高可靠的大数据处理引擎，提供了一种高效、可靠、可扩展的方式来处理和分析实时数据。kafka：是用于构建实时数据管道和流应用程序并具有横向扩展，容错，wickedfast（变态快）等优点的一种消息中间件。
大数据面试之路 (一) 数据倾斜愿与狸花过一生大数据面试职场和发展
记录大数据面试历程数据倾斜大数据岗位，数据倾斜面试必问的一个问题。一、数据倾斜的表现与原因表现某个或某几个Task执行时间过长，其他Task快速完成。Spark/MapReduce作业卡在某个阶段（如reduce阶段），日志显示少数Task处理大量数据。资源利用率不均衡（如CPU、内存集中在某些节点）。常见场景Key分布不均：如某些Key对应的数据量极大（如用户ID为空的记录、热点事件）。数据分区
Zookeeper与Kafka学习笔记上海研博数据 zookeeper kafka 学习
一、Zookeeper核心要点1.核心特性分布式协调服务，用于维护配置/命名/同步等元数据采用层次化数据模型（Znode树结构），每个节点可存储<1MB数据典型应用场景：HadoopNameNode高可用HBase元数据管理Kafka集群选举与状态管理2.设计限制内存型存储，不适合大数据量场景数据变更通过版本号（Version）控制，实现乐观锁机制采用ZAB协议保证数据一致性二、Kafka核心架构
一文理清：阿里系数据中台-数据治理工具集(傻傻也能分清楚） Debug_Snail Hadoop Big Data 技术工具人工智能 hadoop 数据仓库
阿里云提供的大数据与数据分析产品种类较多，各产品的定位和核心功能有所不同。以下是对DataWorks、MaxCompute、Dataphin、AnalyticDBforMySQL（ADB）、QuickBI、EMR的详细梳理。一、核心产品定位与功能DataWorks定位：一站式大数据开发治理平台，提供数据集成、开发、调度、治理、服务等全链路能力。核心功能：数据集成：支持异构数据源（如数据库、OSS、
使用LangChain访问个人数据第一章-简介明志刘明大模型学习手册 langchain
需要学习提示词工程的同学请看面向开发者的提示词工程需要学习ChatGPT的同学请查看搭建基于ChatGPT的问答系统需要学习LangChian开发的同学请查看基于LangChain开发应用程序正文在大数据时代，数据价值逐渐凸显，打造定制化、个性化服务，个人数据尤为重要。要开发一个具备较强服务能力、能够充分展现个性化智能的应用程序，大模型与个人数据的对齐是一个重要步骤。作为针对大模型开发应运而生的框
架构生命周期（演进史）技术应服务于业务 Limbo1213 java架构生命周期演进史
架构生命周期简介本篇幅主要讲述架构的各阶段出现的需求问题、业务问题、性能问题以及相应的解决方案。1、web1.0时代（1996年左右）2、web2.0时代（2006年左右）3、互联网时代（2012年左右）–》互联网±-》智慧城市。滴滴打车。饿了么（工商局）4、大数据+云计算5、AI未来以来时代…第一时期单一应用架构allinone。所有的模块和代码都在一起。技术也不分层。(2000年左右)网站的初
MySQL-关于如何保存“大数据” 赵师的工作日 mysql 大数据数据库
作者：赵师的工作日（赵明中）现役OracleACE、MySQL8.0ocp、TiDBPCTA\PCTP、ElasticsearchCertifiedEngineer微信号：mzzhao23微信公众号：赵师的工作日墨天轮社区：赵师的工作日CSND：赵师的工作日数据库的种类有很多，各类数据库充分发挥各自的优势从而保证业务稳定运行，mysql轻量级、关键数据，redis缓存、快，ES搜索，Mongodb
PIPCA个人信息保护合规审计师认证介绍！熙丫 13381482386 大数据
个人信息保护合规审计师"（PersonalInformationProtectionComplianceAuditor-CCRC）是中国网络安全审查认证中心与市场监管大数据中心为深入贯彻实施《个人信息保护法》，推动个人信息处理者切实履行合规审计职责，针对企事业单位及第三方机构中从事个人信息保护合规审计（简称“个保审计”）的专业人员，依据《个人信息保护法》、《网络安全从业人员能力基本要求》
Apache Doris 实现毫秒级查询响应随风九天匠心数据库服务 java apache Apache Doris
1.引言1.1数据分析的重要性随着大数据时代的到来，企业对实时数据分析的需求日益增长。快速、准确地获取数据洞察成为企业在竞争中脱颖而出的关键。传统的数据库系统在处理大规模数据时往往面临性能瓶颈，难以满足实时分析的需求。例如，一个电商公司需要实时监控销售数据以调整库存和营销策略，而传统的数据库可能需要数分钟甚至数小时才能生成报表，这显然无法满足业务需求。1.2ApacheDoris简介ApacheD
ClickHouse Keeper 源码解析阿里云云栖号云栖号技术分享 java 开发语言后端
简介：ClickHouse社区在21.8版本中引入了ClickHouseKeeper。ClickHouseKeeper是完全兼容Zookeeper协议的分布式协调服务。本文对开源版本ClickHousev21.8.10.19-lts源码进行了解析。作者简介：范振（花名辰繁），阿里云开源大数据-OLAP方向负责人。内容框架背景架构图核心流程图梳理内部代码流程梳理Nuraft关键配置排坑结论关于我们R
基于大数据架构的就业岗位推荐系统的设计与实现【java或python】—计算机毕业设计源码+LW文档 qq_375279829 大数据架构 python 课程设计算法
摘要随着互联网技术的迅猛发展和大数据时代的到来，就业市场日益复杂多变，求职者与招聘方之间的信息不对称问题愈发突出。为解决这一难题，本文设计并实现了一个基于大数据架构的就业岗位推荐系统。该系统通过收集、整合并分析大量求职者简历信息、企业招聘信息以及市场动态数据，运用先进的机器学习算法，为求职者提供个性化的岗位推荐服务，同时帮助企业快速定位到合适的候选人。本文将从系统设计的背景与意义、技术基础、需求分
供应链工作效率如何提升 dev.null 社会供应链
提升供应链工作效率可以从以下几个关键方面入手：1.优化供应链管理数据驱动决策：利用AI和大数据分析，提高预测准确性，优化库存管理。供应链可视化：采用ERP（企业资源计划）和SCM（供应链管理）系统，实现实时跟踪和监控。流程自动化：使用RPA（机器人流程自动化）减少人为操作，提高效率。2.提高物流效率智能调度：使用AI优化配送路线，减少运输时间和成本。自动化仓储：采用自动分拣、机器人搬运、无人机配送
【人工智能】农业工程与信息技术文献推荐 lisw05 人工智能农业信息技术机器人
李升伟整理1.农业物联网与智能化管理《农业物联网导论》作者：李道亮内容简介：本书系统介绍了农业物联网的基本概念、技术架构及其在农业生产中的应用，包括传感器网络、远程监控、智能决策支持系统等。《农业信息智能获取技术》作者：岳峻、傅泽田、高文内容简介：重点探讨了如何利用信息技术获取农业数据，包括遥感技术、无人机监测和传感器网络的应用。2.农业大数据与决策支持《农业大数据：理论与实践》作者：梅方权内容简
phoenix无法连接hbase shell创建表失败_报错_PleaseHoldException: Master is initializing---记录020_大数据工作笔记0180 添柴程序猿 hbase连接报错 phoenix连接hbase phoenix PleaseHoldExcep
今天发现,我的phoenix,去连接hbase集群,怎么也连不上了,奇怪了...弄了一晚上org.apache.hadoop.hbase.PleaseHoldException:Masterisinitializing[root@hadoop120bin]#ll总用量184-rwxr-xr-x.1rootroot36371月222020chaos-daemon.sh-rwxr-xr-x.1root
物联网-电路局“一杆一档”管理小赖同学啊智能硬件物联网
电路局“一杆一档”管理及设备管理维修的技术实现为了实现电路局对电杆及其安装设备的“一杆一档”管理，并结合设备管理、维修等相关工作，可以通过物联网（IoT）、地理信息系统（GIS）、大数据、人工智能（AI）和移动互联网等技术手段，构建一个智能化、数字化的管理系统。以下是详细的技术实现方案。1.实现目标“一杆一档”管理：为每根电杆建立唯一的数字化档案，记录其位置、型号、安装时间、维护记录等信息。对电杆
2025最新Linux系统深度优化指南：20个核心技巧与实战案例解析 emmm形成中 linux应用实操服务器 linux github
2025最新Linux系统深度优化指南：20个核心技巧与实战案例解析摘要：随着Linux在云计算、大数据、AI等领域的广泛应用，系统性能优化成为运维工程师的核心技能。本文结合2025年最新实践案例，从内核调优、资源管理、安全加固到云原生适配，全面解析Linux系统优化的20项核心技术，助力企业打造高性能、高可用的服务器环境。一、Linux系统优化的重要性与趋势在数字化转型加速的背景下，Linux系
大数据与hdfs创建文件夹猫猫头有亿点炸大数据 hdfs hadoop
注意事项:在hdfs上操作的文件,创建文件的时候注意他与linux是不一样的(模式如下:)hdfsdfs-mkdir/test1错误示例:否则,无论如何hdfsdfs-ls/test1/都没有文件的
【基础5】归并排序流光听风语基础算法排序算法算法
核心思路归并排序基本思想是将一个数组分成两个子数组，分别对这两个子数组进行排序，然后将排好序的子数组合并成一个最终的有序数组，即分治法：分：将数组递归拆分成左右两半，直到每个子数组只剩1个元素（天然有序）。治：将两个有序子数组合并为一个有序数组，直到合并成完整数组。优缺点优点缺点✅稳定排序（相等元素顺序不变）❌额外空间（需O(n)临时数组）✅时间复杂度稳定O(nlogn)❌递归可能栈溢出（极大数据
Java 中操作 R：深度整合与高效应用 froginwe11 开发语言
Java中操作R：深度整合与高效应用引言随着大数据和机器学习的快速发展，R语言在数据分析和可视化方面扮演着越来越重要的角色。而Java作为一种广泛应用于企业级应用开发的语言，其强大的功能和稳定性使其成为构建高性能应用的首选。本文将探讨Java如何操作R语言，实现高效的数据分析应用。一、Java操作R的背景R语言优势：R语言拥有丰富的统计分析、数据可视化工具和机器学习算法库，是数据分析领域的首选语言
数据集与云计算：云端数据集的管理与应用 AI天才研究院 AI大模型企业级应用开发实战 DeepSeek R1 &大数据AI人工智能大模型计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
1.背景介绍1.1大数据时代的数据挑战步入21世纪，我们见证了信息技术的爆炸式增长，数据以前所未有的速度产生、存储和使用。从社交媒体互动到科学研究，从电子商务交易到物联网传感器，各行各业都被海量数据所淹没。这种数据爆炸式增长带来了前所未有的机遇和挑战。1.1.1机遇：数据驱动型决策数据的激增为企业和组织提供了前所未有的洞察力。通过分析和理解这些数据，我们可以识别趋势、预测未来行为并做出更明智的决策
AI 大模型应用数据中心建设：数据中心成本优化杭州大厂Java程序媛 DeepSeek R1 &AI人工智能与大数据 java python javascript kotlin golang 架构人工智能
AI大模型应用数据中心建设：数据中心成本优化1.背景介绍在人工智能（AI）和大模型应用的快速发展中，数据中心（DataCenter）成为了一个至关重要的组成部分。无论是进行深度学习模型的训练，还是大模型应用的推理，数据中心都需要提供充足的计算资源、存储空间和网络带宽。随着AI模型和大数据量的增长，数据中心的建设和管理成本逐渐成为AI技术落地和应用的核心挑战之一。为了优化数据中心成本，同时保持高性能
信号处理应用：电力系统中的信号处理_（9）.基于电力系统信号的数据挖掘技术 kkchenkx 信号处理技术仿真模拟信号处理数据挖掘人工智能
基于电力系统信号的数据挖掘技术1.引言电力系统中的信号处理是一个重要的研究领域，涉及电力系统的监测、故障诊断、状态评估等多个方面。随着大数据和人工智能技术的发展，数据挖掘技术在电力系统中的应用越来越广泛。本节将介绍如何利用数据挖掘技术对电力系统中的信号进行处理和分析，以提高系统的可靠性和效率。2.电力系统中的信号类型在电力系统中，信号可以分为多种类型，包括：电压信号：反映电力系统的电压水平，用于检
CCF-GESP Python一级考试全解析：网络协议+编程技能双突破奕澄羽邦 python 网络协议开发语言
第一章CCF-GESP考试全景透视1.1认证体系权威性中国计算机学会（CCF）主办的GESP编程能力等级认证，是国内首个面向青少年的编程能力标准化评估体系。Python一级考试作为入门级认证，主要考察考生对计算机基础逻辑、编程工具使用及网络基础概念的掌握程度，证书受教育部认可，为后续人工智能、大数据等领域学习奠定基石。1.2考试内容三维度编程语言：Python语法基础（变量、循环、条件判断）、函数
2024年大数据最新图解curator如何实现zookeeper分布式锁_curator 锁(3) 2401_84183802 程序员分布式大数据 zookeeper
三、Zookeeper分布式锁概述1、Zookeeper分布式锁实现思路2、Zookeeper分布式锁解决的问题3、Zookeeper分布式锁优缺点？四、InterProcessMute实现分布式锁原理1、加锁流程（acquire()方法）0）加锁流程图1）internalLock()LockDatainternalLock()方法逻辑2）LockInternals#attemptLock()--
物联网通过数字孪生技术实现设备状态的实时仿真和优化小赖同学啊智能硬件物联网
数字孪生（DigitalTwin）是一种通过虚拟模型实时映射和仿真物理设备状态的技术。它结合了物联网（IoT）、大数据、人工智能（AI）和仿真技术，能够实现对设备状态的实时监控、预测和优化。以下是数字孪生技术在设备状态实时仿真和优化中的应用及实现路径：一、数字孪生的核心概念1.物理实体实际的设备或系统（如工厂设备、风力发电机、汽车）。2.虚拟模型物理实体的数字化表示，通常包括几何模型、行为模型和数
数据湖架构与实时数仓实践：Hudi、Iceberg、Kafka + Flink + Spark 晴天彩虹雨架构 kafka flink 数据仓库
1.引言：数据湖与数据仓库的融合趋势在大数据时代，传统的数据仓库（DataWarehouse,DW）因其强一致性和高效查询能力，一直是企业数据分析的核心。然而，随着数据量和数据类型的爆炸式增长，传统数据仓库的存储成本和数据管理难度逐渐增加。为了解决这些问题，数据湖（DataLake）概念应运而生。数据湖能够存储原始数据，支持半结构化和非结构化数据，提供更灵活的计算框架，但其缺乏事务管理和数据一致性
笔试题6：销售区域业绩对比 clownAdam 大数据笔试题数据库 sql 大数据面试笔试数据分析
2025年3月某运营商大数据笔试题（真实）并附有解答和解析说明笔试题6销售区域业绩对比：有一份销售业绩数据文件regional_sales.csv，包含字段：region（销售区域）、product_category（产品类别）、sales_amount（销售金额）。请使用SQL完成以下任务：统计每个销售区域各类产品的总销售金额，结果按销售区域和产品类别排序。找出每个销售区域销售金额最高的产品类别
Laravel如何实现MySQL分库分表的功能？使用场景是什么？底层原理是什么？快点好好学习吧 Laravel laravel mysql php
一、MySQL分库分表的定义1.核心定义分库（Sharding）：将数据分散到多个数据库中，以减轻单个数据库的压力。分表（Partitioning）：将一个大表拆分为多个小表，通常基于某种规则（如用户ID或时间戳）。目的：提高系统的扩展性、性能和可用性。二、使用场景1.常见使用场景高并发系统：数据量巨大且访问频率高的场景（如电商平台、社交网络）。大数据存储：单表数据量超过千万甚至亿级时，需要分表以
探秘开源项目 MapReduce：分布式计算的新篇章褚知茉Jade
探秘开源项目MapReduce：分布式计算的新篇章去发现同类优质开源项目:https://gitcode.com/在大数据处理领域，一个名字始终熠熠生辉，那就是。这是一个由Google提出的并被广泛应用的编程模型，用于大规模数据集的并行计算。本文将带你深入了解这一开源实现的魅力，分析其技术原理，探讨它的应用场景，并揭示它独特的特性。项目简介该项目是ChubbyJiang对原始GoogleMapRe
基于Python的微博舆情分析与可视化系统【附源码】 AI博士小张 python 数据分析数据库
基于Python的微博舆情分析与可视化系统摘要研究背景及意义一、数据流程总体架构二、详细处理流程与代码实现1.数据采集模块2.数据清洗与预处理3.情感分析与特征工程4.舆情分析模型5.可视化呈现三、性能优化要点摘要基于Python的微博舆情分析与可视化系统旨在利用大数据和自然语言处理技术，实时抓取、分析微博平台上的用户言论，并通过可视化手段揭示舆情的动态演变规律。系统采用Python技术栈，结合网
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring