枫_Maple

Spark on Yarn提交任务过程

这篇文章将从源码的角度向大家展示Spark是如何提交任务到Yarn上执行的，如有错误，还请各位指出。（基于Spark 3.0.0）

Spark On Yarn有两种模式：Yarn Client和Yarn Cluster

在这篇文章中，我们这里先讲Yarn Cluster

Yarn Cluster模式主要流程如上图所示，下面结合源码对这个过程进行详细的分析

1. 提交Application

当用spark-submit命令提交一个spark程序时，可以在spark-class文件最后加一行

这样就可以在控制台上看到最后执行的命令是什么样了

[root@cdh6029 usr]# sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster ./spark-examples_2.11-2.4.7.jar 
/usr/java/jdk1.8.0_171/bin/java -cp XXXX(一大堆依赖文件) org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --class org.apache.spark.examples.SparkPi ./spark-examples_2.11-2.4.7.jar

实际上会启动一个Java进程，主类是org.apache.spark.deploy.SparkSubmit，找到SparkSubmit的main方法，然后进入submit方法。我个人比较讨厌将大段的代码贴出来，因此这里只对重要的那几行代码进行说明。

//SparkSubmit.runMain()
val (childArgs, childClasspath, sparkConf, childMainClass) = prepareSubmitEnvironment(args)  //line 871

prepareSubmitEnvironment方法会对相关的参数和进行解析，同时会返回childMainClass，如果是Yarn Master模式下，childMainClass=org.apache.spark.deploy.yarn.YarnClusterApplication，然后通过反射创建YarnClusterApplication实例，并调用他的start方法

//SparkSubmit.runMain()
app.start(childArgs.toArray, sparkConf) //line 928

YarnClusterApplication.start()会调用Client.run()，然后调用submitApplication()

//Client.run()
this.appId = submitApplication() //line1177

这个appId就是向Yarn注册后得到的全局唯一id，submitApplication()就是Yarn Client向ResourceManager提交Application的过程。这个方法的逻辑很清晰，一些重要的步骤也做了注释说明

//Client.submitApplication()
def submitApplication(): ApplicationId = {
    ResourceRequestHelper.validateResources(sparkConf)

    var appId: ApplicationId = null
    try {
      launcherBackend.connect()  //与Launcher server建立连接
      yarnClient.init(hadoopConf)  //向hadoop初始化YarnClient服务
      yarnClient.start()           //启动服务
      //关于Hadoop service registry这一块可以去看官方文档，http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/registry/index.html


      logInfo("Requesting a new application from cluster with %d NodeManagers"
        .format(yarnClient.getYarnClusterMetrics.getNumNodeManagers))

      // Get a new application from our RM
      val newApp = yarnClient.createApplication()
      val newAppResponse = newApp.getNewApplicationResponse()
      appId = newAppResponse.getApplicationId()

      // The app staging dir based on the STAGING_DIR configuration if configured
      // otherwise based on the users home directory.
      val appStagingBaseDir = sparkConf.get(STAGING_DIR)
        .map { new Path(_, UserGroupInformation.getCurrentUser.getShortUserName) }
        .getOrElse(FileSystem.get(hadoopConf).getHomeDirectory())
      stagingDirPath = new Path(appStagingBaseDir, getAppStagingDir(appId))

      new CallerContext("CLIENT", sparkConf.get(APP_CALLER_CONTEXT),
        Option(appId.toString)).setCurrentContext()

      // Verify whether the cluster has enough resources for our AM
      verifyClusterResources(newAppResponse)

      // Set up the appropriate contexts to launch our AM
      val containerContext = createContainerLaunchContext(newAppResponse)
      val appContext = createApplicationSubmissionContext(newApp, containerContext)

      // Finally, submit and monitor the application
      logInfo(s"Submitting application $appId to ResourceManager")
      yarnClient.submitApplication(appContext)
      launcherBackend.setAppId(appId.toString)
      reportLauncherState(SparkAppHandle.State.SUBMITTED)

      appId
    } catch {
      case e: Throwable =>
        if (stagingDirPath != null) {
          cleanupStagingDir()
        }
        throw e
    }
  }

client向RM提交新建一个Applicaiton的申请后，RM会返回一个AppID，client拿到这个ID后会进行一系列的容器初始化操作，createContainerLaunchContext(newAppResponse)方法主要是拼接ApplicationMaster的JVM启动命令，这个命令会提交到RM，RM会让某一个NodeManager运行这个命令来启动ApplicationMaster，createApplicationSubmissionContext(newApp, containerContext)方法创建应用提交环境，这个环境除了保存了上一步创建的ContainerLaunchContext之外，主要保存了一些spark yarn相关的配置信息，spark.app.name，spark.yarn.queue，spark.yarn.am.nodeLabelExpression等。最后client将应用提交环境（appContext）打包为一个request提交给RM。

createContainerLaunchContext(newAppResponse)方法最重要的就是指定他所构建的JVM启动命令的主类，进入到该方法

//Client.createContainerLaunchContext()
//line 978-983
val amClass =
      if (isClusterMode) {
        Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName
      } else {
        Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName
      }

显然我们用的是集群模式，因此ApplicationMaster的启动主类为

org.apache.spark.deploy.yarn.ApplicationMaster

2. 启动Application Master

RM接收到Client启动Application的Request后，会启动一个ApplicationMaster，直接找到org.apache.spark.deploy.yarn.ApplicationMaster 这个类并进入他的main方法。

//ApplicationMaster.main()
//line 840-892
def main(args: Array[String]): Unit = {
    SignalUtils.registerLogger(log)
    val amArgs = new ApplicationMasterArguments(args)
    val sparkConf = new SparkConf()
    if (amArgs.propertiesFile != null) {
      Utils.getPropertiesFromFile(amArgs.propertiesFile).foreach { case (k, v) =>
        sparkConf.set(k, v)
      }
    }
    // Set system properties for each config entry. This covers two use cases:
    // - The default configuration stored by the SparkHadoopUtil class
    // - The user application creating a new SparkConf in cluster mode
    //
    // Both cases create a new SparkConf object which reads these configs from system properties.
    sparkConf.getAll.foreach { case (k, v) =>
      sys.props(k) = v
    }

    val yarnConf = new YarnConfiguration(SparkHadoopUtil.newConfiguration(sparkConf))
    master = new ApplicationMaster(amArgs, sparkConf, yarnConf)

    val ugi = sparkConf.get(PRINCIPAL) match {
      // We only need to log in with the keytab in cluster mode. In client mode, the driver
      // handles the user keytab.
      case Some(principal) if master.isClusterMode =>
        val originalCreds = UserGroupInformation.getCurrentUser().getCredentials()
        SparkHadoopUtil.get.loginUserFromKeytab(principal, sparkConf.get(KEYTAB).orNull)
        val newUGI = UserGroupInformation.getCurrentUser()

        if (master.appAttemptId == null || master.appAttemptId.getAttemptId > 1) {
          // Re-obtain delegation tokens if this is not a first attempt, as they might be outdated
          // as of now. Add the fresh tokens on top of the original user's credentials (overwrite).
          // Set the context class loader so that the token manager has access to jars
          // distributed by the user.
          Utils.withContextClassLoader(master.userClassLoader) {
            val credentialManager = new HadoopDelegationTokenManager(sparkConf, yarnConf, null)
            credentialManager.obtainDelegationTokens(originalCreds)
          }
        }

        // Transfer the original user's tokens to the new user, since it may contain needed tokens
        // (such as those user to connect to YARN).
        newUGI.addCredentials(originalCreds)
        newUGI

      case _ =>
        SparkHadoopUtil.get.createSparkUser()
    }

    ugi.doAs(new PrivilegedExceptionAction[Unit]() {
      override def run(): Unit = System.exit(master.run())
    })
  }

main方法中主要做了这么几件事情：

接收YarnClient传过来的参数并做解析，其中一个重要的参数就是--class，这个参数其实就是一开始用spark-submit提交spark程序时指定的--class org.apache.spark.examples.SparkPi
初始化ApplicationMaster
运行ApplicationMaster.run()

run()方法中主要做了三件事

注册一些系统变量

// Set the web ui port to be ephemeral for yarn so we don't conflict with
        // other spark processes running on the same box
        System.setProperty(UI_PORT.key, "0")

        // Set the master and deploy mode property to match the requested mode.
        System.setProperty("spark.master", "yarn")
        System.setProperty(SUBMIT_DEPLOY_MODE.key, "cluster")

        // Set this internal configuration if it is running on cluster mode, this
        // configuration will be checked in SparkContext to avoid misuse of yarn cluster mode.
        System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())

注册一个shutdown hook方法，顾名思义，这个钩子方法会在SparkContext关闭后被调用，向RM注销自己，删除staging目录，staging目录是spark任务的依赖文件所在目录
运行runDriver方法()。

3. 初始化SparkContext

进入runDriver()方法，AM会在startUserApplication()方法中新启动一个线程来初始化SparkContext，该过程实际上就是利用反射来调用userClass这个类的main方法，而这个userClass其实就是一开始用spark-submit命令传递进来的--class参数，而这个新启动的线程就是Driver线程，到这里终于开始运行我们自己写的Spark程序的代码了。

//ApplicationMaster.startUserApplication()
//line 718-719
val mainMethod = userClassLoader.loadClass(args.userClass)
      .getMethod("main", classOf[Array[String]])

//line 728
mainMethod.invoke(null, userArgs.toArray)

//line 758
userThread.setName("Driver")

在我们自己写的spark程序的main 方法中，肯定会new一个SparkContext，例如：

val sc = new SparkContext(sparConf)

继续回到runDriver()方法，在新启动一个线程用来初始化SparkContext后，AM会一直等待直到SparkContext初始化完成，然后再继续执行runDriver()方法后面的代码。

//ApplicationMaster.scala
//line 499-500
val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
        Duration(totalWaitTime, TimeUnit.MILLISECONDS))

那么很明显，初始化SparkContext后（或者说在初始化SparkContext的这个过程中），肯定在某个地方修改了这个sparkContextPromise，使得runDriver()方法得以继续执行下去，且这个地方肯定是在SparkContext的构造方法中。在SparkContext.scala的第608行中：

//SparkContext.scala
//line 608
_taskScheduler.postStartHook()

进入 _taskScheduler.postStartHook()

//YarnClusterScheduler.scala
//line 31-35
override def postStartHook(): Unit = {
    ApplicationMaster.sparkContextInitialized(sc)
    super.postStartHook()
    logInfo("YarnClusterScheduler.postStartHook done")
  }

//ApplicationMaster.scala
//line 401-408
private def sparkContextInitialized(sc: SparkContext) = {
    sparkContextPromise.synchronized {
      // Notify runDriver function that SparkContext is available
      sparkContextPromise.success(sc)
      // Pause the user class thread in order to make proper initialization in runDriver function.
      sparkContextPromise.wait()
    }
  }

//TaskSchedulerImpl.scala

//line 211-213
override def postStartHook(): Unit = {
    waitBackendReady()
  }

//line 891-912
private def waitBackendReady(): Unit = {
    if (backend.isReady) {
      return
    }
    while (!backend.isReady) {
      // Might take a while for backend to be ready if it is waiting on resources.
      if (sc.stopped.get) {
        // For example: the master removes the application for some reason
        throw new IllegalStateException("Spark context stopped while waiting for backend")
      }
      synchronized {
        this.wait(100)
      }
    }
  }

显然，sparkContextInitialized方法中调用了sparkContextPromise.success(sc)，这样，AM就可以继续运行runDriver方法了，同时该方法中还调用了sparkContextPromise.wait()，这样初始化SparkContext的线程（即Driver线程）会一直等待，直到runDriver将其唤醒。我们再回到runDriver()方法中，在518行中，会调用resumeDriver()方法，就是在这里将Driver线程唤醒，而紧接着在519行中，调用了Driver线程的join()方法，也就是runDriver会一直等待Driver线程运行完毕（即我们的整个Spark程序已经全部完成了）。

//ApplicationMaster.scala
//line 518-519 
resumeDriver()
userClassThread.join()

此外，当Driver线程被唤醒后，还会进入 waitBackenReady()方法等待backend准备好，这个backend就是YarnClusterSchedulerBackend，是用来后续与Executor进行通信的后台终端，只要SparkContext是正常初始化的，那么这里backend.isReady就是true。

//TaskSchedulerImpl.scala

//line 211-213
override def postStartHook(): Unit = {
    waitBackendReady()
  }

//line 891-912
private def waitBackendReady(): Unit = {
    if (backend.isReady) {
      return
    }
    while (!backend.isReady) {
      // Might take a while for backend to be ready if it is waiting on resources.
      if (sc.stopped.get) {
        // For example: the master removes the application for some reason
        throw new IllegalStateException("Spark context stopped while waiting for backend")
      }
      synchronized {
        this.wait(100)
      }
    }
  }

到这里先做一个小小的总结。

当我们进入到启动AM这一步时，AM进程中有两个主要的线程：

主线程，调用runDriver()方法
Driver线程，进行SparkContext的初始化

虽然是两个不同的线程，但这两个线程的工作几乎是处于一种互相同步（synchronized）的状态，一个线程在工作时，另一个线程会等待其工作完成，再继续工作。

4. 注册AM，申请资源 + 5. 返回可用资源列表

在上一步中我们说到，主线程在执行runDriver()方法的过程中会等待Driver线程完成SparkContext的初始化，然后进行一系列的准备工作，这些准备工作的步骤非常的复杂，也是这篇文章的重点。我们先讲第四步和第五步。

回到runDriver()方法，在startUserApplication()和resumeDriver()这两行代码之间，就是完成这一系列准备工作的代码。

//ApplicationMaster.scala
//line 499-517
val sc = ThreadUtils.awaitResult(sparkContextPromise.future,
Duration(totalWaitTime, TimeUnit.MILLISECONDS))
if (sc != null) {
    val rpcEnv = sc.env.rpcEnv

    val userConf = sc.getConf
    val host = userConf.get(DRIVER_HOST_ADDRESS)
    val port = userConf.get(DRIVER_PORT)
    registerAM(host, port, userConf, sc.ui.map(_.webUrl), appAttemptId)

    val driverRef = rpcEnv.setupEndpointRef(
    RpcAddress(host, port),
    YarnSchedulerBackend.ENDPOINT_NAME)
    createAllocator(driverRef, userConf, rpcEnv, appAttemptId, distCacheConf)
} else {
    // Sanity check; should never happen in normal operation, since sc should only be null
    // if the user app did not create a SparkContext.
    throw new IllegalStateException("User did not initialize spark context!")
}

AM会调用registerAM()方法向RM注册自己，这个过程是通过一个YarnRMClient来完成的，YarnRMClient就是AM与RM通信的一个客户端实例，后续与RM的通信都是通过YarnRMClient来完成的。

//ApplicationMaster.scala
//line 417-430
private def registerAM(
      host: String,
      port: Int,
      _sparkConf: SparkConf,
      uiAddress: Option[String],
      appAttempt: ApplicationAttemptId): Unit = {
    val appId = appAttempt.getApplicationId().toString()
    val attemptId = appAttempt.getAttemptId().toString()
    val historyAddress = ApplicationMaster
      .getHistoryServerAddress(_sparkConf, yarnConf, appId, attemptId)

    client.register(host, port, yarnConf, _sparkConf, uiAddress, historyAddress)
    registered = true
}

注册完成后，进入createAllocator()方法。

//ApplicationMaster.createAllocator()
//line 465-472

allocator = client.createAllocator(
    yarnConf,
    _sparkConf,
    appAttemptId,
    driverUrl,
    driverRef,
    securityMgr,
    localResources)

//ApplicationMaster.createAllocator()
//line 479
allocator.allocateResources()

该方法中，YarnRMClient会创建一个分配器，为向RM申请资源和分配资源做准备。然后调用allocator.allocateResources()来向RM申请资源。

//YarnAllocator.scala
//line 254-284
def allocateResources(): Unit = synchronized {
    updateResourceRequests()

    val progressIndicator = 0.1f
    // Poll the ResourceManager. This doubles as a heartbeat if there are no pending container
    // requests.
    val allocateResponse = amClient.allocate(progressIndicator)

    val allocatedContainers = allocateResponse.getAllocatedContainers()
    allocatorBlacklistTracker.setNumClusterNodes(allocateResponse.getNumClusterNodes)

    if (allocatedContainers.size > 0) {
      logDebug(("Allocated containers: %d. Current executor count: %d. " +
        "Launching executor count: %d. Cluster resources: %s.")
        .format(
          allocatedContainers.size,
          runningExecutors.size,
          numExecutorsStarting.get,
          allocateResponse.getAvailableResources))

      handleAllocatedContainers(allocatedContainers.asScala)
    }

    val completedContainers = allocateResponse.getCompletedContainersStatuses()
    if (completedContainers.size > 0) {
      logDebug("Completed %d containers".format(completedContainers.size))
      processCompletedContainers(completedContainers.asScala)
      logDebug("Finished processing %d completed containers. Current running executor count: %d."
        .format(completedContainers.size, runningExecutors.size))
    }
  }

amClient向AM申请资源，AM会将当前可用资源的列表作为Response返回，如果可用资源(allocatedContainers)数量大于0，则对这些资源进行处理

handleAllocatedContainers(allocatedContainers.asScala)

6. 启动Executor执行计算

继续上一步，我们进入handleAllocatedContainers(allocatedContainers.asScala)方法。

//YarnAllocator.scala
def handleAllocatedContainers(allocatedContainers: Seq[Container]): Unit = {
    val containersToUse = new ArrayBuffer[Container](allocatedContainers.size)

    // Match incoming requests by host
    val remainingAfterHostMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- allocatedContainers) {
      matchContainerToRequest(allocatedContainer, allocatedContainer.getNodeId.getHost,
        containersToUse, remainingAfterHostMatches)
    }

    // Match remaining by rack. Because YARN's RackResolver swallows thread interrupts
    // (see SPARK-27094), which can cause this code to miss interrupts from the AM, use
    // a separate thread to perform the operation.
    val remainingAfterRackMatches = new ArrayBuffer[Container]
    if (remainingAfterHostMatches.nonEmpty) {
      var exception: Option[Throwable] = None
      val thread = new Thread("spark-rack-resolver") {
        override def run(): Unit = {
          try {
            for (allocatedContainer <- remainingAfterHostMatches) {
              val rack = resolver.resolve(allocatedContainer.getNodeId.getHost)
              matchContainerToRequest(allocatedContainer, rack, containersToUse,
                remainingAfterRackMatches)
            }
          } catch {
            case e: Throwable =>
              exception = Some(e)
          }
        }
      }
      thread.setDaemon(true)
      thread.start()

      try {
        thread.join()
      } catch {
        case e: InterruptedException =>
          thread.interrupt()
          throw e
      }

      if (exception.isDefined) {
        throw exception.get
      }
    }

    // Assign remaining that are neither node-local nor rack-local
    val remainingAfterOffRackMatches = new ArrayBuffer[Container]
    for (allocatedContainer <- remainingAfterRackMatches) {
      matchContainerToRequest(allocatedContainer, ANY_HOST, containersToUse,
        remainingAfterOffRackMatches)
    }

    if (remainingAfterOffRackMatches.nonEmpty) {
      logDebug(s"Releasing ${remainingAfterOffRackMatches.size} unneeded containers that were " +
        s"allocated to us")
      for (container <- remainingAfterOffRackMatches) {
        internalReleaseContainer(container)
      }
    }

    runAllocatedContainers(containersToUse)

    logInfo("Received %d containers from YARN, launching executors on %d of them."
      .format(allocatedContainers.size, containersToUse.size))
  }

拿到了AM返回的可用资源列表(allocatedContainers)，AM并不是从可用容器中任意选择，而是会有一个选择策略，关于spark对container资源的选择的策略，详细的情况可以看：Spark源码——Spark on YARN Container资源申请分配、Executor的启动_aof-CSDN博客

这里就不再多做赘述。

得到要使用的容器(containersToUse)后，进入runAllocatedContainers(containersToUse)方法。YarnAllocator会用一个线程池来启动这些容器

//YarnAllocator.scala
//line 553-590
if (runningExecutors.size() < targetNumExecutors) {
        numExecutorsStarting.incrementAndGet()
        if (launchContainers) {
          launcherPool.execute(() => {
            try {
              new ExecutorRunnable(
                Some(container),
                conf,
                sparkConf,
                driverUrl,
                executorId,
                executorHostname,
                executorMemory,
                executorCores,
                appAttemptId.getApplicationId.toString,
                securityMgr,
                localResources,
                ResourceProfile.DEFAULT_RESOURCE_PROFILE_ID // use until fully supported
              ).run()
              updateInternalState()
            } catch {
              case e: Throwable =>
                numExecutorsStarting.decrementAndGet()
                if (NonFatal(e)) {
                  logError(s"Failed to launch executor $executorId on container $containerId", e)
                  // Assigned container should be released immediately
                  // to avoid unnecessary resource occupation.
                  amClient.releaseAssignedContainer(containerId)
                } else {
                  throw e
                }
            }
          })
        } else {
          // For test only
          updateInternalState()
        }
      }

进入ExecutorRunnable.run()方法中，

//ExecutorRunnable.scala
//line 63-69
def run(): Unit = {
  logDebug("Starting Executor Container")
  nmClient = NMClient.createNMClient()
  nmClient.init(conf)
  nmClient.start()
  startContainer()
}

首先会新建一个NoneManagerClient，后续会通过NMClient来通知NM启动Executor。再进入startContainer()方法，

//ExecutorRunnable.scala
//line 88-131
def startContainer(): java.util.Map[String, ByteBuffer] = {
  val ctx = Records.newRecord(classOf[ContainerLaunchContext])
    .asInstanceOf[ContainerLaunchContext]
  val env = prepareEnvironment().asJava

  ctx.setLocalResources(localResources.asJava)
  ctx.setEnvironment(env)

  val credentials = UserGroupInformation.getCurrentUser().getCredentials()
  val dob = new DataOutputBuffer()
  credentials.writeTokenStorageToStream(dob)
  ctx.setTokens(ByteBuffer.wrap(dob.getData()))

  val commands = prepareCommand()

  ctx.setCommands(commands.asJava)
  ctx.setApplicationACLs(
    YarnSparkHadoopUtil.getApplicationAclsForYarn(securityMgr).asJava)

  // If external shuffle service is enabled, register with the Yarn shuffle service already
  // started on the NodeManager and, if authentication is enabled, provide it with our secret
  // key for fetching shuffle files later
  if (sparkConf.get(SHUFFLE_SERVICE_ENABLED)) {
    val secretString = securityMgr.getSecretKey()
    val secretBytes =
      if (secretString != null) {
        // This conversion must match how the YarnShuffleService decodes our secret
        JavaUtils.stringToBytes(secretString)
      } else {
        // Authentication is not enabled, so just provide dummy metadata
        ByteBuffer.allocate(0)
      }
    ctx.setServiceData(Collections.singletonMap("spark_shuffle", secretBytes))
  }

  // Send the start request to the ContainerManager
  try {
    nmClient.startContainer(container.get, ctx)
  } catch {
    case ex: Exception =>
      throw new SparkException(s"Exception while starting container ${container.get.getId}" +
        s" on host $hostname", ex)
  }
}

这个方法中会创建一个ContainerLaunchContext，设置环境(env)、设置本地资源(localResources)、设置启动指令(commands)，然后判断spark.shuffle.service.enable参数，如果是true则会额外开启一个辅助NM进行Shuffle的服务，关于shuffle详细过程我后续也会写一篇文章专门进行详解。这里我们主要关注一下commands，因为AM想要NM启动Executor，必然要给NM一个形如/bin/java [mainClass] 的一个启动指令。进入prepareCommand()方法，

//ExecutorRunnable.scala
//line 204-217

val commands = prefixEnv ++
  Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++
  javaOpts ++
  Seq("org.apache.spark.executor.YarnCoarseGrainedExecutorBackend",
    "--driver-url", masterAddress,
    "--executor-id", executorId,
    "--hostname", hostname,
    "--cores", executorCores.toString,
    "--app-id", appId,
    "--resourceProfileId", resourceProfileId.toString) ++
  userClassPath ++
  Seq(
    s"1>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stdout",
    s"2>${ApplicationConstants.LOG_DIR_EXPANSION_VAR}/stderr")

这里我们找到了启动Executor的启动主类为org.apache.spark.executor.YarnCoarseGrainedExecutorBackend，找到main方法

//YarnCoarseGrainedExecutorBackend.scala
//line 72-83
def main(args: Array[String]): Unit = {
    val createFn: (RpcEnv, CoarseGrainedExecutorBackend.Arguments, SparkEnv, ResourceProfile) =>
      CoarseGrainedExecutorBackend = { case (rpcEnv, arguments, env, resourceProfile) =>
      new YarnCoarseGrainedExecutorBackend(rpcEnv, arguments.driverUrl, arguments.executorId,
        arguments.bindAddress, arguments.hostname, arguments.cores, arguments.userClassPath, env,
        arguments.resourcesFileOpt, resourceProfile)
    }
    val backendArgs = CoarseGrainedExecutorBackend.parseArguments(args,
      this.getClass.getCanonicalName.stripSuffix("$"))
    CoarseGrainedExecutorBackend.run(backendArgs, createFn)
    System.exit(0)
  }

调用了CoarseGrainedExecutorBackend.run(backendArgs, createFn)，首先通过远端的driver获取配置信息，然后初始化sparkEnv和rpcEnv，然后向rpcEnv注册了两个endpoint——“Executor”和“WorkerWatcher”，“WorkerWatcher”用来做Worker之间的通信，“Executor”用来做executor和driver之间的通信，我们重点关注executor和driver之间的通信过程。

CoarseGrainedExecutorBackend.scala
//line 331-337
val env = SparkEnv.createExecutorEnv(driverConf, arguments.executorId, arguments.bindAddress,
        arguments.hostname, arguments.cores, cfg.ioEncryptionKey, isLocal = false)
env.rpcEnv.setupEndpoint("Executor",
        backendCreateFn(env.rpcEnv, arguments, env, cfg.resourceProfile))
arguments.workerUrl.foreach { url =>
        env.rpcEnv.setupEndpoint("WorkerWatcher", new WorkerWatcher(env.rpcEnv, url))

进入env.rpcEnv.setupEndpoint("Executor",backendCreateFn(env.rpcEnv, arguments, env, cfg.resourceProfile))方法。

//NettyEnv.scala
//line 135-137
override def setupEndpoint(name: String, endpoint: RpcEndpoint): RpcEndpointRef = {
    dispatcher.registerRpcEndpoint(name, endpoint)
}

进入dispatcher.registerRpcEndpoint(name, endpoint)，dispatcher内部有两个hashMap，分别用来保存在其上面注册的endpoint和相应的endpointRef，当一个新的endpoint向dispatcher注册时，dispatcher会为其新建一个messageLoop，messageLoop就是一个线程池，里面的线程会在一个死循环中不停的处理这个endpoint所接收到的数据，而endpointRef其实就是endpoint自身的一个引用，只不过endpoint用来做数据的接收，endpointRef用来做数据的发送。

//Dispatcher.scala
//line 55-88
def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {
    val addr = RpcEndpointAddress(nettyEnv.address, name)
    val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv)
    synchronized {
      if (stopped) {
        throw new IllegalStateException("RpcEnv has been stopped")
      }
      if (endpoints.containsKey(name)) {
        throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
      }

      // This must be done before assigning RpcEndpoint to MessageLoop, as MessageLoop sets Inbox be
      // active when registering, and endpointRef must be put into endpointRefs before onStart is
      // called.
      endpointRefs.put(endpoint, endpointRef)

      var messageLoop: MessageLoop = null
      try {
        messageLoop = endpoint match {
          case e: IsolatedRpcEndpoint =>
            new DedicatedMessageLoop(name, e, this)
          case _ =>
            sharedLoop.register(name, endpoint)
            sharedLoop
        }
        endpoints.put(name, messageLoop)
      } catch {
        case NonFatal(e) =>
          endpointRefs.remove(endpoint)
          throw e
      }
    }
    endpointRef
  }

进入new DedicatedMessageLoop(name, e, this)，在DedicatedMessageLoop构造方法中，new了一个Inbox和一个线程池threadpool，然后向线程池中提交一个receiveLoopRunnable任务。

//MessageLoop.scala  
//line 165-178
  private val inbox = new Inbox(name, endpoint)

  override protected val threadpool = if (endpoint.threadCount() > 1) {
    ThreadUtils.newDaemonCachedThreadPool(s"dispatcher-$name", endpoint.threadCount())
  } else {
    ThreadUtils.newDaemonSingleThreadExecutor(s"dispatcher-$name")
  }

  (1 to endpoint.threadCount()).foreach { _ =>
    threadpool.submit(receiveLoopRunnable)
  }

  // Mark active to handle the OnStart message.
  setActive(inbox)

receiveLoopRunnable其实就是上面所说的死循环，他会一直从inbox队列中(active)取inbox，也就是endpoint就受到的消息，然后根据消息的类型，调用endpoint的相应的方法。

//MessageLoop.scala 
//line 40-42
  protected val receiveLoopRunnable = new Runnable() {
    override def run(): Unit = receiveLoop()
  }

//MessageLoop.scala
//line 65-91
private def receiveLoop(): Unit = {
    try {
      while (true) {
        try {
          val inbox = active.take()
          if (inbox == MessageLoop.PoisonPill) {
            // Put PoisonPill back so that other threads can see it.
            setActive(MessageLoop.PoisonPill)
            return
          }
          inbox.process(dispatcher)
        } catch {
          case NonFatal(e) => logError(e.getMessage, e)
        }
      }
    } catch {
      case _: InterruptedException => // exit
        case t: Throwable =>
          try {
            // Re-submit a receive task so that message delivery will still work if
            // UncaughtExceptionHandler decides to not kill JVM.
            threadpool.execute(receiveLoopRunnable)
          } finally {
            throw t
          }
    }
  }

在receiveLoopRunnable 被启动后，他所处理的第一条消息一定是一个OnStart类型的消息，因为在inbox的构造方法中，会向自己的消息队列中添加一个OnStart类型的消息。

//Inbox.scala
//line 78-80
inbox.synchronized {
    messages.add(OnStart)
}

这样，receiveLoopRunnable就会相应地调用endpoint的onStart()方法。这个endpoint就是一开始的main方法中createFn函数中new出来的YarnCoarseGrainedExecutorBackend，找到YarnCoarseGrainedExecutorBackend的onStart()方法，继承自CoarseGrainedExecutorBackend

onStart()方法中，首先会通过driverUrl获取远端driver的endpointRef，该endpointRef用于向driver发送消息，这里调用ask方法发送了一个类型为RegisterExecutor的消息。

//CoarseGrainedExecutorBackend.scala
//line 82-101
override def onStart(): Unit = {
    logInfo("Connecting to driver: " + driverUrl)
    try {
      _resources = parseOrFindResources(resourcesFileOpt)
    } catch {
      case NonFatal(e) =>
        exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
    }
    rpcEnv.asyncSetupEndpointRefByURI(driverUrl).flatMap { ref =>
      // This is a very fast action so we can use "ThreadUtils.sameThread"
      driver = Some(ref)
      ref.ask[Boolean](RegisterExecutor(executorId, self, hostname, cores, extractLogUrls,
        extractAttributes, _resources, resourceProfile.id))
    }(ThreadUtils.sameThread).onComplete {
      case Success(_) =>
        self.send(RegisteredExecutor)
      case Failure(e) =>
        exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)
    }(ThreadUtils.sameThread)
  }

那么我们再回到driver线程中，driver线程中必然也有一个形如XXXBackend的endpoint用于与executor进行通信，这个Backend就是前面提到的SparkContex初始化里的SchedulerBackend。

找到SchedulerBackend的实现类

org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend

找到他的receiveAndReply方法，显然CoarseGrainedSchedulerBackend接收到的是Executor发过来的类型为RegisterExecutor的消息。这个方法很长，主要就是做一些Executor的注册工作，最后给Executor回复一个true。

context.reply(true)

//CoarseGrainedSchedulerBackend.scala
//line 207-256
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls,
          attributes, resources, resourceProfileId) =>
        if (executorDataMap.contains(executorId)) {
          context.sendFailure(new IllegalStateException(s"Duplicate executor ID: $executorId"))
        } else if (scheduler.nodeBlacklist.contains(hostname) ||
            isBlacklisted(executorId, hostname)) {
          // If the cluster manager gives us an executor on a blacklisted node (because it
          // already started allocating those resources before we informed it of our blacklist,
          // or if it ignored our blacklist), then we reject that executor immediately.
          logInfo(s"Rejecting $executorId as it has been blacklisted.")
          context.sendFailure(new IllegalStateException(s"Executor is blacklisted: $executorId"))
        } else {
          // If the executor's rpc env is not listening for incoming connections, `hostPort`
          // will be null, and the client connection should be used to contact the executor.
          val executorAddress = if (executorRef.address != null) {
              executorRef.address
            } else {
              context.senderAddress
            }
          logInfo(s"Registered executor $executorRef ($executorAddress) with ID $executorId")
          addressToExecutorId(executorAddress) = executorId
          totalCoreCount.addAndGet(cores)
          totalRegisteredExecutors.addAndGet(1)
          val resourcesInfo = resources.map{ case (k, v) =>
            (v.name,
             new ExecutorResourceInfo(v.name, v.addresses,
               // tell the executor it can schedule resources up to numParts times,
               // as configured by the user, or set to 1 as that is the default (1 task/resource)
               taskResourceNumParts.getOrElse(v.name, 1)))
          }
          val data = new ExecutorData(executorRef, executorAddress, hostname,
            0, cores, logUrlHandler.applyPattern(logUrls, attributes), attributes,
            resourcesInfo, resourceProfileId)
          // This must be synchronized because variables mutated
          // in this block are read when requesting executors
          CoarseGrainedSchedulerBackend.this.synchronized {
            executorDataMap.put(executorId, data)
            if (currentExecutorIdCounter < executorId.toInt) {
              currentExecutorIdCounter = executorId.toInt
            }
            if (numPendingExecutors > 0) {
              numPendingExecutors -= 1
              logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
            }
          }
          listenerBus.post(
            SparkListenerExecutorAdded(System.currentTimeMillis(), executorId, data))
          // Note: some tests expect the reply to come after we put the executor in the map
          context.reply(true)
        }

继续回到Executor那边的endpoint的onStart()方法，driver回复Executor已注册成功，那么Executor就会向自己发送一个类型为RegisteredExecutor的消息

case Success(_) =>
  self.send(RegisteredExecutor)
case Failure(e) =>
  exitExecutor(1, s"Cannot register with driver: $driverUrl", e, notifyDriver = false)

也就是说，调用

self.send(RegisteredExecutor)

方法，会向inbox中增加一条类型为RegisteredExecutor的消息，然后messageLoop在循环处理消息的过程中会处理到这条消息，然后调用相应的endpoint的receive()方法。直到这里，才真正将Executor给new了出来（前面所说的Executor是指类型为CoarseGrainedExecutorBackend，名为“Executor”的endpoint，他的作用是与driver进行通信，这里new出来的Executor实例才是真正用来执行计算任务的），然后又向driver发送了一条类型为LaunchedExecutor的消息。

//CoarseGrainedExecutorBackend.scala
//line 148-157
  case RegisteredExecutor =>
    logInfo("Successfully registered with driver")
    try {
      executor = new Executor(executorId, hostname, env, userClassPath, isLocal = false,
        resources = _resources)
      driver.get.send(LaunchedExecutor(executorId))
    } catch {
      case NonFatal(e) =>
        exitExecutor(1, "Unable to create executor due to " + e.getMessage, e)
    }

driver接收到Executor发过来的类型为LaunchedExecutor消息，后面就是进行任务的切分、调度和执行过程了，我们留在下一篇文章中再讲。

到此为止，driver和executor的环境已经完全建立起来了。我们做一个总结。在最开始的那张图中，我们将Spark On Yarn Cluster的过程分为了6步

第6步启动Executor，执行计算从YarnCoarseGrainedExecutorBackend的main方法开始，又分为这么几个小步骤：

Executor进程启动，初始化环境sparkEnv和rpcEnv
向rpcEnv注册一个名为“Executor”的endpoint和一个名为“WorkerWatcher”的endpoint，“Executor”用于与SparkContext(driver线程)通信，“WorkerWatcher”用于Worker之间相互通信
“Executor”调用onStart()方法，向driver发送一个类型为RegisterExecutor的消息，向driver提出注册
driver完成Executor的注册，向Executor回复注册成功
“Executor”向自己发送一个类型为RegisteredExecutor的消息，然后receive()方法被调用
Executor进程创建Executor计算对象，向driver发送一个类型为LaunchedExecutor的消息
driver进行任务的切分、调度和执行

rpc通信过程的UML时序图如下

Executor接收消息的流程：当rpcEnv接收到一条消息后，会进入NettyRpcEnv.receive()方法，rpcEnv首先会将这条消息进行简单的解析和封装，然后交由dispatcher处理，dispatcher会根据消息中的endpoint name，从endpoints中找到相应的messageLoop，每一个messageLoop中绑定有一个消息队列linkedBlockingQueue，dispatcher会将消息封装为一个inbox并放入这个linkedBlockingQueue，同时messageLoop会不断地从linkedBlockingQueue中取inbox，然后根据inbox中的消息的类型调用endpoint的相应的方法。

Executor发送消息的流程：当一个endpointRef需要发送消息时，会调用rpcEnv.send()方法，rpcEnv会根据消息的目的地是本地还是远程分开处理，如果是本地消息，即endpointRef所对应的endpoint就在本地，那么直接让dispatcher去处理，因为本地所有的endpoint都在dispatcher的endpoints中，那么dispatcher会将这条消息直接放入相应的inbox队列中就可以了；如果不是本地消息，那么会根据消息的目的地找到相应的outbox，outbox中保存了与远程endpoint进行通信的transportClient，然后transportClient将消息发送出去。

个人对rpc通信过程的一些通信组件的理解

1. rpcEnv：rpc通信环境，底层是一个netty服务器，用来接收和发送数据。

2. endpoint：通信的终端，一个endpoint在一个Executor进程中具有一个唯一的endpoint name，里面封装了接收数据后的处理逻辑（receive方法）。每个endpoint都有一个相应的endpointRef，这个endpointRef在本地是由dispatcher来维护的。endpoint的生命周期为

constructor -> onStart -> receive* -> onStop

3. endpointRef：对远程（或本地）的endpoint的引用，可以通过endpointRef向远程（或本地）的endpoint发送消息

4. messageLoop：本地的每一个endpoint都有自己对应的一个messageLoop（也可以多个endpoint共享一个messageLoop），messageLoop中有一个inbox的阻塞队列和一个线程池，线程池中的线程会作为消费者不断地去消费阻塞队列中的inbox

5. inbox：用于保存消息的收件箱，messageLoop消费一个inbox，其实就是让该inbox去根据消息的类型调用endpoint相应的方法（onStart，reveive，onStop）

6. dispatcher：相当于一个调度器，他保存了两个HashMap，一个是endpoints，key=endpoint name，value=messageLoop，另一个是endpointRefs，key=endpoint，value=相应的endpoint的endpointRef。当rpcEnv接收到消息时，会交由dispatcher来处理，dispatcher能根据endpoints找到处理这个消息的messageLoop，从而对该消息进行处理。

结尾

这篇文章写到这里终于写完了，在下一篇文章中，我会解析Driver (SparkContext) 是如何对任务进行切分、调度以及Executor是如何对任务进行接收和执行。

如有错误，希望各位指正~

你可能感兴趣的:(大数据,spark,yarn,大数据)

浅聊读写分离不全数据库 JAVA C#相关数据库读写分离 C#JAVA
一、前言最近工作很繁忙，同事的离职给我带来了很多的事情，投身于博客的时间比较少，另外在宿舍住可能部分的时间要随大流，鹤立鸡群有一些不好，当然这也是给自己找借口和理由，趁着周末整理下最近的感悟；另外公司用的ElasticSearch，最近我也在探索，微服务方面暂时搁浅，待到搬出宿舍的时候在开始一波666的操作；另外随着数据量增加自己还需要去接触波大数据东西，不得说真是有些挑战和机遇，看自己如何把握了
电力知识图谱与大模型的结合：从构建到行业应用的深度解析 Cc不爱吃洋葱知识图谱人工智能自然语言处理大模型大语言模型 LLM 语言模型
随着大数据和人工智能技术的飞速发展，电力行业迎来了智能化转型的全新契机。电力知识图谱作为一种将数据转化为结构化知识的技术，正在赋能故障诊断、设备管理、运维优化等核心场景。而当知识图谱与大模型相结合，更能释放强大的知识推理和智能预测能力，为行业智慧化发展注入新动力。本文将从专业视角，深入探讨电力知识图谱的构建过程、大模型的融入方法，以及它们在实际应用中的落地场景。通过具体案例剖析与技术解读，帮助你了
360智算中心：万卡GPU集群落地实践 ZVAyIVqt0UFji
360智算中心是一个融合了人工智能、异构计算、大数据、高性能网络、AI平台等多种技术的综合计算设施，旨在为各类复杂的AI计算任务提供高效、智能化的算力支持。360智算中心不仅具备强大的计算和数据处理能力，还结合了AI开发平台，使得计算资源的使用更加高效和智能化。360内部对于智算中心的核心诉求是性能和稳定性，本文将深入探讨360智算中心在万卡GPU集群中的落地实践过程，包括算力基础设施搭建、集群优
字节跳动后端或大数据基础知识面试题及参考答案（2万字长文）大模型大数据攻城狮大数据大厂面试数据结构算法 leetcode
目录Redis的数据类型Redis数据类型的底层数据结构三次握手、四次挥手Redis持久化机制购物车为什么用Redis存，是永久存储吗MySQL的InnoDB索引数据结构哪些SQL的关键字会让索引失效队列、栈、数组、链表有什么不同讲讲爬虫的构成爬虫抓到的数据不清洗吗？不去重吗？对爬虫的更多了解Linux进程间通信机制进程和线程的区别线程私有的数据讲一下堆排序，每次调整的时间复杂度？堆排序是稳定的吗
《2025：中国行业新方向与民营企业的使命》晚风る传媒
2025年，中国经济正站在新的历史节点上，科技创新、数字经济、绿色经济等成为发展的核心驱动力。在这样的背景下，2025年民营企业座谈会的召开，无疑为中国未来行业的发展指明了方向。本文将结合座谈会内容，探讨中国未来行业发展的新方向。一、数字经济：创新驱动的核心引擎数字经济已成为全球经济增长的重要引擎，而民营企业在其中扮演着关键角色。2025年，数字经济将继续深化，涵盖云计算、大数据、人工智能、物联网
网络安全：挑战、技术与未来发展一ge科研小菜鸡运维网络运维
个人主页：一ge科研小菜鸡-CSDN博客期待您的关注1.引言在数字化时代，网络安全（Cybersecurity）已成为全球关注的焦点。随着云计算、大数据、人工智能（AI）、物联网（IoT）等技术的发展，企业和个人的敏感数据在互联网上的流通日益增加，黑客攻击、数据泄露、勒索软件等网络安全威胁也日趋严峻。本文将从网络安全的核心概念、常见攻击手段、防御技术、企业安全策略以及未来发展趋势等方面，深入探讨如
DolphinScheduler环境搭建、服务启动等常见问题及解决方案数据库
ApacheDolphinScheduler作为一款分布式易扩展的工作流调度系统，广泛应用于大数据任务编排。然而，在实际使用中，用户可能会遇到环境搭建、服务启动、工作流执行等问题。本文结合社区文档与用户实践经验，整理以下高频问题及详细解决方案，帮助用户快速定位并解决问题。一、安装与部署问题环境依赖配置错误问题：部署时因缺少JDK、Maven或数据库配置导致失败。解决方案：安装JDK1.8+并配置J
文心快码智能体不断发展，真正与AI协同工作
文心快码(BaiduComate)是基于百度文心大模型，在研发全流程全场景下为开发者提供辅助建议的智能代码助手。结合百度积累多年的编程现场大数据、外部优秀开源数据，可为开发者生成更符合实际研发场景的优秀代码，提升编码效率，释放“十倍”软件生产力。如果您对【文心快码企业版】感兴趣，希望获取更多详细信息，点击进入企业服务咨询我们会尽快安排专业人员与您取得联系！我们期待与您建立联系，为您的企业带来更高效
pnpm、npm、yarn 包管理工具『优劣对比』及『环境迁移』「已注销」 npm
前言博主在开发前端网站的时候，发现随着开发的项目的逐渐增多，安装的依赖包越来越臃肿，依赖包的安装速度也是非常越来越慢，多项目开发管理也是比较麻烦。之前我就了解过pnpm，但是当时担心更换包管理环境可能会出现的依赖等问题，并且也没有急切的需求，所以当时并没有立即更换综上所述，随着上面问题的出现，更换包管理环境也逐渐提上日程，所以本文主要将会简单对比pnpm和npm/yarn，并且详细讲解如何在多项目
Yarn 常见问题及排查指南艳艳子呀 javascript 前端 yarn
介绍Yarn是一个高效、可靠的JavaScript依赖管理工具，但在使用过程中，可能会遇到各种问题。本文总结了一些常见的Yarn错误及其排查方法，帮助你快速定位并解决问题。1.网络相关问题❌错误1：网络超时errorAnunexpectederroroccurred:"https://registry.yarnpkg.com/...:ETIMEDOUT".可能原因✅网络不稳定，导致请求超时✅公司或
智能城市：科技驱动的未来城市给生活加糖！热门知识科技语音识别人工智能
随着科技的不断发展和城市化进程的加速，传统城市面临着诸多挑战，包括交通拥堵、环境污染、资源浪费和公共服务不足等问题。为了解决这些问题，智能城市（SmartCity）的概念应运而生。智能城市是利用现代信息技术、物联网、大数据、云计算、人工智能等手段，对城市的各个方面进行全面的智能化管理与优化，从而提高城市运行效率、改善居民生活质量，并实现可持续发展的城市目标。一、什么是智能城市？智能城市是指通过信息
推动AI云产业向深向实，云·AI·算力创新发展大会即将启幕科技云报道云计算 AI 云计算
近年来，以AIGC为代表的新兴技术正加速演进，全球站在智能化变革的起点，人工智能与云计算的深度融合，也驱动云计算进入第三次发展浪潮，迎来前所未有的机遇。伴随AI的快速发展，2024年《政府工作报告》明确提出，制定支持数字经济高质量发展政策，深化大数据、人工智能等研发应用，开展“人工智能+”行动。这意味着AI正在成为产业创新的核心抓手和驱动新质生产力的关键引擎，而云计算作为基础底座将在其中扮演至关重
PHP + XlsWriter实现百万级数据导入导出，如何实现程序员阿凡提 PHP实战教程 php 开发语言
在PHP中使用XlsWriter（如xlswriter扩展）处理百万级数据的导入导出，需重点解决内存占用和性能问题。以下是分步骤的实现方案：一、环境准备1安装xlswriter扩展从PECL安装：peclinstallxlswriter在php.ini中启用扩展：extension=xlswriter.so2调整PHP配置处理大数据时需增加内存和执行时间限制：memory_limit=1024Mm
【国产自研-神软大数据平台3.4.10】王旭亮_ 数据治理大数据技术栈大数据数据治理神软产品国产自研
产品介绍：北京神舟航天软件技术股份有限公司自研全栈式大数据平台神软大数据平台是数据全生命周期一站式数据治理开发平台，提供数据采集、数据集成、数据开发、数据治理、数据服务等功能，支持大数据存储、大数据计算分析引擎等数据底座，充分发挥数据价值作用，聚焦企业数字化转型，提升组织的信息化水平和高效应用决策。1、可以兼容并适配各种服务器（X86\ARM）、操作系统包括Centos、麒麟V10SP3、欧拉（o
使用Docker安装Spark集群(带有HDFS) Sicilly_琬姗云计算大数据 docker spark hdfs
本实验在CentOS7中完成第一部分：安装Docker这一部分是安装Docker，如果机器中已经安装过Docker，可以直接跳过[root@VM-48-22-centos~]#systemctlstopfirewalld[root@VM-48-22-centos~]#systemctldisablefirewalld[root@VM-48-22-centos~]#systemctlstatusfi
使用Docker部署Spark集群小孩真笨工程开发技术 Cloud Data Docker Spark
使用Docker部署Spark集群克隆包含启动脚本的git仓库启动Spark0.8.0集群并切换至SparkShell环境不带参数运行部署脚本*运行一些小的例子终止集群克隆包含启动脚本的git仓库*[email protected]:amplab/docker-scripts.git当然，在这之前你必须已经配置了Github的SSH密钥认证，如果没有配置，会提示Per
从0开始使用Docker搭建Spark集群吃鱼的羊 SPARK Hadoop
https://www.jianshu.com/p/ee210190224f?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation最近在学习大数据技术，朋友叫我直接学习Spark，英雄不问出处，菜鸟不问对错，于是我就开始了Spark学习。为什么要在Docker上搭建Spark集群
包管理工具npm、yarn、pnpm、cnpm详解懒羊羊我小弟前端工程化 npm 前端 yarn cnpm node.js
1.包管理工具1.1npm#安装$node自带npm#基本用法npminstallpackage#安装包npminstall#安装所有依赖npminstall-gpackage#全局安装npmuninstallpackage#卸载包npmupdatepackage#更新包npmrunscript#运行脚本#特点-Node.js默认包管理器-依赖平铺结构-包安装慢-磁盘空间占用大1.2yarn#安装
Fink与Hadoop的简介以及联系 Bugkillers hadoop 大数据分布式
Fink和Hadoop是两个常用于大数据处理的开源工具，它们可以搭配使用以构建高效的数据处理系统。一、Fink和Hadoop的关系Fink：1、Fink是一个分布式流处理框架，专注于实时数据处理。它支持高吞吐、低延迟的流处理，适用于实时分析、事件驱动应用等场景。2、Fink提供精确一次（exactly-once）语义，确保数据处理的准确性。Hadoop：1、Hadoop是一个分布式存储和批处理框架
全面了解 Node.js、npm、yarn、node-gyp、Python、Visual Studio 和 Electron 的关联性古木12345 node.js npm python electron yarn
好的，以下是一个全面且深入的详细解析，针对Electron桌面开发工具链及其相关依赖，包含每个关键模块（Node.js、npm、yarn、Python、node-gyp、VisualStudio、Electron）及其实现原理、功能、关联性和使用示例，确保您能完整理解这些工具的用途和关联性。一、基础工具链模块详细介绍1.Node.js1.1功能概述Node.js是一个运行时环境，它允许开发者在服务
Windows环境下构建本地多节点Elasticsearch集群静谧星光c windows elasticsearch jenkins 大数据
Windows环境下构建本地多节点Elasticsearch集群在大数据领域，Elasticsearch是一个经常使用的分布式搜索和分析引擎。本文将介绍如何在Windows操作系统下搭建一个本地的多节点Elasticsearch集群。通过搭建本地集群，我们可以在单一系统上模拟出多个节点，从而加深对Elasticsearch集群内工作原理的理解。准备工作首先，确保你的系统已经安装了Java开发环境（
Hbase深入浅出天才之上数据存储 Hbase 大数据存储
目录HBase在大数据生态圈中的位置HBase与传统关系数据库的区别HBase相关的模块以及HBase表格的特性HBase的使用建议Phoenix的使用总结HBase在大数据生态圈中的位置提到大数据的存储，大多数人首先联想到的是Hadoop和Hadoop中的HDFS模块。大家熟知的Spark、以及Hadoop的MapReduce，可以理解为一种计算框架。而HDFS，我们可以认为是为计算框架服务的存
深入浅出了解HBase及RDD编程山海王子大数据 hbase
深入浅出了解HBaseHBase简介架构HBase是什么样的数据库？关键是数据模型关键要素：什么是单元格时间戳的功能是什么？HBase为什么能存储海量数据创建一个HBase表配置Spark编写程序读取HBase数据编写程序向HBase写入数据关于搭建HBase高可用集群的图文教程，可参考我的另一篇博文——安装并配置HBase集群（5个节点）。HBase简介HBase是GoogleBigTable的
如何在Java中设计大规模稀疏数据处理架构省赚客app开发者 java 架构开发语言
如何在Java中设计大规模稀疏数据处理架构大家好，我是微赚淘客系统3.0的小编，是个冬天不穿秋裤，天冷也要风度的程序猿！在大数据时代，稀疏数据在各个领域变得越来越常见，例如推荐系统、自然语言处理、图像处理等。稀疏数据通常包含大量零值或空值，直接使用传统的数据处理架构可能导致效率低下，内存和计算资源浪费。因此，设计一个高效的稀疏数据处理架构成为Java开发者面临的关键挑战。本文将探讨如何在Java中
程序员如何将技术咨询服务转化为SaaS产品 AI天才研究院 ChatGPT AI大模型企业级应用开发实战 DeepSeek R1 &大数据AI人工智能大模型大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
引言与概述在当今快速发展的数字化时代，软件即服务（SaaS）已经成为企业服务市场的重要趋势。随着云计算和大数据技术的普及，越来越多的企业开始将传统的技术咨询服务转化为SaaS产品，以提供更加灵活、可扩展的服务。这不仅为企业带来了新的增长点，也极大地改变了技术服务行业的发展格局。SaaS市场的增长趋势SaaS市场呈现出快速增长的态势，根据市场研究机构的预测，全球SaaS市场的规模将在未来几年内持续扩
[1138]基于JAVA的安全监管网络人员信息智慧管理系统的设计与实现阿鑫学长【毕设工场】 java 网络开发语言课程设计毕业设计
毕业设计（论文）开题报告表姓名学院专业班级题目基于JAVA的安全监管网络人员信息智慧管理系统的设计与实现指导老师（一）选题的背景和意义选题背景与意义：随着信息技术的飞速发展和大数据时代的到来，安全监管网络人员信息管理面临着前所未有的挑战与机遇。当前，执法人员、监督员以及各类从业人员的信息档案管理工作日益繁重，传统的人工管理模式效率低下、易出错且难以满足实时更新、精准查询的需求。特别是在复杂的执法环
在Hadoop集群中实现数据安全：技术与策略并行 Echo_Wish 实战高阶大数据 hadoop 大数据分布式
在Hadoop集群中实现数据安全：技术与策略并行随着大数据技术的广泛应用，Hadoop已经成为处理和存储海量数据的首选平台。然而，随着数据规模的扩大，如何确保Hadoop集群中的数据安全也成为了亟待解决的难题。毕竟，数据安全不仅关系到企业的隐私保护，也直接影响到数据的可信度与可用性。本文将探讨如何在Hadoop集群中实现数据安全，分析数据加密、访问控制、审计日志等方面的技术与策略，并通过一些具体的
什么是GaussDB 如清风一般 gaussdb
什么是GaussDB简介GaussDB是华为自主创新研发的分布式关系型数据库。该产品具备企业级复杂事务混合负载能力，同时支持分布式事务，同城跨AZ部署，数据0丢失，支持1000+的扩展能力，PB级海量存储。同时拥有云上高可用，高可靠，高安全，弹性伸缩，一键部署，快速备份恢复，监控告警等关键能力，能为企业提供功能全面，稳定可靠，扩展性强，性能优越的企业级数据库服务。应用场景交易型应用大并发、大数据量
《传统教培机构的痛点：数字化转型如何破局？》
数字化浪潮下的困境在当今时代，数字化浪潮正以前所未有的速度席卷全球，深刻地改变着人们的生活、工作和学习方式。这是一个数据爆炸的时代，数据成为了驱动社会发展的核心要素之一。据统计，全球每天产生的数据量高达数万亿字节，这些数据涵盖了人们生活的方方面面，从购物习惯到社交行为，从健康状况到学习偏好，都被数字化记录下来。[]()数字化时代的技术创新日新月异，人工智能、大数据、云计算、物联网等新兴技术不断涌现
Aerospike 小的~~ nosql Aerospike
文章来源：拉勾教育Java高薪训练营第3期Aerospike介绍Aerospike（简称AS）是一个分布式，可扩展的键值存储的NoSQL数据库。T级别大数据高并发的结构化数据存储读写操作达微妙级，99%的响应可在1毫秒内实现采用混合架构，索引存储在内存中，而数据可存储在机械硬盘(HDD)或固态硬盘(SSD)上（也可存储在内存）AS内部在访问SSD屏蔽了文件系统层级，直接访问地址，保证了数据的读取速
springmvc 下 freemarker页面枚举的遍历输出杨白白 enum freemarker
spring mvc freemarker 中遍历枚举 1枚举类型有一个本地方法叫values（），这个方法可以直接返回枚举数组。所以可以利用这个遍历。 enum public enum BooleanEnum { TRUE(Boolean.TRUE, "是"), FALSE(Boolean.FALSE, "否");
实习简要总结 byalias 工作
来白虹不知不觉中已经一个多月了，因为项目还在需求分析及项目架构阶段，自己在这段时间都是在学习相关技术知识，现在对这段时间的工作及学习情况做一个总结：（1）工作技能方面大体分为两个阶段，Java Web 基础阶段和Java EE阶段 1）Java Web阶段在这个阶段，自己主要着重学习了 JSP, Servlet, JDBC, MySQL，这些知识的核心点都过了一遍，也
Quartz——DateIntervalTrigger触发器 eksliang quartz
转载请出自出处：http://eksliang.iteye.com/blog/2208559 一.概述 simpleTrigger 内部实现机制是通过计算间隔时间来计算下次的执行时间，这就导致他有不适合调度的定时任务。例如我们想每天的 1：00AM 执行任务，如果使用 SimpleTrigger，间隔时间就是一天。注意这里就会有一个问题，即当有 misfired 的任务并且恢复执行时，该执行时间
Unix快捷键 18289753290 unix Unix；快捷键;
复制，删除，粘贴： dd:删除光标所在的行 &nbs
获取Android设备屏幕的相关参数酷的飞上天空 android
包含屏幕的分辨率以及屏幕宽度的最大dp 高度最大dp TextView text = (TextView)findViewById(R.id.text); DisplayMetrics dm = new DisplayMetrics(); text.append("getResources().ge
要做物联网？先保护好你的数据蓝儿唯美数据
根据Beecham Research的说法，那些在行业中希望利用物联网的关键领域需要提供更好的安全性。在Beecham的物联网安全威胁图谱上，展示了那些可能产生内外部攻击并且需要通过快速发展的物联网行业加以解决的关键领域。 Beecham Research的技术主管Jon Howes说：“之所以我们目前还没有看到与物联网相关的严重安全事件，是因为目前还没有在大型客户和企业应用中进行部署，也就
Java取模（求余）运算随便小屋 java
整数之间的取模求余运算很好求，但几乎没有遇到过对负数进行取模求余，直接看下面代码： /** * * @author Logic * */ public class Test { public static void main(String[] args) { // TODO A
SQL注入介绍 aijuans sql注入
二、SQL注入范例这里我们根据用户登录页面 <form action="" > 用户名：<input type="text" name="username"><br/> 密码：<input type="password" name="passwor
优雅代码风格 aoyouzi 代码
总结了几点关于优雅代码风格的描述：代码简单：不隐藏设计者的意图，抽象干净利落，控制语句直截了当。接口清晰：类型接口表现力直白，字面表达含义，API 相互呼应以增强可测试性。依赖项少：依赖关系越少越好，依赖少证明内聚程度高，低耦合利于自动测试，便于重构。没有重复：重复代码意味着某些概念或想法没有在代码中良好的体现，及时重构消除重复。战术分层：代码分层清晰，隔离明确，
布尔数组百合不是茶 java 布尔数组
androi中提到了布尔数组; 布尔数组默认的是false, 并且只会打印false或者是true 布尔数组的例子; 根据字符数组创建布尔数组 char[] c = {'p','u','b','l','i','c'}; //根据字符数组的长度创建布尔数组的个数 boolean[] b = new bool
web.xml之welcome-file-list、error-page bijian1013 java web.xml servlet error-page
welcome-file-list 1.定义： <welcome-file-list> <welcome-file>login.jsp</welcome> </welcome-file-list> 2.作用：用来指定WEB应用首页名称。 error-page1.定义： <error-page&g
richfaces 4 fileUpload组件删除上传的文件 sunjing clear Richfaces 4 fileupload
页面代码 <h:form id="fileForm"> <rich:
技术文章备忘 bit1129 技术文章
Zookeeper http://wenku.baidu.com/view/bab171ffaef8941ea76e05b8.html http://wenku.baidu.com/link?url=8thAIwFTnPh2KL2b0p1V7XSgmF9ZEFgw4V_MkIpA9j8BX2rDQMPgK5l3wcs9oBTxeekOnm5P3BK8c6K2DWynq9nfUCkRlTt9uV
org.hibernate.hql.ast.QuerySyntaxException: unexpected token: on near line 1解决方案白糖_ Hibernate
文章摘自：http://blog.csdn.net/yangwawa19870921/article/details/7553181 在编写HQL时，可能会出现这种代码： select a.name,b.age from TableA a left join TableB b on a.id=b.id 如果这是HQL，那么这段代码就是错误的，因为HQL不支持
sqlserver按照字段内容进行排序 bozch 按照内容排序
在做项目的时候，遇到了这样的一个需求：从数据库中取出的数据集，首先要将某个数据或者多个数据按照地段内容放到前面显示，例如:从学生表中取出姓李的放到数据集的前面； select * fro
编程珠玑-第一章-位图排序 bylijinnan java 编程珠玑
import java.io.BufferedWriter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.Writer; import java.util.Random; public class BitMapSearch {
Java关于==和equals chenbowen00 java
关于==和equals概念其实很简单，一个是比较内存地址是否相同，一个比较的是值内容是否相同。虽然理解上不难，但是有时存在一些理解误区，如下情况： 1、 String a = "aaa"; a=="aaa"; ==> true 2、 new String("aaa")==new String("aaa
[IT与资本]软件行业需对外界投资热情保持警惕 comsci it
我还是那个看法,软件行业需要增强内生动力,尽量依靠自有资金和营业收入来进行经营,避免在资本市场上经受各种不同类型的风险,为企业自主研发核心技术和产品提供稳定,温和的外部环境... 如果我们在自己尚未掌握核心技术之前,企图依靠上市来筹集资金,然后使劲往某个领域砸钱,然
oracle 数据块结构 daizj oracle 块数据块块结构行目录
oracle 数据块是数据库存储的最小单位，一般为操作系统块的N倍。其结构为：块头－－〉空行－－〉数据，其实际为纵行结构。块的标准大小由初始化参数DB_BLOCK_SIZE指定。具有标准大小的块称为标准块（Standard Block）。块的大小和标准块的大小不同的块叫非标准块（Nonstandard Block）。同一数据库中，Oracle9i及以上版本支持同一数据库中同时使用标
github上一些觉得对自己工作有用的项目收集 dengkane github
github上一些觉得对自己工作有用的项目收集技能类 markdown语法中文说明回到顶部全文检索 elasticsearch bigdesk elasticsearch管理插件回到顶部 nosql mapdb 支持亿级别map, list, 支持事务. 可考虑做为缓存使用 C
初二上学期难记单词二 dcj3sjt126com english word
dangerous 危险的 panda 熊猫 lion 狮子 elephant 象 monkey 猴子 tiger 老虎 deer 鹿 snake 蛇 rabbit 兔子 duck 鸭 horse 马 forest 森林 fall 跌倒；落下 climb 爬；攀登 finish 完成；结束 cinema 电影院；电影 seafood 海鲜；海产食品 bank 银行
8、mysql外键(FOREIGN KEY)的简单使用 dcj3sjt126com mysql
一、基本概念 1、MySQL中“键”和“索引”的定义相同，所以外键和主键一样也是索引的一种。不同的是MySQL会自动为所有表的主键进行索引，但是外键字段必须由用户进行明确的索引。用于外键关系的字段必须在所有的参照表中进行明确地索引，InnoDB不能自动地创建索引。 2、外键可以是一对一的，一个表的记录只能与另一个表的一条记录连接，或者是一对多的，一个表的记录与另一个表的多条记录连接。 3、如
java循环标签 Foreach shuizhaosi888 标签 java循环 foreach
1. 简单的for循环 public static void main(String[] args) { for (int i = 1, y = i + 10; i < 5 && y < 12; i++, y = i * 2) { System.err.println("i=" + i + " y="
Spring Security（05）——异常信息本地化 234390216 exception Spring Security 异常信息本地化
异常信息本地化 Spring Security支持将展现给终端用户看的异常信息本地化，这些信息包括认证失败、访问被拒绝等。而对于展现给开发者看的异常信息和日志信息（如配置错误）则是不能够进行本地化的，它们是以英文硬编码在Spring Security的代码中的。在Spring-Security-core-x
DUBBO架构服务端告警Failed to send message Response javamingtingzhao 架构 DUBBO
废话不多说，警告日志如下，不知道有哪位遇到过，此异常在服务端抛出(服务器启动第一次运行会有这个警告)，后续运行没问题，找了好久真心不知道哪里错了。 WARN 2015-07-18 22:31:15,272 com.alibaba.dubbo.remoting.transport.dispatcher.ChannelEventRunnable.run(84)
JS中Date对象中几个用法 leeqq JavaScript Date 最后一天
近来工作中遇到这样的两个需求 1. 给个Date对象，找出该时间所在月的第一天和最后一天 2. 给个Date对象，找出该时间所在周的第一天和最后一天需求1中的找月第一天很简单，我记得api中有setDate方法可以使用使用setDate方法前，先看看getDate var date = new Date(); console.log(date); // Sat J
MFC中使用ado技术操作数据库你不认识的休道人 sql mfc
1.在stdafx.h中导入ado动态链接库 #import"C:\Program Files\Common Files\System\ado\msado15.dll" no_namespace rename("EOF","end")2.在CTestApp文件的InitInstance()函数中domodal之前写::CoIniti
Android Studio加速 rensanning android studio
Android Studio慢、吃内存！启动时后会立即通过Gradle来sync & build工程。（1）设置Android Studio a) 禁用插件 File -> Settings... Plugins 去掉一些没有用的插件。比如：Git Integration、GitHub、Google Cloud Testing、Google Cloud
各数据库的批量Update操作 tomcat_oracle java oracle sql mysql sqlite
MyBatis的update元素的用法与insert元素基本相同，因此本篇不打算重复了。本篇仅记录批量update操作的 sql语句，懂得SQL语句，那么MyBatis部分的操作就简单了。　　注意：下列批量更新语句都是作为一个事务整体执行，要不全部成功，要不全部回滚。 MSSQL的SQL语句　WITH R AS（　　SELECT 'John' as name, 18 as
html禁止清除input文本输入缓存 xp9802 input
多数浏览器默认会缓存input的值，只有使用ctl+F5强制刷新的才可以清除缓存记录。如果不想让浏览器缓存input的值，有2种方法：方法一：在不想使用缓存的input中添加 autocomplete="off"; eg: <input type="text" autocomplete="off" name