如何在Scala的for comprehension中使用Future

Scala语言中Future代表一个异步的执行过程, 通常是在worker线程中执行.默认情况下, 通过导入

import ExecutionContext.Implicits.global

异步执行过程将在ExecutionContext的全局对象global配置的线程池中执行(ExecutionContext.scala):

  object Implicits {
    /**
     * The implicit global `ExecutionContext`. Import `global` when you want to provide the global
     * `ExecutionContext` implicitly.
     *
     * The default `ExecutionContext` implementation is backed by a work-stealing thread pool. By default,
     * the thread pool uses a target number of worker threads equal to the number of
     * [[https://docs.oracle.com/javase/8/docs/api/java/lang/Runtime.html#availableProcessors-- available processors]].
     */
    implicit lazy val global: ExecutionContext = impl.ExecutionContextImpl.fromExecutor(null: Executor)
  }

函数fromExecutor的实现为(ExecutionContextImpl.scala):

def fromExecutor(e: Executor, reporter: Throwable => Unit = ExecutionContext.defaultReporter): ExecutionContextImpl =
    new ExecutionContextImpl(Option(e).getOrElse(createDefaultExecutorService(reporter)), reporter)

因为global的配置的Executor为null, 所以将调用createDefaultExecutorService创建Executor对象来构造一个ExecutionContextImpl实例.ExecutionContextImpl是ExecutionContext的执行类, 提供其接口规范实现(ExecutionContextImpl.scala):

private[scala] class ExecutionContextImpl private[impl] (val executor: Executor, val reporter: Throwable => Unit) extends ExecutionContextExecutor {
  require(executor ne null, "Executor must not be null")
  override def execute(runnable: Runnable) = executor execute runnable
  override def reportFailure(t: Throwable) = reporter(t)
}

函数createDefaultExecutorService创建Executor的实现为(ExecutionContextImpl.scala):

def createDefaultExecutorService(reporter: Throwable => Unit): ExecutorService = {
    def getInt(name: String, default: String) = (try System.getProperty(name, default) catch {
      case e: SecurityException => default
    }) match {
      case s if s.charAt(0) == 'x' => (Runtime.getRuntime.availableProcessors * s.substring(1).toDouble).ceil.toInt
      case other => other.toInt
    }

    def range(floor: Int, desired: Int, ceiling: Int) = scala.math.min(scala.math.max(floor, desired), ceiling)
    val numThreads = getInt("scala.concurrent.context.numThreads", "x1")
    // The hard limit on the number of active threads that the thread factory will produce
    val maxNoOfThreads = getInt("scala.concurrent.context.maxThreads", "x1")

    val desiredParallelism = range(
      getInt("scala.concurrent.context.minThreads", "1"),
      numThreads,
      maxNoOfThreads)

    // The thread factory must provide additional threads to support managed blocking.
    val maxExtraThreads = getInt("scala.concurrent.context.maxExtraThreads", "256")

    val uncaughtExceptionHandler: Thread.UncaughtExceptionHandler = new Thread.UncaughtExceptionHandler {
      override def uncaughtException(thread: Thread, cause: Throwable): Unit = reporter(cause)
    }

    val threadFactory = new ExecutionContextImpl.DefaultThreadFactory(daemonic = true,
                                                                      maxBlockers = maxExtraThreads,
                                                                      prefix = "scala-execution-context-global",
                                                                      uncaught = uncaughtExceptionHandler)

    new ForkJoinPool(desiredParallelism, threadFactory, uncaughtExceptionHandler, true)
  }

所以创建的是ForkJoinPool规格的Executor. 该函数中需要注意的2个细节是:

  • 计算ForkJoinPool的parallelism level时,会读取相关属性配置,如scala.concurrent.context.numThreads和scala.concurrent.context.maxThreads. 如果这些属性配置没有值, 传入ForkJoinPool构造器的parallelism level的值是CPU核数.
  • 在执行大量的Future时, 如果Future执行过程中有阻塞操作, 可能会导致所有CPU上的线程都阻塞住. 应该将阻塞操作放在blocking中, threadFactory将创建新的线程到线程池中.或者不使用默认的ForkJoinPool线程池, 使用其他更能适应场景需求的线程池.

for comprehension
在for comprehension使用Future进行组合时, 需要注意的是并行执行Future还是顺序执行Future, 请参考下面例子的区别.

object SerialFutureMap extends App {
  import scala.concurrent._
  import ExecutionContext.Implicits.global
  
  val serial = for {
    f1 <- Future {
      Thread.sleep(300)
      println("[Serial] exec f1 first.")
      "f1"
    }
    
    f2 <- Future {
      println("[Serial] exec f2 second.")
      "f2"
    }
  } yield {
    f1 + " and " + f2
  }
  
  serial foreach {
    case result => println("Serial result:" + result)
  }
}

object ParallelFutureMap extends App {
  import scala.concurrent._
  import ExecutionContext.Implicits.global
  
  val f1 = Future {
    Thread.sleep(300)
    println("[Parallel] exec f1 second as it is sleep some time.")
    "f1"
  }
  
  val f2 = Future {
    println("[Parallel] exec f2 first.")
    "f2"
  }
  
  val parallel = for {
    first <- f1
    second <- f2    
  } yield {
    first + " and " + second
  }
  
  parallel foreach {
    case result => println("Parallel result:" + result)
  }
}

注意SerialFutureMap和ParallelFutureMap中在for comprehension组合Future的区别, 前者会顺序执行Future,而后者会并行执行Future.
SerialFutureMap的输出为:

[Serial] exec f1 first though it sleep some time.
[Serial] exec f2 second.
Serial result:f1 and f2

从输出可以看出, 尽管发f1阻塞了300毫秒,但是需要先执行完该Future, 再执行f2中的逻辑.

ParallelFutureMap中f1和f2是并发执行的, 因为f1中有阻塞操作, 所以有如下的log输出:

[Parallel] exec f2 first.
[Parallel] exec f1 second as it is sleep some time.
Parallel result:f1 and f2

需要注意的是, 对于ParallelFutureMap, 如果f1和f2定义为方法, 则在for comprehension中, 是顺序执行的. 当然这里只需将f2定义为方法, 就保证了f2在f1后的顺序执行:

object SerialFutureMap2 extends App {
  import scala.concurrent._
  import ExecutionContext.Implicits.global
  
  def f1 = Future {
    Thread.sleep(300)
    println("[Serial] exec f1 first though it sleep some time.")
    "f1"
  }
  
  def f2 = Future {
    println("[Serial] exec f2 second.")
    "f2"
  }
  
  val parallel = for {
    first <- f1
    second <- f2
    
  } yield {
    first + " and " + second
  }
  
  parallel foreach {
    case result => println("Serial result:" + result)
  }
}

Async中的模式
async库中也有类似的场景, 但是async中的语义更清晰. 请参考下面的例子.

object AsyncSerialMap extends App {
  import scala.concurrent._
  import ExecutionContext.Implicits.global
  import scala.async.Async.{async, await}
  
  def f1(name : String) = Future {
    Thread.sleep(300)
    println("[Serial] exec f1 first though it sleep some time.")
    name
  }
  
  def f2(name : String) = Future {
    println("[Serial] exec f2 second.")
    name
  }
  
  val serial = async {
    await(f1("f1")) + " and " + await(f2("f2"))
  }
  
  serial foreach {
    case result => println("serial result:" + result)
  }
}

object AsyncParallelMap extends App {
  import scala.concurrent._
  import ExecutionContext.Implicits.global
  import scala.async.Async.{async, await}
  
  def f1(name: String) = Future {
    Thread.sleep(300)
    println("[parallel] exec f1 second as it sleep some time.")
    name
  }
  
  def f2(name: String) = Future {
    println("[parallel] exec f2 second.")
    name
  }
  
  val parallel = async {
    val result1 = f1("f1")
    val result2 = f2("f2")
    await(result1) + " and " + await(result2)
  }
  
  parallel foreach {
    case result => println("parallel result:" + result)
  }
}

在AsyncSerialMap中将Future的执行放在了await中,所以必须等待其执行结束,才能执行下面的逻辑,这是由于await的语义决定的.而在AsyncParallelMap中先并发执行两个Future, 然后依次等待f1f2的执行结果,此时,虽然等待结果是有顺序的,但是Future的执行是并行的.

你可能感兴趣的:(Scala)