ForkJoin翻译

ForkJoinTask

翻译

Abstract base class for tasks that run within a ForkJoinPool. A ForkJoinTask is a thread-like entity that is much lighter weight than a normal thread. Huge numbers of tasks and subtasks may be hosted by a small number of actual threads in a ForkJoinPool, at the price of some usage limitations.

这是一个在ForkJoinPool中运行的所有任务的抽象基类,一个ForkJoinTask是一个类似线程,但是比真实线程更加轻量级的实体。巨量的任务可能仅仅只需要ForkJoinPool中的几个线程来运行即可,代价是一些使用上面的限制。

A "main" ForkJoinTask begins execution when it is explicitly submitted to a ForkJoinPool, or, if not already engaged in a ForkJoin computation, commenced in the ForkJoinPool.commonPool() via fork, invoke, or related methods. Once started, it will usually in turn start other subtasks. As indicated by the name of this class, many programs using ForkJoinTask employ only methods fork and join, or derivatives such as invokeAll. However, this class also provides a number of other methods that can come into play in advanced usages, as well as extension mechanics that allow support of new forms of fork/join processing.

主要的ForkJoinTask一旦明确的提交给ForkJoinPool之后会立马执行,同时你也可以同时手动调用ForkJoinPool.commonPool()中的相关方法,例如forkinvoke等。一旦开始,其他子任务会按顺讯执行。正如类的名字一样,许多程序使用ForkJoinTask的时候仅仅用到了他的forkjoin方法,或者其衍生出来的invokeAll方法。然而,这个类还提供一系列的其他可以执行一些高级事务的方法,例如允许执行其他fork/join任务的扩展机制等。

A ForkJoinTask is a lightweight form of Future. The efficiency of ForkJoinTasks stems from a set of restrictions (that are only partially statically enforceable) reflecting their main use as computational tasks calculating pure functions or operating on purely isolated objects. The primary coordination mechanisms are fork, that arranges asynchronous execution, and join, that doesn't proceed until the task's result has been computed. Computations should ideally avoid synchronized methods or blocks, and should minimize other blocking synchronization apart from joining other tasks or using synchronizers such as Phasers that are advertised to cooperate with fork/join scheduling. Subdividable tasks should also not perform blocking I/O, and should ideally access variables that are completely independent of those accessed by other running tasks. These guidelines are loosely enforced by not permitting checked exceptions such as IOExceptions to be thrown. However, computations may still encounter unchecked exceptions, that are rethrown to callers attempting to join them. These exceptions may additionally include RejectedExecutionException stemming from internal resource exhaustion, such as failure to allocate internal task queues. Rethrown exceptions behave in the same way as regular exceptions, but, when possible, contain stack traces (as displayed for example using ex.printStackTrace()) of both the thread that initiated the computation as well as the thread actually encountering the exception; minimally only the latter.

一个ForkJoinTask是一个轻量版本的Future。其高效的特性来源于一些列的限制(一些仅仅部分静态强制的),这些现实反应了ForkJoinTask的主要用途,就是一些在相互隔离的对象上面的纯粹的的计算的操作。主要的协调机制是负责协调异步执行的fork操作和在所有异步任务执行完成之后的join操作。计算任务应该尽量独立,避免同步方法或者同步块,同时尽量最小化一些同步操作,除了在join阶段或者使用例如Phasers这种推荐的用来协调fork/join任务的同步方法。可分割的子任务不应该执行阻塞性的IO操作,且应该尽量使用与其他任务独立的变量。由于不允许一些例如IOExceptions这列异常的抛出,这些规则可以不用强制的执行。然而,计算过程中仍然会碰到一些未检查的异常,这些异常会抛出给那些执行join操作的调用者。这些异常可能额外的包括来源于内部资源耗尽的RejectedExecutionException,例如申请任务队列内存失败。重抛出异常的行为与常规异常一致,但如果可能,包含启动计算的线程及实际发生异常的线程的堆栈跟踪信息(如显式使用ex.printstacktrace()),但至少会有后者。

It is possible to define and use ForkJoinTasks that may block, but doing do requires three further considerations: (1) Completion of few if any other tasks should be dependent on a task that blocks on external synchronization or I/O. Event-style async tasks that are never joined (for example, those subclassing CountedCompleter) often fall into this category. (2) To minimize resource impact, tasks should be small; ideally performing only the (possibly) blocking action. (3) Unless the ForkJoinPool.ManagedBlocker API is used, or the number of possibly blocked tasks is known to be less than the pool's ForkJoinPool.getParallelism level, the pool cannot guarantee that enough threads will be available to ensure progress or good performance.

定义和使用一些可能会阻塞的ForkJoinTasks是可能的,但是需要如下三个考虑点:

  1. 如果一个任务以来与外部的同步操作或者IO操作,那个其他任务应该尽量少的依赖这个任务
  2. 为了尽量较少资源的影响,任务应该尽量小,最好只执行阻塞的动作
  3. 除非在使用了ForkJoinPool.ManagedBlocker里面的API,或者可能阻塞的任务比ForkJoinPool.getParallelism的值小,否则pool无法保证有足够的线程保证有些的性能表现。

The primary method for awaiting completion and extracting results of a task is join, but there are several variants: The Future.get methods support interruptible and/or timed waits for completion and report results using Future conventions. Method invoke is semantically equivalent to fork(); join() but always attempts to begin execution in the current thread. The "quiet" forms of these methods do not extract results or report exceptions. These may be useful when a set of tasks are being executed, and you need to delay processing of results or exceptions until all complete. Method invokeAll (available in multiple versions) performs the most common form of parallel invocation: forking a set of tasks and joining them all.

join是等待计算结束和获取任务计算机过的主要方法,但是还有几个变种方法可以使用:Future.get方法支持一些可打断的或者定时的计算等待时间,并且可以通过Future的其他方法来获取结果。invoke方法在语法上面等于forkjoin方法一直会尝试在当前线程上面执行计算操作。上述的方法有何静默的方式,这个方式不会获取结果,亦不会上报异常。当你想在所有任务结束之后处理结果和异常的情况下,这种方法会比较有用。invokeAll方法(在最近的几个版本可以获取使用)是一种最常见的执行并行调用的形式:分发一系列的任务然后获取他们的计算结果。

In the most typical usages, a fork-join pair act like a call (fork) and return (join) from a parallel recursive function. As is the case with other forms of recursive calls, returns (joins) should be performed innermost-first. For example, a.fork(); b.fork(); b.join(); a.join(); is likely to be substantially more efficient than joining a before b.

在最典型的场景,一对fork-join表象的像是一个并行递归方法里面的调用(fork)和返回(join)。在循环调用的场景下,返回(join)的调用顺序应该相反。例如,我们调用a.fork(); b.fork(); b.join(); a.join();,这个顺序一般会比先执行a.join()效率更高。

The execution status of tasks may be queried at several levels of detail: isDone is true if a task completed in any way (including the case where a task was cancelled without executing); isCompletedNormally is true if a task completed without cancellation or encountering an exception; isCancelled is true if the task was cancelled (in which case getException returns a java.util.concurrent.CancellationException); and isCompletedAbnormally is true if a task was either cancelled or encountered an exception, in which case getException will return either the encountered exception or java.util.concurrent.CancellationException.

我们可能需要在几个层面来查询任务的执行状态:

  1. isDone为真,如果一个任务完成(包括任务取消的情况下)
  2. isCompletedNormally 为真,如果一个任务没有被取消且没有碰到任何异常
  3. isCancelled为真,如果任务被取消(这种情况下getException活返回一个java.util.concurrent.CancellationException的异常)
  4. isCompletedAbnormally为真,如果任务取消或者执行途中碰到了异常,这种场景下getException将返回碰到的异常或者java.util.concurrent.CancellationException

The ForkJoinTask class is not usually directly subclassed. Instead, you subclass one of the abstract classes that support a particular style of fork/join processing, typically RecursiveAction for most computations that do not return results, RecursiveTask for those that do, and CountedCompleter for those in which completed actions trigger other actions. Normally, a concrete ForkJoinTask subclass declares fields comprising its parameters, established in a constructor, and then defines a compute method that somehow uses the control methods supplied by this base class.

ForkJoinTask这个类一般不会直接去实现它。一般我们实现如下两个抽象类,当我们不需要返回任何计算结果的时候,实现RecursiveAction类,需要返回计算结果的时候,实现RecursiveTask,当任务结束的时候需要调用其他动作的时候,我们使用CountedCompleter。一般情况下,一个具体的ForkJoinTask的子类声明自己的参数和构造函数,然后定义一个compute方法,这个方法可以使用父类提供的一些控制方法。

Method join and its variants are appropriate for use only when completion dependencies are acyclic; that is, the parallel computation can be described as a directed acyclic graph (DAG). Otherwise, executions may encounter a form of deadlock as tasks cyclically wait for each other. However, this framework supports other methods and techniques (for example the use of Phaser, helpQuiesce, and complete) that may be of use in constructing custom subclasses for problems that are not statically structured as DAGs. To support such usages, a ForkJoinTask may be atomically tagged with a short value using setForkJoinTaskTag or compareAndSetForkJoinTaskTag and checked using getForkJoinTaskTag. The ForkJoinTask implementation does not use these protected methods or tags for any purpose, but they may be of use in the construction of specialized subclasses. For example, parallel graph traversals can use the supplied methods to avoid revisiting nodes/tasks that have already been processed. (Method names for tagging are bulky in part to encourage definition of methods that reflect their usage patterns.)

join方法和它的变种方法当且仅当任务里面的计算不存在循环依赖的情况下适用,即,该并行计算能够被描述为一个有向无环图(DAG)。否则,由于循环依赖,计算任务会出现一个死锁的情况。然而,在为一些不能直接构建成有向无环图的问题的子类的情况下,这个框架支持其他的方法和技术(例如使用Phaser,helpQuiescecomplete)来解决这个问题。为了支持这种使用场景,一个ForkJoinTask可以通过一个setForkJoinTaskTag或者compareAndSetForkJoinTaskTag来自动设置一个标志位,通过getForkJoinTaskTag方法来检查这个值。ForkJoinTask的实现,不会使用这些受保护的方法和标记,但是他们在某些特定的子类的构造中会有用。例如,并行图的遍历可以使用提供的方法来避免重复遍历节点。(使用方法名来打标记比较笨重,在这里我们鼓励定义良好的方法名来明确表达它的用途)

Most base support methods are final, to prevent overriding of implementations that are intrinsically tied to the underlying lightweight task scheduling framework. Developers creating new basic styles of fork/join processing should minimally implement protected methods exec, setRawResult, and getRawResult, while also introducing an abstract computational method that can be implemented in its subclasses, possibly relying on other protected methods provided by this class.

为了防止被某些和其他轻量级的任务执行框架绑定的实现覆盖,大部分的基本方法是final类型的。创建新的fork/join操作类型的开发者们,应该尽量少的实现一些protected方法,execsetRawResultgetRawResult,同时可能会引入一个抽象计算方法,这个方法可以在子类实现,同时可能依赖其他的该类提供的protected方法。

ForkJoinTasks should perform relatively small amounts of computation. Large tasks should be split into smaller subtasks, usually via recursive decomposition. As a very rough rule of thumb, a task should perform more than 100 and less than 10000 basic computational steps, and should avoid indefinite looping. If tasks are too big, then parallelism cannot improve throughput. If too small, then memory and internal task maintenance overhead may overwhelm processing.

ForkJoinTasks应该执行小数据量的计算。大的任务应该分解成小的子任务,通常使用地柜的分解来实现。一个非常粗略的经验法则,一个任务应该执行超过100次,但是不超过10000此的基本计算步骤,并且应该避免无限循环。如果任务太大,并行性不能提高吞吐量,如果太小,内存和内部任务的维护成本会过高。

This class provides adapt methods for Runnable and Callable, that may be of use when mixing execution of ForkJoinTasks with other kinds of tasks. When all tasks are of this form, consider using a pool constructed in asyncMode.

这个类给RunnableCallable提供适配的方法,当我们混合使用ForkJoinTasks任务和其他类型任务的时候。当所有任务都是这种类型的时候,考虑使用一个异步模式的池来管理。

ForkJoinTasks are Serializable, which enables them to be used in extensions such as remote execution frameworks. It is sensible to serialize tasks only before or after, but not during, execution. Serialization is not relied on during execution itself.

ForkJoinTasks任务是可序列化的,这使得这些任务能够在一些外部扩展,例如远程执行框架中使用。在任务执行之前或者之后,而不是在执行过程中,都可以序列化任务。序列化在任务执行过程是不可靠的。

Since:1.7Author:Doug Lea

你可能感兴趣的:(ForkJoin翻译)