Giraph Aggregator Guide

Aggregator

Aggregator运行聚集在一个超级步中所有顶点的操作。
Aggregator的操作类型可以是多样的,并不总是对值求和,如LongSumAggregator是对Long性进行求和,而LongMinAggregator只保留所有的最小值。LongMaxAggregator只保留最大值。LongProductAggregator保留把聚集的每个数的乘积。

 LongProductAggregator longProductAggregator = ...
 longProductAggregator.aggregator(2);
 longProductAggregator.aggregator(3);
 longProductAggregator.aggregator(4);
 // aggregator的值是 2 * 3 * 4.
 

初始值

每个Aggregator都有一个初始值。LongSumAggregator的初始值是0.

public class LongSumAggregator extends BasicAggregator {
  @Override
  public void aggregate(LongWritable value) {
    getAggregatedValue().set(getAggregatedValue().get() + value.get());
  }

  @Override
  public LongWritable createInitialValue() {
    return new LongWritable(0);
  }
}

序列化

Aggregator的类型需要继承Writable,用于序列化,网络传输。

Aggregator lifecycle

init at master

masterCompute.initialize(); register aggregators.
Registered aggregators is put in reducerMap.

RandomWalkVertexMasterCompute.initialize

 @Override
  public void initialize() throws InstantiationException,
      IllegalAccessException {
    registerAggregator(RandomWalkComputation.NUM_DANGLING_VERTICES,
        LongSumAggregator.class);
    registerAggregator(RandomWalkComputation.CUMULATIVE_DANGLING_PROBABILITY,
        DoubleSumAggregator.class);
    registerAggregator(RandomWalkComputation.CUMULATIVE_PROBABILITY,
        DoubleSumAggregator.class);
    registerAggregator(RandomWalkComputation.L1_NORM_OF_PROBABILITY_DIFFERENCE,
        DoubleSumAggregator.class);
  }

MasterAggregatorHandler.prepareSuperstep

  1. clean reducedMap.
  2. put aggregators from reducermap to reducedMap.
  3. send aggregators to works by name.(one work receive part of aggregators).

WorkerAggregatorHandler.prepareSuperstep

  1. get Aggregators the master sended.
  2. distribute aggregators to other workers
  3. get aggregators from other workers and master, and put into boradcastedMap or reducerMap for this superstep.

WorkerAggregatorHandler.finishSuperstep is called per superstep

  1. send aggregators to works by aggregator name and work list.
  2. receive the aggregators send by other works.
  3. send the reduced aggregator to master.

ThreadLocal

Work end may use ThreadLocal aggregators to speedup computation. Thread local aggreagators reduces thread synchronous. At the end of compute per thread, calls finishThreadComputation to reduce aggregators.

你可能感兴趣的:(graphdb)