Coordinator时Druid的中心协调模块,用于解耦各个模块之间的直接联系,负责Segment的管理与分发,控制历史节点上Segment的装载和删除,并保持Segment在各个历史节点上的负载均衡。
Coordinator采用定期运行任务的设计模式。它包含一些不同的任务。Coordinator并不是直接和历史节点发生调用关系,而是通过Zookeeper作为桥梁,将指令发送到Zookeeper上,然后历史节点获取Zookeeper上的指令来装载和卸载Segment。
Coordinator装载和卸载Segment的依据来自于一系列的规则,这些规则可以通过Druid的管理工具或者参数来配置。这些规则包括:
Coordinator的卸载规则如下:
@JsonTypeInfo(use = JsonTypeInfo.Id.NAME, property = "type")
@JsonSubTypes(value = {
@JsonSubTypes.Type(name = "loadByPeriod", value = PeriodLoadRule.class),
@JsonSubTypes.Type(name = "loadByInterval", value = IntervalLoadRule.class),
@JsonSubTypes.Type(name = "loadForever", value = ForeverLoadRule.class),
@JsonSubTypes.Type(name = "dropByPeriod", value = PeriodDropRule.class),
@JsonSubTypes.Type(name = "dropByInterval", value = IntervalDropRule.class),
@JsonSubTypes.Type(name = "dropForever", value = ForeverDropRule.class)
})
Coordinator的代码入口类是:io.druid.server.coordinator.DruidCoordinator
DruidCoordinator引入了几个管理类用于获取Segment与集群的信息和管理能力。例如下图的MetadataSegmentManager和MetadataRuleManager这两个接口都是通过SQL查询方式从MySQL服务器获取规则和Segment信息。
Coordinator的启动是从start()开始的,首先通过ZK的LeaderLatch选取一个Leader。该Leader会定期运行一些任务:
@LifecycleStart
public void start()
{
synchronized (lock) {
if (started) {
return;
}
started = true;
createNewLeaderLatch();
try {
leaderLatch.get().start();
}
catch (Exception e) {
throw Throwables.propagate(e);
}
}
}
private LeaderLatch createNewLeaderLatch()
{
final LeaderLatch newLeaderLatch = new LeaderLatch(
curator, ZKPaths.makePath(zkPaths.getCoordinatorPath(), COORDINATOR_OWNER_NODE), self.getHostAndPort()
);
newLeaderLatch.addListener(
new LeaderLatchListener()
{
@Override
public void isLeader()
{
DruidCoordinator.this.becomeLeader();
}
@Override
public void notLeader()
{
DruidCoordinator.this.stopBeingLeader();
}
},
Execs.singleThreaded("CoordinatorLeader-%s")
);
return leaderLatch.getAndSet(newLeaderLatch);
}
在已经确定的Leader上,可以开始定期地执行任务了:
private void becomeLeader()
{
synchronized (lock) {
if (!started) {
return;
}
log.info("I am the leader of the coordinators, all must bow!");
log.info("Starting coordination in [%s]", config.getCoordinatorStartDelay());
try {
leaderCounter++;
leader = true;
metadataSegmentManager.start();
metadataRuleManager.start();
serverInventoryView.start();
serviceAnnouncer.announce(self);
final int startingLeaderCounter = leaderCounter;
final List> coordinatorRunnables = Lists.newArrayList();
coordinatorRunnables.add(
Pair.of(
new CoordinatorHistoricalManagerRunnable(startingLeaderCounter),
config.getCoordinatorPeriod()
)
);
if (indexingServiceClient != null) {
coordinatorRunnables.add(
Pair.of(
new CoordinatorIndexingServiceRunnable(
makeIndexingServiceHelpers(),
startingLeaderCounter
),
config.getCoordinatorIndexingPeriod()
)
);
}
for (final Pair extends CoordinatorRunnable, Duration> coordinatorRunnable : coordinatorRunnables) {
ScheduledExecutors.scheduleWithFixedDelay(
exec,
config.getCoordinatorStartDelay(),
coordinatorRunnable.rhs,
new Callable()
{
private final CoordinatorRunnable theRunnable = coordinatorRunnable.lhs;
@Override
public ScheduledExecutors.Signal call()
{
if (leader && startingLeaderCounter == leaderCounter) {
theRunnable.run();
}
if (leader && startingLeaderCounter == leaderCounter) { // (We might no longer be leader)
return ScheduledExecutors.Signal.REPEAT;
} else {
return ScheduledExecutors.Signal.STOP;
}
}
}
);
}
}
catch (Exception e) {
log.makeAlert(e, "Unable to become leader")
.emit();
final LeaderLatch oldLatch = createNewLeaderLatch();
CloseQuietly.close(oldLatch);
try {
leaderLatch.get().start();
}
catch (Exception e1) {
// If an exception gets thrown out here, then the coordinator will zombie out 'cause it won't be looking for
// the latch anymore. I don't believe it's actually possible for an Exception to throw out here, but
// Curator likes to have "throws Exception" on methods so it might happen...
log.makeAlert(e1, "I am a zombie")
.emit();
}
}
}
}
其中CoordinatorHistoricalManagerRunnable和CoordinatorIndexingServiceRunnable最为重要。
CoordinatorHistoricalManagerRunnable包括了多个具体任务的集合:
private class CoordinatorHistoricalManagerRunnable extends CoordinatorRunnable
{
public CoordinatorHistoricalManagerRunnable(final int startingLeaderCounter)
{
super(
ImmutableList.of(
new DruidCoordinatorSegmentInfoLoader(DruidCoordinator.this),
new DruidCoordinatorHelper()
{
@Override
public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
{
// Display info about all historical servers
Iterable servers = FunctionalIterable
.create(serverInventoryView.getInventory())
.filter(
new Predicate()
{
@Override
public boolean apply(
DruidServer input
)
{
return input.isAssignable();
}
}
).transform(
new Function()
{
@Override
public ImmutableDruidServer apply(DruidServer input)
{
return input.toImmutableDruidServer();
}
}
);
if (log.isDebugEnabled()) {
log.debug("Servers");
for (ImmutableDruidServer druidServer : servers) {
log.debug(" %s", druidServer);
log.debug(" -- DataSources");
for (ImmutableDruidDataSource druidDataSource : druidServer.getDataSources()) {
log.debug(" %s", druidDataSource);
}
}
}
// Find all historical servers, group them by subType and sort by ascending usage
final DruidCluster cluster = new DruidCluster();
for (ImmutableDruidServer server : servers) {
if (!loadManagementPeons.containsKey(server.getName())) {
String basePath = ZKPaths.makePath(zkPaths.getLoadQueuePath(), server.getName());
LoadQueuePeon loadQueuePeon = taskMaster.giveMePeon(basePath);
log.info("Creating LoadQueuePeon for server[%s] at path[%s]", server.getName(), basePath);
loadManagementPeons.put(server.getName(), loadQueuePeon);
}
cluster.add(new ServerHolder(server, loadManagementPeons.get(server.getName())));
}
segmentReplicantLookup = SegmentReplicantLookup.make(cluster);
// Stop peons for servers that aren't there anymore.
final Set disappeared = Sets.newHashSet(loadManagementPeons.keySet());
for (ImmutableDruidServer server : servers) {
disappeared.remove(server.getName());
}
for (String name : disappeared) {
log.info("Removing listener for server[%s] which is no longer there.", name);
LoadQueuePeon peon = loadManagementPeons.remove(name);
peon.stop();
}
return params.buildFromExisting()
.withDruidCluster(cluster)
.withDatabaseRuleManager(metadataRuleManager)
.withLoadManagementPeons(loadManagementPeons)
.withSegmentReplicantLookup(segmentReplicantLookup)
.withBalancerReferenceTimestamp(DateTime.now())
.build();
}
},
new DruidCoordinatorRuleRunner(DruidCoordinator.this),
new DruidCoordinatorCleanupUnneeded(DruidCoordinator.this),
new DruidCoordinatorCleanupOvershadowed(DruidCoordinator.this),
new DruidCoordinatorBalancer(DruidCoordinator.this),
new DruidCoordinatorLogger()
),
startingLeaderCounter
);
}
}
其中:
Balancer的想法就是尽量让那些容易被同一个查询覆盖的Segment分布在整个集群的不同历史节点上。最大利用集群的能力,以避免大量查询集中在集群中的某些机器上。
Druid的负载均衡算法在类CostBalancerStrategy中。