问题
在升级0.12.2到0.15.0遇到了一个很奇怪的问题,不论实时写入(tranquility)还是hadoop离线摄入数据摄入都奇慢无福,知道后来报错。
排查
首先看实时摄入日志
发现实时摄入任务有以下日志。
2019-08-09T19:00:12,919 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[page_event]
2019-08-09T19:00:12,921 INFO [page_event-incremental-persist] org.apache.druid.segment.realtime.plumber.RealtimePlumber - DataSource[page_event], Interval[2019-08-09T11:00:00.000Z/2019-08-09T12:00:00.000Z], Metadata [null] persisting Hydrant[FireHydrant{, queryable=page_event_2019-08-09T11:00:00.000Z_2019-08-09T12:00:00.000Z_2019-08-09T11:00:12.659Z, count=0}]
2019-08-09T19:00:12,922 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[page_event]
2019-08-09T19:00:12,923 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Starting persist for interval[2019-08-09T11:00:00.000Z/2019-08-09T12:00:00.000Z], rows[1]
2019-08-09T19:00:13,036 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Completed dim conversions in 78 millis.
2019-08-09T19:00:13,073 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - completed walk through of 1 rows in 12 millis.
2019-08-09T19:00:13,086 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Completed time column in 13 millis.
通过日志发现,数据是一行一行合并持久化的。但是持久化数据条目数有关的只有maxRowsInMemory=75000和maxBytesInMemory=1/6 * JVM heap (这个参数是0.13.0 新加的)。
这俩都见应该都不满足的。
相关代码在
public boolean canAppendRow()
{
final boolean countCheck = size() < maxRowCount;
// if maxBytesInMemory = -1, then ignore sizeCheck
final boolean sizeCheck = maxBytesInMemory <= 0 || getBytesInMemory().get() < maxBytesInMemory;
final boolean canAdd = countCheck && sizeCheck;
if (!countCheck && !sizeCheck) {
outOfRowsReason = StringUtils.format(
"Maximum number of rows [%d] and maximum size in bytes [%d] reached",
maxRowCount,
maxBytesInMemory
);
} else {
if (!countCheck) {
outOfRowsReason = StringUtils.format("Maximum number of rows [%d] reached", maxRowCount);
} else if (!sizeCheck) {
outOfRowsReason = StringUtils.format("Maximum size in bytes [%d] reached", maxBytesInMemory);
}
}
return canAdd;
}
因为maxRowsINMemory是0.13.0新加的参数。且设为-1(小于0就ok)就是废弃这个参数
public static long getMaxBytesInMemoryOrDefault(final long maxBytesInMemory)
{
// In the main tuningConfig class constructor, we set the maxBytes to 0 if null to avoid setting
// maxBytes to max jvm memory of the process that starts first. Instead we set the default based on
// the actual task node's jvm memory.
long newMaxBytesInMemory = maxBytesInMemory;
if (maxBytesInMemory == 0) {
newMaxBytesInMemory = TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY;
} else if (maxBytesInMemory < 0) {
newMaxBytesInMemory = Long.MAX_VALUE;
}
return newMaxBytesInMemory;
}
long DEFAULT_MAX_BYTES_IN_MEMORY = JvmUtils.getRuntimeInfo().getMaxHeapSizeBytes() / 6;
配置试了以下,证明就是这个参数的问题。
根本原因
接下来看看为什么这次参数默认值不好用呢。
先看看druid是怎么预估数据的大小的。
org.apache.druid.segment.incremental.OnheapIncrementalIndex
long estimatedRowSize = estimateRowSizeInBytes(key, maxBytesPerRowForAggregators);
sizeInBytes.addAndGet(estimatedRowSize);
private long estimateRowSizeInBytes(IncrementalIndexRow key, long maxBytesPerRowForAggregators)
{
return ROUGH_OVERHEAD_PER_MAP_ENTRY + key.estimateBytesInMemory() + maxBytesPerRowForAggregators;
}
可以看到,一行数据的预估数据量分为3部分,第一部分是一个固定值,不很大。第二部分是维度所占用的数据量大小,第三部分是聚合度量占用的数据量大小。
public long estimateBytesInMemory()
{
long sizeInBytes = Long.BYTES + Integer.BYTES * dims.length + Long.BYTES + Long.BYTES;
sizeInBytes += dimsKeySize;
return sizeInBytes;
}
private static long getMaxBytesPerRowForAggregators(IncrementalIndexSchema incrementalIndexSchema)
{
long maxAggregatorIntermediateSize = Integer.BYTES * incrementalIndexSchema.getMetrics().length;
maxAggregatorIntermediateSize += Arrays.stream(incrementalIndexSchema.getMetrics())
.mapToLong(aggregator -> aggregator.getMaxIntermediateSizeWithNulls()
+ Long.BYTES * 2)
.sum();
return maxAggregatorIntermediateSize;
}
维度数据量只和个数有关系,也不会大的离谱。剩下只有聚合度量可下手了。想到用了data sketch,往下跟代码。
@Override
public int getMaxIntermediateSize()
{
return SetOperation.getMaxUnionBytes(size);
}
public static int getMaxUnionBytes(int nomEntries) {
int nomEnt = Util.ceilingPowerOf2(nomEntries);
return (nomEnt << 4) + (Family.UNION.getMaxPreLongs() << 3);
}
发现他个sketch设置的size有关系。默认是16384,设置的值是8388608,大了将近快1000倍,而且数据中有4-5个这样子的度量。用默认值试了以下。解决。
总结
data sketch size设置不合理导致0.13.0新特性按照数据量(maxBytesInMemory)来持久化数据参数出问题。
还是需要看看data sketch的实现。