druid.io 0.12.2 升级到0.15.0 数据摄入过慢问题排查

问题

在升级0.12.2到0.15.0遇到了一个很奇怪的问题，不论实时写入(tranquility)还是hadoop离线摄入数据摄入都奇慢无福，知道后来报错。

排查

首先看实时摄入日志

发现实时摄入任务有以下日志。

2019-08-09T19:00:12,919 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[page_event]
2019-08-09T19:00:12,921 INFO [page_event-incremental-persist] org.apache.druid.segment.realtime.plumber.RealtimePlumber - DataSource[page_event], Interval[2019-08-09T11:00:00.000Z/2019-08-09T12:00:00.000Z], Metadata [null] persisting Hydrant[FireHydrant{, queryable=page_event_2019-08-09T11:00:00.000Z_2019-08-09T12:00:00.000Z_2019-08-09T11:00:12.659Z, count=0}]
2019-08-09T19:00:12,922 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.plumber.RealtimePlumber - Submitting persist runnable for dataSource[page_event]
2019-08-09T19:00:12,923 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Starting persist for interval[2019-08-09T11:00:00.000Z/2019-08-09T12:00:00.000Z], rows[1]
2019-08-09T19:00:13,036 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Completed dim conversions in 78 millis.
2019-08-09T19:00:13,073 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - completed walk through of 1 rows in 12 millis.
2019-08-09T19:00:13,086 INFO [page_event-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Completed time column in 13 millis.

通过日志发现，数据是一行一行合并持久化的。但是持久化数据条目数有关的只有maxRowsInMemory=75000和maxBytesInMemory=1/6 * JVM heap （这个参数是0.13.0 新加的）。
这俩都见应该都不满足的。
相关代码在

  public boolean canAppendRow()
  {
    final boolean countCheck = size() < maxRowCount;
    // if maxBytesInMemory = -1, then ignore sizeCheck
    final boolean sizeCheck = maxBytesInMemory <= 0 || getBytesInMemory().get() < maxBytesInMemory;
    final boolean canAdd = countCheck && sizeCheck;
    if (!countCheck && !sizeCheck) {
      outOfRowsReason = StringUtils.format(
          "Maximum number of rows [%d] and maximum size in bytes [%d] reached",
          maxRowCount,
          maxBytesInMemory
      );
    } else {
      if (!countCheck) {
        outOfRowsReason = StringUtils.format("Maximum number of rows [%d] reached", maxRowCount);
      } else if (!sizeCheck) {
        outOfRowsReason = StringUtils.format("Maximum size in bytes [%d] reached", maxBytesInMemory);
      }
    }

    return canAdd;
  }

因为maxRowsINMemory是0.13.0新加的参数。且设为-1(小于0就ok)就是废弃这个参数

  public static long getMaxBytesInMemoryOrDefault(final long maxBytesInMemory)
  {
    // In the main tuningConfig class constructor, we set the maxBytes to 0 if null to avoid setting
    // maxBytes to max jvm memory of the process that starts first. Instead we set the default based on
    // the actual task node's jvm memory.
    long newMaxBytesInMemory = maxBytesInMemory;
    if (maxBytesInMemory == 0) {
      newMaxBytesInMemory = TuningConfig.DEFAULT_MAX_BYTES_IN_MEMORY;
    } else if (maxBytesInMemory < 0) {
      newMaxBytesInMemory = Long.MAX_VALUE;
    }
    return newMaxBytesInMemory;
  }

long DEFAULT_MAX_BYTES_IN_MEMORY = JvmUtils.getRuntimeInfo().getMaxHeapSizeBytes() / 6;

配置试了以下，证明就是这个参数的问题。

根本原因

接下来看看为什么这次参数默认值不好用呢。
先看看druid是怎么预估数据的大小的。
org.apache.druid.segment.incremental.OnheapIncrementalIndex

long estimatedRowSize = estimateRowSizeInBytes(key, maxBytesPerRowForAggregators);
sizeInBytes.addAndGet(estimatedRowSize);

  private long estimateRowSizeInBytes(IncrementalIndexRow key, long maxBytesPerRowForAggregators)
  {
    return ROUGH_OVERHEAD_PER_MAP_ENTRY + key.estimateBytesInMemory() + maxBytesPerRowForAggregators;
  }

可以看到，一行数据的预估数据量分为3部分，第一部分是一个固定值，不很大。第二部分是维度所占用的数据量大小，第三部分是聚合度量占用的数据量大小。

  public long estimateBytesInMemory()
  {
    long sizeInBytes = Long.BYTES + Integer.BYTES * dims.length + Long.BYTES + Long.BYTES;
    sizeInBytes += dimsKeySize;
    return sizeInBytes;
  }

 private static long getMaxBytesPerRowForAggregators(IncrementalIndexSchema incrementalIndexSchema)
  {
    long maxAggregatorIntermediateSize = Integer.BYTES * incrementalIndexSchema.getMetrics().length;
    maxAggregatorIntermediateSize += Arrays.stream(incrementalIndexSchema.getMetrics())
                                           .mapToLong(aggregator -> aggregator.getMaxIntermediateSizeWithNulls()
                                                                    + Long.BYTES * 2)
                                           .sum();
    return maxAggregatorIntermediateSize;
  }

维度数据量只和个数有关系，也不会大的离谱。剩下只有聚合度量可下手了。想到用了data sketch，往下跟代码。

  @Override
  public int getMaxIntermediateSize()
  {
    return SetOperation.getMaxUnionBytes(size);
  }

    public static int getMaxUnionBytes(int nomEntries) {
        int nomEnt = Util.ceilingPowerOf2(nomEntries);
        return (nomEnt << 4) + (Family.UNION.getMaxPreLongs() << 3);
    }

发现他个sketch设置的size有关系。默认是16384，设置的值是8388608，大了将近快1000倍，而且数据中有4-5个这样子的度量。用默认值试了以下。解决。

总结

data sketch size设置不合理导致0.13.0新特性按照数据量（maxBytesInMemory）来持久化数据参数出问题。
还是需要看看data sketch的实现。

druid.io 0.12.2 升级到0.15.0 数据摄入过慢问题排查

问题

排查

首先看实时摄入日志

根本原因

总结

你可能感兴趣的:(druid.io 0.12.2 升级到0.15.0 数据摄入过慢问题排查)