EsgynDB delete报错org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException

现象

当表行宽较大时,批量删除数据或者插入数据可能会报错org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException,典型的错误场景如下,

SQL>delete from TE_JZYY_TRADEDATA where TRANDATE <'20240101';
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::nextRow returned error HBASE_ACCESS_ERROR(-706). Cause: java.util.concurrent.ExecutionException: java.io.IOException: performScan encountered Exception txID: 72339094786551175 Exception: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: TrxRegionEndpoint coprocessor: getScanner - scanner id 14, Expected nextCallSeq: 5, But the nextCallSeq received from client: 4 in region TRAFODION.ITLR_UAT.TE_JZYY_TRADEDATA,,1593580518362.85de9a4992f2600c868b0e1249ab15a2.,skey=null,ekey=null
java.util.concurrent.FutureTask.report(FutureTask.java:122)
java.util.concurrent.FutureTask.get(FutureTask.java:192)
org.trafodion.sql.HTableClient.fetchRows(HTableClient.java:1343) Caused by
java.io.IOException: performScan encountered Exception txID: 72339094786551175 Exception: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: TrxRegionEndpoint coprocessor: getScanner - scanner id 14, Expected nextCallSeq: 5, But the nextCallSeq received from client: 4 in region TRAFODION.ITLR_UAT.TE_JZYY_TRADEDATA,,1593580518362.85de9a4992f2600c868b0e1249ab15a2.,skey=null,ekey=null
org.apache.hadoop.hbase.client.transactional.TransactionalScanner.next(TransactionalScanner.java:391)
org.apache.hadoop.hbase.client.AbstractClientScanner.next(AbstractClientScanner.java:70)
org.trafodion.sql.HTableClient$ScanHelper.call(HTableClient.java:309)
org.trafodion.sql.HTableClient$ScanHelper.call(HTableClient.java:307)
java.util.concurrent.FutureTask.run(FutureTask.java:266)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748). [2020-07-09 10:32:07]

解决

之所以报错,主要是因为OutOfMemory的问题,每个mxosrvr的JVM HEAP SIZE受ms.env中的参数JVM_MAX_HEAP_SIZE_MB控制。删除或插入语句如果走事务的话需要保证cache size * 行宽 Cache size的大小可以通过explain中的执行计划看到,
EsgynDB delete报错org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException_第1张图片
由于示例中的表TE_JZYY_TRADEDATA中有一个5M的大字段,加上其他一些字段的总宽度超过5M,根据cache size * 行宽,即1024*5M多,大于mxosrvr的JVM HEAP SIZE,导致OutOutMemory。
关于某个mxosrvr的HEAP SIZE的大小,我们也可以使用以下命令来检查。

[trafodion@grcbperf207 ~]$ jinfo -flags 292801
Attaching to process ID 292801, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.191-b12
Non-default VM flags: -XX:CICompilerCount=18 -XX:CompressedClassSpaceSize=125829120 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=null -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 -XX:MaxMetaspaceSize=134217728 -XX:MaxNewSize=178782208 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=178782208 -XX:OldSize=358088704 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseParallelGC 
Command line:  -Xmx512m -XX:CompressedClassSpaceSize=128m -XX:MaxMetaspaceSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/trafodion

因此,解决此问题有以下几种方案:

  1. 缩小列宽,如不需要5M,可以减少长度,不过一般情况下业务可能不允许
  2. 调整cache size大小,默认1024,可以通过cqd hbase_num_cache_rows_max ‘100’; 来缩小
  3. 增大mxosrvr 的jvm heap size,如在ms.env中设置JVM_MAX_HEAP_SIZE_MB=1024,需要重启数据库生效

你可能感兴趣的:(Trafodion,Troubleshoot)