[Err] 1064 - errCode = 2, detailMessage = Failed to find enough host in all backends. need: 3
原因:
语句中指定了 PROPERTIES("replication_num" = "3");
结果BE只有2个:
查看对应节点的日志:.
==> ./be.WARNING.log.20200921-141304 <==
W1026 18:13:39.139992 19091 utils.cpp:101] fail to get master client from cache. host=192.168.6.143, port=9020, code=7
W1026 18:13:39.140386 19091 task_worker_pool.cpp:1185] finish report olap table state failed. status:-1, master host:192.168.6.143, port:9020
W1026 18:13:40.391201 19089 utils.cpp:101] fail to get master client from cache. host=192.168.6.143, port=9020, code=7
W1026 18:13:40.391471 19089 task_worker_pool.cpp:1060] finish report task failed. status:-1, master host:192.168.6.143port:9020
W1027 10:00:31.385262 2359 data_dir.cpp:128] open file filed, error: IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id
W1027 10:00:31.385926 2359 data_dir.cpp:95] _init_cluster_id failed, error: IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id
W1027 10:00:31.385958 2359 storage_engine.cpp:192] Store load failed, status=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id, path=/wyyt/software/doris/be/storage
W1027 10:00:31.386071 2353 storage_engine.cpp:148] _init_store_map failed, error: Internal error: init path failed, error=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;
W1027 10:00:31.386106 2353 storage_engine.cpp:96] open engine failed, error: Internal error: init path failed, error=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;
F1027 10:00:31.386186 2353 doris_main.cpp:189] fail to open StorageEngine, res=init path failed, error=IO error: failed to open cluster id file /wyyt/software/doris/be/storage/cluster_id;
找到原因之后,解决问题。我这里是打开文件失败,权限给755试试,然后重启BE节点。
如果重启失败,直接删除 be.pid ,再重启
启动服务的时候是什么用户就是什么用户
原因:字段长度数字加起来不能超过10W。如果要改,可以设置,但是不推荐
ErrorReason{code=errCode = 2, msg='failed to create task: errCode = 2, detailMessage = disk 6189104187500640169 on backend 11001 exceed limit usage'
导致所有的任务暂停;
create materialized view test_p_user_view as select user_id,user_name from test_p_user limit 8;
ERROR 1064 (HY000): errCode = 2, detailMessage = The materialized view is coming soon
解决:可以在master上执行这个命令 ADMIN SET FRONTEND CONFIG ("enable_materialized_view" = "true");
目前物化视图只支持duplicate key 表,而且0.12只支持部分,0.13版本会完善
1,在doris创建对应的表
2,执行语句
从hdfs导入大表导致be节点挂掉
解决方案:对fe进行参数设置
任务要显示指定内存:
查看be日志,查看core文件,查看是否是OOM。
参考:https://blog.csdn.net/weixin_42135997/article/details/80732658
https://blog.csdn.net/qq_15437667/article/details/83934113?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~sobaiduend~default-1-83934113.nonecase&utm_term=linux%20%E6%80%8E%E4%B9%88%E7%9C%8Bcore%E6%96%87%E4%BB%B6&spm=1000.2123.3001.4430
查看be节点,是Alive状态。
查看be节点日志 be.INFO be.WARN 日志都没发现啥
后来发现是一个节点的磁盘出问题了 ,以后遇到这种问题,就晓得怎么排查了。。
1)验证了broker导入hdfs数据,导入数据使用uniq模式的情况下。相同主键覆盖不是有序,而是按照第二个字段的长度来替换的(第二个字段长度最大,相同长度则取时间最新的。),如果第二个字段一样,同理,比较第三个字段长度。
结果数据:
type:LOAD_RUN_FAIL; msg:errCode = 2, detailMessage = all partitions have no load data
原始表数据为null。没数据
原因:应该是内存不足的原因导致BE死掉。
解决方案:broker 单节点限制每次1个G,或者更小
BE的任务并发是默认 max_routine_load_task_num_per * be数量
比如be节点有3个,那么所有的并发是 5*3
内存不够,修改内存
异常说明:数据质量不好,导致不能doris不能解析或者解析失败而取消导入任务
可能原因:
1. varchar字段太长;分隔符问题
2. too_many_filtered_rows
解决方案
长文本不要导入;长文本导入截断;数据中包含分隔符
16,使用broker导入数据到doris之后,发现内存没有释放
解决方案:
尝试升级doris版本为0.13.15,验证这个问题:
地址:https://cloud.baidu.com/doc/PALO/s/Ikivhcwb5
17,出现的错误
doris版本为 0.13.11 补丁版本。
18,出现be节点的data目录很大,有的be节点目录很正常。
初步判断原因集群负载有问题,routine load写入太频繁
查看表是否正常:
修改routine load参数 ,设置为60s
(
'desired_concurrent_number'='3',
'max_batch_interval' = '60',
'max_batch_rows' = '300000',
'max_batch_size' = '209715200',
'strict_mode' = 'false',
'format' = 'json'
)
20,doris 0.14.7 内网3个fe部署之后写入数据以后,fe有节点挂掉,具体日志:
2021-08-27 09:09:25,172 ERROR (heartbeat mgr|19) [BDBJEJournal.write():166] catch an exception when writing to database. sleep and retry. journal id 1526718
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160910 VLSN: 31,775,195, initiated at: 09:09:22. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,198 replicaTxnEndVLSN=31,775,193
192.168.7.4_9010_1625132697001: feederVLSN=31,775,198 replicaTxnEndVLSN=31,775,191at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:159) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logEdit(EditLog.java:849) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logHeartbeat(EditLog.java:1265) [palo-fe.jar:3.4.0]
at org.apache.doris.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:154) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
2021-08-27 09:09:27,884 WARN (Thread-49|192) [BDBJEMetricHandler.write():117] write metric data into bdb error, key:192.168.7.7:8030_query_err_rate_1630026555000
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160912 VLSN: 31,775,198, initiated at: 09:09:23. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,199 replicaTxnEndVLSN=31,775,196
192.168.7.4_9010_1625132697001: feederVLSN=31,775,199 replicaTxnEndVLSN=31,775,191at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.metric.collector.BDBJEMetricHandler.write(BDBJEMetricHandler.java:115) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.BDBJEMetricHandler.writeDouble(BDBJEMetricHandler.java:109) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.parseFeMetricJsonAndWriteMetric(MetricCollector.java:217) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.writeMetric(MetricCollector.java:105) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.lambda$init$0(MetricCollector.java:77) ~[palo-fe.jar:3.4.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
2021-08-27 09:09:33,338 WARN (Thread-49|192) [BDBJEMetricHandler.write():117] write metric data into bdb error, key:192.168.7.7:8030_quantile0.75_1630026555000
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160913 VLSN: 31,775,200, initiated at: 09:09:27. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,202 replicaTxnEndVLSN=31,775,198
192.168.7.4_9010_1625132697001: feederVLSN=31,775,202 replicaTxnEndVLSN=31,775,196at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.metric.collector.BDBJEMetricHandler.write(BDBJEMetricHandler.java:115) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.BDBJEMetricHandler.writeDouble(BDBJEMetricHandler.java:109) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.parseFeMetricJsonAndWriteMetric(MetricCollector.java:247) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.writeMetric(MetricCollector.java:105) ~[palo-fe.jar:3.4.0]
at org.apache.doris.metric.collector.MetricCollector.lambda$init$0(MetricCollector.java:77) ~[palo-fe.jar:3.4.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
2021-08-27 09:09:37,283 ERROR (heartbeat mgr|19) [BDBJEJournal.write():166] catch an exception when writing to database. sleep and retry. journal id 1526718
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160914 VLSN: 31,775,202, initiated at: 09:09:30. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
Current feeds:
192.168.7.7_9010_1625192915300: feederVLSN=31,775,205 replicaTxnEndVLSN=31,775,200
192.168.7.4_9010_1625132697001: feederVLSN=31,775,205 replicaTxnEndVLSN=31,775,196at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:226) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:772) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.commit(Txn.java:625) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.txn.Txn.operationEnd(Txn.java:1803) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1506) ~[je-7.3.7.jar:7.3.7]
at com.sleepycat.je.Database.put(Database.java:1556) ~[je-7.3.7.jar:7.3.7]
at org.apache.doris.journal.bdbje.BDBJEJournal.write(BDBJEJournal.java:159) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logEdit(EditLog.java:849) [palo-fe.jar:3.4.0]
at org.apache.doris.persist.EditLog.logHeartbeat(EditLog.java:1265) [palo-fe.jar:3.4.0]
at org.apache.doris.system.HeartbeatMgr.runAfterCatalogReady(HeartbeatMgr.java:154) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) [palo-fe.jar:3.4.0]
at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [palo-fe.jar:3.4.0]
2021-08-27 09:09:40,305 WARN (Thread-49|192) [BDBJEMetricHandler.write():117] write metric data into bdb error, key:192.168.7.7:8030_quantile0.95_1630026555000
com.sleepycat.je.rep.InsufficientAcksException: (JE 7.3.7) Transaction: -16160916 VLSN: 31,775,205, initiated at: 09:09:33. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 2000ms. FeederState=192.168.7.5_9010_1625132780567(2)[MASTER]
如下图:
初步判断是不是心跳超时时间设置的太短了,因为测试这个版本没有调整任何参数。
后来判断是不是fe元数据同步副本的时候写入失败,重试失败。
重启了3次才起来: