以下所列问题,都是本人在使用Logstash做异库同步业务中,所遇到的问题,记录一下,供大家参考。很多问题解决后,想想感觉很崩溃,尤其是第二个问题。
一、
[FATAL][logstash.runner ] An unexpected error occurred! {:error=>#, :backtrace=>["/export/servers/logstash-6.2.2/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:187:in `buffer_flush'", "/export/servers/logstash-6.2.2/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:112:in `block in buffer_initialize'", "org/jruby/RubyKernel.java:1292:in `loop'", "/export/servers/logstash-6.2.2/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:110:in `block in buffer_initialize'"]}
[INFO ][logstash.pipeline ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#"}
[INFO ][logstash.agent ] Pipelines running {:count=>1, :pipelines=>["main"]}
[ERROR][org.logstash.Logstash ] java.lang.IllegalStateException: org.jruby.exceptions.RaiseException: (SystemExit) exit
这个问题很隐晦。当我们使用-t对logstash的配置文件进行测试的时候,没问题,但是当运行后,就会出现这样的错误。在我多次的测试中,发现这其实还是配置文件的问题,只是logstash架构自己不能检测出来。所以遇到这样的问题,别慌。细心检查自己的配置文件。
二、
[WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"LogStash::Error", :backtrace=>["org/logstash/ext/JrubyEventExtLibrary.java:202:in `sprintf'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.4/lib/logstash/outputs/webhdfs.rb:194:in `flush'", "org/jruby/RubyArray.java:2409:in `collect'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.4/lib/logstash/outputs/webhdfs.rb:189:in `flush'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `buffer_flush'", "org/jruby/RubyHash.java:1342:in `each'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:216:in `buffer_flush'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:193:in `buffer_flush'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/export/servers/logstash-5.5.3/vendor/bundle/jruby/1.9/gems/logstash-output-webhdfs-3.0.4/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "org/jruby/RubyArray.java:1613:in `each'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:22:in `multi_receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/output_delegator.rb:47:in `multi_receive'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:420:in `output_batch'", "org/jruby/RubyHash.java:1342:in `each'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:419:in `output_batch'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:365:in `worker_loop'", "/export/servers/logstash-5.5.3/logstash-core/lib/logstash/pipeline.rb:330:in `start_workers'"]}
这个问题,其实是我花了很长时间都没弄懂的。最终在亚洲和涛哥的帮助下才解决的,我觉得不是我能力不够,而是真的太出乎意料了。
原因是我的输出路径。我们先看一下官网对path的举例: e.g.: /user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log。同样的写法我用在了我的配置文件中,只是我把log替换为了TXT。最开始的测试中,完全没问题,所以在出现问题后,我就根本没想过居然是这的问题。最终我替换成了/hive/logs/output.txt,最终解决了这个问题。其实后面我有试过改成dt=%{+YYYY-MM-dd}/%{+HH},几次测试又没问题了,在接近上线的时候,又出现为了上面的问题。最终保险决定不使用这种格式了。这个问题我已经提交官网,后续我也会关注其解决情况。
三、
[WARN ][logstash.outputs.webhdfs ] webhdfs write caused an exception:
{"RemoteException":{"exception":"AlreadyBeingCreatedException",
"javaClassName":"org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException",
"message":"Failed to APPEND_FILE /output for DFSClient_NONMAPREDUCE_-688998419_40 on 10.66.90.167
because this file lease is currently owned by DFSClient_NONMAPREDUCE_-380528477_38 on 10.66.90.167\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2932)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2683)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2982)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2950)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:654)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)\n\tat
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)\n"}}.
Maybe you should increase retry_interval or reduce number of workers. Retrying...
这个问题就很简单了,在我的输出配置中,我设置了:
retry_interval => 30 # 间隔多久向HDFS尝试重写
如果上述问题不能很好的解决,其实最终会引发下面的问题:
[ERROR][logstash.outputs.webhdfs ] Max write retries reached. Events will be discarded. Exception: {"RemoteException":{"exception":"IOException","javaClassName":"java.io.IOException","message":"append: lastBlock=blk_1073768614_56374 of src=/output is not sufficiently replicated yet.\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2690)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2982)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2950)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:654)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:421)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2045)\n"}}
最终导致写入HDFS数据请求失败。
我当时配置的时间很短,当写入的数据比较多的时候,就会警告。虽然不影响使用,但是看着不舒服,就增大了这个配置。
后续我会不断更新此文章,如果大家在开发遇到有意思的问题,也欢迎共同探讨。