logstash的webhdfs使用问题

2018年4月25日 星期三

10:11

现象

Logstash使用webhdfs插件,配置完成后无法正常输出到HDFS中,日志中报错:

[2018-04-25T00:00:26,915][WARN ][logstash.outputs.webhdfs ] Failed to flush outgoing items {:outgoing_count=>1, :exception=>"WebHDFS::ServerError", :backtrace=>["/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:351:in `request'", "/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:270:in `operate_requests'", "/opt/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/webhdfs-0.8.0/lib/webhdfs/client_v1.rb:73:in `create'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:228:in `write_data'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:211:in `block in flush'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:199:in `flush'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:219:in `block in buffer_flush'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:216:in `buffer_flush'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/stud-0.0.23/lib/stud/buffer.rb:159:in `buffer_receive'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-output-webhdfs-3.0.6/lib/logstash/outputs/webhdfs.rb:182:in `receive'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `block in multi_receive'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/logstash-core/lib/logstash/outputs/base.rb:92:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/output_delegator_strategies/legacy.rb:22:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/output_delegator.rb:49:in `multi_receive'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:477:in `block in output_batch'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:476:in `output_batch'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:428:in `worker_loop'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:386:in `block in start_workers'"]}

分析

检查配置

既然报这个错误,就确定是访问WebHDFS过程中出错,那么首先检查下配置。

配置内容如下:

input {
  beats {
        port => "5044"
    }
}
output {
  stdout{
    codec => rubydebug
  }
  webhdfs {
    host => "x.x.x.x"                
    port => 9870                     
    path => "/weblog/iis/%{@source_host}/%{+YYYY-MM-dd}/iislog-%{@source_host}-%{YYYYMMddHH}.log"  
    user => "root"             
    retry_times => 100        
  }
}

WebHDFS: ServerError

因为没用过Logstash,但是直到很简单易用。所以直接搜关键字,查找到了logstash-output-webhdfs Failed to flush outgoing items这篇文章,提到:

It seems you should set user option of logstash-output-webhdfs to the hdfs supergroup user,which is the user you use to start hdfs.For example ,if you use root to run start-dfs.sh bash,then the user option shuold be root.
In addition, you should edit /etc/hosts ,add the hdfs cluster node route list .

可以确认两点常见问题:

  1. HDFS访问账户问题;
  2. HDFS的主机解析问题;

解决

HDFS访问账户问题

这个很容易确认,HDFS上使用的账户就是root。

HDFS主机解析问题

查看/etc/hosts内容,发现只有namenode做了配置。

简单思考了下,Logstash默认可能使用主机名进行解析的,而且从namenode获取到的也应该是主机名。因此Answer中才说要加入节点路由列表。

增加hosts

直接将所有Hadoop的节点/IP映射放入/etc/hosts中。

修改配置

然后修改logstash配置。

input {
  beats {
        port => "5044"
    }
}
output {
  stdout{
    codec => rubydebug
  }
  webhdfs {
    host => "namenode"                
    port => 9870                     
    path => "/weblog/iis/%{+YYYY-MM-dd}/%{@source_host}/iislog-%{+HH}.log"
    user => "root"             
    retry_times => 100        
  }
}

确认结果

查看HDFS中,如果有建立对应目录和文件就OK了。

遗留问题

实际上还存在一些问题:

  1. 按照官网示例,年月日前面有个dt=不明白什么作用。
  2. %{@source_host}无法解析。
  3. {+HH}不是按照UTC+0800来建立的。

参考

  1. logstash-output-webhdfs Failed to flush outgoing items

你可能感兴趣的:(logstash的webhdfs使用问题)