解决flume SyslogTCP 日志长度超限问题

日志采集之前一直使用的是syslogUDP的方式,因为采集的是网络流量日志,所以对于日志数量丢失没有太在意,一次偶然的对数发现syslogUDP方式丢包太过严重,经排查发现使用rsyslog方式发送UDP报文时除去头部,body长度超过1472字节时会被截断,于是flume采集时拦截器进行json校验不通过。
解决方案:牺牲性能,换syslogTCP
更换syslogTCP后进行测试时,发送较长报文,tcpdump抓包有数据,kafka中无数据,推测flume采集出错,查看flume日志发现如下信息:

2021-12-22 10:55:07,672 WARN org.apache.flume.source.SyslogUtils: Event size larger than specified event size: 2500. You should consider increasing your event size.
2021-12-22 10:55:07,673 WARN org.apache.flume.source.SyslogUtils: Event created from Invalid Syslog data.

日志长度超过2500字节,flume提示无效数据
查看flume对应org.apache.flume.source.SyslogUtils源码:

public static final Integer MIN_SIZE = 10;
public static final Integer DEFAULT_SIZE = 2500;

指定默认最小长度为10,最大长度为2500
查找报错长度超限的校验逻辑:

if (isBadEvent) {
        logger.warn("Event created from Invalid Syslog data.");
        headers.put(EVENT_STATUS, SyslogStatus.INVALID.getSyslogStatus());
      } else if (isIncompleteEvent) {
        logger.warn("Event size larger than specified event size: {}. You should " +
            "consider increasing your event size.", maxSize);
        headers.put(EVENT_STATUS, SyslogStatus.INCOMPLETE.getSyslogStatus());
      }

当event size大于maxSize时会有如上日志输出
查看maxSize的指定:

private Integer maxSize;

maxSize为私有属性

public SyslogUtils(Integer eventSize, Set<String> keepFields, boolean isUdp, Clock clock) {
    this.isUdp = isUdp;
    this.clock = clock;
    isBadEvent = false;
    isIncompleteEvent = false;
    maxSize = (eventSize < MIN_SIZE) ? MIN_SIZE : eventSize;
    baos = new ByteArrayOutputStream(eventSize);
    this.keepFields = keepFields;
    initHeaderFormats();
  }

在构造器中判断eventSize与MIN_SIZE大小,eventSize为传入参数
同时存在setEventSize方法被SyslogTcpSource与SyslogUdpSource类所引用

public void setEventSize(Integer eventSize) {
    this.maxSize = eventSize;
  }

判断eventSize为配置文件中可指定参数,查看flume官网关于syslogTCP参数列表:
解决flume SyslogTCP 日志长度超限问题_第1张图片
指定参数:

a1.sources.r1.eventSize = 5000

重启flume,问题解决

你可能感兴趣的:(大数据,flume,大数据,big,data)