从日志平台本身的业务需求分析来看,我们至少应该希望通过平台获取到以下日志信息:
根据平台组件部署方式的不同,组件日志的采集也有不同。平台组件目前主要有两种启动方式:
备注:
目前我的k8s集群的部署方式是用kargo(目前叫kubespray)部署的,etcd和kubelet是通过systemd启动的容器(即这两个组件的可靠性由systemd来管理)。其他k8s原生组件是使用static pod的形式起在k8s上(即这些组件的可靠性由k8s来保证)。
应用都是指跑在k8s集群中的容器,但是根据应用日志的输出方式不同,应用日志的采集方式也有不同。目前平台主要支持两种日志采集:
由第二节可知,采集方式主要有三种:
fluentd需要额外安装几个plugin:
备注:
configmap文件的fluent.conf解析:
@type record_transformer
enable_ruby
logname ${tag.split('.')[4]} //提取出tag中的podID_namespaceName_appName-containerID
<match container.**> //把从容器中取到的日志分为应用日志和组件日志分别处理
@type copy
@type grep
input_key logname
exclude ^kube
remove_tag_prefix container
add_tag_prefix application
@type grep
regexp1 logname ^kube
remove_tag_prefix container
add_tag_prefix component
match>
@type record_transformer
enable_ruby //下面的host #{Socket.gethostname}是ruby语法
app ${(record['logname'].split('_')[0].split('-')[-2] =~ /\d{10}/)==0 ? record['logname'].split('_')[0].split('-')[0..-3].join("-"):record['logname'].split('_')[0].split('-')[0..-2].join("-")}
namespace ${record['logname'].split('_')[1]}
podname ${record['logname'].split('_')[0]}
message ${record['log']}
logtime ${record['time']}
cluster cluster-1
host #{Socket.gethostname}
remove_keys logname,log,stream,time
@type record_transformer
enable_ruby
message ${record['log']}
logtime ${record['time']}
cluster cluster-1
component_name ${record['logname'].split('-')[0]+'-'+record['logname'].split('-')[1]}
host #{Socket.gethostname}
remove_keys logname,log,time,stream
@type loglevel //使用自己写的loglevel plugin提取出severity信息
<match application.**>
@type elasticsearch_dynamic
flush_interval 5s
hosts elasticsearch.kube-system.svc.cluster.local:9200
logstash_format true //https://github.com/uken/fluent-plugin-elasticsearch#logstash_format
logstash_prefix k8s-application
logstash_dateformat %Y.%m //为了以年、月在es中建索引:https://github.com/uken/fluent-plugin-elasticsearch#logstash_dataformat
type_name ${record['cluster']}-${record['namespace']}
match>
<match component.**>
@type elasticsearch_dynamic
flush_interval 5s
hosts elasticsearch.kube-system.svc.cluster.local:9200
logstash_format true
logstash_prefix k8s-component
logstash_dateformat %Y.%m
type_name ${record['cluster']}
match>
@type etcd //使用自己写的etcd plugin提取出severity信息
@type kubelet
@type record_transformer
enable_ruby
host #{Socket.gethostname}
cluster cluster-1
component_name etcd
remove_keys MESSAGE,_SELINUX_CONTEXT,__CURSOR,__REALTIME_TIMESTAMP,__MONOTONIC_TIMESTAMP,_BOOT_ID,_TRANSPORT,PRIORITY,SYSLOG_FACILITY,_UID,_GID,_CAP_EFFECTIVE,_SYSTEMD_SLICE,_MACHINE_ID,_HOSTNAME,SYSLOG_IDENTIFIER,_PID,_COMM,_EXE,_CMDLINE,_SYSTEMD_CGROUP,_SYSTEMD_UNIT
@type record_transformer
enable_ruby
host #{Socket.gethostname}
cluster cluster-1
component_name kubelet
remove_keys MESSAGE,_SELINUX_CONTEXT,__CURSOR,__REALTIME_TIMESTAMP,__MONOTONIC_TIMESTAMP,_BOOT_ID,_TRANSPORT,PRIORITY,SYSLOG_FACILITY,_UID,_GID,_CAP_EFFECTIVE,_SYSTEMD_SLICE,_MACHINE_ID,_HOSTNAME,SYSLOG_IDENTIFIER,_PID,_COMM,_EXE,_CMDLINE,_SYSTEMD_CGROUP,_SYSTEMD_UNIT
<match {etcd,kubelet}>
@type elasticsearch_dynamic
flush_interval 5s
hosts elasticsearch.kube-system.svc.cluster.local:9200
logstash_format true
logstash_prefix journal
logstash_dateformat %Y.%m
type_name ${record['cluster']}
match>
@type nginx
<match nginx>
@type elasticsearch_dynamic
flush_interval 5s
hosts elasticsearch.kube-system.svc.cluster.local:9200
index_name k8s-audit
type_name ${record['cluster']}-${record['namespace']}
remove_keys @timestamp,method
match>
待续