在Kubernetes上搭建新版fluentd-elasticsearch_1.22日志收集工具

背景介绍

第一,对于企业来说,日志的重要性不言而喻,就不赘述了。

第二,日志收集分析展示平台的选择,这里给出几点选择ELK的理由。ELK是一套非常成熟的系统,她本身的构架非常适合Kubernetes集群,这里官网当中也是选用的 Elasticsearch作为Sample的,GitHub上下载的kubernetes二进制包中本身就有这个.yaml文件,所以使用ELK作为收集日志的理由相当充分。
对于任何基础设施或后端服务系统,日志都是极其重要的。对于受Google内部容器管理系统Borg启发而催生出的Kubernetes项目来说,自然少不了对Logging的支持。在“ Logging Overview “中,官方概要介绍了Kubernetes上的几个层次的Logging方案,并给出Cluster-level logging的参考架构:
这里写图片描述

Kubernetes还给出了参考实现:
– Logging Backend: Elastic Search stack(包括: Kibana )
– Logging-agent: fluentd
ElasticSearch stack实现的cluster level logging的一个优势在于其对Kubernetes集群中的Pod没有侵入性,Pod无需做任何配合性改动。同时EFK/ELK方案在业内也是相对成熟稳定的。
在本文中,我将为我们的Kubernetes 1.2.0集群安装Fluentd(前提 Elastic Search和kibana已经部署完成)。
一、Kubernetes 1.2.0部署脚本

Kubernetes 1.2.0集群是通过kube-up.sh搭建并初始化的。按照 K8s官方文档 有关elasticsearch logging的介绍,在kubernetes/cluster/ubuntu/config-default.sh中,有一系列配置项:

// kubernetes/cluster/ubuntu/config-default.sh
# Optional: Enable node logging.
ENABLE_NODE_LOGGING=false
LOGGING_DESTINATION=${LOGGING_DESTINATION:-elasticsearch}

# Optional: When set to true, Elasticsearch and Kibana will be setup as part of the cluster bring up.
ENABLE_CLUSTER_LOGGING=false
ELASTICSEARCH_LOGGING_REPLICAS=${ELASTICSEARCH_LOGGING_REPLICAS:-1}

但是修改完配置文件后,EFK并没有通过kube-up.sh运行并安装elastic logging了。所以需要手工进行安装了!

镜像准备

1.2.0源码中kubernetes/cluster/addons/fluentd-elasticsearch下的manifest已经比较old了,我们直接使用kubernetes最新源码中的 manifest文件 :

k8s.io/kubernetes/cluster/addons/fluentd-elasticsearch/es-controller.yaml 

分析这个yaml,我们需要这个个镜像:
gcr.io/google_containers/fluentd-elasticsearch:1.22

显然镜像都在墙外,我通过工具下载镜像并重新打tag,上传到我在hub.docker.com上的账号下。

# docker pull  gcr.io/google_containers/fluentd-elasticsearch:1.2.2
# docker tag gcr.io/google_containers/fluentd-elasticsearch:1.2.2 lidnyun/elasticsearch:1.2.2
# docker push lidnyun/elasticsearch:1.2.2

下面是我们在后续安装过程中真正要使用到的镜像:
lidnyun/fluentd-elasticsearch:1.22

启动fluentd

fluentd是以 DaemonSet 的形式跑在K8s集群上的,这样k8s可以保证每个k8s cluster node上都会启动一个fluentd(注意:将image改为上述镜像地址)。

# kubectl create -f fluentd-es-ds.yaml
  daemonset "fluentd-es-v1.22" created

查看daemonset中的Pod的启动情况kubectl get pods –namespace=kube-system,发现:

NAME                                   READY     STATUS              RESTARTS   AGE
fluentd-elasticsearch-5c2mk            0/1       ContainerCreating   0          10s
fluentd-elasticsearch-84iwz            0/1       CrashLoopBackOff    1          10s
fluentd-elasticsearch-l1rab            0/1       CrashLoopBackOff    1          10s

fluentd Pod启动失败,fluentd的日志可以通过两种方式查看,通过宿主主机/var/log/fluentd.log或者通过kubectl命令查看:

  1. 通过/var/log/fluentd.log
    查看fluentd日志,tail -fn100 /var/log/fluentd.log
2017-03-29 08:37:35 +0000 [warn]: process died within 1 second. exit.
2017-03-29 08:38:23 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
2017-03-29 08:38:23 +0000 [info]: starting fluentd-0.12.31
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.4.0'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-docker_metadata_filter' version '0.1.3'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.5.0'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-kafka' version '0.4.1'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-kubernetes_metadata_filter' version '0.24.0'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.16'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.5'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-s3' version '0.8.0'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-td' version '0.10.29'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.2'
2017-03-29 08:38:23 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.2'
2017-03-29 08:38:23 +0000 [info]: gem 'fluentd' version '0.12.31'
2017-03-29 08:38:23 +0000 [info]: adding match pattern="fluent.**" type="null"
2017-03-29 08:38:23 +0000 [info]: adding filter pattern="kubernetes.**" type="kubernetes_metadata"
2017-03-29 08:38:23 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error="Invalid Kubernetes API v1 endpoint https://10.232.0.1:443/api: 401 Unauthorized"
2017-03-29 08:38:23 +0000 [info]: process finished code=256
2017-03-29 08:38:23 +0000 [warn]: process died within 1 second. exit.
  1. 通过kubectl
    查看某一个fluentd容器日志,kubectl logs fluentd-elasticsearch-m0gbq –namespace=kube-system
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/version.rb:3: warning: already initialized constant JSON::VERSION
/opt/td-agent/embedded/lib/ruby/2.1.0/json/version.rb:3: warning: previous definition of VERSION was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/version.rb:4: warning: already initialized constant JSON::VERSION_ARRAY
/opt/td-agent/embedded/lib/ruby/2.1.0/json/version.rb:4: warning: previous definition of VERSION_ARRAY was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/version.rb:5: warning: already initialized constant JSON::VERSION_MAJOR
/opt/td-agent/embedded/lib/ruby/2.1.0/json/version.rb:5: warning: previous definition of VERSION_MAJOR was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/version.rb:6: warning: already initialized constant JSON::VERSION_MINOR
/opt/td-agent/embedded/lib/ruby/2.1.0/json/version.rb:6: warning: previous definition of VERSION_MINOR was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/version.rb:7: warning: already initialized constant JSON::VERSION_BUILD
/opt/td-agent/embedded/lib/ruby/2.1.0/json/version.rb:7: warning: previous definition of VERSION_BUILD was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/common.rb:99: warning: already initialized constant JSON::NaN
/opt/td-agent/embedded/lib/ruby/2.1.0/json/common.rb:99: warning: previous definition of NaN was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/common.rb:101: warning: already initialized constant JSON::Infinity
/opt/td-agent/embedded/lib/ruby/2.1.0/json/common.rb:101: warning: previous definition of Infinity was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/common.rb:103: warning: already initialized constant JSON::MinusInfinity
/opt/td-agent/embedded/lib/ruby/2.1.0/json/common.rb:103: warning: previous definition of MinusInfinity was here
/opt/td-agent/embedded/lib/ruby/gems/2.1.0/gems/json-1.8.1/lib/json/common.rb:128: warning: already initialized constant JSON::UnparserError
/opt/td-agent/embedded/lib/ruby/2.1.0/json/common.rb:128: warning: previous definition of UnparserError was here

准备配置文件

从上述日志中的error来看:fluentd访问apiserver secure port(443)出错了:Unauthorized!
通过分析 cluster/addons/fluentd-elasticsearch/fluentd-es-image/build.sh和td-agent.conf,发现是fluentd image中的 fluent-plugin-kubernetes_metadata_filter 要去访问API Server以获取一些kubernetes的metadata信息。不过未做任何特殊配置的fluent-plugin-kubernetes_metadata_filter,使用的是kubernetes为Pod传入的环境变量:KUBERNETES_SERVICE_HOST和KUBERNETES_SERVICE_PORT来得到API Server的访问信息的。但API Server在secure port上是开启了安全身份验证机制的,fluentd直接访问必然是失败的。
我们找到了fluent-plugin-kubernetes_metadata_filter项目在github.com上的 主页 ,在这个页面上我们看到了fluent-plugin-kubernetes_metadata_filter支持的其他配置,包括:ca_file、client_cert、client_key等。我们需要修改一下fluentd image中td-agent.conf的配置,为fluent-plugin-kubernetes_metadata_filter增加一些配置项,比如:

// td-agent.conf
... ...

  type kubernetes_metadata
  ca_file /srv/kubernetes/ca.crt
  client_cert /srv/kubernetes/kubecfg.crt
  client_key /srv/kubernetes/kubecfg.key



   type elasticsearch
   log_level info
   include_tag_key true
   hosts 10.45.4.211:9200,10.45.4.36:9200,10.45.4.140:9200
   logstash_format true
   index_name k8s_
   logstash_prefix logstash-
   buffer_chunk_limit 2M
   buffer_queue_limit 32
   flush_interval 5s
   max_retry_wait 30
   disable_retry_limit

... ...

**其中:
filter需要添加。
logstash_prefix是在kibina中创建索引的时候用到,此处的值以logstash-为例。在kibina中创建索引图中可以看出。
hosts 是elasticsearch的日志IP.
本文的elasticsearch不是通过kubernetes启动的是在kubernetes之外启动的。
如果是通过kubernetes服务启动的可以通过服务名称作为日志IP。如:elasticsearch-logging**


如果不想重新制作image?可以通过配置文件hostpath或者Kubernetes提供的ConfigMap 这一强大的武器,本篇文章采用的是将新版td-agent.conf制作成kubernetes的configmap资源,并挂载到fluentd pod的相应位置以替换image中默认的td-agent.conf。
需要注意两点:
* 在基于td-agent.conf创建configmap资源之前,需要将td-agent.conf中的注释行都删掉,否则生成的configmap的内容可能不正确,这个需要自己测试,我没有删除也是可以的;
* fluentd pod将创建在kube-system下,因此configmap资源也需要创建在kube-system namespace下面,否则kubectl create无法找到对应的configmap。

# kubectl create configmap td-agent-config --from-file=./td-agent.conf --namespace=kube-system
configmap "td-agent-config" created

td-agent.conf文件内容如下:

**>
  type null
</match>
#es-containers log

  type tail
  path /var/log/containers/*.log
  pos_file /var/log/es-containers.log.pos
  time_format %Y-%m-%dT%H:%M:%S.%NZ
  tag kubernetes.*
  format json
  read_from_head true
</source>
#kubelet.log

  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?\w)(?<time>\d{4} [^\s]*)\s+(?\d+)\s+(?[^ \]]+)\]             (?.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/upstart/kubelet.log
  pos_file /var/log/upstart/kubelet.log.pos
  tag kubelet
</source>
#kube-proxy log

  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?\w)(?<time>\d{4} [^\s]*)\s+(?\d+)\s+(?[^ \]]+)\] (?.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/upstart/kube-proxy.log
  pos_file /var/log/upstart/kube-proxy.log.pos
  tag kube-proxy
</source>
#kube-apiserver log

  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?\w)(?<time>\d{4} [^\s]*)\s+(?\d+)\s+(?[^ \]]+)\] (?.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/upstart/kube-apiserver.log
  pos_file /var/log/upstart/kube-apiserver.log.pos
  tag kube-apiserver
</source>
#kube-controller-manager log

  type tail
  format multiline
  format_firstline /^\w\d{4}/
  format1 /^(?\w)(?<time>\d{4} [^\s]*)\s+(?\d+)\s+(?[^ \]]+)\] (?.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/upstart/kube-controller-manager.log
  pos_file /var/log/upstart/kube-controller-manager.log.pos
  tag kube-controller-manager
</source>
#docker log

  type tail
  format none
 # format /^time="(? level=(?[^ ]*) msg="(?[^"]*)"( err="(?[^"]*)")?( statusCode=($<status_code>\d+))?/
 # time_format %Y-%m-%dT%H:%M:%S.%NZ
  path /var/log/upstart/docker.log
  pos_file /var/log/upstart/docker.log.pos
  tag docker
>

#flanneld log

  type tail
  format /^(?\w)(?<time>\d{4} [^\s]*)\s+(?\d+)\s+(?[^ \]]+)\] (?.*)$/
  time_format %m%d %H:%M:%S.%N
  path /var/log/upstart/flanneld.log
  pos_file /var/log/upstart/flanneld.log.pos
  tag flanneld
</source>

#kube-scheduler log

  type tail

  format  multiline
  format_firstline /^\w\d{4}/
  format1 /^(?\w)(?<time>\d{4} [^\s]*)\s+(?\d+)\s+(?[^ \]]+)\] (?.*)/
  time_format %m%d %H:%M:%S.%N
  path /var/log/upstart/kube-scheduler.log
  pos_file /var/log/upstart/kube-scheduler.log.pos
  tag kube-scheduler
</source>

#etcd log

  type tail
   format none
  path /var/log/upstart/etcd.log
  pos_file /var/log/upstart/etcd.log.pos
  tag etcd
>


**>
  type kubernetes_metadata
  ca_file /srv/kubernetes/ca.crt
  client_cert /srv/kubernetes/kubecfg.crt
  client_key /srv/kubernetes/kubecfg.key
</filter>


   type elasticsearch
   log_level info
   include_tag_key true
   hosts 10.45.4.211:9200,10.45.4.36:9200,10.45.4.140:9200
   logstash_format true
   index_name k8s_
   logstash_prefix dev
   # Set the chunk limit the same as for fluentd-gcp.
   buffer_chunk_limit 2M
   # Cap buffer memory usage to 2MiB/chunk * 32 chunks = 64 MiB
   buffer_queue_limit 32
   flush_interval 5s
   # Never wait longer than 5 minutes between retries.
   max_retry_wait 30
   # Disable the limit on the number of retries (retry forever).
   disable_retry_limit
</match>

查看刚才创建的configmaps

 kubectl get configmaps --namespace=kube-system
NAME              DATA      AGE
td-agent-config   1         55s

查看创建congfigmaps的内容

kubectl get configmaps td-agent-config -o yaml --namespace=kube-system

apiVersion: v1
data:
  td-agent.conf: "\n  type null\n\n#es-containers log\n\n
    \ type tail\n  path /var/log/containers/*.log\n  pos_file /var/log/es-containers.log.pos\n
    \ time_format %Y-%m-%dT%H:%M:%S.%NZ\n  tag kubernetes.*\n  format json\n  read_from_head
    true\n\n#kubelet.log\n\n  type tail\n  format multiline\n  format_firstline
    /^\\w\\d{4}/\n  format1 /^(?\\w)(?

内容有点乱

准备daemonset配置文件

fluentd-es-ds.yaml也要随之做一些改动,主要是增加两个mount: 一个是mount 上面的configmap td-agent-config,另外一个就是mount hostpath:/srv/kubernetes以获取到相关client端的数字证书:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
  namespace: kube-system
  labels:
    k8s-app: fluentd-logging
spec:
  template:
    metadata:
      labels:
        daemon: fluentd-logging
    spec:
      containers:
      - name: fluentd-es
        image: lidnyun/gcr.io/google_containers/fluentd-elasticsearch:1.22
        command:
          - '/bin/sh'
          - '-c'
          - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log'
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: td-agent-config
          mountPath: /etc/td-agent
        - name: tls-files
          mountPath: /srv/kubernetes
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: td-agent-config
        configMap:
          name: td-agent-config
      - name: tls-files
        hostPath:
          path: /srv/kubernetes

td-agent配置文件需要的key和dameset上面的tls-files 路径需要准备一下key值。

准备认证key值

以上需要的key值只有在kubernetes_master节点上有,需要拷贝到所有节点。

# ls -al /srv/kubernetes/
total 36
drwxr-xr-x 2 root root      4096 Apr 15  2016 .
drwxr-xr-x 4 root root      4096 Mar 27 14:54 ..
-rw-rw---- 1 root kube-cert 1216 Nov 15 10:53 ca.crt
-rw------- 1 root root      4415 Nov 15 10:53 kubecfg.crt
-rw------- 1 root root      1704 Nov 15 10:53 kubecfg.key
-rw-rw---- 1 root kube-cert 4868 Nov 15 10:53 server.cert
-rw-rw---- 1 root kube-cert 1708 Nov 15 10:53 server.key

将配置文件中的需要的文件拷贝到node节点,目录如下。
ca_file /srv/kubernetes/ca.crt
client_cert /srv/kubernetes/kubecfg.crt
client_key /srv/kubernetes/kubecfg.key
在node节点查看一下:

# ls -al /srv/kubernetes/
total 24
drwxr-xr-x 2 root root 4096 Mar 27 17:28 .
drwxr-xr-x 3 root root 4096 Mar 24 15:46 ..
-rw-r----- 1 root root 1216 Mar 27 17:28 ca.crt
-rw------- 1 root root 4415 Mar 27 17:28 kubecfg.crt
-rw------- 1 root root 1704 Mar 27 17:28 kubecfg.key

接下来,我们重新创建fluentd ds,步骤不赘述。这回我们的创建成功了:

kubectl get pods --namespace=kube-system
NAME                                   READY     STATUS             RESTARTS   AGE
fluentd-elasticsearch-1wque            1/1       Running            0          11m
fluentd-elasticsearch-3i6sq            1/1       Running            0          11m
fluentd-elasticsearch-izgpz            1/1       Running            0          11m
fluentd-elasticsearch-ng1z9            1/1       Running            0          11m

但通过查看/var/log/fluentd.log,ok:

017-03-24 07:48:54 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
... ...
... ...
... ...
2017-03-29 09:25:30 +0000 [info]: following tail of /var/log/containers/fluentd-elasticsearch-1wque_kube-system_fluentd-es-1b799e6a2073187b5e4a7f9827a768014912e9c57c6e20483e36e6fd9aae0c60.log
2017-03-29 09:25:30 +0000 [info]: following tail of /var/log/containers/fluentd-elasticsearch-yq8rn_kube-system_fluentd-elasticsearch-843b2bceae2a362797b841fd9682551c542de0d75ace8f5100b58e1cc194d348.log
2017-03-29 09:25:30 +0000 [info]: following tail of /var/log/upstart/kubelet.log
2017-03-29 09:25:30 +0000 [info]: following tail of /var/log/upstart/kube-apiserver.log
2017-03-29 09:25:30 +0000 [info]: following tail of /var/log/upstart/kube-controller-manager.log
2017-03-29 09:25:30 +0000 [info]: Connection opened to Elasticsearch cluster => {:host=>"10.45.4.211", :port=>9200, :scheme=>"http"}, {:host=>"10.45.4.36", :port=>9200, :scheme=>"http"}, {:host=>"10.45.4.140", :port=>9200, :scheme=>"http"}
2017-03-29 09:25:31 +0000 [info]: following tail of /var/log/upstart/docker.log
2017-03-29 09:25:31 +0000 [info]: following tail of /var/log/upstart/flanneld.log
2017-03-29 09:25:31 +0000 [info]: following tail of /var/log/upstart/etcd.log

接下来,通过浏览器访问下面地址就可以访问kibana的web页面了,注意:Kinaba的web页面加载也需要一段时间。

下面是创建一个index(相当于mysql中的一个database)页面:
在Kubernetes上搭建新版fluentd-elasticsearch_1.22日志收集工具_第1张图片
取消“Index contains time-based events”,然后点击“Create”即可创建一个Index。

创建Index后,可以在Discover下看到ElasticSearch logging中汇聚的日志:
在Kubernetes上搭建新版fluentd-elasticsearch_1.22日志收集工具_第2张图片
六、小结

以上就是在Kubernetes 1.2.0集群上安装Fluentd的过程。
如果要安装整个EFK,EK也会遇到权限不足的问题,可以参考本篇文档的修改配置文件。
使用kubeadm安装的Kubernetes 1.5.15环境 下安装这些,则基本不会遇到上述这些问题。
另外ElasticSearch logging默认挂载的volume是emptyDir,实验用可以。但要部署在生产环境,必须换成Persistent Volume,比如:CephRBD。
如果要在kubernetes低版本上,部署整套efk,可以参考
http://tonybai.com/2017/03/03/implement-kubernetes-cluster-level-logging-with-fluentd-and-elasticsearch-stack/

官方文档参考:
https://kubernetes.io/docs/concepts/cluster-administration/logging/

你可能感兴趣的:(Kubernetes)