Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇

准备

环境规划

有了上篇文章中的ElasticSearch集群,我们接下来就可以准备日志数据采集的工作。业界推荐的最流行的有两种:LogStash,Fluentd。此文中,我们采用Kubernetes官方采用的Fluent体系中的组件:Fluent Bit 和 Fluentd.
所有的组件有:

组件 用途
Fluent Bit 拉起在每台宿主机上采集宿主机上的容器日志。(Fluent Bit 比较新一些,但是资源消耗比较低,性能比Fluentd好一些,但稳定性有待于进一步提升)
Fluentd 两个用途:1 以日志收集中转中心角色拉起,Deployment部署模式;2 在部分Fluent Bit无法正常运行的主机上,以Daemon Set模式运行采集宿主机上的日志,并发送给日志收集中转中心
ElasticSearch 用来接收日志收集中转中心发送过来的日志,并通过Kibana分析展示出来,鉴于硬件资源有限,仅保留一周左右的数据。
Amazon S3 用来接收日志收集中转中心发送过来的日志,对日志进行压缩归档,也可后续使用Spark进行进一步大数据分析。

部署架构

Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇_第1张图片
image.png

此图中,仅作描述Flunt-Bit 和Fluentd的采集集成,ES集群的部署架构,和Kubernetes微服务整体的集群架构,不在此图详述,有兴趣,可参考本人的其它文章。

日志集中中转代理中心

当服务节点比较多的时候,推荐使用集中中转代理中心进行初步的汇集转送,如果节点数没那么多,可以直接由Node发送到ES或者S3。

Docker镜像准备

由于我们的日志集中中转代理中心需要将日志采集点采集过来的日志转发到两个地方:ElasticSearch和Amazon S3,所以需要对我们创建的Fluentd的镜像进行再次处理。
鉴于Kubernetes官网提供的有EFK的现成方案,我们就在这个方案基础上进行调整,使其满足我们自身的需求。

获取Kubernetes官网的Fluentd的配置信息

[centos@master1 efk]$ wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
--2018-06-08 17:46:35--  https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2774 (2.7K) [text/plain]
Saving to: ‘fluentd-es-ds.yaml’

fluentd-es-ds.yaml   100%[===================>]   2.71K  --.-KB/s    in 0s

2018-06-08 17:46:36 (8.56 MB/s) - ‘fluentd-es-ds.yaml’ saved [2774/2774]

[centos@master1 efk]$

打开该文件fluentd-es-ds.yaml,获得官方使用的Docker镜像。

......
      - name: fluentd-es
        image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
        env:
......

在本地将该镜像pull下来,tag到本地Repo,并推送到私有Repo,以便方便获取。此处需要科学上网,并将docker的代理加上。

[centos@master1 efk]$ docker pull k8s.gcr.io/fluentd-elasticsearch:v2.0.4
Trying to pull repository k8s.gcr.io/fluentd-elasticsearch ... 
v2.0.4: Pulling from k8s.gcr.io/fluentd-elasticsearch
e7bb522d92ff: Pull complete 
92e6b816bc34: Pull complete 
ffb38dbddc64: Pull complete 
4900a3591877: Pull complete 
812a2bf6252f: Pull complete 
f8d5892f0b74: Pull complete 
e6736dda51ce: Pull complete 
Digest: sha256:b8c94527b489fb61d3d81ce5ad7f3ddbb7be71e9620a3a36e2bede2f2e487d73
Status: Downloaded newer image for k8s.gcr.io/fluentd-elasticsearch:v2.0.4
[centos@master1 efk]$ docker tag k8s.gcr.io/fluentd-elasticsearch:v2.0.4 hub.***.***/google_containers/fluentd-elasticsearch:v2.1.0
[centos@master1 efk]$ docker push hub.***.***/google_containers/fluentd-elasticsearch:v2.1.0

增加Amazon S3的支持

基于Kubernetes镜像添加对S3的支持,新建Dockerfile,内容如下:

FROM hub.***.***/google_containers/fluentd-elasticsearch:v2.1.0
MAINTAINER X.J CHEN
RUN \
    apt-get update -y && apt-get install ruby-dev -y && \
    gem install fluent-plugin-s3 && \
    apt-get clean

编译该镜像,并上传到我们私库。

[centos@master1 efk] docker build -t hub.***.***/google_containers/fluentd-s3:v2.1.0 .
[centos@master1 efk] docker push hub.***.***/google_containers/fluentd-s3:v2.1.0

至此,我们的镜像已准备完毕。

Server Yaml文件准备

参考上小节获取的fluentd-es-ds.yaml,创建我们的Fluentd Server Yaml文件fluentd-server-s3.yaml,具体内容如:
调整内容主要有:

  • 更改ServiceAccount及相关的Rule。
  • 更改镜像为我们新Build的镜像。
  • 删除var log等读取本地日志参数的路径。
  • Replicas设置为5,这个主要是由于我们的环境生成日志量太大,该值可以依据自身的实际情况来决定Pod的个数,推荐最少2个。
  • 增加Fluent Server的Service,以供Kubernetes集群内日志采集节点访问上传自己的日志。
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd-server
  namespace: kube-system
  labels:
    k8s-app: fluentd-server
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd-server
  labels:
    k8s-app: fluentd-server
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - "namespaces"
  - "pods"
  verbs:
  - "get"
  - "watch"
  - "list"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd-server
  labels:
    k8s-app: fluentd-server
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
  name: fluentd-server
  namespace: kube-system
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: fluentd-server
  apiGroup: ""

---
apiVersion: v1
kind: Service
metadata:
  name: fluentd-server
  namespace: kube-system
  labels:
    k8s-app: fluentd-server
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "Flunetd"
spec:
  ports:
  - port: 24224
    protocol: TCP
    targetPort: server
  selector:
    k8s-app: fluentd-server
    
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fluentd-server-v2.0.4
  namespace: kube-system
  labels:
    k8s-app: fluentd-server
    version: v2.0.4
    kubernetes.io/cluster-service: "true"
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-server
      version: v2.0.4
  replicas: 5
  template:
    metadata:
      labels:
        k8s-app: fluentd-server
        kubernetes.io/cluster-service: "true"
        version: v2.0.4
    spec:
      serviceAccountName: fluentd-server
      containers:
      - name: fluentd-server
        #image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
        image: hub.***.***/google_containers/fluentd-s3:v2.1.0
        imagePullPolicy: Always
        env:
        - name: FLUENTD_ARGS
          value: --no-supervisor -q
        resources:
          limits:
            memory: 1024Mi
          requests:
            cpu: 1000m
            memory: 200Mi
        volumeMounts:
        - name: config-volume
          mountPath: /etc/fluent/config.d
        ports:
        - containerPort: 24224
          name: server
          protocol: TCP
      terminationGracePeriodSeconds: 160
      volumes:
      - name: config-volume
        configMap:
          name: fluentd-server-config-v0.1.4
      imagePullSecrets:
      - name: kube-sec

从Kubernetes官方获取Fluentd的配置文件。

[centos@master1 efk]  wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml

并在此基础上,创建我们自己的配置文件fluentd-server-s3-configmap.yaml,具体内容如下:
主要调整内容:

  • 删除日志采集部分片段。
  • 增加Forward Input片段,增加Server的Host和监听信息。
  • 调整Output片段。
  • 请注意,ES和S3两个Output我们均有采用Buffer,均为文件Buffer。由于我们的Deployment中移除了Host 的Var log,此处Buffer缓存将会存在Docker 容器中,容器挂掉后,缓存的数据将丢失,请依据自身的实际情况调整(如果对数据特别敏感,甚至可以用StatetefulSet来创建PV)。
kind: ConfigMap
apiVersion: v1
metadata:
  name: fluentd-server-config-v0.1.4
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
data:
  system.conf: |-
    
      root_dir /tmp/fluentd-buffers/
    

  system.input.conf: |-
    # Listen to incoming data over SSL
    
      @type forward
      port 24224
      bind 0.0.0.0
    

  output.conf: |-
    # Enriches records with Kubernetes metadata
    
      @type kubernetes_metadata
    
    # Store Data in Elasticsearch and S3
    
      @type copy
      
        @id elasticsearch
        @type elasticsearch
        @log_level info
        host elasticsearch
        port 9200
        #include_tag_key true
        #tag_key @log_name
        logstash_format true
        request_timeout    30s
        slow_flush_log_threshold 30s
        
          @type file
          path /var/log/fluentd-buffers/server.buffer
          flush_mode interval
          #retry_type exponential_backoff
          flush_thread_count 12 # 可以根据实际需要也需兼顾下ES的处理能力进行调整
          flush_interval 8s  # 可以根据实际需要进行调整
          retry_max_interval 30
          chunk_limit_size 32M # 可以根据ES实际处理能力进行调整,务必<100MB 
          #queue_limit_length 64 #8
          total_limit_size 20G
          retry_wait 10s
        
      
      
        @id s3
        @type s3
        @log_level info
        #include_tag_key true
        aws_key_id *********  # 请填写自身的key id和sec key
        aws_sec_key **********
        s3_bucket ******** #请填写自身的Bucket
        s3_region cn-north-1
        s3_object_key_format "%{path}dt=%{time_slice}_%{index}.%{file_extension}"
        #store_as json
        path hive/    
        time_slice_format %Y%m%d/%Y%m%d%H
        
          @type file
          path /var/log/fluentd-buffers/s3.buffer
          timekey 3600 # 1 hour partition
          timekey_wait 10m
          timekey_use_utc true # use utc
          chunk_limit_size 256m
        
      
    

拉起Aggregation Server

创建Server的配置。

[centos@master1 efk]$ kubectl create -f fluentd-server-s3-configmap.yaml

创建Server的Deployment。

[centos@master1 efk]$ kubectl create -f fluentd-server-s3.yaml 

检查Server的启动状态:

[centos@master1 efk]$ kubectl get service -n kube-system -o wide | grep fluentd-server
fluentd-server            ClusterIP   10.104.52.1              24224/TCP        4d        k8s-app=fluentd-server
[centos@master1 efk]$ 
[centos@master1 efk]$ kubectl get pods -n kube-system -o wide | grep fluentd-server
fluentd-server-v2.0.4-855db7cfc5-4wn47   1/1       Running            0          2h        10.244.29.20     minion6
fluentd-server-v2.0.4-855db7cfc5-pfmvd   1/1       Running            0          2h        10.244.3.211     minion17
fluentd-server-v2.0.4-855db7cfc5-rjqxl   1/1       Running            0          2h        10.244.13.47     minion19
fluentd-server-v2.0.4-855db7cfc5-shjfm   1/1       Running            0          2h        10.244.23.141    minion12
fluentd-server-v2.0.4-855db7cfc5-w7m5f   1/1       Running            0          2h        10.244.30.233    minion5
[centos@master1 efk]$ 

也可以查看下日志:


image.png

日志Agent

日志的Agent,我们使用的是Fluent Bit,原因还是那句:性能相较Fluentd稍好,消耗资源要少一些。但是鉴于Fluent Bit 的稳定性,有部分节点无法正常运行(有些是日志无法解析造成的,也有其它原因,由于太久没接触过C和C++,有时只能等待官方补丁),也有部分节点可能会运行一段时间崩溃的情况。所以对于日志要求比较高的场景,还是推荐使用Fluentd。
常见的Fluent Bit的异常,该异常是由于日志文件Json解析异常直接导致Fluent Bit崩溃,号称在0.13.3版本中解决,问题依旧:

[centos@master1 fluent-bit]$ 
[centos@master1 fluent-bit]$ kubectl logs -f fluent-bit-5kvpl -n kube-system
[2018/06/11 03:11:49] [ info] [engine] started (pid=1)
[2018/06/11 03:11:49] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/06/11 03:11:49] [ info] [filter_kube] local POD info OK
[2018/06/11 03:11:49] [ info] [filter_kube] testing connectivity with API server...
[2018/06/11 03:11:49] [ info] [filter_kube] API server connectivity OK
[2018/06/11 03:11:49] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[engine] caught signal (SIGSEGV)
Fluent-Bit v0.13.2
Copyright (C) Treasure Data

#0  0x7fcc07f6eff1      in  ???() at ???:0
#1  0x55b0b655dede      in  msgpack_sbuffer_write() at lib/msgpack-2.1.3/include/msgpack/sbuffer.h:84
#2  0x55b0b6771ca5      in  msgpack_pack_ext_body() at lib/msgpack-2.1.3/include/msgpack/pack_template.h:890
#3  0x55b0b6771ca5      in  msgpack_pack_object() at lib/msgpack-2.1.3/src/objectc.c:72
#4  0x55b0b655e8c0      in  pack_map_content() at plugins/filter_kubernetes/kubernetes.c:321
#5  0x55b0b655f129      in  cb_kube_filter() at plugins/filter_kubernetes/kubernetes.c:493
#6  0x55b0b64feaea      in  flb_filter_do() at src/flb_filter.c:86
#7  0x55b0b64fc53c      in  flb_input_dbuf_write_end() at include/fluent-bit/flb_input.h:642
#8  0x55b0b64fe09c      in  flb_input_dyntag_append_raw() at src/flb_input.c:894
#9  0x55b0b6522b1d      in  process_content() at plugins/in_tail/tail_file.c:290
#10 0x55b0b6523968      in  flb_tail_file_chunk() at plugins/in_tail/tail_file.c:651
#11 0x55b0b6521357      in  in_tail_collect_static() at plugins/in_tail/tail.c:129
#12 0x55b0b64fe5db      in  flb_input_collector_fd() at src/flb_input.c:995
#13 0x55b0b6505370      in  flb_engine_handle_event() at src/flb_engine.c:296
#14 0x55b0b6505370      in  flb_engine_start() at src/flb_engine.c:515
#15 0x55b0b64a5606      in  main() at src/fluent-bit.c:824
#16 0x7fcc07e662e0      in  ???() at ???:0
#17 0x55b0b64a3a89      in  ???() at ???:0
#18 0xffffffffffffffff  in  ???() at ???:0
[centos@master1 fluent-bit]$ 

我们下面对于Fluent Bit和Fluentd的使用都将描述,以供大家参考。

Fluent Bit

准备Fluent Bit Yaml文件

我们参考https://github.com/fluent/fluent-bit-kubernetes-logging来整理我们自己的Fluent Bit相关Yaml文件。

[centos@master1 kube-log]$ git clone https://github.com/fluent/fluent-bit-kubernetes-logging.git

Fluent Bit 配置文件

我们还将继续使用fluent-bit-configmap.yaml来作为我们的Fluent Bit的配置文件,不过要增加Forward组件,以使得Fluent Bit能够正常的将日志转发至日志集中中转代理中心。

  • 增加output-fluentd.conf 片段,输出到Fluent Server。
  • 去掉输出到ES的配置片段。
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: kube-system
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-fluentd.conf
    # @INCLUDE output-elasticsearch.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  5

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        K8S-Logging.Parser  On

  output-elasticsearch.conf: |
    [OUTPUT]
        Name            es
        Match           *
        Host            ${FLUENT_ELASTICSEARCH_HOST}
        Port            ${FLUENT_ELASTICSEARCH_PORT}
        Logstash_Format On
        Retry_Limit     False

  output-fluentd.conf: |
    [OUTPUT]
        Name          forward
        Match         *
        Host          ${FLUENTD_SERVER_HOST}
        Port          ${FLUENTD_SERVER_PORT}
        
  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?[^ ]*) [^ ]* (?[^ ]*) \[(?

Fluent Bit Pod配置

我们将官方的相关Pod的各种配置(fluent-bit-service-account.yaml,fluent-bit-role.yaml,fluent-bit-role-binding.yaml)整合到一个文件中(fluent-bit-ds.yaml)以便方便维护。

  • 更改Docker镜像的Repo到我们的私库。我们将原有的镜像pull下来,不做任何更改,tag并push到私库。
  • 增加Fluentd Server的环境变量。
  • 请注意nodeSelector片段,需要将具备日志搜集的节点加上该标签。推荐一个节点一个节点加(一两个节点相隔几分钟,也可以用脚本来实现,当日志量比较大的时候,强烈推荐这么做,血淋淋的教训),批量加可能会造成数据风暴,导致中转中心处理不过来,也会导致ES处理不过来而拒掉Fluentd的链接。
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: fluent-bit-read
rules:
- apiGroups: [""]
  resources:
  - namespaces
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit-read
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: kube-system
  labels:
    k8s-app: fluent-bit-logging
    version: v1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "2020"
        prometheus.io/path: /api/v1/metrics/prometheus
    spec:
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "true"
      containers:
      - name: fluent-bit
        image: hub.***.***/google_containers/fluent-bit:0.13.2
        imagePullPolicy: Always
        ports:
          - containerPort: 2020
        env:
        - name: FLUENTD_SERVER_HOST
          value: "fluentd-server"
        - name: FLUENTD_SERVER_PORT
          value: "24224"
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
      terminationGracePeriodSeconds: 10
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      imagePullSecrets:
      - name: kube-sec

拉起Fluent Bit DaemonSet

创建Fluent Bit所需的配置。

[centos@master1 fluent-bit]$ ls -al
total 28
drwxrwxr-x 2 centos centos 4096 Jun  7 02:32 .
drwxrwxr-x 7 centos centos 4096 Jun 11 03:03 ..
-rw-rw-r-- 1 centos centos 3562 Jun  7 02:32 fluent-bit-configmap.yaml
-rw-rw-r-- 1 centos centos 2248 Jun  6 12:51 fluent-bit-ds.yaml
-rw-rw-r-- 1 centos centos  273 May 31 13:35 fluent-bit-role-binding.yaml
-rw-rw-r-- 1 centos centos  194 May 31 13:33 fluent-bit-role.yaml
-rw-rw-r-- 1 centos centos   90 May 31 13:35 fluent-bit-service-account.yaml
[centos@master1 fluent-bit]$ 
[centos@master1 fluent-bit]$ kubectl create -f fluent-bit-configmap.yaml

拉起Fluent Bit的Daemon Set。

[centos@master1 fluent-bit]$ kubectl create -f fluent-bit-ds.yaml

检查Pod,此时会发现

[centos@master1 fluent-bit]$ kubectl get pods -n kube-system -o wide | grep fluent-bit
[centos@master1 fluent-bit]$

不要紧张,我们接下来就要对Node打label,这样就会拉起来了,中间注意间隔点时间。

[centos@master1 fluent-bit]$ kubectl label node minion1 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion2 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion3 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion4 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion5 beta.kubernetes.io/fluentd-ds-ready=true
.....

我们再检查节点被拉起状态:

[centos@master1 fluent-bit]$ kubectl get pods -n kube-system -o wide | grep flu
fluent-bit-2sd9k                         1/1       Running   0          4d        10.244.30.213    minion5
fluent-bit-5jd4w                         1/1       Running   0          4d        10.244.32.12     minion3
fluent-bit-952fn                         1/1       Running   0          4d        10.244.15.14     minion13
fluent-bit-cz2xq                         1/1       Running   0          4d        10.244.29.250    minion6
fluent-bit-fx22k                         1/1       Running   0          4d        10.244.25.235    minion10
fluent-bit-g4fmw                         1/1       Running   0          4d        10.244.23.99     minion12
fluent-bit-gnfxg                         1/1       Running   0          4d        10.244.28.207    minion7
fluent-bit-h9t9l                         1/1       Running   0          4d        10.244.11.91     minion22
fluent-bit-ld9fx                         1/1       Running   0          4d        10.244.3.191     minion17
fluent-bit-pgc2f                         1/1       Running   0          4d        10.244.14.48     minion20
fluent-bit-st2qq                         1/1       Running   0          3d        10.244.16.3      minion11
fluent-bit-tm5hl                         1/1       Running   0          4d        10.244.12.46     minion18
fluent-bit-tt44q                         1/1       Running   0          4d        10.244.21.24     minion14
fluent-bit-vgptk                         1/1       Running   0          4d        10.244.31.9      minion4
fluent-bit-vptft                         1/1       Running   0          4d        10.244.34.93     minion1
fluent-bit-wpwl4                         1/1       Running   0          4d        10.244.13.35     minion19
fluent-bit-xdvbz                         1/1       Running   0          4d        10.244.9.99      minion2
fluent-bit-zrmsj                         1/1       Running   0          4d        10.244.20.33     minion15

我们查看Kibana来确定下日志传输情况:


Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇_第2张图片
image.png

我们发现日志已传输到ES,我们接下来对Kibana稍微调整下显示的Fields,更能满足我们查看日志的需要。

  • 增加Host到显示列表。
  • 增加Pod Name到显示列表。
  • 增加Log到显示列表。
Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇_第3张图片
image.png

我们接下来挨个主机查看下日志情况:

  • 增加输入条件,查询节点的日志。
Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇_第4张图片
image.png

我们会发现每个主机的日志已正常的传递到ES,那我们接下来再检查下S3。
我们会发现,日志已按照我们既定的目录规则创建出来。


Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇_第5张图片
image.png

点开某一个目录,则可以发现文件已存在,亦可以下载到本地,进行再次查看,不问不再做论述。

Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇_第6张图片
image.png

至此,Fluent Bit部署已完成,在拉起Fluent Bit的过程中,会有部分节点Crash,无法正常拉起。接下来这些节点,我们将采用Fluentd来进行日志的采集。

Fluentd

我们参考Fluentd Server章节中的描述,来准备Fluentd的拉起。Fluentd的Docker文件可以和Server章节中使用的一致。

准备Fluentd Yaml文件

Fluentd 配置文件

我们新建一文件fluentd-standalone-configmap.yaml,用来Fluentd的独立运行。

  • 调整output.conf。
  • 移除ES片段。
  • 添加forward片段。
kind: ConfigMap
apiVersion: v1
metadata:
  name: fluentd-sa-config-v0.1.4
  namespace: kube-system
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
data:
  system.conf: |-
    
      root_dir /tmp/fluentd-buffers/
    

  containers.input.conf: |-
    # This configuration file for Fluentd / td-agent is used
    # to watch changes to Docker log files. The kubelet creates symlinks that
    # capture the pod name, namespace, container name & Docker container ID
    # to the docker logs for pods in the /var/log/containers directory on the host.
    # If running this fluentd configuration in a Docker container, the /var/log
    # directory should be mounted in the container.
    #
    # These logs are then submitted to Elasticsearch which assumes the
    # installation of the fluent-plugin-elasticsearch & the
    # fluent-plugin-kubernetes_metadata_filter plugins.
    # See https://github.com/uken/fluent-plugin-elasticsearch &
    # https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter for
    # more information about the plugins.
    #
    # Example
    # =======
    # A line in the Docker log file might look like this JSON:
    #
    # {"log":"2014/09/25 21:15:03 Got request with path wombat\n",
    #  "stream":"stderr",
    #   "time":"2014-09-25T21:15:03.499185026Z"}
    #
    # The time_format specification below makes sure we properly
    # parse the time format produced by Docker. This will be
    # submitted to Elasticsearch and should appear like:
    # $ curl 'http://elasticsearch-logging:9200/_search?pretty'
    # ...
    # {
    #      "_index" : "logstash-2014.09.25",
    #      "_type" : "fluentd",
    #      "_id" : "VBrbor2QTuGpsQyTCdfzqA",
    #      "_score" : 1.0,
    #      "_source":{"log":"2014/09/25 22:45:50 Got request with path wombat\n",
    #                 "stream":"stderr","tag":"docker.container.all",
    #                 "@timestamp":"2014-09-25T22:45:50+00:00"}
    #    },
    # ...
    #
    # The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log
    # record & add labels to the log record if properly configured. This enables users
    # to filter & search logs on any metadata.
    # For example a Docker container's logs might be in the directory:
    #
    #  /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
    #
    # and in the file:
    #
    #  997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
    #
    # where 997599971ee6... is the Docker ID of the running container.
    # The Kubernetes kubelet makes a symbolic link to this file on the host machine
    # in the /var/log/containers directory which includes the pod name and the Kubernetes
    # container name:
    #
    #    synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #    ->
    #    /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
    #
    # The /var/log directory on the host is mapped to the /var/log directory in the container
    # running this instance of Fluentd and we end up collecting the file:
    #
    #   /var/log/containers/synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #
    # This results in the tag:
    #
    #  var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #
    # The Kubernetes fluentd plugin is used to extract the namespace, pod name & container name
    # which are added to the log message as a kubernetes field object & the Docker container ID
    # is also added under the docker field object.
    # The final tag is:
    #
    #   kubernetes.var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
    #
    # And the final log record look like:
    #
    # {
    #   "log":"2014/09/25 21:15:03 Got request with path wombat\n",
    #   "stream":"stderr",
    #   "time":"2014-09-25T21:15:03.499185026Z",
    #   "kubernetes": {
    #     "namespace": "default",
    #     "pod_name": "synthetic-logger-0.25lps-pod",
    #     "container_name": "synth-lgr"
    #   },
    #   "docker": {
    #     "container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"
    #   }
    # }
    #
    # This makes it easier for users to search for logs by pod name or by
    # the name of the Kubernetes container regardless of how many times the
    # Kubernetes pod has been restarted (resulting in a several Docker container IDs).

    # Json Log Example:
    # {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}
    # CRI Log Example:
    # 2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here
    
      @id fluentd-containers.log
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/es-containers.log.pos
      time_format %Y-%m-%dT%H:%M:%S.%NZ
      tag raw.kubernetes.*
      read_from_head true
      
        @type multi_format
        
          format json
          time_key time
          time_format %Y-%m-%dT%H:%M:%S.%NZ
        
        
          format /^(?
      
    

    # Detect exceptions in the log output and forward them as one log entry.
    
      @id raw.kubernetes
      @type detect_exceptions
      remove_tag_prefix raw
      message log
      stream stream
      multiline_flush_interval 5
      max_bytes 500000
      max_lines 1000
    

  system.input.conf: |-
    # Example:
    # 2015-12-21 23:17:22,066 [salt.state       ][INFO    ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081
    
      @id minion
      @type tail
      format /^(?

Fluentd Pod Yaml

我们亦将新建fluentd-standalone.yaml,用来控制Fluentd Pod的启动。
请注意此Yaml中,我们亦使用了NodeSelector,和Fluent Bit不同的是,我们使用的
beta.kubernetes.io/fluentd-ds-ready = "fluentd"。

apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd-es
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd-es
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
  - ""
  resources:
  - "namespaces"
  - "pods"
  verbs:
  - "get"
  - "watch"
  - "list"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd-es
  labels:
    k8s-app: fluentd-es
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
  name: fluentd-es
  namespace: kube-system
  apiGroup: ""
roleRef:
  kind: ClusterRole
  name: fluentd-es
  apiGroup: ""
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-es-v2.0.4
  namespace: kube-system
  labels:
    k8s-app: fluentd-es
    version: v2.0.4
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
spec:
  selector:
    matchLabels:
      k8s-app: fluentd-es
      version: v2.0.4
  template:
    metadata:
      labels:
        k8s-app: fluentd-es
        kubernetes.io/cluster-service: "true"
        version: v2.0.4
      # This annotation ensures that fluentd does not get evicted if the node
      # supports critical pod annotation based priority scheme.
      # Note that this does not guarantee admission on the nodes (#40573).
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      priorityClassName: system-node-critical
      serviceAccountName: fluentd-es
      containers:
      - name: fluentd-es
        #image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
        image: hub.***.***/google_containers/fluentd-s3:v2.1.0
        imagePullPolicy: Always
        env:
        - name: FLUENTD_ARGS
          value: --no-supervisor -q
        resources:
          limits:
            memory: 500Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: config-volume
          mountPath: /etc/fluent/config.d
      nodeSelector:
        beta.kubernetes.io/fluentd-ds-ready: "fluentd"
      terminationGracePeriodSeconds: 30
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: config-volume
        configMap:
          name: fluentd-sa-config-v0.1.4
      imagePullSecrets:
      - name: kube-sec

拉起Fluentd DaemonSet

创建Fluentd所需的配置。

[centos@master1 efk]$ kubectl create -f fluentd-standalone-configmap.yaml 

拉起Fluentd的DaemonSet。

[centos@master1 efk]$ kubectl create -f fluentd-standalone.yaml 

和Fluent Bit 拉起时一样,此时我们检查Pod,除了Fluentd Server的5个Pod外,此时亦发现没有Pod被拉起。

[centos@master1 efk]$ kubectl get pods -n kube-system -o wide | grep fluentd
fluentd-server-v2.0.4-855db7cfc5-4wn47   1/1       Running   0          6h        10.244.29.20     minion6
fluentd-server-v2.0.4-855db7cfc5-pfmvd   1/1       Running   0          6h        10.244.3.211     minion17
fluentd-server-v2.0.4-855db7cfc5-rjqxl   1/1       Running   0          6h        10.244.13.47     minion19
fluentd-server-v2.0.4-855db7cfc5-shjfm   1/1       Running   0          6h        10.244.23.141    minion12
fluentd-server-v2.0.4-855db7cfc5-w7m5f   1/1       Running   0          6h        10.244.30.233    minion5
[centos@master1 efk]$ 

我们接下来就要对Node打label,这样就会拉起来了,中间注意间隔点时间。

[centos@master1 efk]$ kubectl label node minion8 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 efk]$ kubectl label node minion21 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 efk]$ kubectl label node minion9 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 efk]$ kubectl label node minion16 beta.kubernetes.io/fluentd-ds-ready=true

再次检查节点被拉起的状态:

[centos@master1 efk]$ kubectl get pods -n kube-system -o wide | grep fluentd
fluentd-es-v2.0.4-75gdf                  1/1       Running   1          4d        10.244.27.245    minion8
fluentd-es-v2.0.4-kx5pz                  1/1       Running   0          2h        10.244.10.96     minion21
fluentd-es-v2.0.4-n89xj                  1/1       Running   6          4d        10.244.26.92     minion9
fluentd-es-v2.0.4-zsrln                  1/1       Running   0          6h        10.244.19.67     minion16
fluentd-server-v2.0.4-855db7cfc5-4wn47   1/1       Running   0          6h        10.244.29.20     minion6
fluentd-server-v2.0.4-855db7cfc5-pfmvd   1/1       Running   0          6h        10.244.3.211     minion17
fluentd-server-v2.0.4-855db7cfc5-rjqxl   1/1       Running   0          6h        10.244.13.47     minion19
fluentd-server-v2.0.4-855db7cfc5-shjfm   1/1       Running   0          6h        10.244.23.141    minion12
fluentd-server-v2.0.4-855db7cfc5-w7m5f   1/1       Running   0          6h        10.244.30.233    minion5
[centos@master1 efk]$ 

和Fluent Bit一样,需要去Kibana那边再次检查下节点的日志是否有正常传递到ES和S3,在此就不再累述。

你可能感兴趣的:(Kubernetes EFK 实战 - Flunt-Bit & Fluentd篇)