准备
环境规划
有了上篇文章中的ElasticSearch集群,我们接下来就可以准备日志数据采集的工作。业界推荐的最流行的有两种:LogStash,Fluentd。此文中,我们采用Kubernetes官方采用的Fluent体系中的组件:Fluent Bit 和 Fluentd.
所有的组件有:
组件 | 用途 |
---|---|
Fluent Bit | 拉起在每台宿主机上采集宿主机上的容器日志。(Fluent Bit 比较新一些,但是资源消耗比较低,性能比Fluentd好一些,但稳定性有待于进一步提升) |
Fluentd | 两个用途:1 以日志收集中转中心角色拉起,Deployment部署模式;2 在部分Fluent Bit无法正常运行的主机上,以Daemon Set模式运行采集宿主机上的日志,并发送给日志收集中转中心 |
ElasticSearch | 用来接收日志收集中转中心发送过来的日志,并通过Kibana分析展示出来,鉴于硬件资源有限,仅保留一周左右的数据。 |
Amazon S3 | 用来接收日志收集中转中心发送过来的日志,对日志进行压缩归档,也可后续使用Spark进行进一步大数据分析。 |
部署架构
此图中,仅作描述Flunt-Bit 和Fluentd的采集集成,ES集群的部署架构,和Kubernetes微服务整体的集群架构,不在此图详述,有兴趣,可参考本人的其它文章。
日志集中中转代理中心
当服务节点比较多的时候,推荐使用集中中转代理中心进行初步的汇集转送,如果节点数没那么多,可以直接由Node发送到ES或者S3。
Docker镜像准备
由于我们的日志集中中转代理中心需要将日志采集点采集过来的日志转发到两个地方:ElasticSearch和Amazon S3,所以需要对我们创建的Fluentd的镜像进行再次处理。
鉴于Kubernetes官网提供的有EFK的现成方案,我们就在这个方案基础上进行调整,使其满足我们自身的需求。
获取Kubernetes官网的Fluentd的配置信息
[centos@master1 efk]$ wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
--2018-06-08 17:46:35-- https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.72.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.72.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2774 (2.7K) [text/plain]
Saving to: ‘fluentd-es-ds.yaml’
fluentd-es-ds.yaml 100%[===================>] 2.71K --.-KB/s in 0s
2018-06-08 17:46:36 (8.56 MB/s) - ‘fluentd-es-ds.yaml’ saved [2774/2774]
[centos@master1 efk]$
打开该文件fluentd-es-ds.yaml,获得官方使用的Docker镜像。
......
- name: fluentd-es
image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
env:
......
在本地将该镜像pull下来,tag到本地Repo,并推送到私有Repo,以便方便获取。此处需要科学上网,并将docker的代理加上。
[centos@master1 efk]$ docker pull k8s.gcr.io/fluentd-elasticsearch:v2.0.4
Trying to pull repository k8s.gcr.io/fluentd-elasticsearch ...
v2.0.4: Pulling from k8s.gcr.io/fluentd-elasticsearch
e7bb522d92ff: Pull complete
92e6b816bc34: Pull complete
ffb38dbddc64: Pull complete
4900a3591877: Pull complete
812a2bf6252f: Pull complete
f8d5892f0b74: Pull complete
e6736dda51ce: Pull complete
Digest: sha256:b8c94527b489fb61d3d81ce5ad7f3ddbb7be71e9620a3a36e2bede2f2e487d73
Status: Downloaded newer image for k8s.gcr.io/fluentd-elasticsearch:v2.0.4
[centos@master1 efk]$ docker tag k8s.gcr.io/fluentd-elasticsearch:v2.0.4 hub.***.***/google_containers/fluentd-elasticsearch:v2.1.0
[centos@master1 efk]$ docker push hub.***.***/google_containers/fluentd-elasticsearch:v2.1.0
增加Amazon S3的支持
基于Kubernetes镜像添加对S3的支持,新建Dockerfile,内容如下:
FROM hub.***.***/google_containers/fluentd-elasticsearch:v2.1.0
MAINTAINER X.J CHEN
RUN \
apt-get update -y && apt-get install ruby-dev -y && \
gem install fluent-plugin-s3 && \
apt-get clean
编译该镜像,并上传到我们私库。
[centos@master1 efk] docker build -t hub.***.***/google_containers/fluentd-s3:v2.1.0 .
[centos@master1 efk] docker push hub.***.***/google_containers/fluentd-s3:v2.1.0
至此,我们的镜像已准备完毕。
Server Yaml文件准备
参考上小节获取的fluentd-es-ds.yaml,创建我们的Fluentd Server Yaml文件fluentd-server-s3.yaml,具体内容如:
调整内容主要有:
- 更改ServiceAccount及相关的Rule。
- 更改镜像为我们新Build的镜像。
- 删除var log等读取本地日志参数的路径。
- Replicas设置为5,这个主要是由于我们的环境生成日志量太大,该值可以依据自身的实际情况来决定Pod的个数,推荐最少2个。
- 增加Fluent Server的Service,以供Kubernetes集群内日志采集节点访问上传自己的日志。
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd-server
namespace: kube-system
labels:
k8s-app: fluentd-server
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd-server
labels:
k8s-app: fluentd-server
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "namespaces"
- "pods"
verbs:
- "get"
- "watch"
- "list"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd-server
labels:
k8s-app: fluentd-server
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: fluentd-server
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: fluentd-server
apiGroup: ""
---
apiVersion: v1
kind: Service
metadata:
name: fluentd-server
namespace: kube-system
labels:
k8s-app: fluentd-server
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "Flunetd"
spec:
ports:
- port: 24224
protocol: TCP
targetPort: server
selector:
k8s-app: fluentd-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: fluentd-server-v2.0.4
namespace: kube-system
labels:
k8s-app: fluentd-server
version: v2.0.4
kubernetes.io/cluster-service: "true"
spec:
selector:
matchLabels:
k8s-app: fluentd-server
version: v2.0.4
replicas: 5
template:
metadata:
labels:
k8s-app: fluentd-server
kubernetes.io/cluster-service: "true"
version: v2.0.4
spec:
serviceAccountName: fluentd-server
containers:
- name: fluentd-server
#image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
image: hub.***.***/google_containers/fluentd-s3:v2.1.0
imagePullPolicy: Always
env:
- name: FLUENTD_ARGS
value: --no-supervisor -q
resources:
limits:
memory: 1024Mi
requests:
cpu: 1000m
memory: 200Mi
volumeMounts:
- name: config-volume
mountPath: /etc/fluent/config.d
ports:
- containerPort: 24224
name: server
protocol: TCP
terminationGracePeriodSeconds: 160
volumes:
- name: config-volume
configMap:
name: fluentd-server-config-v0.1.4
imagePullSecrets:
- name: kube-sec
从Kubernetes官方获取Fluentd的配置文件。
[centos@master1 efk] wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml
并在此基础上,创建我们自己的配置文件fluentd-server-s3-configmap.yaml,具体内容如下:
主要调整内容:
- 删除日志采集部分片段。
- 增加Forward Input片段,增加Server的Host和监听信息。
- 调整Output片段。
- 请注意,ES和S3两个Output我们均有采用Buffer,均为文件Buffer。由于我们的Deployment中移除了Host 的Var log,此处Buffer缓存将会存在Docker 容器中,容器挂掉后,缓存的数据将丢失,请依据自身的实际情况调整(如果对数据特别敏感,甚至可以用StatetefulSet来创建PV)。
kind: ConfigMap
apiVersion: v1
metadata:
name: fluentd-server-config-v0.1.4
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
data:
system.conf: |-
root_dir /tmp/fluentd-buffers/
system.input.conf: |-
# Listen to incoming data over SSL
output.conf: |-
# Enriches records with Kubernetes metadata
@type kubernetes_metadata
# Store Data in Elasticsearch and S3
@type copy
@id elasticsearch
@type elasticsearch
@log_level info
host elasticsearch
port 9200
#include_tag_key true
#tag_key @log_name
logstash_format true
request_timeout 30s
slow_flush_log_threshold 30s
@type file
path /var/log/fluentd-buffers/server.buffer
flush_mode interval
#retry_type exponential_backoff
flush_thread_count 12 # 可以根据实际需要也需兼顾下ES的处理能力进行调整
flush_interval 8s # 可以根据实际需要进行调整
retry_max_interval 30
chunk_limit_size 32M # 可以根据ES实际处理能力进行调整,务必<100MB
#queue_limit_length 64 #8
total_limit_size 20G
retry_wait 10s
@id s3
@type s3
@log_level info
#include_tag_key true
aws_key_id ********* # 请填写自身的key id和sec key
aws_sec_key **********
s3_bucket ******** #请填写自身的Bucket
s3_region cn-north-1
s3_object_key_format "%{path}dt=%{time_slice}_%{index}.%{file_extension}"
#store_as json
path hive/
time_slice_format %Y%m%d/%Y%m%d%H
@type file
path /var/log/fluentd-buffers/s3.buffer
timekey 3600 # 1 hour partition
timekey_wait 10m
timekey_use_utc true # use utc
chunk_limit_size 256m
拉起Aggregation Server
创建Server的配置。
[centos@master1 efk]$ kubectl create -f fluentd-server-s3-configmap.yaml
创建Server的Deployment。
[centos@master1 efk]$ kubectl create -f fluentd-server-s3.yaml
检查Server的启动状态:
[centos@master1 efk]$ kubectl get service -n kube-system -o wide | grep fluentd-server
fluentd-server ClusterIP 10.104.52.1 24224/TCP 4d k8s-app=fluentd-server
[centos@master1 efk]$
[centos@master1 efk]$ kubectl get pods -n kube-system -o wide | grep fluentd-server
fluentd-server-v2.0.4-855db7cfc5-4wn47 1/1 Running 0 2h 10.244.29.20 minion6
fluentd-server-v2.0.4-855db7cfc5-pfmvd 1/1 Running 0 2h 10.244.3.211 minion17
fluentd-server-v2.0.4-855db7cfc5-rjqxl 1/1 Running 0 2h 10.244.13.47 minion19
fluentd-server-v2.0.4-855db7cfc5-shjfm 1/1 Running 0 2h 10.244.23.141 minion12
fluentd-server-v2.0.4-855db7cfc5-w7m5f 1/1 Running 0 2h 10.244.30.233 minion5
[centos@master1 efk]$
也可以查看下日志:
日志Agent
日志的Agent,我们使用的是Fluent Bit,原因还是那句:性能相较Fluentd稍好,消耗资源要少一些。但是鉴于Fluent Bit 的稳定性,有部分节点无法正常运行(有些是日志无法解析造成的,也有其它原因,由于太久没接触过C和C++,有时只能等待官方补丁),也有部分节点可能会运行一段时间崩溃的情况。所以对于日志要求比较高的场景,还是推荐使用Fluentd。
常见的Fluent Bit的异常,该异常是由于日志文件Json解析异常直接导致Fluent Bit崩溃,号称在0.13.3版本中解决,问题依旧:
[centos@master1 fluent-bit]$
[centos@master1 fluent-bit]$ kubectl logs -f fluent-bit-5kvpl -n kube-system
[2018/06/11 03:11:49] [ info] [engine] started (pid=1)
[2018/06/11 03:11:49] [ info] [filter_kube] https=1 host=kubernetes.default.svc.cluster.local port=443
[2018/06/11 03:11:49] [ info] [filter_kube] local POD info OK
[2018/06/11 03:11:49] [ info] [filter_kube] testing connectivity with API server...
[2018/06/11 03:11:49] [ info] [filter_kube] API server connectivity OK
[2018/06/11 03:11:49] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[engine] caught signal (SIGSEGV)
Fluent-Bit v0.13.2
Copyright (C) Treasure Data
#0 0x7fcc07f6eff1 in ???() at ???:0
#1 0x55b0b655dede in msgpack_sbuffer_write() at lib/msgpack-2.1.3/include/msgpack/sbuffer.h:84
#2 0x55b0b6771ca5 in msgpack_pack_ext_body() at lib/msgpack-2.1.3/include/msgpack/pack_template.h:890
#3 0x55b0b6771ca5 in msgpack_pack_object() at lib/msgpack-2.1.3/src/objectc.c:72
#4 0x55b0b655e8c0 in pack_map_content() at plugins/filter_kubernetes/kubernetes.c:321
#5 0x55b0b655f129 in cb_kube_filter() at plugins/filter_kubernetes/kubernetes.c:493
#6 0x55b0b64feaea in flb_filter_do() at src/flb_filter.c:86
#7 0x55b0b64fc53c in flb_input_dbuf_write_end() at include/fluent-bit/flb_input.h:642
#8 0x55b0b64fe09c in flb_input_dyntag_append_raw() at src/flb_input.c:894
#9 0x55b0b6522b1d in process_content() at plugins/in_tail/tail_file.c:290
#10 0x55b0b6523968 in flb_tail_file_chunk() at plugins/in_tail/tail_file.c:651
#11 0x55b0b6521357 in in_tail_collect_static() at plugins/in_tail/tail.c:129
#12 0x55b0b64fe5db in flb_input_collector_fd() at src/flb_input.c:995
#13 0x55b0b6505370 in flb_engine_handle_event() at src/flb_engine.c:296
#14 0x55b0b6505370 in flb_engine_start() at src/flb_engine.c:515
#15 0x55b0b64a5606 in main() at src/fluent-bit.c:824
#16 0x7fcc07e662e0 in ???() at ???:0
#17 0x55b0b64a3a89 in ???() at ???:0
#18 0xffffffffffffffff in ???() at ???:0
[centos@master1 fluent-bit]$
我们下面对于Fluent Bit和Fluentd的使用都将描述,以供大家参考。
Fluent Bit
准备Fluent Bit Yaml文件
我们参考https://github.com/fluent/fluent-bit-kubernetes-logging来整理我们自己的Fluent Bit相关Yaml文件。
[centos@master1 kube-log]$ git clone https://github.com/fluent/fluent-bit-kubernetes-logging.git
Fluent Bit 配置文件
我们还将继续使用fluent-bit-configmap.yaml来作为我们的Fluent Bit的配置文件,不过要增加Forward组件,以使得Fluent Bit能够正常的将日志转发至日志集中中转代理中心。
- 增加output-fluentd.conf 片段,输出到Fluent Server。
- 去掉输出到ES的配置片段。
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: kube-system
labels:
k8s-app: fluent-bit
data:
# Configuration files: server, input, filters and output
# ======================================================
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level info
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-fluentd.conf
# @INCLUDE output-elasticsearch.conf
input-kubernetes.conf: |
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
Parser docker
DB /var/log/flb_kube.db
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 5
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc.cluster.local:443
Merge_Log On
K8S-Logging.Parser On
output-elasticsearch.conf: |
[OUTPUT]
Name es
Match *
Host ${FLUENT_ELASTICSEARCH_HOST}
Port ${FLUENT_ELASTICSEARCH_PORT}
Logstash_Format On
Retry_Limit False
output-fluentd.conf: |
[OUTPUT]
Name forward
Match *
Host ${FLUENTD_SERVER_HOST}
Port ${FLUENTD_SERVER_PORT}
parsers.conf: |
[PARSER]
Name apache
Format regex
Regex ^(?[^ ]*) [^ ]* (?[^ ]*) \[(?
Fluent Bit Pod配置
我们将官方的相关Pod的各种配置(fluent-bit-service-account.yaml,fluent-bit-role.yaml,fluent-bit-role-binding.yaml)整合到一个文件中(fluent-bit-ds.yaml)以便方便维护。
- 更改Docker镜像的Repo到我们的私库。我们将原有的镜像pull下来,不做任何更改,tag并push到私库。
- 增加Fluentd Server的环境变量。
- 请注意nodeSelector片段,需要将具备日志搜集的节点加上该标签。推荐一个节点一个节点加(一两个节点相隔几分钟,也可以用脚本来实现,当日志量比较大的时候,强烈推荐这么做,血淋淋的教训),批量加可能会造成数据风暴,导致中转中心处理不过来,也会导致ES处理不过来而拒掉Fluentd的链接。
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources:
- namespaces
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluent-bit-logging
version: v1
kubernetes.io/cluster-service: "true"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "2020"
prometheus.io/path: /api/v1/metrics/prometheus
spec:
nodeSelector:
beta.kubernetes.io/fluentd-ds-ready: "true"
containers:
- name: fluent-bit
image: hub.***.***/google_containers/fluent-bit:0.13.2
imagePullPolicy: Always
ports:
- containerPort: 2020
env:
- name: FLUENTD_SERVER_HOST
value: "fluentd-server"
- name: FLUENTD_SERVER_PORT
value: "24224"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
terminationGracePeriodSeconds: 10
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
serviceAccountName: fluent-bit
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
imagePullSecrets:
- name: kube-sec
拉起Fluent Bit DaemonSet
创建Fluent Bit所需的配置。
[centos@master1 fluent-bit]$ ls -al
total 28
drwxrwxr-x 2 centos centos 4096 Jun 7 02:32 .
drwxrwxr-x 7 centos centos 4096 Jun 11 03:03 ..
-rw-rw-r-- 1 centos centos 3562 Jun 7 02:32 fluent-bit-configmap.yaml
-rw-rw-r-- 1 centos centos 2248 Jun 6 12:51 fluent-bit-ds.yaml
-rw-rw-r-- 1 centos centos 273 May 31 13:35 fluent-bit-role-binding.yaml
-rw-rw-r-- 1 centos centos 194 May 31 13:33 fluent-bit-role.yaml
-rw-rw-r-- 1 centos centos 90 May 31 13:35 fluent-bit-service-account.yaml
[centos@master1 fluent-bit]$
[centos@master1 fluent-bit]$ kubectl create -f fluent-bit-configmap.yaml
拉起Fluent Bit的Daemon Set。
[centos@master1 fluent-bit]$ kubectl create -f fluent-bit-ds.yaml
检查Pod,此时会发现
[centos@master1 fluent-bit]$ kubectl get pods -n kube-system -o wide | grep fluent-bit
[centos@master1 fluent-bit]$
不要紧张,我们接下来就要对Node打label,这样就会拉起来了,中间注意间隔点时间。
[centos@master1 fluent-bit]$ kubectl label node minion1 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion2 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion3 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion4 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 fluent-bit]$ kubectl label node minion5 beta.kubernetes.io/fluentd-ds-ready=true
.....
我们再检查节点被拉起状态:
[centos@master1 fluent-bit]$ kubectl get pods -n kube-system -o wide | grep flu
fluent-bit-2sd9k 1/1 Running 0 4d 10.244.30.213 minion5
fluent-bit-5jd4w 1/1 Running 0 4d 10.244.32.12 minion3
fluent-bit-952fn 1/1 Running 0 4d 10.244.15.14 minion13
fluent-bit-cz2xq 1/1 Running 0 4d 10.244.29.250 minion6
fluent-bit-fx22k 1/1 Running 0 4d 10.244.25.235 minion10
fluent-bit-g4fmw 1/1 Running 0 4d 10.244.23.99 minion12
fluent-bit-gnfxg 1/1 Running 0 4d 10.244.28.207 minion7
fluent-bit-h9t9l 1/1 Running 0 4d 10.244.11.91 minion22
fluent-bit-ld9fx 1/1 Running 0 4d 10.244.3.191 minion17
fluent-bit-pgc2f 1/1 Running 0 4d 10.244.14.48 minion20
fluent-bit-st2qq 1/1 Running 0 3d 10.244.16.3 minion11
fluent-bit-tm5hl 1/1 Running 0 4d 10.244.12.46 minion18
fluent-bit-tt44q 1/1 Running 0 4d 10.244.21.24 minion14
fluent-bit-vgptk 1/1 Running 0 4d 10.244.31.9 minion4
fluent-bit-vptft 1/1 Running 0 4d 10.244.34.93 minion1
fluent-bit-wpwl4 1/1 Running 0 4d 10.244.13.35 minion19
fluent-bit-xdvbz 1/1 Running 0 4d 10.244.9.99 minion2
fluent-bit-zrmsj 1/1 Running 0 4d 10.244.20.33 minion15
我们查看Kibana来确定下日志传输情况:
我们发现日志已传输到ES,我们接下来对Kibana稍微调整下显示的Fields,更能满足我们查看日志的需要。
- 增加Host到显示列表。
- 增加Pod Name到显示列表。
- 增加Log到显示列表。
我们接下来挨个主机查看下日志情况:
- 增加输入条件,查询节点的日志。
我们会发现每个主机的日志已正常的传递到ES,那我们接下来再检查下S3。
我们会发现,日志已按照我们既定的目录规则创建出来。
点开某一个目录,则可以发现文件已存在,亦可以下载到本地,进行再次查看,不问不再做论述。
至此,Fluent Bit部署已完成,在拉起Fluent Bit的过程中,会有部分节点Crash,无法正常拉起。接下来这些节点,我们将采用Fluentd来进行日志的采集。
Fluentd
我们参考Fluentd Server章节中的描述,来准备Fluentd的拉起。Fluentd的Docker文件可以和Server章节中使用的一致。
准备Fluentd Yaml文件
Fluentd 配置文件
我们新建一文件fluentd-standalone-configmap.yaml,用来Fluentd的独立运行。
- 调整output.conf。
- 移除ES片段。
- 添加forward片段。
kind: ConfigMap
apiVersion: v1
metadata:
name: fluentd-sa-config-v0.1.4
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: Reconcile
data:
system.conf: |-
root_dir /tmp/fluentd-buffers/
containers.input.conf: |-
# This configuration file for Fluentd / td-agent is used
# to watch changes to Docker log files. The kubelet creates symlinks that
# capture the pod name, namespace, container name & Docker container ID
# to the docker logs for pods in the /var/log/containers directory on the host.
# If running this fluentd configuration in a Docker container, the /var/log
# directory should be mounted in the container.
#
# These logs are then submitted to Elasticsearch which assumes the
# installation of the fluent-plugin-elasticsearch & the
# fluent-plugin-kubernetes_metadata_filter plugins.
# See https://github.com/uken/fluent-plugin-elasticsearch &
# https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter for
# more information about the plugins.
#
# Example
# =======
# A line in the Docker log file might look like this JSON:
#
# {"log":"2014/09/25 21:15:03 Got request with path wombat\n",
# "stream":"stderr",
# "time":"2014-09-25T21:15:03.499185026Z"}
#
# The time_format specification below makes sure we properly
# parse the time format produced by Docker. This will be
# submitted to Elasticsearch and should appear like:
# $ curl 'http://elasticsearch-logging:9200/_search?pretty'
# ...
# {
# "_index" : "logstash-2014.09.25",
# "_type" : "fluentd",
# "_id" : "VBrbor2QTuGpsQyTCdfzqA",
# "_score" : 1.0,
# "_source":{"log":"2014/09/25 22:45:50 Got request with path wombat\n",
# "stream":"stderr","tag":"docker.container.all",
# "@timestamp":"2014-09-25T22:45:50+00:00"}
# },
# ...
#
# The Kubernetes fluentd plugin is used to write the Kubernetes metadata to the log
# record & add labels to the log record if properly configured. This enables users
# to filter & search logs on any metadata.
# For example a Docker container's logs might be in the directory:
#
# /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b
#
# and in the file:
#
# 997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
#
# where 997599971ee6... is the Docker ID of the running container.
# The Kubernetes kubelet makes a symbolic link to this file on the host machine
# in the /var/log/containers directory which includes the pod name and the Kubernetes
# container name:
#
# synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
# ->
# /var/lib/docker/containers/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b/997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b-json.log
#
# The /var/log directory on the host is mapped to the /var/log directory in the container
# running this instance of Fluentd and we end up collecting the file:
#
# /var/log/containers/synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
#
# This results in the tag:
#
# var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
#
# The Kubernetes fluentd plugin is used to extract the namespace, pod name & container name
# which are added to the log message as a kubernetes field object & the Docker container ID
# is also added under the docker field object.
# The final tag is:
#
# kubernetes.var.log.containers.synthetic-logger-0.25lps-pod_default_synth-lgr-997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b.log
#
# And the final log record look like:
#
# {
# "log":"2014/09/25 21:15:03 Got request with path wombat\n",
# "stream":"stderr",
# "time":"2014-09-25T21:15:03.499185026Z",
# "kubernetes": {
# "namespace": "default",
# "pod_name": "synthetic-logger-0.25lps-pod",
# "container_name": "synth-lgr"
# },
# "docker": {
# "container_id": "997599971ee6366d4a5920d25b79286ad45ff37a74494f262e3bc98d909d0a7b"
# }
# }
#
# This makes it easier for users to search for logs by pod name or by
# the name of the Kubernetes container regardless of how many times the
# Kubernetes pod has been restarted (resulting in a several Docker container IDs).
# Json Log Example:
# {"log":"[info:2016-02-16T16:04:05.930-08:00] Some log text here\n","stream":"stdout","time":"2016-02-17T00:04:05.931087621Z"}
# CRI Log Example:
# 2016-02-17T00:04:05.931087621Z stdout F [info:2016-02-16T16:04:05.930-08:00] Some log text here
# Detect exceptions in the log output and forward them as one log entry.
@id raw.kubernetes
@type detect_exceptions
remove_tag_prefix raw
message log
stream stream
multiline_flush_interval 5
max_bytes 500000
max_lines 1000
system.input.conf: |-
# Example:
# 2015-12-21 23:17:22,066 [salt.state ][INFO ] Completed state [net.ipv4.ip_forward] at time 23:17:22.066081
# Example:
# Dec 21 23:17:22 gke-foo-1-1-4b5cbd14-node-4eoj startupscript: Finished running startup script /var/run/google.startup.script
# Examples:
# time="2016-02-04T06:51:03.053580605Z" level=info msg="GET /containers/json"
# time="2016-02-04T07:53:57.505612354Z" level=error msg="HTTP Error" err="No such image: -f" statusCode=404
# TODO(random-liu): Remove this after cri container runtime rolls out.
# Example:
# 2016/02/04 06:52:38 filePurge: successfully removed file /var/etcd/data/member/wal/00000000000006d0-00000000010a23d1.wal
# Multi-line parsing is required for all the kube logs because very large log
# statements, such as those that include entire object bodies, get split into
# multiple lines by glog.
# Example:
# I0204 07:32:30.020537 3368 server.go:1048] POST /stats/container/: (13.972191ms) 200 [[Go-http-client/1.1] 10.244.1.3:40537]
Fluentd Pod Yaml
我们亦将新建fluentd-standalone.yaml,用来控制Fluentd Pod的启动。
请注意此Yaml中,我们亦使用了NodeSelector,和Fluent Bit不同的是,我们使用的
beta.kubernetes.io/fluentd-ds-ready = "fluentd"。
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd-es
namespace: kube-system
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd-es
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups:
- ""
resources:
- "namespaces"
- "pods"
verbs:
- "get"
- "watch"
- "list"
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: fluentd-es
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
subjects:
- kind: ServiceAccount
name: fluentd-es
namespace: kube-system
apiGroup: ""
roleRef:
kind: ClusterRole
name: fluentd-es
apiGroup: ""
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-es-v2.0.4
namespace: kube-system
labels:
k8s-app: fluentd-es
version: v2.0.4
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
spec:
selector:
matchLabels:
k8s-app: fluentd-es
version: v2.0.4
template:
metadata:
labels:
k8s-app: fluentd-es
kubernetes.io/cluster-service: "true"
version: v2.0.4
# This annotation ensures that fluentd does not get evicted if the node
# supports critical pod annotation based priority scheme.
# Note that this does not guarantee admission on the nodes (#40573).
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-node-critical
serviceAccountName: fluentd-es
containers:
- name: fluentd-es
#image: k8s.gcr.io/fluentd-elasticsearch:v2.0.4
image: hub.***.***/google_containers/fluentd-s3:v2.1.0
imagePullPolicy: Always
env:
- name: FLUENTD_ARGS
value: --no-supervisor -q
resources:
limits:
memory: 500Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: config-volume
mountPath: /etc/fluent/config.d
nodeSelector:
beta.kubernetes.io/fluentd-ds-ready: "fluentd"
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: config-volume
configMap:
name: fluentd-sa-config-v0.1.4
imagePullSecrets:
- name: kube-sec
拉起Fluentd DaemonSet
创建Fluentd所需的配置。
[centos@master1 efk]$ kubectl create -f fluentd-standalone-configmap.yaml
拉起Fluentd的DaemonSet。
[centos@master1 efk]$ kubectl create -f fluentd-standalone.yaml
和Fluent Bit 拉起时一样,此时我们检查Pod,除了Fluentd Server的5个Pod外,此时亦发现没有Pod被拉起。
[centos@master1 efk]$ kubectl get pods -n kube-system -o wide | grep fluentd
fluentd-server-v2.0.4-855db7cfc5-4wn47 1/1 Running 0 6h 10.244.29.20 minion6
fluentd-server-v2.0.4-855db7cfc5-pfmvd 1/1 Running 0 6h 10.244.3.211 minion17
fluentd-server-v2.0.4-855db7cfc5-rjqxl 1/1 Running 0 6h 10.244.13.47 minion19
fluentd-server-v2.0.4-855db7cfc5-shjfm 1/1 Running 0 6h 10.244.23.141 minion12
fluentd-server-v2.0.4-855db7cfc5-w7m5f 1/1 Running 0 6h 10.244.30.233 minion5
[centos@master1 efk]$
我们接下来就要对Node打label,这样就会拉起来了,中间注意间隔点时间。
[centos@master1 efk]$ kubectl label node minion8 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 efk]$ kubectl label node minion21 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 efk]$ kubectl label node minion9 beta.kubernetes.io/fluentd-ds-ready=true
[centos@master1 efk]$ kubectl label node minion16 beta.kubernetes.io/fluentd-ds-ready=true
再次检查节点被拉起的状态:
[centos@master1 efk]$ kubectl get pods -n kube-system -o wide | grep fluentd
fluentd-es-v2.0.4-75gdf 1/1 Running 1 4d 10.244.27.245 minion8
fluentd-es-v2.0.4-kx5pz 1/1 Running 0 2h 10.244.10.96 minion21
fluentd-es-v2.0.4-n89xj 1/1 Running 6 4d 10.244.26.92 minion9
fluentd-es-v2.0.4-zsrln 1/1 Running 0 6h 10.244.19.67 minion16
fluentd-server-v2.0.4-855db7cfc5-4wn47 1/1 Running 0 6h 10.244.29.20 minion6
fluentd-server-v2.0.4-855db7cfc5-pfmvd 1/1 Running 0 6h 10.244.3.211 minion17
fluentd-server-v2.0.4-855db7cfc5-rjqxl 1/1 Running 0 6h 10.244.13.47 minion19
fluentd-server-v2.0.4-855db7cfc5-shjfm 1/1 Running 0 6h 10.244.23.141 minion12
fluentd-server-v2.0.4-855db7cfc5-w7m5f 1/1 Running 0 6h 10.244.30.233 minion5
[centos@master1 efk]$
和Fluent Bit一样,需要去Kibana那边再次检查下节点的日志是否有正常传递到ES和S3,在此就不再累述。