Jaeger是Uber推出的一款调用链追踪系统,类似于Zipkin和Dapper,为微服务调用追踪而生。 其主要用于多个服务调用过程追踪分析,图形化服务调用轨迹,便于快速准确定位问题。
按照数据流向,整体可以分为四个部分:
jaeger-client:
Jaeger 的客户端,实现了 OpenTracing 的 API,支持主流编程语言。客户端直接集成在目标 Application 中,其作用是记录和发送 Span 到 Jaeger Agent。在 Application 中调用 Jaeger Client Library 记录 Span 的过程通常被称为埋点。
jaeger-agent:
暂存 Jaeger Client 发来的 Span,并批量向 Jaeger Collector 发送 Span,一般每台机器上都会部署一个 Jaeger Agent。官方的介绍中还强调了 Jaeger Agent 可以将服务发现的功能从 Client 中抽离出来,不过从架构角度讲,如果是部署在 Kubernetes 或者是 Nomad 中,Jaeger Agent 存在的意义并不大。
jaeger-collector:
接受 Jaeger Agent 发来的数据,并将其写入存储后端,目前支持采用 Cassandra 和 Elasticsearch 作为存储后端。推荐用 Elasticsearch,既可以和日志服务共用同一个 ES,又可以使用 Kibana 对 Trace 数据进行额外的分析。架构图中的存储后端是 Cassandra,旁边还有一个 Spark,讲的就是可以用 Spark 等其他工具对存储后端中的 Span 进行直接分析。
jaeger-query & jaeger-ui:
读取存储后端中的数据,以直观的形式呈现。
为了方便统一管理,将jaeger所有组件放到jaeger的namespace中,并创建jaeger-agent这个服务账号,做好访问授权以方便各个节点上的jaeger-agent对default命名空间中各负载的链路调用情况及数据进行收集监控
[root@formal-k8s-01 jaeger]#kubectl create namespace jaeger
[root@formal-k8s-01 jaeger]# cat rabc.yml
apiVersion: v1
kind: ServiceAccount
metadata:
name: jaeger-agent
namespace: jaeger
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: jaeger-agent
rules:
- apiGroups: [""]
resources:
- services
- namespaces
- deployments
- pods
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: jaeger-agent
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jaeger-agent
subjects:
- kind: ServiceAccount
name: jaeger-agent
namespace: jaeger
测试环境下利用statefulset部署elasticsearch,正式环境将jaeger-collector端数据接入线上ELK集群,方便对索引统一管理
#测试环境下利用statefulset部署elasticsearch,正式环境将jaeger-collector端数据接入线上ELK集群
wget https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/production-elasticsearch/elasticsearch.yml
cat elasticsearch.yml
apiVersion: v1
kind: List
items:
- apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: jaeger
labels:
app: elasticsearch
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: storage-backend
app.kubernetes.io/part-of: jaeger
spec:
serviceName: elasticsearch
replicas: 1
template:
metadata:
labels:
app: elasticsearch
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: storage-backend
app.kubernetes.io/part-of: jaeger
spec:
containers:
- name: elasticsearch
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.0
imagePullPolicy: Always
command:
- bin/elasticsearch
args:
- "-Ehttp.host=0.0.0.0"
- "-Etransport.host=$(PODIP)" #将设置的变量$(PODIP)为elasticsearch的监听地址,默认使用的lo网址在后面cronjob
volumeMounts: #定期将服务之间的依赖关系同步到elasticsearch时会有问题
- name: cce-sfs-kt-jaeger-data
mountPath: /data
env:
- name: PODIP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
failureThreshold: 3
initialDelaySeconds: 20
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 9200
timeoutSeconds: 4
initialDelaySeconds: 20
periodSeconds: 5
timeoutSeconds: 4
volumes: #通过pvc实现elasticsearch数据持久化存储,此为华为云pvc挂载方式,自定义
- name: cce-sfs-kt-jaeger-data #参见statefulset中关于 volumeClaimTemplates的设置
persistentVolumeClaim:
claimName: cce-sfs-kt-jaeger-data
- apiVersion: v1
kind: Service
metadata:
name: elasticsearch
namespace: jaeger
labels:
app: elasticsearch
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: storage-backend
app.kubernetes.io/part-of: jaeger
spec:
clusterIP: None
selector:
app.kubernetes.io/name: elasticsearch
app.kubernetes.io/component: storage-backend
app.kubernetes.io/part-of: jaeger
ports:
- port: 9200
name: elasticsearch
- port: 9300
name: transport
------------------------------------------------------------------
#各组件配置管理:
cat configmap.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: jaeger-configuration
namespace: jaeger
labels:
app: jaeger
app.kubernetes.io/name: jaeger
data:
span-storage-type: elasticsearch
collector: |
es:
server-urls: http://192.168.4.17:9200 #ELK集群elasticsearch节点
collector:
zipkin:
http-port: 9411
query: |
es:
server-urls: http://192.168.4.17:9200
agent: |
collector:
host-port: "jaeger-collector:14267"
我们规划将jaeger-query以及jaeger-collector两个组件的pods调度到test-jaeger-01这台节点之上,所以通过配置pod的nodeAffinity及对test-jaeger-01节点taints的容忍将以上两个组件的pods强制分配到该节点
[root@test-k8s-01 jumpserver]# kubectl taint node test-jaeger-01 type=jaeger:NoSchedule
[root@test-k8s-01 jumpserver]# kubectl describe node 1test-jaeger-01 |grep Taint
Taints: type=jaeger:NoSchedule
[root@test-k8s-01 jaeger]#cat jaeger-query.yml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jaeger-query
namespace: jaeger
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: query
spec:
replicas: 2
strategy:
type: Recreate
template:
metadata:
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: query
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "16686"
spec:
containers:
- image: jaegertracing/jaeger-query:1.9.0
name: jaeger-query
args: ["--config-file=/conf/query.yaml"]
ports:
- containerPort: 16686
protocol: TCP
readinessProbe:
httpGet:
path: "/"
port: 16687
volumeMounts:
- name: jaeger-configuration-volume
mountPath: /conf
env:
- name: SPAN_STORAGE_TYPE
valueFrom:
configMapKeyRef:
name: jaeger-configuration
key: span-storage-type
volumes:
- configMap:
name: jaeger-configuration
items:
- key: query
path: query.yaml
name: jaeger-configuration-volume
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.*.*
tolerations:
- key: "type"
operator: "Equal"
value: "jaeger"
effect: "NoSchedule"
---
apiVersion: v1 #利用华为云elb部署一个loadbalance类型services,后期通过nginx进行反代,统一访问出入口
kind: Service
metadata:
annotations:
kubernetes.io/elb.class: union
kubernetes.io/elb.id: f7713230-c6fe-465c-ba0a-***********
kubernetes.io/elb.vpc.id: c289b36d-1321-*********
name: jaeger-query
namespace: jaeger
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: query
spec:
loadBalancerIP: 192.168.*.*
ports:
- name: jaeger-query
port: 80
protocol: TCP
targetPort: 16686
selector:
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: query
type: LoadBalancer
利用daemonset方式部署jaeger-agent,以节点为单位采集各个应用pod之间的span信息,也可以以sidecar的方式注将agent入到各个应用的pod中
root@test-k8s-01 jaeger]# cat jaeger-production-template.yml
apiVersion: v1
kind: List
items:
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jaeger-collector
namespace: jaeger
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: collector
spec:
replicas: 2
strategy:
type: Recreate
template:
metadata:
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: collector
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "14268"
spec:
tolerations:
- key: "type"
operator: "Equal"
value: "jaeger"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- 192.168.10.156
containers:
- image: jaegertracing/jaeger-collector:1.9.0
name: jaeger-collector
args: ["--config-file=/conf/collector.yaml"]
ports:
- containerPort: 14267
protocol: TCP
- containerPort: 14268
protocol: TCP
- containerPort: 9411
protocol: TCP
readinessProbe:
httpGet:
path: "/"
port: 14269
volumeMounts:
- name: jaeger-configuration-volume
mountPath: /conf
env:
- name: SPAN_STORAGE_TYPE
valueFrom:
configMapKeyRef:
name: jaeger-configuration
key: span-storage-type
volumes:
- configMap:
name: jaeger-configuration
items:
- key: collector
path: collector.yaml
name: jaeger-configuration-volume
- apiVersion: v1
kind: Service
metadata:
name: jaeger-collector
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: collector
spec:
ports:
- name: jaeger-collector-tchannel
port: 14267
protocol: TCP
targetPort: 14267
- name: jaeger-collector-http
port: 14268
protocol: TCP
targetPort: 14268
- name: jaeger-collector-zipkin
port: 9411
protocol: TCP
targetPort: 9411
selector:
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: collector
type: ClusterIP
- apiVersion: v1
kind: Service
metadata:
name: zipkin
namespace: jaeger
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: zipkin
spec:
ports:
- name: jaeger-collector-zipkin
port: 9411
protocol: TCP
targetPort: 9411
selector:
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: collector
type: ClusterIP
- apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: jaeger-agent
namespace: jaeger
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: agent
spec:
template:
metadata:
labels:
app: jaeger
app.kubernetes.io/name: jaeger
app.kubernetes.io/component: agent
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5778"
spec:
containers:
- name: jaeger-agent
image: jaegertracing/jaeger-agent:1.9.0
args: ["--config-file=/conf/agent.yaml"]
volumeMounts:
- name: jaeger-configuration-volume
mountPath: /conf
env:
- name: JAEGER_AGENT_HOST #由于jaeger-agent是作为DaemonSet方式部署,且使用了hostnetwork方式,因此节点
valueFrom: #的IP地址可以存储为环境变量,并通过以下方式传递给应用程序
fieldRef:
fieldPath: status.hostIP
ports:
- containerPort: 5775
protocol: UDP
- containerPort: 6831
protocol: UDP
- containerPort: 6832
protocol: UDP
- containerPort: 5778
protocol: TCP
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
serviceAccountName: jaeger-agent
volumes:
- configMap:
name: jaeger-configuration
items:
- key: agent
path: agent.yaml
name: jaeger-configuration-volume
将各个应用的依赖关系导入elasticsearch,进行持久化存储,否则仅仅存储在缓存之中
apiVersion: v1
items:
- apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
run: jaeger-spark-dependencies
name: jaeger-spark-dependencies
namespace: jaeger
spec:
spec:
template:
metadata:
labels:
run: jaeger-spark-dependencies
spec:
containers:
- env:
- name: STORAGE
value: elasticsearch
- name: ES_NODES
# value: elasticsearch-0.elasticsearch.jaeger.svc.cluster.local:9200 #选择将es部署在k8s中
value: 192.168.*.*:80**
# - name: ES_USERNAME 如果es没有设置访问认证,就不需要添加
# value: elastic
# - name: ES_PASSWORD
# value: changeme
image: jaegertracing/spark-dependencies
name: jaeger-spark-dependencies
schedule: 55 23 * * *
[root@test-k8s-01 jaeger]# kubectl get pods -n jaeger
NAME READY STATUS RESTARTS AGE
elasticsearch-0 1/1 Running 0 2d
jaeger-agent-xj7cs 1/1 Running 0 16h
jaeger-agent-zw5gr 1/1 Running 0 16h
jaeger-collector-7fb4fff766-m79x7 1/1 Running 0 16h
jaeger-collector-7fb4fff766-t5thx 1/1 Running 0 16h
jaeger-query-888d478b8-fk527 1/1 Running 0 16h
jaeger-query-888d478b8-ht8xh 1/1 Running 0 16h
jaeger-spark-dependencies-1574265300-8rmbl 0/1 Completed 0 1d
jaeger-spark-dependencies-1574351700-txpjt 0/1 Completed 0 9h
jaeger-ui展示所追踪span的耗时离散统计
ps:从上图可以看出jaeger-ui自带折线统计图表并不能有效反映所有(调用链)span耗时的分布范围,由于我们将span数据存入到elasticsearch,所以可以利用kibana的聚合进行有效展示
1.从索引类jaeger-span-*所输出的日志信息中发现duration为反映span耗时的字段,单位为微秒
2.从Kibana导航栏的Visualize(聚合展示)创建一个Pie饼形图