现在越来越多的应用迁移到基于微服务的云原生的架构之上,微服务架构很强大,但是同时也带来了很多的挑战,尤其是如何对应用进行调试,如何监控多个服务间的调用关系和状态。如何有效的对微服务架构进行有效的监控成为微服务架构运维成功的关键。用软件架构的语言来说就是要增强微服务架构的可观测性(Observability)。
通过收集日志,对系统和各个服务的运行状态进行监控
通过收集量度(Metrics),对系统和各个服务的性能进行监控
通过分布式追踪,追踪服务请求是如何在各个分布的组件中进行处理的细节
对于分布式追踪,主要有以下的几个概念:
追踪 Trace:就是由分布的微服务协作所支撑的一个事务。一个追踪,包含为该事务提供服务的各个服务请求。
跨度 Span:Span是事务中的一个工作流,一个Span包含了时间戳,日志和标签信息。Span之间包含父子关系,或者主从(Followup)关系。
跨度上下文 Span Context:跨度上下文是支撑分布式追踪的关键,它可以在调用的服务之间传递,上下文的内容包括诸如:从一个服务传递到另一个服务的时间,追踪的ID,Span的ID还有其它需要从上游服务传递到下游服务的信息。
2.OpenTracing 标准概念
t=0 operation name: db_query t=x
+-----------------------------------------------------+
| · · · · · · · · · · Span · · · · · · · · · · |
+-----------------------------------------------------+
Tags:
- db.instance:"jdbc:mysql://127.0.0.1:3306/customers
- db.statement: "SELECT * FROM mytable WHERE foo='bar';"
Logs:
- message:"Can't connect to mysql server on '127.0.0.1'(10061)"
SpanContext:
- trace_id:"abc123"
- span_id:"xyz789"
- Baggage Items:
- special_id:"vsid1738"
(左右滑动查看全部代码)
span_context = ...
outbound_request = ...
# We'll use the (builtin) HTTP_HEADERS carrier format. We
# start by using an empty map as the carrier prior to the
# call to `tracer.inject`.
carrier = {}
tracer.inject(span_context, opentracing.Format.HTTP_HEADERS, carrier)
# `carrier` now contains (opaque) key:value pairs which we pass
# along over whatever wire protocol we already use.
for key, value in carrier:
outbound_request.headers[key] = escape(value)
(左右滑动查看全部代码)
Extract 伪代码
inbound_request = ...
# We'll again use the (builtin) HTTP_HEADERS carrier format. Per the
# HTTP_HEADERS documentation, we can use a map that has extraneous data
# in it and let the OpenTracing implementation look for the subset
# of key:value pairs it needs.
#
# As such, we directly use the key:value `inbound_request.headers`
# map as the carrier.
carrier = inbound_request.headers
span_context = tracer.extract(opentracing.Format.HTTP_HEADERS, carrier)
# Continue the trace given span_context. E.g.,
span = tracer.start_span("...", child_of=span_context)
# (If `carrier` held trace data, `span` will now be ready to use.)
(左右滑动查看全部代码)
好了讲了一大堆的概念,作为程序猿的你早已经不耐烦了,不要讲那些有的没的,快上代码。不急我们这就看看具体如何使用Tracing。
我们用一个程序猿喜闻乐见的打印‘hello world’的Python应用来说明OpenTracing是如何工作的。
import requests
import sys
import time
from lib.tracing import init_tracer
from opentracing.ext import tags
from opentracing.propagation import Format
def say_hello(hello_to):
with tracer.start_active_span('say-hello') as scope:
scope.span.set_tag('hello-to', hello_to)
hello_str = format_string(hello_to)
print_hello(hello_str)
def format_string(hello_to):
with tracer.start_active_span('format') as scope:
hello_str = http_get(8081, 'format', 'helloTo', hello_to)
scope.span.log_kv({'event': 'string-format', 'value': hello_str})
return hello_str
def print_hello(hello_str):
with tracer.start_active_span('println') as scope:
http_get(8082, 'publish', 'helloStr', hello_str)
scope.span.log_kv({'event': 'println'})
def http_get(port, path, param, value):
url = 'http://localhost:%s/%s' % (port, path)
span = tracer.active_span
span.set_tag(tags.HTTP_METHOD, 'GET')
span.set_tag(tags.HTTP_URL, url)
span.set_tag(tags.SPAN_KIND, tags.SPAN_KIND_RPC_CLIENT)
headers = {}
tracer.inject(span, Format.HTTP_HEADERS, headers)
r = requests.get(url, params={param: value}, headers=headers)
assert r.status_code == 200
return r.text
# main
assert len(sys.argv) == 2
tracer = init_tracer('hello-world')
hello_to = sys.argv[1]
say_hello(hello_to)
# yield to IOLoop to flush the spans
time.sleep(2)
tracer.close()
(左右滑动查看全部代码)
服务A代码
from flask import Flask
from flask import request
from lib.tracing import init_tracer
from opentracing.ext import tags
from opentracing.propagation import Format
app = Flask(__name__)
tracer = init_tracer('formatter')
@app.route("/format")
def format():
span_ctx = tracer.extract(Format.HTTP_HEADERS, request.headers)
span_tags = {tags.SPAN_KIND: tags.SPAN_KIND_RPC_SERVER}
with tracer.start_active_span('format', child_of=span_ctx, tags=span_tags):
hello_to = request.args.get('helloTo')
return 'Hello, %s!' % hello_to
if __name__ == "__main__":
app.run(port=8081)
(左右滑动查看全部代码)
from flask import Flask
from flask import request
from lib.tracing import init_tracer
from opentracing.ext import tags
from opentracing.propagation import Format
app = Flask(__name__)
tracer = init_tracer('publisher')
@app.route("/publish")
def publish():
span_ctx = tracer.extract(Format.HTTP_HEADERS, request.headers)
span_tags = {tags.SPAN_KIND: tags.SPAN_KIND_RPC_SERVER}
with tracer.start_active_span('publish', child_of=span_ctx, tags=span_tags):
hello_str = request.args.get('helloStr')
print(hello_str)
return 'published'
if __name__ == "__main__":
app.run(port=8082)
(左右滑动查看全部代码)
Zipkin
Jaeger Client,负责在客户端收集跟踪信息。
Jaeger Agent,负责和客户端通信,把收集到的追踪信息上报个收集器 Jaeger Collector
Jaeger Colletor把收集到的数据存入数据库或者其它存储器
Jaeger Query 负责对追踪数据进行查询
Jaeger UI负责用户交互
4.分布式跟踪系统——产品对比
当然除了支持OpenTracing标准的产品之外,还有其它的一些分布式追踪产品。这里引用一些其它博主的分析,给大家一些参考:
5.总结
参考资料
http://1t.click/6tC
http://1t.click/6t7
http://1t.click/6tD
http://1t.click/6tK
http://1t.click/6tP
http://1t.click/6tS
https://dwz.cn/vBqhTHL1
推荐阅读
对没有监控的微服务Say No!
Skywalking微服务监控分析
微服务来了,监控怎么办?
关于作者:陶刚,Splunk资深软件工程师,架构师,毕业于北京邮电大学,现在在温哥华负责Splunk机器学习云平台的开发,曾经就职于SAP,EMC,Lucent等企业,拥有丰富的企业应用软件开发经验,熟悉软件开发的各种技术,平台和开发过程,在商务智能,机器学习,数据可视化,数据采集,网络管理等领域都有涉及。
关于EAWorld:微服务,DevOps,数据治理,移动架构原创技术分享。长按二维码关注!