Envoy状态统计
访问日志
分布式跟踪
日志,指标和跟踪是应用程序可观测性的三大支柱.前二者更多是属于传统的"以主机为中心"的模型,而跟踪则"以流程为中心",“请求链路为中心”
分布式跟踪(Distributed Tracing)
Envoy统计数据大体可以分为三类:
Envoy统计数据列席主要有三类,数据类型均为无符号整数:
几乎所有统计数据都可以通过admin接口的/stats获取到
stats的配置参数位于Bootstrap配置文件的顶级配置段
{
...
"stats_sinks": [], # stats_sink列表;
"stats_config": "{...}", # stats内部处理机制;
"stats_flush_interval": "{...}", # stats数据刷写至sinks的频率,出于性能考虑,Envoy仅周期性刷写counters和gauges,默认时长为5000ms;
“stats_flush_on_admin”: “...” # 仅在admin接口上收到查询请求时才刷写数据;
...
}
stats_sinks为Envoy的可选配置,统计数据默认没有配置任何暴露机制,但需要存储长期的指标数据则应该手动定制此配置
stats_sinks:
name: ... # 要初始化的Sink的名称,名称必须匹配于Envoy内置支持的sink,包括envoy.stat_sinks.dog_statsd、envoy.stat_sinks.graphite_statsd、envoy.stat_sinks.hystrix、
# envoy.stat_sinks.metrics_service、envoy.stat_sinks.statsd和envoy.stat_sinks.wasm几个;它们的功用类似于Prometheus的exporter;
typed_config: {...} # Sink的配置;各Sink的配置方式有所不同,下面给出的参数是为statd专用;
address: {...} # StatsdSink服务的访问端点,也可以使用下面的tcp_cluster_name指定为配置在Envoy上的Sink服务器组成集群;
tcp_cluster_name: ... # StatsdSink集群的名称,与address互斥;
prefix: ... # StatsdSink的自定义前缀,可选参数;
Envoy的统计信息由规范字符串表示法进行表示,这些字符串的动态部分可被玻璃标签(tag),并可由用户通过tag specifier进行配置
- config.metrics.v3.StatsConfig
- config.metrics.v3.TagSpecifier
stats_sink集群定义
...
stats_sinks:
- name: envoy.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
tcp_cluster_name: statsd_exporter
prefix: front-envoy
...
static_resources:
clusters:
- name: colord
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: colord
endpoints:
- lb_endpoints:
- endpoint:
address: { socket_address: { address: myservice, port_value: 80 } }
Prometheus定义
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'statsd'
scrape_interval: 5s
static_configs:
- targets: ['statsd_exporter:9102'] # statsd_exporter需要可以被dns解析或hosts解析
labels:
group: 'services'
10个Service:
version: '3.3'
services:
front-envoy:
image: envoyproxy/envoy-alpine:v1.21-latest
environment:
- ENVOY_UID=0
- ENVOY_GID=0
volumes:
- "./front_envoy/envoy-config.yaml:/etc/envoy/envoy.yaml"
networks:
envoymesh:
ipv4_address: 172.31.70.10
aliases:
- front-envoy
- front
ports:
- 8088:80
- 9901:9901
service_a_envoy:
image: envoyproxy/envoy-alpine:v1.20.0
volumes:
- "./service_a/envoy-config.yaml:/etc/envoy/envoy.yaml"
networks:
envoymesh:
aliases:
- service_a_envoy
- service-a-envoy
ports:
- 8786:8786
- 8788:8788
service_a:
build: service_a/
networks:
envoymesh:
aliases:
- service_a
- service-a
ports:
- 8081:8081
service_b_envoy:
image: envoyproxy/envoy-alpine:v1.20.0
volumes:
- "./service_b/envoy-config.yaml:/etc/envoy/envoy.yaml"
networks:
envoymesh:
aliases:
- service_b_envoy
- service-b-envoy
ports:
- 8789:8789
service_b:
build: service_b/
networks:
envoymesh:
aliases:
- service_b
- service-b
ports:
- 8082:8082
service_c_envoy:
image: envoyproxy/envoy-alpine:v1.20.0
volumes:
- "./service_c/envoy-config.yaml:/etc/envoy/envoy.yaml"
networks:
envoymesh:
aliases:
- service_c_envoy
- service-c-envoy
ports:
- 8790:8790
service_c:
build: service_c/
networks:
envoymesh:
aliases:
- service_c
- service-c
ports:
- 8083:8083
statsd_exporter:
image: prom/statsd-exporter:v0.22.3
networks:
envoymesh:
ipv4_address: 172.31.70.66
aliases:
- statsd_exporter
ports:
- 9125:9125
- 9102:9102
prometheus:
image: prom/prometheus:v2.30.3
volumes:
- "./prometheus/config.yaml:/etc/prometheus.yaml"
networks:
envoymesh:
ipv4_address: 172.31.70.67
aliases:
- prometheus
ports:
- 9090:9090
command: "--config.file=/etc/prometheus.yaml"
grafana:
image: grafana/grafana:8.2.2
volumes:
- "./grafana/grafana.ini:/etc/grafana/grafana.ini"
- "./grafana/datasource.yaml:/etc/grafana/provisioning/datasources/datasource.yaml"
- "./grafana/dashboard.yaml:/etc/grafana/provisioning/dashboards/dashboard.yaml"
- "./grafana/dashboard.json:/etc/grafana/provisioning/dashboards/dashboard.json"
networks:
envoymesh:
ipv4_address: 172.31.70.68
aliases:
- grafana
ports:
- 3000:3000
networks:
envoymesh:
driver: bridge
ipam:
config:
- subnet: 172.31.70.0/24
通过stats_sinks将日志写入statsd_exporter,并加front-envoy头
将所有80的流量都交给了service_a
node:
id: front-envoy
cluster: mycluster
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
layered_runtime:
layers:
- name: admin
admin_layer: {}
stats_sinks:
- name: envoy.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
tcp_cluster_name: statsd_exporter
prefix: front-envoy
static_resources:
listeners:
- name: http_listener
address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
use_remote_address: true
add_user_agent: true
stat_prefix: ingress_80
codec_type: AUTO
generate_request_id: true
route_config:
name: local_route
virtual_hosts:
- name: http-route
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service_a
http_filters:
- name: envoy.filters.http.router
clusters:
- name: statsd_exporter
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: statsd_exporter
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: statsd_exporter
port_value: 9125
- name: service_a
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_a
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service_a_envoy
port_value: 8786
通过stats_sinks将日志写入statsd_exporter,并加service-a头
ingress,将入流量交给cluster: service_a
egress,将出流量导向service_b和service_c
node:
id: service-a
cluster: mycluster
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
layered_runtime:
layers:
- name: admin
admin_layer: {}
stats_sinks:
- name: envoy.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
tcp_cluster_name: statsd_exporter
prefix: service-a
static_resources:
listeners:
- name: service-a-svc-http-listener
address:
socket_address:
address: 0.0.0.0
port_value: 8786
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: service-a-svc-http-route
virtual_hosts:
- name: service-a-svc-http-route
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service_a
http_filters:
- name: envoy.filters.http.router
- name: service-b-svc-http-listener
address:
socket_address:
address: 0.0.0.0
port_value: 8788
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: service-b-svc-http-route
virtual_hosts:
- name: service-b-svc-http-route
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service_b
http_filters:
- name: envoy.filters.http.router
- name: service-c-svc-http-listener
address:
socket_address:
address: 0.0.0.0
port_value: 8791
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: service-c-svc-http-route
virtual_hosts:
- name: service-c-svc-http-route
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service_c
http_filters:
- name: envoy.filters.http.router
clusters:
- name: statsd_exporter
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: statsd_exporter
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: statsd_exporter
port_value: 9125
- name: service_a
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_a
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service_a
port_value: 8081
- name: service_b
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_b
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service_b_envoy
port_value: 8789
- name: service_c
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_c
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service_c_envoy
port_value: 8790
通过stats_sinks将日志写入statsd_exporter,并加service-b头
ingress,将入流量交给cluster: service_b
15%的概率注入503错误
node:
id: service-b
cluster: mycluster
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
layered_runtime:
layers:
- name: admin
admin_layer: {}
stats_sinks:
- name: envoy.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
tcp_cluster_name: statsd_exporter
prefix: service-b
static_resources:
listeners:
- name: service-b-svc-http-listener
address:
socket_address:
address: 0.0.0.0
port_value: 8789
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: service-b-svc-http-route
virtual_hosts:
- name: service-b-svc-http-route
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service_b
http_filters:
- name: envoy.filters.http.fault
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
max_active_faults: 100
abort:
http_status: 503
percentage:
numerator: 15
denominator: HUNDRED
- name: envoy.filters.http.router
typed_config: {}
clusters:
- name: statsd_exporter
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: statsd_exporter
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: statsd_exporter
port_value: 9125
- name: service_b
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_b
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service_b
port_value: 8082
通过stats_sinks将日志写入statsd_exporter,并加service-c头
ingress,将入流量交给cluster: service_c
10%的概率注入1s延迟
node:
id: service-c
cluster: mycluster
admin:
profile_path: /tmp/envoy.prof
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
layered_runtime:
layers:
- name: admin
admin_layer: {}
stats_sinks:
- name: envoy.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
tcp_cluster_name: statsd_exporter
prefix: service-c
static_resources:
listeners:
- name: service-c-svc-http-listener
address:
socket_address:
address: 0.0.0.0
port_value: 8790
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
stat_prefix: ingress_8786
codec_type: AUTO
route_config:
name: service-c-svc-http-route
virtual_hosts:
- name: service-c-svc-http-route
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: service_c
http_filters:
- name: envoy.filters.http.fault
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
max_active_faults: 100
delay:
fixed_delay: 1s
percentage:
numerator: 10
denominator: HUNDRED
- name: envoy.filters.http.router
typed_config: {}
clusters:
- name: statsd_exporter
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: statsd_exporter
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: statsd_exporter
port_value: 9125
- name: service_c
connect_timeout: 0.25s
type: strict_dns
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_c
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: service_c
port_value: 8083
持续访问front envoy172.31.70.10.
front envoy将请求转发给service_a,并将监控写入statsd_exporter
service_a通过ingress将流量导入,再通过egress将去service_b和service_c的流量导出.
将监控写入statsd_exporter
service_b通过ingress导入流量并做应答,同时有15%概率触发503错误.
将监控写入statsd_exporter
service_c通过ingress导入流量并做应答,同时有10%概率延迟1s应答
将监控写入statsd_exporter
# while true; do curl 172.31.70.10; sleep 0.$RANDOM; done
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
Hello from service C.
Calling Service B: Hello from service B.
Hello from service A.
prometheus将以下内容映射到容器的/etc/prometheus.yaml,对statsd_exporter:9102进行数据采集
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'statsd'
scrape_interval: 5s
static_configs:
- targets: ['statsd_exporter:9102']
labels:
group: 'services'
通过宿主机的9090端口可以在prometheus控制台上读取到几个container采集的监控数据
当流量到front envoy时,将所有流量转给service_a,所以没有front_envoy到service_b和service_c的数据
service_a到service_b的流量会有15%的被注入了503的报错,此时可以看到grafana上有2xx和5xx的显示
C是没有注入503错误的所以没有5XX的错误,但在命令行while true的执行过程中会明显感觉到部分响应会有1s左右的卡顿