skyWalking分布式调用链跟踪平台

随着微服务架构的流行,一些微服务架构下的问题也会越来越突出,比如一个请求会涉及多个服务,而服务本身可能也会依赖其他服务,整个请求路径就构成了一个网状的调用链,而在整个调用链中一旦某个节点发生异常,整个调用链的稳定性就会受到影响,所以会深深的感受到 “银弹” 这个词是不存在的,每种架构都有其优缺点 。

skywalking 简介(链路跟踪与分析)

随着业务越来越复杂,企业应用也进入了分布式服务化的阶段,随着模块的不断增多,一次请求可能会涉及到十几个甚至几十个服务的协同处理,那么如何准确快速的定位到线上故障和性能瓶颈,便成为我们不得不面对的棘手问题,传统的日志监控等方式无法很好达到跟踪调用,排查问题等需求。在谷歌论文《 Dapper,大规模分布式系统的跟踪系统》的指导下,许多优秀的APM应运而生。
分布式追踪系统发展很快,种类繁多,给我们带来很大的方便。但在数据采集过程中,有时需要侵入用户代码,并且不同系统的 API 并不兼容,这就导致了如果您希望切换追踪系统,往往会带来较大改动。OpenTracing为了解决不同的分布式追踪系统 API 不兼容的问题,诞生了 OpenTracing 规范。OpenTracing 是一个轻量级的标准化层,它位于应用程序/类库和追踪或日志分析程序之间。详细介绍见

opentracing文档中文版。

Skywalking是一款APM(应用程序性能监视器),尤其适用于微服务,Cloud Native和基于容器的架构系统。也称为分布式跟踪系统。它提供了一种自动检测应用程序的方法:无需更改目标应用程序的任何源代码; 以及具有高效流媒体模块的收集器。
针对分布式系统的APM(应用性能监控)系统,特别针对微服务、cloud native和容器化(Docker, Kubernetes, Mesos)架构, 其核心是个分布式追踪系统。
该项目由国人吴晟基于OpenTracking实现的开源项目skywalking(码云、github)
2017年12月8日,Apache软件基金会孵化器项目管理委员会 ASF IPMC宣布“SkyWalking全票通过,进入Apache孵化器”

skywalking 特点

性能好,针对单实例5000tps的应用,在全量采集的情况下,只增加 10% 的CPU开销。详细评测见《skywalking agent performance test》。
支持多语言探针
支持自动及手动探针;自动探针:Java支持的中间件、框架与类库列表; 手动探针:OpenTrackingApi、@Trace注解、trackId集成到日志中。
采用探针技术,在使用过程中,完全是0代码,无侵入,分布式自动采集与监控系统运行;

一、环境概览

软件 版本 机器数量
系统 centos7.4
jdk 1.8
elasticsearch 6.5.2 3
skywalking-UI 6.0 1
skywalking-collector 6.0 3

二、下载软件
apache-skywalking:
项目git地址:https://github.com/OpenSkywalking/skywalking-netcore
项目包下载地址:http://www.apache.org/dyn/closer.cgi/incubator/skywalking/6.0.0-GA/apache-skywalking-apm-incubating-6.0.0-GA.tar.gz
此包中包括了agent包,如下

$ ls -al apache-skywalking-apm-incubating/agent
drwxrwxr-x 2 1001 1002      271 Mar 20 17:22 activations
drwxrwxr-x 2 1001 1002       26 Mar 20 17:22 config
drwxrwxr-x 2 1001 1002        6 Jan 21 12:01 logs
drwxrwxr-x 2 1001 1002      139 Mar 20 17:22 optional-plugins
drwxrwxr-x 2 1001 1002     4096 Mar 20 17:22 plugins
-rw-rw-r-- 1 1001 1002 17805401 Jan 21 12:01 skywalking-agent.jar

三、安装部署
1、安装elasticsearch6.5.2集群 详见elsticsearch6.5.2集群安装+head插件
2、安装apache-skywalking(配置 collector) 3台都一样的配置

$ wget http://www.apache.org/dyn/closer.cgi/incubator/skywalking/6.0.0- GA/apache-skywalking-apm-incubating-6.0.0-GA.tar.gz -O /usr/local/src
$ tar xf /usr/local/src/apache-skywalking-apm-incubating-6.0.0-GA.tar.gz -C /usr/local
$ cd /usr/local/apache-skywalking-apm-incubating/config
$ cat  config/application.yml
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

cluster:
  #standalone:
  # Please check your ZooKeeper is 3.5+, However, it is also compatible with ZooKeeper 3.4.x. Replace the ZooKeeper 3.5+
  # library the oap-libs folder with your ZooKeeper 3.4.x library.
  zookeeper:
    nameSpace: ${SW_NAMESPACE:"skywalking"}
#    hostPort: ${SW_CLUSTER_ZK_HOST_PORT:172.16.163.60:2181,172.16.163.61:2181,172.16.163.62:2181}
    hostPort: ${SW_CLUSTER_ZK_HOST_PORT:10.100.11.37:2181}
#    #Retry Policy
    baseSleepTimeMs: ${SW_CLUSTER_ZK_SLEEP_TIME:1000} # initial amount of time to wait between retries
    maxRetries: ${SW_CLUSTER_ZK_MAX_RETRIES:3} # max number of times to retry
#  kubernetes:
#    watchTimeoutSeconds: ${SW_CLUSTER_K8S_WATCH_TIMEOUT:60}
#    namespace: ${SW_CLUSTER_K8S_NAMESPACE:default}
#    labelSelector: ${SW_CLUSTER_K8S_LABEL:app=collector,release=skywalking}
#    uidEnvName: ${SW_CLUSTER_K8S_UID:SKYWALKING_COLLECTOR_UID}
#  consul:
#    serviceName: ${SW_SERVICE_NAME:"SkyWalking_OAP_Cluster"}
#     Consul cluster nodes, example: 10.0.0.1:8500,10.0.0.2:8500,10.0.0.3:8500
#    hostPort: ${SW_CLUSTER_CONSUL_HOST_PORT:localhost:8500}
core:
  default:
    restHost: ${SW_CORE_REST_HOST:0.0.0.0}
    restPort: ${SW_CORE_REST_PORT:12800}
    restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
    gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0}
    gRPCPort: ${SW_CORE_GRPC_PORT:11800}
    downsampling:
    - Hour
    - Day
    - Month
    # Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.
    recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
    minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
    hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
    dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
    monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
storage:
  #h2:
    #driver: ${SW_STORAGE_H2_DRIVER:org.h2.jdbcx.JdbcDataSource}
    #url: ${SW_STORAGE_H2_URL:jdbc:h2:mem:skywalking-oap-db}
    #user: ${SW_STORAGE_H2_USER:sa}
  elasticsearch:
    nameSpace: ${SW_NAMESPACE:"skywalking"}
    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:10.100.xx.xx:9200,10.100.xx.xx:9200,10.100.xx.xx:9200}
    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:2000} # Execute the bulk every 2000 requests
    bulkSize: ${SW_STORAGE_ES_BULK_SIZE:20} # flush the bulk every 20mb
    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
#  mysql:
receiver-register:
  default:
receiver-trace:
  default:
    bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/}  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
    bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
    bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
    sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
receiver-jvm:
  default:
#service-mesh:
#  default:
#    bufferPath: ${SW_SERVICE_MESH_BUFFER_PATH:../mesh-buffer/}  # Path to trace buffer files, suggest to use absolute path
#    bufferOffsetMaxFileSize: ${SW_SERVICE_MESH_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
#    bufferDataMaxFileSize: ${SW_SERVICE_MESH_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
#    bufferFileCleanWhenRestart: ${SW_SERVICE_MESH_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
#istio-telemetry:
#  default:
#receiver_zipkin:
#  default:
#    host: ${SW_RECEIVER_ZIPKIN_HOST:0.0.0.0}
#    port: ${SW_RECEIVER_ZIPKIN_PORT:9411}
#    contextPath: ${SW_RECEIVER_ZIPKIN_CONTEXT_PATH:/}
query:
  graphql:
    path: ${SW_QUERY_GRAPHQL_PATH:/graphql}
alarm:
  default:
telemetry:
  none:

#启动
#SkyWalking 的启动包括两部分,一个是 SkyWalking Collector ,一个是 SkyWalking UI
$ ../bin/startup.sh  //UI和collector全部启动

访问 localhost:8080
账户密码均是admin
skyWalking分布式调用链跟踪平台_第1张图片

3、部署agent
取出agent放到项目里

cd //usr/local/apache-skywalking-apm-incubating
tar zcf  agent.tar.gz agent

将agent 放到项目里
接下来部署项目,有两种方式
第一种是Jar包部署方式的探针配置
java -javaagent:/path/to/skywalking-agent.jar -jar your_name.jar
第二种方式是tomcat部署方式
Tomcat配置探针

## linux
CATALINA_OPTS="-javaagent:/usr/local/tomcat-pof/bin/skywalking/Agent/skywalking-agent.jar -DSW_AGENT_NAMESPACE=default-namespace -DSW_AGENT_COLLECTOR_BACKEND_SERVICES=10.100.xx.xx:11800,
10.100.xx.xx:11800,10.100.xx.xx:11800 -DSW_AGENT_NAME=xxx_www_pof"; export CATALINA_OPTS
注意 11800 为 collector 端口 
## windows
set "CATALINA_OPTS=... -javaagent:E:\apache-tomcat-8.5.20\skywalking-agent\skywalking-agent.jar"

skyWalking的高级特性
插件会被统一放置在plugins目录中,新的插件,也只需要在启动阶段,放在目录中,就自动生效。删除则失效。
配置除了通过/config/agent.config文件外,可以通过环境变量和VM参数(-D)来进行设置
参数的key = skywalking. + agent.config文件中的key
优先级:系统环境变量 > VM参数(-D) >/config/agent.config中的配置
Log默认使用文件输出,输出到/log目录中
也就是说可以将传递的参数写到config/agent.config 文件中
这里也展示下agent.config的配置

cat Agent/config/agent.config 
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# The agent namespace
agent.namespace=${SW_AGENT_NAMESPACE:default-namespace}

# The service name in UI
agent.service_name=${SW_AGENT_NAME:xxx_www_pef}

# The number of sampled traces per 3 seconds
# Negative number means sample traces as many as possible, most likely 100%
# agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:-1}

# Authentication active is based on backend setting, see application.yml for more details.
# agent.authentication = ${SW_AGENT_AUTHENTICATION:xxxx}

# The max amount of spans in a single segment.
# Through this config item, skywalking keep your application memory cost estimated.
# agent.span_limit_per_segment=${SW_AGENT_SPAN_LIMIT:300}

# Ignore the segments if their operation names start with these suffix.
# agent.ignore_suffix=${SW_AGENT_IGNORE_SUFFIX:.jpg,.jpeg,.js,.css,.png,.bmp,.gif,.ico,.mp3,.mp4,.html,.svg}

# If true, skywalking agent will save all instrumented classes files in `/debugging` folder.
# Skywalking team may ask for these files in order to resolve compatible problem.
# agent.is_open_debugging_class = ${SW_AGENT_OPEN_DEBUG:true}

# Backend service addresses.
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:10.100.xx.xx:11800,10.100.xx.xx:11800,10.100.xx.xx:11800}

# Logging level
logging.level=${SW_LOGGING_LEVEL:DEBUG}

如果写在config/agent.config 中,项目里的探针配置需要写成

vi bin/catalina.sh
## 添加如下
CATALINA_OPTS="-javaagent:/usr/local/tomcat-htmall/bin/skywalking/Agent/skywalking-agent.jar" ;export CATALINA_OPTS

重启项目后查看UI
skyWalking分布式调用链跟踪平台_第2张图片

你可能感兴趣的:(skywalking,agent,Collector)