SkyWalking

一、介绍

SkyWalking开源项目由吴晟2015年创建,同年10月在GitHub上作为个人项目开源。SkyWalking项目的核心目标,是针对微服务、Cloud Native、容器化架构,提供应用性能监控和分布式调用链追踪能力。 目前已加入Apache孵化器。目前支持链路追踪和监控应用组件如下,基本涵盖主流框架和容器,如国产PRC Dubbo和motan等,国际化的spring boot,spring cloud。主要功能特性:


20190104090621_48918.png

二、SkyWalking的主要部分及其功能

frame.jpeg

SW总体可以分为四部分:

  • Skywalking Agent:使用Javaagent做字节码植入,无侵入式的收集,并通过HTTP或者gRPC方式发送数据到Skywalking Collector。

  • Skywalking Collector :链路数据收集器,对agent传过来的数据进行整合分析处理并落入相关的数据存储中。

  • Storage:Skywalking的存储,时间更迭,支持以ElasticSearch、Mysql、TiDB、H2、作为存储介质进行数据存储。

  • UI :Web可视化平台,用来展示落地的数据。

2.1 Skywalking Agent配置

通过了解配置,可以对一个组件功能有一个大致的了解。让我们一起看一下skywalking的相关配置。

解压开skywalking的压缩包,在agent/config文件夹中可以看到agent的配置文件。

从skywalking支持环境变量配置加载,在启动的时候优先读取环境变量中的相关配置。

# The agent namespace
# agent.namespace=${SW_AGENT_NAMESPACE:default-namespace}

# The service name in UI
agent.service_name=${SW_AGENT_NAME:Your_ApplicationName}

# The number of sampled traces per 3 seconds
# Negative number means sample traces as many as possible, most likely 100%
# agent.sample_n_per_3_secs=${SW_AGENT_SAMPLE:-1}

# Authentication active is based on backend setting, see application.yml for more details.
# agent.authentication = ${SW_AGENT_AUTHENTICATION:xxxx}

# The max amount of spans in a single segment.
# Through this config item, SkyWalking keep your application memory cost estimated.
# agent.span_limit_per_segment=${SW_AGENT_SPAN_LIMIT:300}

# Ignore the segments if their operation names end with these suffix.
# agent.ignore_suffix=${SW_AGENT_IGNORE_SUFFIX:.jpg,.jpeg,.js,.css,.png,.bmp,.gif,.ico,.mp3,.mp4,.html,.svg}

# If true, SkyWalking agent will save all instrumented classes files in `/debugging` folder.
# SkyWalking team may ask for these files in order to resolve compatible problem.
# agent.is_open_debugging_class = ${SW_AGENT_OPEN_DEBUG:true}

# The operationName max length
# agent.operation_name_threshold=${SW_AGENT_OPERATION_NAME_THRESHOLD:500}

# Backend service addresses.
collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:127.0.0.1:11800}

# Logging file_name
logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log}

# Logging level
logging.level=${SW_LOGGING_LEVEL:DEBUG}

agent.namespace: 跨进程链路中的header,不同的namespace会导致跨进程的链路中断
agent.service_name:一个服务(项目)的唯一标识,这个字段决定了在sw的UI上的关于service的展示名称
agent.sample_n_per_3_secs: 客户端采样率,默认是-1代表全采样
agent.authentication: 与collector进行通信的安全认证,需要同collector中配置相同
agent.ignore_suffix: 忽略特定请求后缀的trace
collecttor.backend_service: agent需要同collector进行数据传输的IP和端口
logging.level: agent记录日志级别

skywalking agent使用javaagent无侵入式的配合collector实现对分布式系统的追踪和相关数据的上下文传递。

2.2 Skywalking Collector关键配置

Collector支持集群部署,zookeeper、kubernetes(如果你的应用是部署在容器中的)、consul(GO语言开发的服务发现工具)是sw可选的集群管理工具,结合大家具体的部署方式进行选择。详细配置大家可以去Skywalking官网下载介质包进行了解。

2.2.1 Collector端口设置

打开config/application.yml

core:
  default:
    # Mixed: Receive agent data, Level 1 aggregate, Level 2 aggregate
    # Receiver: Receive agent data, Level 1 aggregate
    # Aggregator: Level 2 aggregate
    role: ${SW_CORE_ROLE:Mixed} # Mixed/Receiver/Aggregator
    restHost: ${SW_CORE_REST_HOST:0.0.0.0}
    restPort: ${SW_CORE_REST_PORT:12800}
    restContextPath: ${SW_CORE_REST_CONTEXT_PATH:/}
    gRPCHost: ${SW_CORE_GRPC_HOST:0.0.0.0}
    gRPCPort: ${SW_CORE_GRPC_PORT:11800}
    downsampling:
      - Hour
      - Day
      - Month
    # Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
    enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
    dataKeeperExecutePeriod: ${SW_CORE_DATA_KEEPER_EXECUTE_PERIOD:5} # How often the data keeper executor runs periodically, unit is minute
    recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
    minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
    hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
    dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
    monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
    # Cache metric data for 1 minute to reduce database queries, and if the OAP cluster changes within that minute,
    # the metrics may not be accurate within that minute.
    enableDatabaseSession: ${SW_CORE_ENABLE_DATABASE_SESSION:true}
    topNReportPeriod: ${SW_CORE_TOPN_REPORT_PERIOD:10} # top_n record worker report cycle, unit is minute
storage:

downsampling: 采样汇总统计维度,会分别按照分钟、【小时、天、月】(可选)来统计各项指标数据。
通过设置TTL相关配置项可以对数据进行自动清理。
collector提供了gRPC和HTTP两种通信方式。
UI使用rest http通信,agent在大多数场景下使用grpc方式通信,在语言不支持的情况下会使用http通信。
关于绑定IP和端口需要注意的一点是,通过绑定IP,agent和collector必须配置对应ip才可以正常通信。

2.2 .2 Collector存储配置

在application.yml中配置的storage模块配置中选择要使用的数据库类型,并填写相关的配置信息。
我这里选择默认的h2数据库

storage:
#  elasticsearch:
#    nameSpace: ${SW_NAMESPACE:""}
#    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
#    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
#    trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"}
#    trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
#    user: ${SW_ES_USER:""}
#    password: ${SW_ES_PASSWORD:""}
#    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
#    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
#    # Those data TTL settings will override the same settings in core module.
#    recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
#    otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
#    monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
#    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
#    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the bulk every 1000 requests
#    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
#    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
#    resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
#    metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
#    segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
#  elasticsearch7:
#    nameSpace: ${SW_NAMESPACE:""}
#    clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:localhost:9200}
#    protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"}
#    trustStorePath: ${SW_SW_STORAGE_ES_SSL_JKS_PATH:"../es_keystore.jks"}
#    trustStorePass: ${SW_SW_STORAGE_ES_SSL_JKS_PASS:""}
#    user: ${SW_ES_USER:""}
#    password: ${SW_ES_PASSWORD:""}
#    indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:2}
#    indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:0}
#    # Those data TTL settings will override the same settings in core module.
#    recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
#    otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
#    monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
#    # Batch process setting, refer to https://www.elastic.co/guide/en/elasticsearch/client/java-api/5.5/java-docs-bulk-processor.html
#    bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the bulk every 1000 requests
#    flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests
#    concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests
#    resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000}
#    metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000}
#    segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200}
  h2:
    driver: ${SW_STORAGE_H2_DRIVER:org.h2.jdbcx.JdbcDataSource}
    url: ${SW_STORAGE_H2_URL:jdbc:h2:mem:skywalking-oap-db}
    user: ${SW_STORAGE_H2_USER:sa}
    metadataQueryMaxSize: ${SW_STORAGE_H2_QUERY_MAX_SIZE:5000}
#  mysql:
#    properties:
#      jdbcUrl: ${SW_JDBC_URL:"jdbc:mysql://localhost:3306/swtest"}
#      dataSource.user: ${SW_DATA_SOURCE_USER:root}
#      dataSource.password: ${SW_DATA_SOURCE_PASSWORD:root@1234}
#      dataSource.cachePrepStmts: ${SW_DATA_SOURCE_CACHE_PREP_STMTS:true}
#      dataSource.prepStmtCacheSize: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_SIZE:250}
#      dataSource.prepStmtCacheSqlLimit: ${SW_DATA_SOURCE_PREP_STMT_CACHE_SQL_LIMIT:2048}
#      dataSource.useServerPrepStmts: ${SW_DATA_SOURCE_USE_SERVER_PREP_STMTS:true}
#    metadataQueryMaxSize: ${SW_STORAGE_MYSQL_QUERY_MAX_SIZE:5000}

2.2.3 Collector Receiver

Receiver是Skywalking在6.x提出的新的概念,负责从被监控的系统中接受指标数据。用户完全可以参照OpenTracing规范来上传自定义的监控数据。Skywalking官方提供了service-mesh、istio、zipkin的相关能力。

receiver-register:
  default:
receiver-trace:
  default:
    bufferPath: ${SW_RECEIVER_BUFFER_PATH:../trace-buffer/}  # Path to trace buffer files, suggest to use absolute path
    bufferOffsetMaxFileSize: ${SW_RECEIVER_BUFFER_OFFSET_MAX_FILE_SIZE:100} # Unit is MB
    bufferDataMaxFileSize: ${SW_RECEIVER_BUFFER_DATA_MAX_FILE_SIZE:500} # Unit is MB
    bufferFileCleanWhenRestart: ${SW_RECEIVER_BUFFER_FILE_CLEAN_WHEN_RESTART:false}
    sampleRate: ${SW_TRACE_SAMPLE_RATE:10000} # The sample rate precision is 1/10000. 10000 means 100% sample in default.
    slowDBAccessThreshold: ${SW_SLOW_DB_THRESHOLD:default:200,mongodb:100} # The slow database access thresholds. Unit ms.
receiver-jvm:
  default:

现在Skywalking支持服务端采样,配置项为sampleRate,比例采样,如果配置为5000则采样率就是50%。

三、业务调用链路监控

3.1 网络拓扑

调用链路监控可以从两个角度去看待。我们先从整体上来认识一下我们所监控的系统。

通过给服务添加探针并产生实际的调用之后,我们可以通过Skywalking的前端UI查看服务之间的调用关系。我们简单模拟一次服务之间的调用。我本地搭建了网关、用户、认证3个服务,服务之间简单的通过Feign Client 来模拟远程调用。

image.png

从图中可以看到:
有3个服务节点:pscp-user& pscp-uaa & pscp-zuul-gateway
有一个数据库节点:localhost【mysql】
一个系统的拓扑图让我们清晰的认识到系统之间的应用的依赖关系以及当前状态下的业务流转流程。
点击节点,可以看到更详细的信息


image.png

3.2 全局监测

image.png

3.3 全链路性能追踪

image.png

四、安装配置

  • 下载
    地址:http://skywalking.apache.org/zh/downloads/
  • 解压
    下载解压后目录如下


    image.png
  • 启动
    不需要修改配置文件,在bin目录下执行startup.bat或startup.sh即可启动服务:
    执行startup.bat之后会启动如下两个服务:
    (1)Skywalking-Collector:追踪信息收集器,通过 gRPC/Http 收集客户端的采集信息 ,Http默认端口 12800,gRPC默认端口 11800。
    (2)Skywalking-Webapp:管理平台页面 默认端口 8080,登录信息 admin/admin

五、Agent 使用示例

Skywalking 采用 Java 探针技术,对客户端应用程序没有任何代码侵入,使用起来简单方便,当然其具体实现就是需要针对不同的框架及服务提供探针插件。
使用命令:

  • 如果是eclipse直接加入以下信息
-javaagent:F:/code/pscp-platform/pscp-tool/apache-skywalking-apm-bin-es7/agent/skywalking-agent.jar -Dskywalking.agent.service_name=pscp-user

image.png
  • 如果是java -jar启动,将上面的内容放到java -jar之间

  • Tomcat 7 启动 修改tomcat/bin/catalina.sh,在首行加入如下信息
    CATALINA_OPTS="$CATALINA_OPTS -javaagent:/path/to/skywalking-agent/skywalking-agent.jar"; export CATALINA_OPTS

  • Tomcat 8 修改tomcat/bin/catalina.sh,在首行加入如下信息
    set "CATALINA_OPTS=... -javaagent:E:\apache-tomcat-8.5.20\skywalking-agent\skywalking-agent.jar"

你可能感兴趣的:(SkyWalking)