全栈程序猿

SpringCloud微服务实战——搭建企业级开发框架（三十八）：搭建ELK日志采集与分析系统

一套好的日志分析系统可以详细记录系统的运行情况，方便我们定位分析系统性能瓶颈、查找定位系统问题。上一篇说明了日志的多种业务场景以及日志记录的实现方式，那么日志记录下来，相关人员就需要对日志数据进行处理与分析，基于E(ElasticSearch)L(Logstash)K(Kibana)组合的日志分析系统可以说是目前各家公司普遍的首选方案。

Elasticsearch: 分布式、RESTful 风格的搜索和数据分析引擎，可快速存储、搜索、分析海量的数据。在ELK中用于存储所有日志数据。
Logstash：开源的数据采集引擎，具有实时管道传输功能。Logstash 能够将来自单独数据源的数据动态集中到一起，对这些数据加以标准化并传输到您所选的地方。在ELK中用于将采集到的日志数据进行处理、转换然后存储到Elasticsearch。
Kibana：免费且开放的用户界面，能够让您对 Elasticsearch 数据进行可视化，并让您在 Elastic Stack 中进行导航。您可以进行各种操作，从跟踪查询负载，到理解请求如何流经您的整个应用，都能轻松完成。在ELK中用于通过界面展示存储在Elasticsearch中的日志数据。

作为微服务集群，必须要考虑当微服务访问量暴增时的高并发场景，此时系统的日志数据同样是爆发式增长，我们需要通过消息队列做流量削峰处理，Logstash官方提供Redis、Kafka、RabbitMQ等输入插件。Redis虽然可以用作消息队列，但其各项功能显示不如单一实现的消息队列，所以通常情况下并不使用它的消息队列功能；Kafka的性能要优于RabbitMQ，通常在日志采集，数据采集时使用较多，所以这里我们采用Kafka实现消息队列功能。
ELK日志分析系统中，数据传输、数据保存、数据展示、流量削峰功能都有了，还少一个组件，就是日志数据的采集，虽然log4j2可以将日志数据发送到Kafka，甚至可以将日志直接输入到Logstash，但是基于系统设计解耦的考虑，业务系统运行不会影响到日志分析系统，同时日志分析系统也不会影响到业务系统，所以，业务只需将日志记录下来，然后由日志分析系统去采集分析即可，Filebeat是ELK日志系统中常用的日志采集器，它是 Elastic Stack 的一部分，因此能够与 Logstash、Elasticsearch 和 Kibana 无缝协作。

Kafka：高吞吐量的分布式发布订阅消息队列，主要应用于大数据的实时处理。
Filebeat: 轻量型日志采集器。在 Kubernetes、Docker 或云端部署中部署 Filebeat，即可获得所有的日志流：信息十分完整，包括日志流的 pod、容器、节点、VM、主机以及自动关联时用到的其他元数据。此外，Beats Autodiscover 功能可检测到新容器，并使用恰当的 Filebeat 模块对这些容器进行自适应监测。

软件下载：

因经常遇到在内网搭建环境的问题，所以这里习惯使用下载软件包的方式进行安装，虽没有使用Yum、Docker等安装方便，但是可以对软件目录、配置信息等有更深的了解，在后续采用Yum、Docker等方式安装时，也能清楚安装了哪些东西，安装配置的文件是怎样的，即使出现问题，也可以快速的定位解决。

Elastic Stack全家桶下载主页： https://www.elastic.co/cn/downloads/

我们选择如下版本：

Elasticsearch8.0.0，下载地址：https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.0.0-linux-x86_64.tar.gz
Logstash8.0.0，下载地址：https://artifacts.elastic.co/downloads/logstash/logstash-8.0.0-linux-x86_64.tar.gz
Kibana8.0.0，下载地址：https://artifacts.elastic.co/downloads/kibana/kibana-8.0.0-linux-x86_64.tar.gz
Filebeat8.0.0，下载地址：https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.0.0-linux-x86_64.tar.gz

Kafka下载：

Kafka3.1.0，下载地址：https://dlcdn.apache.org/kafka/3.1.0/kafka_2.13-3.1.0.tgz

安装配置：

安装前先准备好三台CentOS7服务器用于集群安装，这是IP地址为：172.16.20.220、172.16.20.221、172.16.20.222，然后将上面下载的软件包上传至三台服务器的/usr/local目录。因服务器资源有限，这里所有的软件都安装在这三台集群服务器上，在实际生产环境中，请根据业务需求设计规划进行安装。
在集群搭建时，如果能够编写shell安装脚本就会很方便，如果不能编写，就需要在每台服务器上执行安装命令，多数ssh客户端提供了多会话同时输入的功能，这里一些通用安装命令可以选择启用该功能。

一、安装Elasticsearch集群

1、Elasticsearch是使用Java语言开发的，所以需要在环境上安装jdk并配置环境变量。

下载jdk软件包安装，https://www.oracle.com/java/technologies/downloads/#java8

新建/usr/local/java目录

mkdir /usr/local/java

将下载的jdk软件包jdk-8u64-linux-x64.tar.gz上传到/usr/local/java目录，然后解压

tar -zxvf jdk-8u77-linux-x64.tar.gz

配置环境变量/etc/profile

vi /etc/profile

在底部添加以下内容

JAVA_HOME=/usr/local/java/jdk1.8.0_64
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=$JAVA_HOME/jre/lib/ext:$JAVA_HOME/lib/tools.jar
export PATH JAVA_HOME CLASSPATH

使环境变量生效

source /etc/profile

另外一种十分快捷的方式，如果不是内网环境，可以直接使用命令行安装，这里安装的是免费版本的openjdk

yum install java-1.8.0-openjdk* -y

2、安装配置Elasticsearch

进入/usr/local目录，解压Elasticsearch安装包，请确保执行命令前已将环境准备时的Elasticsearch安装包上传至该目录。

tar -zxvf elasticsearch-8.0.0-linux-x86_64.tar.gz

重命名文件夹

mv elasticsearch-8.0.0 elasticsearch

elasticsearch不能使用root用户运行，这里创建运行elasticsearch的用户组和用户

# 创建用户组
groupadd elasticsearch
# 创建用户并添加至用户组
useradd elasticsearch -g elasticsearch
# 更改elasticsearch密码，设置一个自己需要的密码，这里设置为和用户名一样：El12345678
passwd elasticsearch

新建elasticsearch数据和日志存放目录，并给elasticsearch用户赋权限

mkdir -p /data/elasticsearch/data
mkdir -p /data/elasticsearch/log
chown -R elasticsearch:elasticsearch /data/elasticsearch/*
chown -R elasticsearch:elasticsearch /usr/local/elasticsearch/*

elasticsearch默认启用了x-pack，集群通信需要进行安全认证，所以这里需要用到SSL证书。注意：这里生成证书的命令只在一台服务器上执行，执行之后copy到另外两台服务器的相同目录下。

# 提示输入密码时，直接回车
./elasticsearch-certutil ca -out /usr/local/elasticsearch/config/elastic-stack-ca.p12

# 提示输入密码时，直接回车
./elasticsearch-certutil cert --ca /usr/local/elasticsearch/config/elastic-stack-ca.p12 -out /usr/local/elasticsearch/config/elastic-certificates.p12 -pass ""
# 如果使用root用户生成的证书，记得给elasticsearch用户赋权限
chown -R elasticsearch:elasticsearch /usr/local/elasticsearch/config/elastic-certificates.p12

设置密码，这里在出现输入密码时，所有的都是输入的123456

./elasticsearch-setup-passwords interactive

Enter password for [elastic]: 
Reenter password for [elastic]: 
Enter password for [apm_system]: 
Reenter password for [apm_system]: 
Enter password for [kibana_system]: 
Reenter password for [kibana_system]: 
Enter password for [logstash_system]: 
Reenter password for [logstash_system]: 
Enter password for [beats_system]: 
Reenter password for [beats_system]: 
Enter password for [remote_monitoring_user]: 
Reenter password for [remote_monitoring_user]: 
Changed password for user [apm_system]
Changed password for user [kibana_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]

修改elasticsearch配置文件

vi /usr/local/elasticsearch/config/elasticsearch.yml

# 修改配置
# 集群名称
cluster.name: log-elasticsearch
# 节点名称
node.name: node-1
# 数据存放路径
path.data: /data/elasticsearch/data
 # 日志存放路径
path.logs: /data/elasticsearch/log
# 当前节点IP
network.host: 192.168.60.201
 # 对外端口
http.port: 9200
# 集群ip
discovery.seed_hosts: ["172.16.20.220", "172.16.20.221", "172.16.20.222"]
# 初始主节点
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]
# 新增配置
# 集群端口
transport.tcp.port: 9300
transport.tcp.compress: true

http.cors.enabled: true
http.cors.allow-origin: "*" 
http.cors.allow-methods: OPTIONS, HEAD, GET, POST, PUT, DELETE
http.cors.allow-headers: "X-Requested-With, Content-Type, Content-Length, X-User"

xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: elastic-certificates.p12

配置Elasticsearch的JVM参数

vi /usr/local/elasticsearch/config/jvm.options

-Xms1g
-Xmx1g

修改Linux默认资源限制数

vi /etc/security/limits.conf

# 在最后加入，修改完成后，重启系统生效。
*                soft    nofile          131072
*                hard   nofile          131072

vi /etc/sysctl.conf
# 将值vm.max_map_count值修改为655360
vm.max_map_count=655360
# 使配置生效
sysctl -p

切换用户启动服务

su elasticsearch
cd /usr/local/elasticsearch/bin
# 控制台启动命令，可以看到具体报错信息
./elasticsearch

访问我们的服务器地址和端口,可以看到，服务已启动：
http://172.16.20.220:9200/
http://172.16.20.221:9200/
http://172.16.20.222:9200/
正常运行没有问题后，Ctrl+c关闭服务，然后使用后台启动命令

./elasticsearch -d

备注：后续可通过此命令停止elasticsearch运行

# 查看进程id
ps -ef | grep elastic
# 关闭进程
kill -9 1376(进程id)

3、安装ElasticSearch界面管理插件elasticsearch-head，只需要在一台服务器上安装即可，这里我们安装到172.16.20.220服务器上

配置nodejs环境
下载地址: (https://nodejs.org/dist/v16.14.0/node-v16.14.0-linux-x64.tar.xz)[https://nodejs.org/dist/v16.14.0/node-v16.14.0-linux-x64.tar.xz]，将node-v16.14.0-linux-x64.tar.xz上传到服务器172.16.20.220的/usr/local目录

# 解压
tar -xvJf node-v16.14.0-linux-x64.tar.xz
# 重命名
mv node-v16.14.0-linux-x64 nodejs
# 配置环境变量
vi /etc/profile
# 新增以下内容
export NODE_HOME=/usr/local/nodejs
PATH=$JAVA_HOME/bin:$NODE_HOME/bin:/usr/local/mysql/bin:/usr/local/subversion/bin:$PATH
export PATH JAVA_HOME NODE_HOME JENKINS_HOME CLASSPATH
# 使配置生效
source /etc/profile
# 测试是否配置成功
node -v

配置elasticsearch-head
项目开源地址：https://github.com/mobz/elasticsearch-head
zip包下载地址：https://github.com/mobz/elasticsearch-head/archive/master.zip
下载后上传至172.16.20.220的/usr/local目录，然后进行解压安装

# 解压
unzip elasticsearch-head-master.zip
# 重命名
mv elasticsearch-head-master elasticsearch-head
# 进入到elasticsearch-head目录
cd elasticsearch-head
#切换软件源，可以提升安装速度
npm config set registry https://registry.npm.taobao.org
# 执行安装命令
npm install -g [email protected]
npm install [email protected] --ignore-scripts
npm install
# 启动命令
npm run start

浏览器访问http://172.16.20.220:9100/?auth_user=elastic&auth_password=123456 ，需要加上我们上面设置的用户名密码，就可以看到我们的Elasticsearch集群状态了。

二、安装Kafka集群

环境准备：

新建kafka的日志目录和zookeeper数据目录，因为这两项默认放在tmp目录，而tmp目录中内容会随重启而丢失,所以我们自定义以下目录:

 mkdir /data/zookeeper
 mkdir /data/zookeeper/data
 mkdir /data/zookeeper/logs

 mkdir /data/kafka
 mkdir /data/kafka/data
 mkdir /data/kafka/logs

zookeeper.properties配置

vi /usr/local/kafka/config/zookeeper.properties

修改如下：

# 修改为自定义的zookeeper数据目录
dataDir=/data/zookeeper/data

# 修改为自定义的zookeeper日志目录
dataLogDir=/data/zookeeper/logs

# 端口
clientPort=2181

# 注释掉
#maxClientCnxns=0

# 设置连接参数，添加如下配置
# 为zk的基本时间单元，毫秒
tickTime=2000
# Leader-Follower初始通信时限 tickTime*10
initLimit=10
# Leader-Follower同步通信时限 tickTime*5
syncLimit=5

# 设置broker Id的服务地址，本机ip一定要用0.0.0.0代替
server.1=0.0.0.0:2888:3888
server.2=172.16.20.221:2888:3888
server.3=172.16.20.222:2888:3888

在各台服务器的zookeeper数据目录/data/zookeeper/data添加myid文件，写入服务broker.id属性值

在data文件夹中新建myid文件，myid文件的内容为1（一句话创建：echo 1 > myid）

cd /data/zookeeper/data

vi myid

#添加内容：1 其他两台主机分别配置 2和3
1

kafka配置，进入config目录下，修改server.properties文件

vi /usr/local/kafka/config/server.properties

# 每台服务器的broker.id都不能相同
broker.id=1
# 是否可以删除topic
delete.topic.enable=true
# topic 在当前broker上的分片个数，与broker保持一致
num.partitions=3
# 每个主机地址不一样：
listeners=PLAINTEXT://172.16.20.220:9092
advertised.listeners=PLAINTEXT://172.16.20.220:9092
# 具体一些参数
log.dirs=/data/kafka/kafka-logs
# 设置zookeeper集群地址与端口如下：
zookeeper.connect=172.16.20.220:2181,172.16.20.221:2181,172.16.20.222:2181

Kafka启动

kafka启动时先启动zookeeper，再启动kafka；关闭时相反，先关闭kafka，再关闭zookeeper。
1、zookeeper启动命令

./zookeeper-server-start.sh ../config/zookeeper.properties &

后台运行启动命令：

nohup ./zookeeper-server-start.sh ../config/zookeeper.properties >/data/zookeeper/logs/zookeeper.log 2>1 &

或者

./zookeeper-server-start.sh -daemon ../config/zookeeper.properties &

查看集群状态：

./zookeeper-server-start.sh status ../config/zookeeper.properties

2、kafka启动命令

./kafka-server-start.sh ../config/server.properties &

后台运行启动命令：

nohup bin/kafka-server-start.sh ../config/server.properties >/data/kafka/logs/kafka.log 2>1 &

或者

 ./kafka-server-start.sh -daemon ../config/server.properties &

3、创建topic，最新版本已经不需要使用zookeeper参数创建。

./kafka-topics.sh --create --replication-factor 2 --partitions 1 --topic test --bootstrap-server 172.16.20.220:9092

参数解释:
复制两份
　　--replication-factor 2
创建1个分区
　　--partitions 1
topic 名称
　　--topic test

4、查看已经存在的topic（三台设备都执行时可以看到）

./kafka-topics.sh --list --bootstrap-server 172.16.20.220:9092

5、启动生产者：

./kafka-console-producer.sh --broker-list 172.16.20.220:9092 --topic test

6、启动消费者：

./kafka-console-consumer.sh --bootstrap-server 172.16.20.221:9092 --topic test
./kafka-console-consumer.sh --bootstrap-server 172.16.20.222:9092 --topic test

添加参数 --from-beginning 从开始位置消费，不是从最新消息

./kafka-console-consumer.sh --bootstrap-server 172.16.20.221 --topic test --from-beginning

7、测试：在生产者输入test，可以在消费者的两台服务器上看到同样的字符test，说明Kafka服务器集群已搭建成功。

三、安装配置Logstash

Logstash没有提供集群安装方式，相互之间并没有交互，但是我们可以配置同属一个Kafka消费者组，来实现统一消息只消费一次的功能。

解压安装包

tar -zxvf logstash-8.0.0-linux-x86_64.tar.gz
mv logstash-8.0.0 logstash

配置kafka主题和组

cd logstash
# 新建配置文件
vi logstash-kafka.conf
# 新增以下内容
input {
  kafka {
    codec => "json"
    group_id => "logstash"
    client_id => "logstash-api"
    topics_pattern => "api_log"
    type => "api"
    bootstrap_servers => "172.16.20.220:9092,172.16.20.221:9092,172.16.20.222:9092"
    auto_offset_reset => "latest"
  }
  kafka {
    codec => "json"
    group_id => "logstash"
    client_id => "logstash-operation"
    topics_pattern => "operation_log"
    type => "operation"
    bootstrap_servers => "172.16.20.220:9092,172.16.20.221:9092,172.16.20.222:9092"
    auto_offset_reset => "latest"
  }
  kafka {
    codec => "json"
    group_id => "logstash"
    client_id => "logstash-debugger"
    topics_pattern => "debugger_log"
    type => "debugger"
    bootstrap_servers => "172.16.20.220:9092,172.16.20.221:9092,172.16.20.222:9092"
    auto_offset_reset => "latest"
  }
  kafka {
    codec => "json"
    group_id => "logstash"
    client_id => "logstash-nginx"
    topics_pattern => "nginx_log"
    type => "nginx"
    bootstrap_servers => "172.16.20.220:9092,172.16.20.221:9092,172.16.20.222:9092"
    auto_offset_reset => "latest"
  }
}
output {
 if [type] == "api"{
  elasticsearch {
    hosts => ["172.16.20.220:9200","172.16.20.221:9200","172.16.20.222:9200"]
    index => "logstash_api-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "123456"
  }
 }
 if [type] == "operation"{
  elasticsearch {
    hosts => ["172.16.20.220:9200","172.16.20.221:9200","172.16.20.222:9200"]
    index => "logstash_operation-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "123456"
  }
 }
 if [type] == "debugger"{
  elasticsearch {
    hosts => ["172.16.20.220:9200","172.16.20.221:9200","172.16.20.222:9200"]
    index => "logstash_operation-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "123456"
  }
 }
 if [type] == "nginx"{
  elasticsearch {
    hosts => ["172.16.20.220:9200","172.16.20.221:9200","172.16.20.222:9200"]
    index => "logstash_operation-%{+YYYY.MM.dd}"
    user => "elastic"
    password => "123456"
  }
 }
}

启动logstash

# 切换到bin目录
cd /usr/local/logstash/bin
# 启动命令
nohup ./logstash -f ../config/logstash-kafka.conf &
#查看启动日志
tail -f nohup.out

四、安装配置Kibana

解压安装文件

tar -zxvf kibana-8.0.0-linux-x86_64.tar.gz

mv kibana-8.0.0 kibana

修改配置文件

cd /usr/local/kibana/config
vi kibana.yml
# 修改以下内容
server.port: 5601
server.host: "172.16.20.220"
elasticsearch.hosts: ["http://172.16.20.220:9200","http://172.16.20.221:9200","http://172.16.20.222:9200"]
elasticsearch.username: "kibana_system"
elasticsearch.password: "123456"

启动服务

cd /usr/local/kibana/bin
# 默认不允许使用root运行，可以添加 --allow-root 参数使用root用户运行，也可以跟Elasticsearch一样新增一个用户组用户
nohup ./kibana --allow-root &

访问http://172.16.20.220:5601/，并使用elastic / 123456登录。

五、安装Filebeat

Filebeat用于安装在业务软件运行服务器，收集业务产生的日志，并推送到我们配置的Kafka、Redis、RabbitMQ等消息中间件，或者直接保存到Elasticsearch，下面来讲解如何安装配置：

1、进入到/usr/local目录，执行解压命令

tar -zxvf filebeat-8.0.0-linux-x86_64.tar.gz

mv filebeat-8.0.0-linux-x86_64 filebeat

2、编辑配置filebeat.yml
配置文件中默认是输出到elasticsearch，这里我们改为kafka，同文件目录下的filebeat.reference.yml文件是所有配置的实例，可以直接将kafka的配置复制到filebeat.yml

配置采集开关和采集路径：

# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  # enable改为true
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  # 修改微服务日志的实际路径
  paths:
    - /data/gitegg/log/gitegg-service-system/*.log
    - /data/gitegg/log/gitegg-service-base/*.log
    - /data/gitegg/log/gitegg-service-oauth/*.log
    - /data/gitegg/log/gitegg-service-gateway/*.log
    - /data/gitegg/log/gitegg-service-extension/*.log
    - /data/gitegg/log/gitegg-service-bigdata/*.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  #fields:
  #  level: debug
  #  review: 1

Elasticsearch 模板配置

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 3
  index.number_of_replicas: 1
  #index.codec: best_compression
  #_source.enabled: false

# 允许自动生成index模板
setup.template.enabled: true
# # 生成index模板时字段配置文件
setup.template.fields: fields.yml
# # 如果存在模块则覆盖
setup.template.overwrite: true
# # 生成index模板的名称
setup.template.name: "api_log" 
# # 生成index模板匹配的index格式 
setup.template.pattern: "api-*" 
#索引生命周期管理ilm功能默认开启，开启的情况下索引名称只能为filebeat-*， 通过setup.ilm.enabled: false进行关闭；
setup.ilm.pattern: "{now/d}"
setup.ilm.enabled: false

开启仪表盘并配置使用Kibana仪表盘：

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
setup.dashboards.enabled: true

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:
# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "172.16.20.220:5601"

  # Kibana Space ID
  # ID of the Kibana Space into which the dashboards should be loaded. By default,
  # the Default Space will be used.
  #space.id:

配置输出到Kafka，完整的filebeat.yml如下

###################### Filebeat Configuration Example #########################

# This file is an example configuration file highlighting only the most common
# options. The filebeat.reference.yml file from the same directory contains all the
# supported options with more comments. You can use it as a reference.
#
# You can find the full configuration reference here:
# https://www.elastic.co/guide/en/beats/filebeat/index.html

# For more available modules and options, please see the filebeat.reference.yml sample
# configuration file.

# ============================== Filebeat inputs ===============================

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /data/gitegg/log/*/*operation.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
    topic: operation_log
  #  level: debug
  #  review: 1
# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /data/gitegg/log/*/*api.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
    topic: api_log
  #  level: debug
  #  review: 1
# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /data/gitegg/log/*/*debug.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
    topic: debugger_log
  #  level: debug
  #  review: 1
# filestream is an input for collecting log messages from files.
- type: filestream

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /usr/local/nginx/logs/access.log
    #- c:\programdata\elasticsearch\logs\*

  # Exclude lines. A list of regular expressions to match. It drops the lines that are
  # matching any regular expression from the list.
  #exclude_lines: ['^DBG']

  # Include lines. A list of regular expressions to match. It exports the lines that are
  # matching any regular expression from the list.
  #include_lines: ['^ERR', '^WARN']

  # Exclude files. A list of regular expressions to match. Filebeat drops the files that
  # are matching any regular expression from the list. By default, no files are dropped.
  #prospector.scanner.exclude_files: ['.gz$']

  # Optional additional fields. These fields can be freely picked
  # to add additional information to the crawled log files for filtering
  fields:
    topic: nginx_log
  #  level: debug
  #  review: 1

# ============================== Filebeat modules ==============================

filebeat.config.modules:
  # Glob pattern for configuration loading
  path: ${path.config}/modules.d/*.yml

  # Set to true to enable config reloading
  reload.enabled: false

  # Period on which files under path should be checked for changes
  #reload.period: 10s

# ======================= Elasticsearch template setting =======================

setup.template.settings:
  index.number_of_shards: 3
  index.number_of_replicas: 1
  #index.codec: best_compression
  #_source.enabled: false

# 允许自动生成index模板
setup.template.enabled: true
# # 生成index模板时字段配置文件
setup.template.fields: fields.yml
# # 如果存在模块则覆盖
setup.template.overwrite: true
# # 生成index模板的名称
setup.template.name: "gitegg_log" 
# # 生成index模板匹配的index格式 
setup.template.pattern: "filebeat-*" 
#索引生命周期管理ilm功能默认开启，开启的情况下索引名称只能为filebeat-*， 通过setup.ilm.enabled: false进行关闭；
setup.ilm.pattern: "{now/d}"
setup.ilm.enabled: false

# ================================== General ===================================

# The name of the shipper that publishes the network data. It can be used to group
# all the transactions sent by a single shipper in the web interface.
#name:

# The tags of the shipper are included in their own field with each
# transaction published.
#tags: ["service-X", "web-tier"]

# Optional fields that you can specify to add additional information to the
# output.
#fields:
#  env: staging 

# ================================= Dashboards =================================
# These settings control loading the sample dashboards to the Kibana index. Loading
# the dashboards is disabled by default and can be enabled either by setting the
# options here or by using the `setup` command.
setup.dashboards.enabled: true

# The URL from where to download the dashboards archive. By default this URL
# has a value which is computed based on the Beat name and version. For released
# versions, this URL points to the dashboard archive on the artifacts.elastic.co
# website.
#setup.dashboards.url:

# =================================== Kibana ===================================

# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
setup.kibana:

  # Kibana Host
  # Scheme and port can be left out and will be set to the default (http and 5601)
  # In case you specify and additional path, the scheme is required: http://localhost:5601/path
  # IPv6 addresses should always be defined as: https://[2001:db8::1]:5601
  host: "172.16.20.220:5601"

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  username: "elastic"
  password: "123456"

  # Optional HTTP path
  #path: ""

  # Optional Kibana space ID.
  #space.id: ""

  # Custom HTTP headers to add to each request
  #headers:
  #  X-My-Header: Contents of the header

  # Use SSL settings for HTTPS.
  #ssl.enabled: true

# =============================== Elastic Cloud ================================

# These settings simplify using Filebeat with the Elastic Cloud (https://cloud.elastic.co/).

# The cloud.id setting overwrites the `output.elasticsearch.hosts` and
# `setup.kibana.host` options.
# You can find the `cloud.id` in the Elastic Cloud web UI.
#cloud.id:

# The cloud.auth setting overwrites the `output.elasticsearch.username` and
# `output.elasticsearch.password` settings. The format is `:`.
#cloud.auth:

# ================================== Outputs ===================================

# Configure what output to use when sending the data collected by the beat.

# ---------------------------- Elasticsearch Output ----------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  hosts: ["localhost:9200"]

  # Protocol - either `http` (default) or `https`.
  #protocol: "https"

  # Authentication credentials - either API key or username/password.
  #api_key: "id:api_key"
  #username: "elastic"
  #password: "changeme"

# ------------------------------ Logstash Output -------------------------------
#output.logstash:
  # The Logstash hosts
  #hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"
# -------------------------------- Kafka Output --------------------------------
output.kafka:
  # Boolean flag to enable or disable the output module.
  enabled: true

  # The list of Kafka broker addresses from which to fetch the cluster metadata.
  # The cluster metadata contain the actual Kafka brokers events are published
  # to.
  hosts: ["172.16.20.220:9092","172.16.20.221:9092","172.16.20.222:9092"]

  # The Kafka topic used for produced events. The setting can be a format string
  # using any event field. To set the topic from document type use `%{[type]}`.
  topic: '%{[fields.topic]}'

  # The Kafka event key setting. Use format string to create a unique event key.
  # By default no event key will be generated.
  #key: ''

  # The Kafka event partitioning strategy. Default hashing strategy is `hash`
  # using the `output.kafka.key` setting or randomly distributes events if
  # `output.kafka.key` is not configured.
  partition.hash:
    # If enabled, events will only be published to partitions with reachable
    # leaders. Default is false.
    reachable_only: true

    # Configure alternative event field names used to compute the hash value.
    # If empty `output.kafka.key` setting will be used.
    # Default value is empty list.
    #hash: []

  # Authentication details. Password is required if username is set.
  #username: ''
  #password: ''

  # SASL authentication mechanism used. Can be one of PLAIN, SCRAM-SHA-256 or SCRAM-SHA-512.
  # Defaults to PLAIN when `username` and `password` are configured.
  #sasl.mechanism: ''

  # Kafka version Filebeat is assumed to run against. Defaults to the "1.0.0".
  #version: '1.0.0'

  # Configure JSON encoding
  #codec.json:
    # Pretty-print JSON event
    #pretty: false

    # Configure escaping HTML symbols in strings.
    #escape_html: false

  # Metadata update configuration. Metadata contains leader information
  # used to decide which broker to use when publishing.
  #metadata:
    # Max metadata request retry attempts when cluster is in middle of leader
    # election. Defaults to 3 retries.
    #retry.max: 3

    # Wait time between retries during leader elections. Default is 250ms.
    #retry.backoff: 250ms

    # Refresh metadata interval. Defaults to every 10 minutes.
    #refresh_frequency: 10m

    # Strategy for fetching the topics metadata from the broker. Default is false.
    #full: false

  # The number of concurrent load-balanced Kafka output workers.
  #worker: 1

  # The number of times to retry publishing an event after a publishing failure.
  # After the specified number of retries, events are typically dropped.
  # Some Beats, such as Filebeat, ignore the max_retries setting and retry until
  # all events are published.  Set max_retries to a value less than 0 to retry
  # until all events are published. The default is 3.
  #max_retries: 3

  # The number of seconds to wait before trying to republish to Kafka
  # after a network error. After waiting backoff.init seconds, the Beat
  # tries to republish. If the attempt fails, the backoff timer is increased
  # exponentially up to backoff.max. After a successful publish, the backoff
  # timer is reset. The default is 1s.
  #backoff.init: 1s

  # The maximum number of seconds to wait before attempting to republish to
  # Kafka after a network error. The default is 60s.
  #backoff.max: 60s

  # The maximum number of events to bulk in a single Kafka request. The default
  # is 2048.
  #bulk_max_size: 2048

  # Duration to wait before sending bulk Kafka request. 0 is no delay. The default
  # is 0.
  #bulk_flush_frequency: 0s

  # The number of seconds to wait for responses from the Kafka brokers before
  # timing out. The default is 30s.
  #timeout: 30s

  # The maximum duration a broker will wait for number of required ACKs. The
  # default is 10s.
  #broker_timeout: 10s

  # The number of messages buffered for each Kafka broker. The default is 256.
  #channel_buffer_size: 256

  # The keep-alive period for an active network connection. If 0s, keep-alives
  # are disabled. The default is 0 seconds.
  #keep_alive: 0

  # Sets the output compression codec. Must be one of none, snappy and gzip. The
  # default is gzip.
  compression: gzip

  # Set the compression level. Currently only gzip provides a compression level
  # between 0 and 9. The default value is chosen by the compression algorithm.
  #compression_level: 4

  # The maximum permitted size of JSON-encoded messages. Bigger messages will be
  # dropped. The default value is 1000000 (bytes). This value should be equal to
  # or less than the broker's message.max.bytes.
  max_message_bytes: 1000000

  # The ACK reliability level required from broker. 0=no response, 1=wait for
  # local commit, -1=wait for all replicas to commit. The default is 1.  Note:
  # If set to 0, no ACKs are returned by Kafka. Messages might be lost silently
  # on error.
  required_acks: 1

  # The configurable ClientID used for logging, debugging, and auditing
  # purposes.  The default is "beats".
  #client_id: beats

  # Use SSL settings for HTTPS.
  #ssl.enabled: true

  # Controls the verification of certificates. Valid values are:
  # * full, which verifies that the provided certificate is signed by a trusted
  # authority (CA) and also verifies that the server's hostname (or IP address)
  # matches the names identified within the certificate.
  # * strict, which verifies that the provided certificate is signed by a trusted
  # authority (CA) and also verifies that the server's hostname (or IP address)
  # matches the names identified within the certificate. If the Subject Alternative
  # Name is empty, it returns an error.
  # * certificate, which verifies that the provided certificate is signed by a
  # trusted authority (CA), but does not perform any hostname verification.
  #  * none, which performs no verification of the server's certificate. This
  # mode disables many of the security benefits of SSL/TLS and should only be used
  # after very careful consideration. It is primarily intended as a temporary
  # diagnostic mechanism when attempting to resolve TLS errors; its use in
  # production environments is strongly discouraged.
  # The default value is full.
  #ssl.verification_mode: full

  # List of supported/valid TLS versions. By default all TLS versions from 1.1
  # up to 1.3 are enabled.
  #ssl.supported_protocols: [TLSv1.1, TLSv1.2, TLSv1.3]

  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client certificate key
  #ssl.key: "/etc/pki/client/cert.key"

  # Optional passphrase for decrypting the certificate key.
  #ssl.key_passphrase: ''

  # Configure cipher suites to be used for SSL connections
  #ssl.cipher_suites: []

  # Configure curve types for ECDHE-based cipher suites
  #ssl.curve_types: []

  # Configure what types of renegotiation are supported. Valid options are
  # never, once, and freely. Default is never.
  #ssl.renegotiation: never

  # Configure a pin that can be used to do extra validation of the verified certificate chain,
  # this allow you to ensure that a specific certificate is used to validate the chain of trust.
  #
  # The pin is a base64 encoded string of the SHA-256 fingerprint.
  #ssl.ca_sha256: ""

  # A root CA HEX encoded fingerprint. During the SSL handshake if the
  # fingerprint matches the root CA certificate, it will be added to
  # the provided list of root CAs (`certificate_authorities`), if the
  # list is empty or not defined, the matching certificate will be the
  # only one in the list. Then the normal SSL validation happens.
  #ssl.ca_trusted_fingerprint: ""

  # Enable Kerberos support. Kerberos is automatically enabled if any Kerberos setting is set.
  #kerberos.enabled: true

  # Authentication type to use with Kerberos. Available options: keytab, password.
  #kerberos.auth_type: password

  # Path to the keytab file. It is used when auth_type is set to keytab.
  #kerberos.keytab: /etc/security/keytabs/kafka.keytab

  # Path to the Kerberos configuration.
  #kerberos.config_path: /etc/krb5.conf

  # The service name. Service principal name is contructed from
  # service_name/hostname@realm.
  #kerberos.service_name: kafka

  # Name of the Kerberos user.
  #kerberos.username: elastic

  # Password of the Kerberos user. It is used when auth_type is set to password.
  #kerberos.password: changeme

  # Kerberos realm.
  #kerberos.realm: ELASTIC

  # Enables Kerberos FAST authentication. This may
  # conflict with certain Active Directory configurations.
  #kerberos.enable_krb5_fast: false
# ================================= Processors =================================
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

# ================================== Logging ===================================

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#logging.level: debug

# At debug level, you can selectively enable logging only for some components.
# To enable all selectors use ["*"]. Examples of other selectors are "beat",
# "publisher", "service".
#logging.selectors: ["*"]

# ============================= X-Pack Monitoring ==============================
# Filebeat can export internal metrics to a central Elasticsearch monitoring
# cluster.  This requires xpack monitoring to be enabled in Elasticsearch.  The
# reporting is disabled by default.

# Set to true to enable the monitoring reporter.
#monitoring.enabled: false

# Sets the UUID of the Elasticsearch cluster under which monitoring data for this
# Filebeat instance will appear in the Stack Monitoring UI. If output.elasticsearch
# is enabled, the UUID is derived from the Elasticsearch cluster referenced by output.elasticsearch.
#monitoring.cluster_uuid:

# Uncomment to send the metrics to Elasticsearch. Most settings from the
# Elasticsearch output are accepted here as well.
# Note that the settings should point to your Elasticsearch *monitoring* cluster.
# Any setting that is not set is automatically inherited from the Elasticsearch
# output configuration, so if you have the Elasticsearch output configured such
# that it is pointing to your Elasticsearch monitoring cluster, you can simply
# uncomment the following line.
#monitoring.elasticsearch:

# ============================== Instrumentation ===============================

# Instrumentation support for the filebeat.
#instrumentation:
    # Set to true to enable instrumentation of filebeat.
    #enabled: false

    # Environment in which filebeat is running on (eg: staging, production, etc.)
    #environment: ""

    # APM Server hosts to report instrumentation results to.
    #hosts:
    #  - http://localhost:8200

    # API Key for the APM Server(s).
    # If api_key is set then secret_token will be ignored.
    #api_key:

    # Secret token for the APM Server(s).
    #secret_token:


# ================================= Migration ==================================

# This allows to enable 6.7 migration aliases
#migration.6_to_7.enabled: true

执行filebeat启动命令

 ./filebeat -e -c filebeat.yml

后台启动命令

nohup ./filebeat -e -c filebeat.yml >/dev/null 2>&1 &

停止命令

ps -ef |grep filebeat
kill -9 进程号

六、测试配置是否正确

1、测试filebeat是否能够采集log文件并发送到Kafka

在kafka服务器开启消费者，监听api_log主题和operation_log主题

./kafka-console-consumer.sh --bootstrap-server 172.16.20.221:9092 --topic api_log

./kafka-console-consumer.sh --bootstrap-server 172.16.20.222:9092 --topic operation_log

手动写入日志文件，按照filebeat配置的采集目录写入

echo "api log1111" > /data/gitegg/log/gitegg-service-system/api.log

echo "operation log1111" > /data/gitegg/log/gitegg-service-system/operation.log

观察消费者是消费到日志推送内容

2、测试logstash是消费Kafka的日志主题，并将日志内容存入Elasticsearch

手动写入日志文件

echo "api log8888888888888888888888" > /data/gitegg/log/gitegg-service-system/api.log
echo "operation loggggggggggggggggggg" > /data/gitegg/log/gitegg-service-system/operation.log

打开Elasticsearch Head界面 http://172.16.20.220:9100/?auth_user=elastic&auth_password=123456 ，查询Elasticsearch是否有数据。

自动新增的两个index，规则是logstash中配置的

数据浏览页可以看到Elasticsearch中存储的日志数据内容，说明我们的配置已经生效。

七、配置Kibana用于日志统计和展示

依次点击左侧菜单Management -> Kibana -> Data Views -> Create data view , 输入logstash_* ,选择@timestamp，再点击Create data view按钮，完成创建。
点击日志分析查询菜单Analytics -> Discover，选择logstash_* 进行日志查询

GitEgg-Cloud是一款基于SpringCloud整合搭建的企业级微服务应用开发框架，开源项目地址:

Gitee: https://gitee.com/wmz1930/GitEgg
GitHub: https://github.com/wmz1930/GitEgg

欢迎感兴趣的小伙伴Star支持一下。

你可能感兴趣的:(Maven,SpringCloud,微服务,elk,分布式)

React Query 优化数据获取与缓存策略大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 vim 编辑器 linux 算法机器学习
引言随着前端应用规模与复杂度的不断提升，如何高效地获取、缓存以及同步服务端数据，成为提升用户体验和系统性能的关键课题。ReactQuery（现更名为TanStackQuery）凭借其轻量、灵活、可扩展的设计，已成为React社区管理服务端状态的事实标准库。本文将深入探讨ReactQuery在数据获取与缓存策略上的原理与实践，结合HTTP缓存理论、分布式系统一致性以及响应式编程等多学科知识，呈现一套
微服务网站开发学习路线与RuoYi-Cloud实战指南你喜欢喝可乐吗？ ruoyi-cloud microservices java web 微服务学习运维
微服务网站开发学习路线与RuoYi-Cloud实战指南微服务架构已成为现代网站开发的主流选择，它通过将大型应用拆分为小型自治服务，实现了系统的高内聚、低耦合、独立部署和扩展。掌握微服务开发技能需要系统性学习，从基础概念到技术栈再到实战应用。本文将为您提供从零开始学习微服务的完整路线图，并结合RuoYi-Cloud开源框架进行详细举例，帮助您快速上手微服务网站开发。一、微服务基础概念与架构特点微服务
Vert.x逆袭指南：像外卖小哥一样高效的异步编程哲学 —— 每秒处理百万消息的轻量级响应式引擎 zhysunny Java类库 java 后端
目录一、核心装备：Vert.x工具箱全景1.1灵魂组件：EventLoop（永不堵车的快递站）二、基础订单处理：Future与Promise模式2.1基础异步操作流程2.2并行订单冲刺三、全栈式快餐车：Vert.xWeb实战3.1打造高并发HTTP服务器3.2异步数据库连接池四、连锁加盟模式：Vert.x集群4.1构建分布式披萨联盟五、响应式编程的味觉革命：四大核心优势5.1性能对比实验（单节点）
5.k8s：helm包管理器，prometheus监控，elk，k8s可视化鹏哥哥啊Aaaa 运维 kubernetes 容器云原生
目录一、Helm包管理器1.什么是Helm2.安装Helm（3）Helm常用命令（4）目录结构（5）使用Helm完成redis主从搭建二、Prometheus集群监控1.监控方案2.Prometheus监控k8s三、ELK日志搜集1.elk流程2.配置elk（1）配置es（2）配置logstash（3）配置filebeat，kibana3.kibana使用和日志检索四、k8s可视化管理1.Dash
微服务架构核心技术我是廖志伟 Java场景面试宝典 Service Governance Microservices Distributed Systems
我是廖志伟，一名Java开发工程师、《Java项目实战——深入理解大型互联网企业通用技术》（基础篇）、（进阶篇）、（架构篇）清华大学出版社签约作家、Java领域优质创作者、CSDN博客专家、阿里云专家博主、51CTO专家博主、产品软文专业写手、技术文章评审老师、技术类问卷调查设计师、幕后大佬社区创始人、开源项目贡献者。拥有多年一线研发和团队管理经验，研究过主流框架的底层源码(Spring、Spri
互联网大厂java求职者面试我是廖志伟 Java场景面试宝典 java 八股文面试求职 Java
我是廖志伟，一名Java开发工程师，清华大学出版社签约作家、Java领域优质创作者、CSDN博客专家、阿里云专家博主、51CTO专家博主、产品软文专业写手、技术文章评审老师、技术类问卷调查设计师、幕后大佬社区创始人、开源项目贡献者。拥有多年一线研发和团队管理经验，研究过主流框架的底层源码(Spring、SpringBoot、SpringMVC、SpringCloud、Mybatis、Dubbo、Z
python分布式爬虫打造搜索引擎--------scrapy实现 weixin_30515513 爬虫 python 开发工具
http://www.cnblogs.com/jinxiao-pu/p/6706319.html最近在网上学习一门关于scrapy爬虫的课程，觉得还不错，以下是目录还在更新中，我觉得有必要好好的做下笔记，研究研究。第1章课程介绍1-1python分布式爬虫打造搜索引擎简介07:23第2章windows下搭建开发环境2-1pycharm的安装和简单使用10:272-2mysql和navicat的安装
从0到1：Maven下载安装与配置全攻略
目录一、Maven是什么二、为什么需要Maven2.1依赖管理难题2.2构建过程的复杂性三、下载Maven3.1下载前准备3.2下载步骤四、安装Maven4.1解压安装包4.2移动Maven文件夹（可选，Linux适用）五、配置Maven5.1配置环境变量5.2验证安装5.3配置本地仓库5.4配置远程仓库（可选）六、总结一、Maven是什么Maven是一个跨平台的项目管理工具，主要服务于基于Jav
Java大厂面试实录：从Spring Boot到AI微服务架构的深度技术拷问
第一轮提问面试官：小曾，今天我们主要考察Java后端开发能力，从基础开始。场景：假设你要设计一个电商平台的订单系统，订单量峰值达到每秒1000笔。你会选择哪些技术栈？为什么？场景：订单系统需要高可用，数据库选择MySQL，你会如何优化数据库连接池？场景：订单支付后需要通知库存系统减库存，你会选择哪种消息队列？如何保证消息可靠性？小曾：（搓手）嗯…订单系统，我会用SpringBoot，数据库用MyS
Java全栈面试实录：从Spring Boot到AI大模型，互联网大厂求职者的技术洗礼
**第一轮提问面试官：小曾，先谈谈你在SpringBoot项目中的缓存实践。小曾：我常用Redis，通过@Cacheable注解实现方法缓存，配置了Redis集群模式。面试官：很好！在电商秒杀场景，如果缓存击穿怎么办？小曾：可以用布隆过滤器或互斥锁解决，但具体实现得看业务...面试官：你提到SpringCloud，能说说服务注册选Consul还是Eureka？小曾：Eureka简单，Consul更
Java大厂面试实录：从电商场景到AIGC的深度技术拷问 remCoding Java场景面试宝典 Java面试 Spring Boot Kafka AI 大厂面试微服务
第一轮提问：电商场景与微服务基础面试官：小曾，请描述一个典型的电商秒杀场景，你会如何设计系统架构？涉及哪些关键技术？小曾：秒杀嘛，主要是高并发，我一般会用SpringBoot搭后端，数据库用Redis做缓存，消息队列用Kafka异步处理订单。具体技术细节……呃，好像没细想。面试官（微笑）：“不错，Redis和Kafka选得对。那如果用户请求量超10万/QPS，你会如何扩容？SpringCloud的
Java大厂面试实录：从Spring Boot到AI微服务架构的深度拷问 remCoding Java场景面试宝典 Java面试 Spring Boot Jakarta EE AI微服务 Kafka Spring Cloud AI面试
第一轮提问：电商场景下的高并发架构面试官：小曾，我们公司电商业务面临“双十一”秒杀场景，需要支持百万级并发，你会如何设计系统架构？请结合SpringCloud和消息队列谈谈方案。小曾：（搓手）额……我会用SpringCloudAlibaba，搞个Nacos做服务注册，网关用Zuul，然后订单服务用SpringBoot+Redis缓存，秒杀请求走消息队列，比如Kafka吧，异步处理，降低峰值压力……
Java大厂面试实录：从Spring Boot到AI微服务架构的层层递进 remCoding Java场景面试宝典 Java Spring Boot Spring Cloud AI Kafka Redis Microservices
场景：互联网大厂Java后端面试面试官（严肃）：请简单介绍下你参与过的项目，主要使用哪些技术栈？小曾（自信）：我参与过电商平台的订单系统，用了SpringBoot+SpringCloudAlibaba，数据库是MySQL+Redis缓存，消息队列用Kafka处理异步任务。面试官（点头）：不错，能具体说说订单系统如何应对高并发场景的吗？小曾：我们用了HikariCP优化数据库连接池，Redis集群做
Java大厂面试实录：从Spring Boot到AI微服务架构的全栈技术深度解析 remCoding Java场景面试宝典 Java Spring Boot Spring Cloud AI Kafka Redis Spring Security
场景：互联网大厂Java后端开发面试面试官（严肃）：请先自我介绍，并谈谈你熟悉的技术栈。小曾（略紧张）：我是小曾，毕业于XX大学，擅长Java后端开发，熟悉SpringBoot、SpringCloud、MySQL、Redis等技术。面试官：很好，我们来看第一个场景。假设你要设计一个高并发的电商秒杀系统，你会如何选择技术栈？小曾：秒杀系统对性能要求高，我会用SpringBoot快速搭建，数据库用My
Java大厂面试实录：从Spring Boot到AI微服务架构的深度技术挑战 remCoding Java场景面试宝典 Java Spring Boot Spring Cloud AI Kafka Redis Docker
场景：互联网大厂Java后端开发面试面试官（严肃）：小曾，请简单介绍下你过往的项目经验，特别是你在微服务架构中解决过哪些技术难题？小曾（自信）：我之前参与过电商平台的订单系统重构，将单体应用拆分为SpringCloud微服务架构。我们使用了SpringCloudGateway做网关路由，服务间通过Kafka异步通信，并引入Redis缓存热点数据。面试官：很好，能具体说说你们如何解决订单超卖问题的吗
Java大厂面试实录：从Spring Boot到AI微服务架构的深度技术拷问 remCoding Java场景面试宝典 Java面试 Spring Boot Jakarta EE AI微服务 Kafka Redis Spring AI
场景：互联网大厂Java后端面试面试官（严肃）：小曾，请先简单介绍下你过往的项目经验，侧重于高并发场景下的架构设计。小曾（自信）：我之前做过一个电商秒杀系统，用了SpringBoot和Redis，高峰期支撑了百万QPS。主要靠Redis缓存热点数据，数据库用了分库分表。面试官（点头）：不错，能具体说说缓存雪崩和热点key的解决方案吗？小曾（挠头）：呃...缓存雪崩用了熔断器，热点key的话...好
告别内存焦虑！用Dask打开Python大数据并行计算的“任意门“ 小张在编程 python 大数据开发语言
引言当你在Jupyter里用Pandas读取20GB的CSV文件，看到内存占用率从10%飙升到90%，最后弹出"MemoryError"时；当你想对亿级数据做分组聚合，却发现单线程计算要等上半小时——这些场景是不是像极了用小推车搬运万吨货物？Python生态中，Dask库就像一台"并行计算推土机"，能把大数据拆分成小块并行处理，让你的普通电脑也能拥有分布式计算的能力。本文将从原理到实战，带你掌握这
网络爬虫-07 YEGE学AI算法 Python-网络爬虫
网络爬虫-07）**Spider06回顾****scrapy框架****完成scrapy项目完整流程****我们必须记住****爬虫项目启动方式****数据持久化存储****Spider07笔记****分布式爬虫****scrapy_redis详解****腾讯招聘分布式改写****机器视觉与tesseract****补充-滑块缺口验证码案例****豆瓣网登录****Fiddler抓包工具****移
【Python爬虫(26)】Python爬虫进阶：数据清洗与预处理的魔法秘籍奔跑吧邓邓子 Python爬虫 python 爬虫开发语言数据清洗预处理
【Python爬虫】专栏简介：本专栏是Python爬虫领域的集大成之作，共100章节。从Python基础语法、爬虫入门知识讲起，深入探讨反爬虫、多线程、分布式等进阶技术。以大量实例为支撑，覆盖网页、图片、音频等各类数据爬取，还涉及数据处理与分析。无论是新手小白还是进阶开发者，都能从中汲取知识，助力掌握爬虫核心技能，开拓技术视野。目录一、数据清洗的重要性二、数据清洗的常见任务2.1去除噪声数据2.2
Proto文件从入门到精通——现代分布式系统通信的基石（含实战案例）筏.k gRPC c++rpc 服务器
gRPC核心技术详解：Proto文件从入门到精通——现代分布式系统通信的基石（含实战案例）更新时间：2025年7月18日️标签：gRPC|ProtocolBuffers|Proto文件|微服务|分布式系统|RPC通信|接口定义文章目录前言一、基础概念：Proto文件究竟是什么？1.什么是Proto文件？2.传统通信vsProto通信二、语法详解：Proto文件的构成要素1.基本语法结构2.数据类型
关于Spring RestTemplate
一、概述RestTemplate是SpringFramework提供的一个同步HTTP客户端工具，用于简化与RESTfulAPI的交互。它封装了底层HTTP通信细节，提供了统一的API来发送各种HTTP请求（GET、POST、PUT、DELETE等），并自动处理响应数据的序列化和反序列化。二、依赖配置如果使用Maven项目，需要在pom.xml中添加以下依赖：xml org.springfram
【橘子分布式】Thrift RPC(编程篇) 当年明日分布式分布式 rpc 网络协议
一、简介之前我们研究了一下thrift的一些知识，我们知道他是一个rpc框架，他作为rpc自然是提供了客户端到服务端的访问以及两端数据传输的消息序列化，消息的协议解析和传输，所以我们今天就来了解一下他是如何实现这些功能，并且如何在实际代码中使用。我们需要搭建环境。1.安装Thrift作用：把IDL语言描述的接口内容，生成对应编程语言的代码，简化开发。我们已经介绍了在mac如何使用brew安装了。2
分布式弹性故障处理框架——Polly(1)
1前言之服务雪崩在我们实施微服务之后，服务间的调用变得异常频繁，多个服务之前可能存在互相依赖的关系，当某个服务出现故障或者是因为服务间的网络出现故障，导致服务调用的失败，进而影响到某个业务服务处理失败，服务依赖的故障可能导致级联崩溃，如一个微服务不可用拖垮整个系统。【服务雪崩】服务雪崩通常遵循“从局部故障到全局崩溃”的递进路径，可拆解为以下步骤：初始故障某个基础服务（如数据库、缓存、第三方API）
插板式系统的“生命线“：EtherCAT分布式供电该如何实现？ ZLG 致远电子 iot
在ZIO系列插板式模组系统中，EtherCAT分布式供电如同设备的血液循环网络，其供电稳定性直接决定系统可靠性。本文将从电流计算到电源扩展，为您讲解EtherCAT分布式供电该如何实现。ZIO系列插板式模组的电源介绍ZIO系列插板式I/O模块是ZLG开发的可灵活设计的远程I/O扩展模块。该系列产品由耦合器、数字I/O、电机驱动、模拟量、电源等功能模块组成。ZIO系列可以通过定制化的底板集成各类接口
GPU网络运维一行代码通万物网络运维 GPU
一、GPU网络架构与核心技术GPU集群网络需适配分布式训练中“多节点数据同步”（如all-reduce、broadcast）的高频、大流量需求，主流技术方案及特点如下：网络技术核心优势适用场景运维重点InfiniBand低延迟（~1us）、高带宽（400Gb/s）、原生RDMA支持超大规模集群（≥1000节点）、千亿参数模型训练子网管理、固件兼容性、链路健康RoCE（RDMAoverConverg
云原生环境中Consul的动态服务发现实践 AI云原生与云计算技术学院 AI云原生与云计算云原生 consul 服务发现 ai
云原生环境中Consul的动态服务发现实践关键词：云原生,服务发现,Consul,微服务,动态注册,健康检查,Raft算法摘要：本文深入探讨云原生环境下Consul在动态服务发现中的核心原理与实践方法。通过剖析Consul的架构设计、核心算法和关键机制，结合具体代码案例演示服务注册、发现和健康检查的全流程。详细阐述在Kubernetes、Docker等云原生技术栈中的集成方案，分析实际应用场景中的
达梦分布式集群DPC_DPC线程深度解析_yxy yxy___ 达梦分布式集群分布式线程 DPC
达梦分布式集群DPC_DPC线程深度解析1.DPC专用线程体系1.1DPC线程池分类1.1.1底层公共线程池1.1.2上层专用线程池1.2线程管理模式1.2.1生产者-消费者模式1.2.2领导者跟随者模式2.DPC线程相关视图2.1THREADS2.2DPC_STASK_THRD2.3关键列解释3.DPC线程管理监控3.1sql卡顿，找出关键线程分析3.2完整sql执行示例1.DPC专用线程体系文
Redis面试精讲 Day 3：Redis持久化机制详解在未来等你 Redis面试专栏 Redis 面试题持久化 RDB AOF 数据库缓存
【Redis面试精讲Day3】Redis持久化机制详解文章标签Redis,面试题,持久化,RDB,AOF,数据库,缓存,后端开发,分布式系统文章简述本文是"Redis面试精讲"系列第3天内容，深入解析Redis持久化机制这一面试高频考点。文章从基础概念出发，详细剖析RDB和AOF两种持久化方式的实现原理、触发机制和优缺点对比，提供多语言客户端操作示例和性能测试数据。针对"如何选择持久化策略"、"A
Hadoop与云原生集成：弹性扩缩容与OSS存储分离架构深度解析
Hadoop与云原生集成的必要性Hadoop在大数据领域的基石地位作为大数据处理领域的奠基性技术，Hadoop自2006年诞生以来已形成包含HDFS、YARN、MapReduce三大核心组件的完整生态体系。根据CSDN技术社区的分析报告，全球超过75%的《财富》500强企业仍在使用Hadoop处理EB级数据，其分布式文件系统HDFS通过数据分片（默认128MB块大小）和三副本存储机制，成功解决了P
分布式系统中优化ELK日志采集性能 Alex艾力的IT数字空间 elk 微服务中间件架构 ux 安全性测试可用性测试
架构设计、组件调优、资源分配等多维度入手一、架构优化：分布式与解耦设计分层采集与缓冲Filebeat轻量级采集：在每台服务器部署Filebeat替代Logstash作为日志收集器，降低资源占用（CPU/内存减少70%以上）。引入缓冲队列：通过Redis或Kafka作为日志缓冲池，缓解Logstash或Elasticsearch的突发流量压力，避免数据丢失（如Logstash异常时Redis暂存数据
mongodb3.03开启认证 21jhf mongodb
下载了最新mongodb3.03版本，当使用--auth 参数命令行开启mongodb用户认证时遇到很多问题，现总结如下：（百度上搜到的基本都是老版本的，看到db.addUser的就是，请忽略） Windows下我做了一个bat文件，用来启动mongodb，命令行如下： mongod --dbpath db\data --port 27017 --directoryperdb --logp
【Spark103】Task not serializable bit1129 Serializable
Task not serializable是Spark开发过程最令人头疼的问题之一，这里记录下出现这个问题的两个实例，一个是自己遇到的，另一个是stackoverflow上看到。等有时间了再仔细探究出现Task not serialiazable的各种原因以及出现问题后如何快速定位问题的所在，至少目前阶段碰到此类问题，没有什么章法 1. package spark.exampl
你所熟知的 LRU(最近最少使用) dalan_123 java
关于LRU这个名词在很多地方或听说，或使用，接下来看下lru缓存回收的实现 1、大体的想法 a、查询出最近最晚使用的项 b、给最近的使用的项做标记通过使用链表就可以完成这两个操作，关于最近最少使用的项只需要返回链表的尾部；标记最近使用的项，只需要将该项移除并放置到头部，那么难点就出现你如何能够快速在链表定位对应的该项？这时候多
Javascript 跨域周凡杨 JavaScript jsonp 跨域 cross-domain
linux下安装apache服务器 g21121 apache
安装apache 下载windows版本apache，下载地址：http://httpd.apache.org/download.cgi 1.windows下安装apache Windows下安装apache比较简单，注意选择路径和端口即可，这里就不再赘述了。 2.linux下安装apache：下载之后上传到linux的相关目录，这里指定为/home/apach
FineReport的JS编辑框和URL地址栏语法简介老A不折腾 finereport web报表报表软件语法总结
JS编辑框： 1.FineReport的js。作为一款BS产品，browser端的JavaScript是必不可少的。 FineReport中的js是已经调用了finereport.js的。大家知道，预览报表时，报表servlet会将cpt模板转为html，在这个html的head头部中会引入FineReport的js，这个finereport.js中包含了许多内置的fun
根据STATUS信息对MySQL进行优化墙头上一根草 status
mysql 查看当前正在执行的操作，即正在执行的sql语句的方法为: show processlist 命令 mysql> show global status;可以列出MySQL服务器运行各种状态值，我个人较喜欢的用法是show status like '查询值%';一、慢查询mysql> show variab
我的spring学习笔记7-Spring的Bean配置文件给Bean定义别名 aijuans Spring 3
本文介绍如何给Spring的Bean配置文件的Bean定义别名？原始的 <bean id="business" class="onlyfun.caterpillar.device.Business"> <property name="writer"> <ref b
高性能mysql 之性能剖析 annan211 性能 mysql mysql 性能剖析剖析
1 定义性能优化 mysql服务器性能，此处定义为响应时间。在解释性能优化之前，先来消除一个误解，很多人认为，性能优化就是降低cpu的利用率或者减少对资源的使用。这是一个陷阱。资源时用来消耗并用来工作的，所以有时候消耗更多的资源能够加快查询速度，保持cpu忙绿，这是必要的。很多时候发现编译进了新版本的InnoDB之后，cpu利用率上升的很厉害，这并不
主外键和索引唯一性约束百合不是茶索引唯一性约束主外键约束联机删除
目标;第一步;创建两张表用户表和文章表第二步;发表文章 1,建表; ---用户表 BlogUsers --userID唯一的 --userName --pwd --sex create
线程的调度 bijian1013 java 多线程 thread 线程的调度 java多线程
1. Java提供一个线程调度程序来监控程序中启动后进入可运行状态的所有线程。线程调度程序按照线程的优先级决定应调度哪些线程来执行。 2. 多数线程的调度是抢占式的（即我想中断程序运行就中断，不需要和将被中断的程序协商） a)
查看日志常用命令 bijian1013 linux 命令 unix
一.日志查找方法，可以用通配符查某台主机上的所有服务器grep "关键字" /wls/applogs/custom-*/error.log 二.查看日志常用命令1.grep '关键字' error.log：在error.log中搜索'关键字'2.grep -C10 '关键字' error.log：显示关键字前后10行记录3.grep '关键字' error.l
【持久化框架MyBatis3一】MyBatis版HelloWorld bit1129 helloworld
MyBatis这个系列的文章，主要参考《Java Persistence with MyBatis 3》。样例数据本文以MySQL数据库为例，建立一个STUDENTS表，插入两条数据，然后进行单表的增删改查 CREATE TABLE STUDENTS ( stud_id int(11) NOT NULL AUTO_INCREMENT,
【Hadoop十五】Hadoop Counter bit1129 hadoop
1. 只有Map任务的Map Reduce Job File System Counters FILE: Number of bytes read=3629530 FILE: Number of bytes written=98312 FILE: Number of read operations=0 FILE: Number of lar
解决Tomcat数据连接池无法释放 ronin47 tomcat 连接池　优化
近段时间，公司的检测中心报表系统(SMC)的开发人员时不时找到我，说用户老是出现无法登录的情况。前些日子因为手头上有Jboss集群的测试工作，发现用户不能登录时，都是在Tomcat中将这个项目Reload一下就好了，不过只是治标而已，因为大概几个小时之后又会再次出现无法登录的情况。今天上午，开发人员小毛又找到我，要我协助将这个问题根治一下，拖太久用户难保不投诉。简单分析了一
java-75-二叉树两结点的最低共同父结点 bylijinnan java
import java.util.LinkedList; import java.util.List; import ljn.help.*; public class BTreeLowestParentOfTwoNodes { public static void main(String[] args) { /* * node data is stored in
行业垂直搜索引擎网页抓取项目 carlwu Lucene Nutch Heritrix Solr
公司有一个搜索引擎项目，希望各路高人有空来帮忙指导，谢谢！这是详细需求：（1）通过提供的网站地址(大概100-200个网站)，网页抓取程序能不断抓取网页和其它类型的文件（如Excel、PDF、Word、ppt及zip类型），并且程序能够根据事先提供的规则，过滤掉不相干的下载内容。（2）程序能够搜索这些抓取的内容，并能对这些抓取文件按照油田名进行分类，然后放到服务器不同的目录中。
[通讯与服务]在总带宽资源没有大幅增加之前,不适宜大幅度降低资费 comsci 资源
降低通讯服务资费，就意味着有更多的用户进入，就意味着通讯服务提供商要接待和服务更多的用户，在总体运维成本没有由于技术升级而大幅下降的情况下，这种降低资费的行为将导致每个用户的平均带宽不断下降，而享受到的服务质量也在下降，这对用户和服务商都是不利的。。。。。。。。 &nbs
Java时区转换及时间格式 Cwind java
本文介绍Java API 中 Date, Calendar, TimeZone和DateFormat的使用，以及不同时区时间相互转化的方法和原理。问题描述：向处于不同时区的服务器发请求时需要考虑时区转换的问题。譬如，服务器位于东八区（北京时间，GMT+8:00），而身处东四区的用户想要查询当天的销售记录。则需把东四区的“今天”这个时间范围转换为服务器所在时区的时间范围。
readonly,只读，不可用 dashuaifu js jsp disable readOnly readOnly
readOnly 和 readonly 不同，在做js开发时一定要注意函数大小写和jsp黄线的警告！！！我就经历过这么一件事：使用readOnly在某些浏览器或同一浏览器不同版本有的可以实现“只读”功能，有的就不行，而且函数readOnly有黄线警告！！！就这样被折磨了不短时间！！！（期间使用过disable函数，但是发现disable函数之后后台接收不到前台的的数据！！！）
LABjs、RequireJS、SeaJS 介绍 dcj3sjt126com js Web
LABjs 的核心是 LAB（Loading and Blocking）：Loading 指异步并行加载，Blocking 是指同步等待执行。LABjs 通过优雅的语法（script 和 wait）实现了这两大特性，核心价值是性能优化。LABjs 是一个文件加载器。RequireJS 和 SeaJS 则是模块加载器，倡导的是一种模块化开发理念，核心价值是让 JavaScript 的模块化开发变得更
[应用结构]入口脚本 dcj3sjt126com PHP yii2
入口脚本入口脚本是应用启动流程中的第一环，一个应用（不管是网页应用还是控制台应用）只有一个入口脚本。终端用户的请求通过入口脚本实例化应用并将将请求转发到应用。 Web 应用的入口脚本必须放在终端用户能够访问的目录下，通常命名为 index.php，也可以使用 Web 服务器能定位到的其他名称。控制台应用的入口脚本一般在应用根目录下命名为 yii（后缀为.php），该文
haoop shell命令 eksliang hadoop hadoop shell
cat chgrp chmod chown copyFromLocal copyToLocal cp du dus expunge get getmerge ls lsr mkdir movefromLocal mv put rm rmr setrep stat tail test text
MultiStateView不同的状态下显示不同的界面 gundumw100 android
只要将指定的view放在该控件里面，可以该view在不同的状态下显示不同的界面，这对ListView很有用，比如加载界面，空白界面，错误界面。而且这些见面由你指定布局，非常灵活。 PS：ListView虽然可以设置一个EmptyView，但使用起来不方便，不灵活，有点累赘。 <com.kennyc.view.MultiStateView xmlns:android=&qu
jQuery实现页面内锚点平滑跳转 ini JavaScript html jquery html5 css
平时我们做导航滚动到内容都是通过锚点来做，刷的一下就直接跳到内容了，没有一丝的滚动效果，而且 url 链接最后会有“小尾巴”，就像#keleyi，今天我就介绍一款 jquery 做的滚动的特效，既可以设置滚动速度，又可以在 url 链接上没有“小尾巴”。效果体验：http://keleyi.com/keleyi/phtml/jqtexiao/37.htmHTML文件代码： &
kafka offset迁移 kane_xie kafka
在早前的kafka版本中（0.8.0），offset是被存储在zookeeper中的。到当前版本（0.8.2）为止，kafka同时支持offset存储在zookeeper和offset manager（broker）中。从官方的说明来看，未来offset的zookeeper存储将会被弃用。因此现有的基于kafka的项目如果今后计划保持更新的话，可以考虑在合适
android > 搭建 cordova 环境 mft8899 android
1 , 安装 node.js http://nodejs.org node -v 查看版本 2, 安装 npm 可以先从 https://github.com/isaacs/npm/tags 下载源码解压到
java封装的比较器，比较是否全相同，获取不同字段名字 qifeifei
非常实用的java比较器，贴上代码： import java.util.HashSet; import java.util.List; import java.util.Set; import net.sf.json.JSONArray; import net.sf.json.JSONObject; import net.sf.json.JsonConfig; i
记录一些函数用法 .Aky. 位运算 PHP 数据库函数 IP
高手们照旧忽略。想弄个全天朝IP段数据库，找了个今天最新更新的国内所有运营商IP段，copy到文件，用文件函数，字符串函数把玩下。分割出startIp和endIp这样格式写入.txt文件，直接用phpmyadmin导入.csv文件的形式导入。（生命在于折腾，也许你们觉得我傻X，直接下载人家弄好的导入不就可以，做自己的菜鸟，让别人去说吧）当然用到了ip2long()函数把字符串转为整型数
sublime text 3 rust wudixiaotie Sublime Text
1.sublime text 3 => install package => Rust 2.cd ~/.config/sublime-text-3/Packages 3.mkdir rust 4.git clone https://github.com/sp0/rust-style 5.cd rust-style 6.cargo build --release 7.ctrl