PS:文章中用到的ip和代码已脱敏
请查看这篇文章https://editor.csdn.net/md/?articleId=122735525
kafka地址:
kafka_broker_list={
{host="193.168.1.2",port=9092},
{host="193.168.1.3",port=9092},
{host="193.168.1.1",port=9092}
}
发送脚本:
local kafka_broker_list={
{host="193.168.1.2",port=9092},
{host="193.168.1.3",port=9092},
{host="193.168.1.1",port=9092}
}
local kafka_topic_app = "topic_app"
local p = producer:new(kafka_broker_list, {producer_type = "sync",refresh_interval=10000})
local offset, err = p:send(kafka_topic_app, nil, body)
将我的服务打包成docker image,使用的基础镜像是debian10
测试是在某个服务器上安装的docker中进行的
docker run -d -it -p 0.0.0.0:9000:8080 --name nginx-kafka --privileged=true nginx-kafka:0.0.1 /bin/bash
说明:
-p 0.0.0.0:9000:8080
开放ECS(10.11.12.13)的9000端口访问,流量会转发到容器的8080端口--privileged=true
开放特权,否则root账户不可操作iptablesapt-get install iptables -y
telnet 192.168.1.1 9092
telnet 192.168.1.2 9092
telnet 192.168.1.3 9092
iptables -A OUTPUT -d 192.168.1.1 -j DROP
一直在尝试连接
此时nginx的error.log无任何输出。
使用postman调用报送接口:
可以看到发送kafka消息超时了。
iptables -A OUTPUT -d 192.168.1.2 -j DROP
发现已经连接不上了。
此时使用postman测试报送,结果和章节3.5一致。
iptables -A OUTPUT -d 192.168.1.3 -j DROP
日志中出现了连接192.168.1.1:9092,192.168.1.2:9092,192.168.1.3:9092的报错,也有fetch_metadata的报错
2023/03/22 14:57:56 [error] 51#0: *1450 lua tcp socket connect timed out, when connecting to 192.168.1.1:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:57:56 [error] 51#0: *1450 [lua] client.lua:151: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:57:56 [error] 51#0: *1452 lua tcp socket connect timed out, when connecting to 192.168.1.1:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:57:56 [error] 51#0: *1452 [lua] client.lua:151: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:57:58 [error] 51#0: *1466 lua tcp socket connect timed out, when connecting to 192.168.1.2:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:58:00 [error] 51#0: *1469 lua tcp socket connect timed out, when connecting to 192.168.1.2:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:58:01 [error] 51#0: *1466 lua tcp socket connect timed out, when connecting to 192.168.1.3:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:58:03 [error] 51#0: *1469 lua tcp socket connect timed out, when connecting to 192.168.1.3:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:58:04 [error] 51#0: *1466 lua tcp socket connect timed out, when connecting to 192.168.1.1:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 14:58:04 [error] 51#0: *1466 [lua] client.lua:151: _fetch_metadata(): all brokers failed in fetch topic metadata, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
此时使用postman测试报送,结果和章节3.5一致。
iptables -t filter -D OUTPUT -d 192.168.1.1 -j DROP
测试连通性:
2023/03/22 15:02:22 [error] 51#0: *1912 lua tcp socket connect timed out, when connecting to 192.168.1.2:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 15:02:23 [error] 51#0: *1915 lua tcp socket connect timed out, when connecting to 192.168.1.2:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 15:02:23 [error] 51#0: *1917 lua tcp socket connect timed out, when connecting to 192.168.1.2:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 15:02:25 [error] 51#0: *1912 lua tcp socket connect timed out, when connecting to 192.168.1.3:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 15:02:26 [error] 51#0: *1915 lua tcp socket connect timed out, when connecting to 192.168.1.3:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
2023/03/22 15:02:26 [error] 51#0: *1917 lua tcp socket connect timed out, when connecting to 192.168.1.3:9092, context: ngx.timer, client: 192.168.1.4, server: 0.0.0.0:8080
奇迹般的可以访问了。
封禁192.168.1.1,开发其它两个ip访问时埋点报送无法发送至kafka,没能故障转移
解除192.168.1.1封禁时可以正确的将埋点发送到kafka,可以认为当此ip恢复访问时,kafka可以正确的恢复。
以上结论可以类推某个ip如果一直无法访问就没法将请求消息正确发送到kafka,此ip恢复时即可正确发送到kafka
-------------------------->20230327分界线<-------------------------------
写完这篇文章后,收集了一下资料,也请教了Kafka大佬,上面的测试存在一定的问题,Kafka是一主(leader)多从(follower)架构,当leader节点正常时,metadata数据会一直显示leader节点正常,网络不通不代表leader节点挂了。当leader节点确实挂了后,zk会重新选举新的leader节点,此时client端接收到这个信息后会自动故障转移。
我把上面的测试在作者的github上提了个issue,很开心得到了作者的耐心解答:
所以综上,当leader网络从不通变为通畅时,client端还是可以重新连接到leader发消息的。