一个周末接到业务方频繁反馈订单节点流水丢失!
该系统是接收各个业务方上报的订单节点流水,我做的是所有上报请求塞到一个MQ中,另外起线程池定时去消费这个MQ,一次消费10条消息做后续业务逻辑。。
一:客服反映的问题:
流水丢失现象严重,远超之前丢失率
二:分析日志,总结现象:
unipay 上报的包含1101节点的流水全部丢失,如果该节点流水在一次性取到的10条消息中间位置,那么从它开始后面的流水全部丢失。
查询节点管理页面:发现1101节点有2个。。。。
三:问题定位:
检索到unipay上报的包含1101节点的上报日志报文:比如:{"code":"10902101517016","codeType":"01","currentNode":"1101","optTime":1544154263000,"platformCode":"01","remark":"","signKey":"ZFV3ME5xxxxxxxxxxEUwT1RFNU1EWxxxxxxxDQxNxxx","source":"unipay","uploadMessage":"1101","uploadTime":1544154263000}
报文原封不动在测试环境上报,发现能落地成功。
在线上环境补一下上述1101流水报文,发现还是不能落地。
怀疑线上,测试环境代码有误差,对比代码后,推翻该想法。
一筹莫展之际:反本溯源,撸代码!
从日志来看,1101节点的流水能从MQ获取到,就是没有落地,发现处理MQ会同步依次在2处落地:hzw_bts_bill,和 bill_detail_xxx表
发现hzw_bts_bill表中就没有数据落地,怀疑问题出在插入hzw_bts_bill表的逻辑中
HzwBtsBill hzwBtsBill = new HzwBtsBill();
hzwBtsBill.setCode(billJson.getString("code"));
hzwBtsBill.setCodeType(billJson.getString("codeType"));
HzwBtsBill hzwBtsBill2 = this.queryHzwBtsBill(hzwBtsBill);
// HzwBtsNodeCfg hbn = DBCfg.NODE_CFG.get(billJson.get("currentNode"));
HzwBtsNodeCfg hbn = getHzwBtsNodeCfg(billJson.getString("currentNode"));
此时跟踪到这边获取节点的方法,
getHzwBtsNodeCfg()方法是从redis里查询当前节点的信息,如果查询不到 就去库里查一把再存入redis。
public
@SuppressWarnings("unchecked")
T t = (T) CacheHelper.getHashFieldValue(cacheKey, field, clazz);
if (t == null && facade != null) {
t = facade.excute();
if (t != null) {
CacheHelper.setHashVaueBykey(cacheKey, field, t);
}
}
return t;
}
想到上周处理过一个节点幂等的需求时动过这边的节点配置,因为我们存入redis的节点信息是不会失效的!找平台刷新过redis中节点key的值,所有节点信息都清空了,都要先从库里查一遍。
但是上面的从库里查询节点的方法返回值是一个
验证猜测:
12月7日刷新redis之前,redis中有1101节点 不用查DB,1101节点流水能落地成功:
mysql> select * from hzw_bts_interface_result where service_type ='unipay' and create_time < '2018-12-07' order by create_time desc limit 1\G
*************************** 1. row ***************************
ID: 200617787
deal_code: 10422101497772
type: 2
service_type: unipay
result: 1
url: /ifs/upload.do
param: {"code":"10422101497772","codeType":"01","uploadMessage":"1101","optTime":1544105838,"uploadTime":1544105838,"currentNode":"1101","source":"unipay","platformCode":"01","signKey":"ZFVxxxx13TnpFPxxxxx","remark":""}
request_type: 2
description: InterfaceFilter拦截入库
fail_reason: {"data":1,"msg":"成功","code":"1"}
yn: 1
create_user: system
create_time: 2018-12-06 22:17:19
update_user: system
update_time: 2018-12-06 22:17:19
1 row in set (1.21 sec)
mysql> select * from hzw_bts_bill_detail_292 where code ='10422101497772'\G
*************************** 1. row ***************************
id: 336684005968289792
code: 10422101497772
code_type: 01
relative_code: NULL
relative_code_type: NULL
upload_message: 1101
opt_time: 2018-12-06 22:17:18
upload_time: 2018-12-06 22:17:18
current_node: 1101
is_app: 1
is_ts: 0
is_complete: 0
source: unipay
platform_code: 01
relative_message: NULL
yn: 1
create_user: NULL
create_time: 2018-12-06 22:17:19
update_user: NULL
update_time: NULL
image_urls: NULL
1 row in set (0.02 sec)
12月7日之后 redis中没有1101节点 查DB 报错 1101节点流水无法落地:
mysql> select * from hzw_bts_interface_result where service_type ='unipay' and create_time > '2018-12-07 11:00:00' limit 1\G
*************************** 1. row ***************************
ID: 200760850
deal_code: 10507101511124
type: 2
service_type: unipay
result: 1
url: /ifs/upload.do
param: {"code":"10507101511124","codeType":"01","uploadMessage":"1101","optTime":1544151599,"uploadTime":1544151599,"currentNode":"1101","source":"unipay","platformCode":"01","signKey":"xxxx13TnpFxxx","remark":""}
request_type: 2
description: InterfaceFilter拦截入库
fail_reason: {"data":1,"msg":"成功","code":"1"}
yn: 1
create_user: system
create_time: 2018-12-07 11:00:01
update_user: system
update_time: 2018-12-07 11:00:01
1 row in set (0.06 sec)
mysql>
mysql>
mysql> select * from hzw_bts_bill_detail_916 where code ='10507101511124'\G
Empty set (0.03 sec)
四:解决方案:
删掉线上环境中多余的那个1101节点
代码中获取DB中节点的部分做try catch
后续新接入业务方新增节点时 记得删掉多余的节点