之前内网部分服务不稳定,请求偶尔会挂掉,需要频繁重启。本着实事求是的思想,决定排查并解决这个顽疾。
通过登录机器查找日志发现具体的错误信息,“JavaScript heap out of memory”,可以看出是由于内存不足导致的服务异常。
通过执行top
指令,发现该进程([TSW/worker/0]
) CPU占有率和 memery 占比一直居高不下。
可以发现一些问题:
猜测: 可能是
kubeapi
服务,读取数据库或者业务自身写法错误引起的内存泄漏(例如:闭包、mysql连接未释放等)。
通过分析代码逻辑+日志追踪,真相逐渐清晰:
从这里就基本可以排除,数据库连接超时导致的问题。
那么,异常问题应该就是在返回的数据中。
通过查看接口的调用名,findAll
全查询,那么很有可能是因为数据量返回过大,导致内存以及CPU消耗过大。这里也符合一开始的错误定位。
await models.M.ConfBat.findAll({
where: {
jsonid,
type
}
})
为了验证这一个猜想,我们首先加一个分页逻辑,减小返回的数据量。(PS:此处操作数据库使用的是 sequelie API)
格式如下:
await models.M.ConfBat.findAll({
offset: offset || 0, // 偏移量
limit: limit || 20, // 分页数
where: {
jsonid,
type
}
})
为了验证猜想,我们继续重放原来的请求:
可以看到,数据成功返回。
通过上面的问题定位分析,我们可以看到异常的原因是由数据库返回的数据过大导致的。但是,深入思考一下,为什么会出现内存溢出呢?
Node.js 做密集型运算,或者所操作的数组、对象本身较大时,容易出现内存溢出的问题,这是由于 Node.js 的运行环境依赖 V8 引擎导致的。如果经常有较大数据量运算等操作,需要对 Node.js 运行环境限制有充分的了解。
2.1 密集型计算
for (var i = 0; i < 10000000; i++) {
((i) => {
var user = {};
user.id = i;
user.name = 'KUBE真苦逼';
setTimeout(()=>console.log(i,user),0)
})(i)
}
CPU和内存占用情况
2.2 海量数据量操作
let users = []
for (let i =0; i< 10000;i++) {
let user = {}
for(let j =0; j< 10000; j++) {
user[j] = `KUBE${j}`
users.push(user[j])
}
}
CPU和内存占用情况
错误堆栈信息
上述两种操作,都会出现下面的错误信息:
<--- Last few GCs --->
[3126:0x2ca9be0] 34735 ms: Mark-sweep 1280.6 (1331.5) -> 1280.6 (1300.5) MB, 1763.9 / 0.0 ms (average mu = 0.223, current mu = 0.000) last resort GC in old space requested
<--- JS stacktrace --->
NodeJS 程序出现内存溢出问题,可以分为三个方面:
4.1 使用 async/await
对于第一种异常情况,可以通过 async/await
将异步操作改为同步操作。防止异步事件堆积过多,操作内存溢出。
4.2 使用 Buffer 系统内存
对于第二种异常情况,将数据保存到Buffer中,而不是转换成字符串等JS对象的形式,这样也可以避免V8内存占用过多。
<--- Last few GCs --->
[8072:0x3d72dc0] 140630 ms: Mark-sweep 1102.9 (1259.7) -> 1102.8 (1227.2) MB, 48.0 / 0.0 ms (average mu = 0.942, current mu = 0.000) last resort GC in old space requested
[8072:0x3d72dc0] 140677 ms: Mark-sweep 1102.8 (1227.2) -> 1102.8 (1226.7) MB, 47.1 / 0.0 ms (average mu = 0.881, current mu = 0.000) last resort GC in old space requested
<— JS stacktrace —>
==== JS stack trace =========================================
0: ExitFrame [pc: 0x258199dd4eab]
Security context: 0x15dc384aede9
1: indexOf [0x15dc384b40d9](this=0x15c00320a309
2: /* anonymous */ [0x1add5005a291] [/usr/local/services/TSW-1.0/bin/tsw/plug.js:38] [bytecode=0x136dff410d9 offset=36](this=0x15dc384bc881 )
3: arguments adaptor frame: 1->0
4: response [0x30db838c6c19] [/…
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: 0x92b820 node::Abort() [TSW/worker/1]
2: 0x92b86c [TSW/worker/1]
3: 0xb36dfe v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [TSW/worker/1]
4: 0xb37031 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [TSW/worker/1]
5: 0xf45c02 [TSW/worker/1]
6: 0xf5627e v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [TSW/worker/1]
7: 0xf235b1 v8::internal::Factory::NewRawTwoByteString(int, v8::internal::PretenureFlag) [TSW/worker/1]
8: 0x1070d82 v8::internal::String::SlowFlatten(v8::internal::Handlev8::internal::ConsString, v8::internal::PretenureFlag) [TSW/worker/1]
9: 0xb33e9c v8::internal::String::Flatten(v8::internal::Handlev8::internal::String, v8::internal::PretenureFlag) [TSW/worker/1]
10: 0x1071f9c v8::internal::String::IndexOf(v8::internal::Isolate*, v8::internal::Handlev8::internal::String, v8::internal::Handlev8::internal::String, int) [TSW/worker/1]
11: 0x1224c27 v8::internal::Runtime_StringIndexOfUnchecked(int, v8::internal::Object**, v8::internal::Isolate*) [TSW/worker/1]
12: 0x258199dd4eab
2020-09-02 11:19:13.465 [INFO] [11732 cpu 0] [null] [master.js:438] worker1 pid=8072 disconnect event fired.
2020-09-02 11:19:13.465 [INFO] [11732 cpu 0] [null] [master.js:242] restart new worker instead of worker1 pid=8072
2020-09-02 11:19:13.468 [INFO] [11732 cpu 0] [null] [master.js:399] worker1 fork success! pid:19001, cpu: 1
2020-09-02 11:19:13.468 [INFO] [11732 cpu 0] [null] [cpu.js:121] taskset -cp 19001
2020-09-02 11:19:13.470 [INFO] [11732 cpu 0] [null] [master.js:451] worker1 pid=8072 exit event fired.
2020-09-02 11:19:13.472 [INFO] [11732 cpu 0] [null] [cpu.js:153]
pid 19001’s current affinity list: 0-7
2020-09-02 11:19:13.472 [INFO] [11732 cpu 0] [null] [cpu.js:160] taskset -cp 1 19001
2020-09-02 11:19:13.476 [INFO] [11732 cpu 0] [null] [cpu.js:174]
pid 19001’s current affinity list: 0-7
pid 19001’s new affinity list: 1
2020-09-02 11:19:14.162 [INFO] [19001 cpu 0] [null] [master.js:154] start worker…
2020-09-02 11:19:14.164 [INFO] [19001 cpu 0] [null] [master.js:155] config from: /data/release/tsw.config.js
2020-09-02 11:19:14.211 [INFO] [19001 cpuX 0] [null] [http.proxy.js:76] pid:19001 createServer ok
2020-09-02 11:19:14.311 [INFO] [19001 cpuX 0] [null] [http.proxy.js:97] receive message, cmd: listen
2020-09-02 11:19:14.312 [INFO] [19001 cpu1 0] [null] [http.proxy.js:249] cpu: 1, beforeStartup…
2020-09-02 11:19:14.312 [INFO] [19001 cpu1 0] [null] [http.proxy.js:255] cpu: 1, listen…
2020-09-02 11:19:14.315 [INFO] [19001 cpu1 0] [null] [http.proxy.js:263] cpu: 1, listen http ok 0.0.0.0:8081
2020-09-02 11:19:14.316 [INFO] [19001 cpu1 0] [null] [http.proxy.js:318] cpu: 1, listen http ok 100.113.190.39:12802
2020-09-02 11:19:14.316 [INFO] [19001 cpu1 0] [null] [http.proxy.js:271] start heartbeat
2020-09-02 11:19:14.316 [INFO] [19001 cpu1 0] [null] [http.proxy.js:286] switch to uid: user_00
2020-09-02 11:19:14.317 [INFO] [19001 cpu1 0] [null] [http.proxy.js:293] cpu: 1, afterStartup…
2020-09-02 11:19:19.906 [INFO] [11732 cpu 0] [null] [admin.actions.js:89] typeof config: object
check config ok
2020-09-02 11:19:19.906 [INFO] [11732 cpu 0] [null] [master.js:460] reload event fired.
2020-09-02 11:19:19.906 [INFO] [11732 cpu 0] [null] [master.js:480] cpu0 reload after 0ms
2020-09-02 11:19:19.906 [INFO] [11732 cpu 0] [null] [master.js:480] cpu1 reload after 1000ms
2020-09-02 11:19:19.906 [INFO] [11732 cpu 0] [null] [master.js:242] restart new worker instead of worker0 pid=11499
2020-09-02 11:19:19.909 [INFO] [11732 cpu 0] [null] [master.js:399] worker0 fork success! pid:19200, cpu: 0
2020-09-02 11:19:19.909 [INFO] [11732 cpu 0] [null] [cpu.js:121] taskset -cp 19200
2020-09-02 11:19:19.913 [INFO] [11732 cpu 0] [null] [cpu.js:153]
pid 19200’s current affinity list: 0-7
2020-09-02 11:19:19.913 [INFO] [11732 cpu 0] [null] [cpu.js:160] taskset -cp 0 19200
2020-09-02 11:19:19.917 [INFO] [11732 cpu 0] [null] [cpu.js:174]
pid 19200’s current affinity list: 0-7
pid 19200’s new affinity list: 0
2020-09-02 11:19:20.610 [INFO] [19200 cpu 0] [null] [master.js:154] start worker…
2020-09-02 11:19:20.612 [INFO] [19200 cpu 0] [null] [master.js:155] config from: /data/release/tsw.config.js
2020-09-02 11:19:20.659 [INFO] [19200 cpuX 0] [null] [http.proxy.js:76] pid:19200 createServer ok
2020-09-02 11:19:20.770 [INFO] [19200 cpuX 0] [null] [http.proxy.js:97] receive message, cmd: listen
2020-09-02 11:19:20.770 [INFO] [19200 cpu0 0] [null] [http.proxy.js:249] cpu: 0, beforeStartup…
2020-09-02 11:19:20.770 [INFO] [19200 cpu0 0] [null] [http.proxy.js:255] cpu: 0, listen…
2020-09-02 11:19:20.773 [INFO] [19200 cpu0 0] [null] [http.proxy.js:263] cpu: 0, listen http ok 0.0.0.0:8081
2020-09-02 11:19:20.774 [INFO] [19200 cpu0 0] [null] [http.proxy.js:318] cpu: 0, listen http ok 100.113.190.39:12801
2020-09-02 11:19:20.774 [INFO] [19200 cpu0 0] [null] [http.proxy.js:271] start heartbeat
2020-09-02 11:19:20.775 [INFO] [19200 cpu0 0] [null] [http.proxy.js:286] switch to uid: user_00
2020-09-02 11:19:20.775 [INFO] [19200 cpu0 0] [null] [http.proxy.js:293] cpu: 0, afterStartup…
2020-09-02 11:19:20.906 [INFO] [11732 cpu 0] [null] [master.js:242] restart new worker instead of worker1 pid=19001
2020-09-02 11:19:20.909 [INFO] [11732 cpu 0] [null] [master.js:399] worker1 fork success! pid:19258, cpu: 1
2020-09-02 11:19:20.909 [INFO] [11732 cpu 0] [null] [cpu.js:121] taskset -cp 19258
2020-09-02 11:19:20.910 [INFO] [19001 cpu1 0] [null] [index.js:35] report on disconnect event…
2020-09-02 11:19:20.911 [INFO] [11732 cpu 0] [null] [master.js:438] worker1 pid=19001 disconnect event fired.
2020-09-02 11:19:20.912 [INFO] [19001 cpu1 0] [null] [is-test.js:54] save h5test on disconnect event…
2020-09-02 11:19:20.913 [INFO] [11732 cpu 0] [null] [cpu.js:153]
pid 19258’s current affinity list: 0-7
2020-09-02 11:19:20.913 [INFO] [11732 cpu 0] [null] [cpu.js:160] taskset -cp 1 19258
2020-09-02 11:19:20.916 [INFO] [11732 cpu 0] [null] [cpu.js:174]
pid 19258’s current affinity list: 0-7
pid 19258’s new affinity list: 1
2020-09-02 11:19:21.610 [INFO] [19258 cpu 0] [null] [master.js:154] start worker…
2020-09-02 11:19:21.611 [INFO] [19258 cpu 0] [null] [master.js:155] config from: /data/release/tsw.config.js
2020-09-02 11:19:21.660 [INFO] [19258 cpuX 0] [null] [http.proxy.js:76] pid:19258 createServer ok
2020-09-02 11:19:21.774 [INFO] [19258 cpuX 0] [null] [http.proxy.js:97] receive message, cmd: listen
2020-09-02 11:19:21.774 [INFO] [19258 cpu1 0] [null] [http.proxy.js:249] cpu: 1, beforeStartup…
2020-09-02 11:19:21.775 [INFO] [19258 cpu1 0] [null] [http.proxy.js:255] cpu: 1, listen…
2020-09-02 11:19:21.777 [INFO] [19258 cpu1 0] [null] [http.proxy.js:263] cpu: 1, listen http ok 0.0.0.0:8081
2020-09-02 11:19:21.778 [INFO] [19258 cpu1 0] [null] [http.proxy.js:318] cpu: 1, listen http ok 100.113.190.39:12802
2020-09-02 11:19:21.778 [INFO] [19258 cpu1 0] [null] [http.proxy.js:271] start heartbeat
2020-09-02 11:19:21.779 [INFO] [19258 cpu1 0] [null] [http.proxy.js:286] switch to uid: user_00
2020-09-02 11:19:21.780 [INFO] [19258 cpu1 0] [null] [http.proxy.js:293] cpu: 1, afterStartup…
2020-09-02 11:19:23.470 [INFO] [11732 cpu 0] [null] [master.js:216] worker/1 8072 has killed
2020-09-02 11:19:29.910 [INFO] [11732 cpu 0] [null] [master.js:216] worker/0 11499 has killed
2020-09-02 11:19:29.910 [INFO] [11732 cpu 0] [null] [master.js:438] worker0 pid=11499 disconnect event fired.
2020-09-02 11:19:29.933 [INFO] [11732 cpu 0] [null] [master.js:451] worker0 pid=11499 exit event fired.
2020-09-02 11:19:30.909 [INFO] [11732 cpu 0] [null] [master.js:216] worker/1 19001 has killed
2020-09-02 11:19:30.913 [INFO] [11732 cpu 0] [null] [master.js:451] worker1 pid=19001 exit event fired.