Linux Cache过大问题排查

现场问题:

  • 操作系统 CentOS Linux release 7.3.1611 (Core) 
  • 系统内存 16G
[root@clxcld-gateway-prod ~]# free -g
              total        used        free      shared  buff/cache   available
Mem:             15           0           0           0          13          14
Swap:             3           0           3

  系统总共启动2个Java进程,一个Xmx 3G 另外一个Xmx 4G, 但发现系统使用的内存很少,所有的内存全部被cache占用,重启Java进程也不起作用。 

查看lsof -i

[root@clxcld-gateway-prod log]# lsof -i
COMMAND     PID    USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
systemd       1    root   43u  IPv6     13662      0t0  TCP *:sunrpc (LISTEN)
systemd       1    root   44u  IPv4     13663      0t0  TCP *:sunrpc (LISTEN)
chronyd     667  chrony    1u  IPv4     14114      0t0  UDP localhost:323 
chronyd     667  chrony    2u  IPv6     14115      0t0  UDP localhost:323 
avahi-dae   712   avahi   12u  IPv4     15942      0t0  UDP *:mdns 
avahi-dae   712   avahi   13u  IPv4     15943      0t0  UDP *:56794 
xinetd     1066    root    5u  IPv6     19825      0t0  TCP *:nrpe (LISTEN)
xinetd     1066    root    6u  IPv6     19826      0t0  TCP *:nsca (LISTEN)
sshd       1084    root    3u  IPv4     18988      0t0  TCP *:mxxrlogin (LISTEN)
sshd       1084    root    4u  IPv6     18997      0t0  TCP *:mxxrlogin (LISTEN)
rpc.statd  1133 rpcuser    5u  IPv4     20812      0t0  UDP localhost:885 
rpc.statd  1133 rpcuser    8u  IPv4     21033      0t0  UDP *:41165 
rpc.statd  1133 rpcuser    9u  IPv4     21037      0t0  TCP *:59161 (LISTEN)
rpc.statd  1133 rpcuser   10u  IPv6     21041      0t0  UDP *:32879 
rpc.statd  1133 rpcuser   11u  IPv6     21045      0t0  TCP *:39979 (LISTEN)
rpcbind    1138     rpc    4u  IPv6     13662      0t0  TCP *:sunrpc (LISTEN)
rpcbind    1138     rpc    5u  IPv4     13663      0t0  TCP *:sunrpc (LISTEN)
rpcbind    1138     rpc    8u  IPv4     20895      0t0  UDP *:sunrpc 
rpcbind    1138     rpc    9u  IPv4     20896      0t0  UDP *:iclcnet_svinfo 
rpcbind    1138     rpc   10u  IPv6     20897      0t0  UDP *:sunrpc 
rpcbind    1138     rpc   11u  IPv6     20898      0t0  UDP *:iclcnet_svinfo 
master     1250    root   13u  IPv4     20394      0t0  TCP localhost:smtp (LISTEN)
master     1250    root   14u  IPv6     20395      0t0  TCP localhost:smtp (LISTEN)
sshd      23166    root    3u  IPv4 175197624      0t0  TCP clxcld-gateway-prod:mxxrlogin->172.23.46.21:45974 (ESTABLISHED)
java      24608    root  138u  IPv6 175242571      0t0  TCP *:40673 (LISTEN)
java      24608    root  140u  IPv6 175242124      0t0  TCP clxcld-gateway-prod:47240->10.13.248.15:mysql (ESTABLISHED)
java      24608    root  141u  IPv6 175242127      0t0  TCP clxcld-gateway-prod:47246->10.13.248.15:mysql (ESTABLISHED)
java      24608    root  144u  IPv6 175242130      0t0  TCP *:pcsync-https (LISTEN)
java      24608    root  149u  IPv6 175252852      0t0  TCP clxcld-gateway-prod:51920->10.13.248.15:mysql (ESTABLISHED)
java      24610    root  108u  IPv6 175242117      0t0  TCP *:41423 (LISTEN)
java      24610    root  112u  IPv6 175242684      0t0  TCP *:warehouse (LISTEN)
java      24610    root  113u  IPv6 175242691      0t0  TCP clxcld-gateway-prod:47248->10.13.248.15:mysql (ESTABLISHED)
java      24610    root  114u  IPv6 175243437      0t0  TCP clxcld-gateway-prod:47258->10.13.248.15:mysql (ESTABLISHED)
java      24610    root  118u  IPv6 175242785      0t0  TCP clxcld-gateway-prod:47260->10.13.248.15:mysql (ESTABLISHED)
ssh       24748    root    3u  IPv4 175243834      0t0  TCP clxcld-gateway-prod:33706->clxcld-gateway-prod:mxxrlogin (ESTABLISHED)
sshd      24750    root    3u  IPv4 175242791      0t0  TCP clxcld-gateway-prod:mxxrlogin->clxcld-gateway-prod:33706 (ESTABLISHED)
python    24761    root    4u  IPv4 175242899      0t0  TCP clxcld-gateway-prod:46418->analytics-prod.cpkzgarrnsp3.us-west-2.redshift.amazonaws.com:5439 (ESTABLISHED)
zabbix_ag 25594  zabbix    4u  IPv4 175270400      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25594  zabbix    5u  IPv6 175270401      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25595  zabbix    4u  IPv4 175270400      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25595  zabbix    5u  IPv6 175270401      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25596  zabbix    4u  IPv4 175270400      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25596  zabbix    5u  IPv6 175270401      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25597  zabbix    4u  IPv4 175270400      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25597  zabbix    5u  IPv6 175270401      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25598  zabbix    4u  IPv4 175270400      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25598  zabbix    5u  IPv6 175270401      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25599  zabbix    4u  IPv4 175270400      0t0  TCP *:zabbix-agent (LISTEN)
zabbix_ag 25599  zabbix    5u  IPv6 175270401      0t0  TCP *:zabbix-agent (LISTEN)

 

问题排查:

参考 https://www.cnblogs.com/zh94/p/11922714.html , 下载hcache工具: 

github 地址:https://github.com/silenceshell/hcache  
直接下载:wget https://silenceshell-1255345740.cos.ap-shanghai.myqcloud.com/hcache 
chmod 755 hcache
mv hcache /usr/local/bin

 使用hcache -top 10 查看占用最大的进程:

hcache --top 10
+-------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+-----------+---------+
| Name                                                                                                                                | Size (bytes)   | Pages      | Cached    | Percent |
|-------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+-----------+---------|
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006ab097-000597ad5568adc2.journal | 58720256       | 14336      | 12246     | 085.421 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000734380-000599df09bbc4a3.journal | 58720256       | 14336      | 12245     | 085.414 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006d23a4-0005984dfc97fc32.journal | 58720256       | 14336      | 12242     | 085.393 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000747d1e-00059a2f7faf1d70.journal | 58720256       | 14336      | 12242     | 085.393 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-000000000075b6bb-00059a801bbaf6ca.journal | 58720256       | 14336      | 12242     | 085.393 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000697714-0005975cf155fc13.journal | 58720256       | 14336      | 12241     | 085.386 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-000000000070d02d-0005993e39fdff2a.journal | 58720256       | 14336      | 12239     | 085.372 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006bea28-000597fda900f06e.journal | 58720256       | 14336      | 12239     | 085.372 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006e5d54-0005989e300b3aa6.journal | 58720256       | 14336      | 12239     | 085.372 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000007209f2-0005998ebb1d505b.journal | 58720256       | 14336      | 12239     | 085.372 |
+-------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+-----------+---------+

发现systemd进程journal占用很多buffer

[root@clxcld-gateway-prod d14e699e8bbc43228324a169b0f855fe]# ls -lath *
-rw-r-----+ 1 root systemd-journal 8.0M Jan  2 06:43 system.journal
-rw-r-----+ 1 root systemd-journal  56M Jan  2 03:26 system@6cfacedb39904c2499acffe16d0fd88a-000000000076f05c-00059ad09a381e0a.journal
-rw-r-----+ 1 root systemd-journal  56M Dec 29 05:00 system@6cfacedb39904c2499acffe16d0fd88a-000000000075b6bb-00059a801bbaf6ca.journal
-rw-r-----+ 1 root systemd-journal  56M Dec 25 04:56 system@6cfacedb39904c2499acffe16d0fd88a-0000000000747d1e-00059a2f7faf1d70.journal
-rw-r-----+ 1 root systemd-journal  56M Dec 21 04:47 system@6cfacedb39904c2499acffe16d0fd88a-0000000000734380-000599df09bbc4a3.journal
-rw-r-----+ 1 root systemd-journal  56M Dec 17 04:48 system@6cfacedb39904c2499acffe16d0fd88a-00000000007209f2-0005998ebb1d505b.journal
-rw-r-----+ 1 root systemd-journal  56M Dec 13 04:59 system@6cfacedb39904c2499acffe16d0fd88a-000000000070d02d-0005993e39fdff2a.journal
-rw-r-----+ 1 root systemd-journal  56M Dec  9 04:57 system@6cfacedb39904c2499acffe16d0fd88a-00000000006f9690-000598edd5ef0719.journal
-rw-r-----+ 1 root systemd-journal  56M Dec  5 05:02 system@6cfacedb39904c2499acffe16d0fd88a-00000000006e5d54-0005989e300b3aa6.journal
-rw-r-----+ 1 root systemd-journal  56M Dec  1 06:01 system@6cfacedb39904c2499acffe16d0fd88a-00000000006d23a4-0005984dfc97fc32.journal
-rw-r-----+ 1 root systemd-journal  56M Nov 27 06:20 system@6cfacedb39904c2499acffe16d0fd88a-00000000006bea28-000597fda900f06e.journal
-rw-r-----+ 1 root systemd-journal  56M Nov 23 06:30 system@6cfacedb39904c2499acffe16d0fd88a-00000000006ab097-000597ad5568adc2.journal
-rw-r-----+ 1 root systemd-journal  56M Nov 19 06:40 system@6cfacedb39904c2499acffe16d0fd88a-0000000000697714-0005975cf155fc13.journal
-rw-r-----+ 1 root systemd-journal  56M Nov 15 06:45 system@6cfacedb39904c2499acffe16d0fd88a-0000000000683d77-0005970c8a737b7b.journal
-rw-r-----+ 1 root systemd-journal  56M Nov 11 06:50 system@6cfacedb39904c2499acffe16d0fd88a-00000000006703c5-000596bbef4870ae.journal

 参考 https://blog.steamedfish.org/post/systemd-journald/ 清理journal的内存:

journalctl --vacuum-time=10d
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006703c5-000596bbef4870ae.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000683d77-0005970c8a737b7b.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000697714-0005975cf155fc13.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006ab097-000597ad5568adc2.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006bea28-000597fda900f06e.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006d23a4-0005984dfc97fc32.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006e5d54-0005989e300b3aa6.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000006f9690-000598edd5ef0719.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-000000000070d02d-0005993e39fdff2a.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-00000000007209f2-0005998ebb1d505b.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000734380-000599df09bbc4a3.journal (56.0M).
Deleted archived journal /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-0000000000747d1e-00059a2f7faf1d70.journal (56.0M).
Vacuuming done, freed 672.0M of archived journals on disk.
[root@clxcld-gateway-prod d14e699e8bbc43228324a169b0f855fe]# ls
system@6cfacedb39904c2499acffe16d0fd88a-000000000075b6bb-00059a801bbaf6ca.journal  system.journal
system@6cfacedb39904c2499acffe16d0fd88a-000000000076f05c-00059ad09a381e0a.journal

继续通过hcache -top 查询,发现journal已经减少了很多

[root@clxcld-gateway-prod d14e699e8bbc43228324a169b0f855fe]# hcache --top 10
+-------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+-----------+---------+
| Name                                                                                                                                | Size (bytes)   | Pages      | Cached    | Percent |
|-------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+-----------+---------|
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-000000000075b6bb-00059a801bbaf6ca.journal | 58720256       | 14336      | 12242     | 085.393 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system@6cfacedb39904c2499acffe16d0fd88a-000000000076f05c-00059ad09a381e0a.journal | 58720256       | 14336      | 12226     | 085.282 |
| /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/lib/rt.jar                                                         | 72964441       | 17814      | 10463     | 058.735 |
| /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/lib/amd64/server/libjvm.so                                         | 13942784       | 3404       | 3216      | 094.477 |
| /run/log/journal/d14e699e8bbc43228324a169b0f855fe/system.journal                                                                    | 8388608        | 2048       | 1311      | 064.014 |
| /usr/lib64/dri/swrast_dri.so                                                                                                        | 9597216        | 2344       | 1143      | 048.763 |
| /usr/lib64/libmozjs-24.so                                                                                                           | 5987032        | 1462       | 1076      | 073.598 |
| /usr/lib64/libgtk-3.so.0.1400.13                                                                                                    | 7116800        | 1738       | 1024      | 058.918 |
| /usr/lib/locale/locale-archive                                                                                                      | 106070960      | 25897      | 1024      | 003.954 |
| /usr/lib64/gnome-shell/libgnome-shell.so                                                                                            | 2671456        | 653        | 653       | 100.000 |
+-------------------------------------------------------------------------------------------------------------------------------------+----------------+------------+-----------+---------+

由于journal默认存储方式是auto,并且如果存在目录/var/log/journal则将日志cache到磁盘,否则会缓存到内存中。 因此创建/var/log/journal目录,同时重启journal进程

systemctl restart systemd-journal.service
[root@clxcld-gateway-prod journal]# hcache --top 10
+---------------------------------------------------------------------------------------------+----------------+------------+-----------+---------+
| Name | Size (bytes) | Pages | Cached | Percent |
|---------------------------------------------------------------------------------------------+----------------+------------+-----------+---------|
| /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/lib/rt.jar | 72964441 | 17814 | 10463 | 058.735 |
| /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/lib/amd64/server/libjvm.so | 13942784 | 3404 | 3216 | 094.477 |
| /var/log/journal/d14e699e8bbc43228324a169b0f855fe/system.journal | 8388608 | 2048 | 2048 | 100.000 |
| /usr/lib64/dri/swrast_dri.so | 9597216 | 2344 | 1143 | 048.763 |
| /usr/lib64/libmozjs-24.so | 5987032 | 1462 | 1076 | 073.598 |
| /usr/lib64/libgtk-3.so.0.1400.13 | 7116800 | 1738 | 1024 | 058.918 |
| /usr/lib/locale/locale-archive | 106070960 | 25897 | 1024 | 003.954 |
| /usr/lib64/gnome-shell/libgnome-shell.so | 2671456 | 653 | 653 | 100.000 |
| /root/azkaban-3.33.0/azkaban-exec-server-3.33.0/lib/hadoop-common-2.6.1.jar | 3318727 | 811 | 652 | 080.395 |
| /root/azkaban-3.33.0/azkaban-web-server-3.33.0/lib/hadoop-common-2.6.1.jar | 3318727 | 811 | 652 | 080.395 |
+---------------------------------------------------------------------------------------------+----------------+------------+-----------+---------+

 

重新查询,参考 https://blog.csdn.net/liuxiao723846/article/details/72628847 , system cache没有被立即回收

[root@clxcld-gateway-prod ~]# echo 1 > /proc/sys/vm/drop_caches
[root@clxcld-gateway-prod ~]#free -g
              total        used        free      shared  buff/cache   available
Mem:             15           1          14           0           0          14
Swap:             3           0           3

强制将cache回收。 

 

参考 

  • https://blog.csdn.net/liuxiao723846/article/details/72628847 linux内存占用问题调查——cached
  • https://www.jianshu.com/p/8b3fba13fcad systemd攻略
  • https://www.jianshu.com/p/3320bc84f227 Systemd
  • https://www.ibm.com/developerworks/cn/linux/1407_liuming_init3/ Systemd
  • http://www.jinbuguo.com/systemd/systemd-journald.service.html systemd-journald.service 中文手册
  • https://wiki.archlinux.org/index.php/Systemd/Journal_(%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87) systemd/Journal (简体中文)
  • https://github.com/silenceshell/hcache 
  • https://www.cnblogs.com/zh94/p/11922714.html Linux查看哪些进程占用的系统 buffer/cache 较高 (hcache,lsof)命令

你可能感兴趣的:(Linux Cache过大问题排查)