VM中某块盘无法使用

现象

VM中有一块盘,进入mount点后,ls显示错误:

[root@es10 data2]# ls
ls: cannot open directory .: Input/output error

跟踪分析

  • 查看磁盘
[root@es10 ~]# lsblk | grep vdc
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vdc    253:32   0    2T  0 disk 
└─vdc1 253:33   0    2T  0 part /data2
  • 查看mount情况
[root@es10 ~]# mount | grep vdc
/dev/vdc1 on /data2 type xfs (rw,relatime,attr2,inode64,noquota)
  • 查看磁盘使用情况
[root@es10 data2]# df -h | grep vdc
Filesystem      Size  Used Avail Use% Mounted on
/dev/vdc1       2.0T  281G  1.7T  15% /data2
  • 查看内存使用情况
[root@es10 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            31G        322M         18G         16M         12G         30G
Swap:            0B          0B          0B
  • 查看系统log
[root@es10 ~]# tail /var/log/messages
Dec  4 10:02:37 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
Dec  4 10:02:39 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
Dec  4 10:02:39 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
  • 查看mount点被哪些进程使用
[root@es10 ~]# lsof | grep "/data2"
lsof: WARNING: can't stat() xfs file system /data2
      Output information may be incomplete.

[root@es10 ~]# lsof | grep "/data"
lsof: WARNING: can't stat() xfs file system /data2
      Output information may be incomplete.
bash      32147        root  cwd       DIR             253,17        26        128 /data1

解决

在网上搜索了一下,有人也遇到过,重新mount可以解决:

[root@es10 ~]# umount /data2
[root@es10 ~]# mount | grep vdc
[root@es10 ~]# mount /dev/vdc1 /data2

[root@es10 ~]# tail /var/log/messages
Dec  4 10:02:37 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
Dec  4 10:02:39 es10 kernel: XFS (vdc1): Unmounting Filesystem
Dec  4 10:02:57 es10 kernel: XFS (vdc1): Mounting V4 Filesystem
Dec  4 10:02:57 es10 kernel: XFS (vdc1): Starting recovery (logdev: internal)
Dec  4 10:02:58 es10 kernel: XFS (vdc1): Ending recovery (logdev: internal)
Dec  4 10:05:38 es10 systemd: Started Session 592 of user root.
Dec  4 10:05:38 es10 systemd-logind: New session 592 of user root.
Dec  4 10:05:38 es10 systemd: Starting Session 592 of user root.

深度分析

解释

Linux默认会使用可用内存的40%来作为文件系统cache,当这个阈值达到时,文件系统会下刷这些cache到磁盘。而这个flush的过程,会有一个时间限制,操作系统默认给了120s的限制,如果底层磁盘不是很快,部分cache依然占用,没有flush到disk,系统内存会慢慢被消耗完。

再度分析系统Log:

Dec  2 05:27:32 es10 kernel: INFO: task java:10432 blocked for more than 120 seconds.
Dec  2 05:27:32 es10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  2 05:27:32 es10 kernel: java            D ffff8808115a4500     0 10432      1 0x00000180
Dec  2 05:27:32 es10 kernel: ffff8803fcf9fe40 0000000000000082 ffff8808115a4500 ffff8803fcf9ffd8
Dec  2 05:27:32 es10 kernel: ffff8803fcf9ffd8 ffff8803fcf9ffd8 ffff8808115a4500 000000720007bf45
Dec  2 05:27:32 es10 kernel: ffff88042944c528 ffff8803fcf9fef4 ffff88080fc2e918 ffff8808115a4500
Dec  2 05:27:32 es10 kernel: Call Trace:
Dec  2 05:27:32 es10 kernel: [] schedule+0x29/0x70
Dec  2 05:27:32 es10 kernel: [] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
Dec  2 05:27:32 es10 kernel: [] ? wake_up_state+0x20/0x20
Dec  2 05:27:32 es10 kernel: [] xfs_file_fsync+0x1b0/0x200 [xfs]
Dec  2 05:27:32 es10 kernel: [] do_fsync+0x65/0xa0
Dec  2 05:27:32 es10 kernel: [] SyS_fdatasync+0x13/0x20
Dec  2 05:27:32 es10 kernel: [] tracesys+0xdd/0xe2
Dec  2 05:27:32 es10 kernel: INFO: task java:10433 blocked for more than 120 seconds.
Dec  2 05:27:32 es10 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  2 05:27:32 es10 kernel: java            D ffff88081176e780     0 10433      1 0x00000180
Dec  2 05:27:32 es10 kernel: ffff88049fb03e40 0000000000000082 ffff88081176e780 ffff88049fb03fd8
Dec  2 05:27:32 es10 kernel: ffff88049fb03fd8 ffff88049fb03fd8 ffff88081176e780 000000720007be05
Dec  2 05:27:32 es10 kernel: ffff88042944c528 ffff88049fb03ef4 ffff88080fc2e818 ffff88081176e780
Dec  2 05:27:32 es10 kernel: Call Trace:
Dec  2 05:27:32 es10 kernel: [] schedule+0x29/0x70
Dec  2 05:27:32 es10 kernel: [] _xfs_log_force_lsn+0x184/0x2f0 [xfs]
Dec  2 05:27:32 es10 kernel: [] ? wake_up_state+0x20/0x20
Dec  2 05:27:32 es10 kernel: [] xfs_file_fsync+0x1b0/0x200 [xfs]
Dec  2 05:27:32 es10 kernel: [] do_fsync+0x65/0xa0
Dec  2 05:27:32 es10 kernel: [] SyS_fdatasync+0x13/0x20
Dec  2 05:27:32 es10 kernel: [] tracesys+0xdd/0xe2

Dec  2 05:33:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660325
Dec  2 05:33:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07b9a5 ("xlog_iodone") error 5 numblks 64
Dec  2 05:33:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:33:17 es10 kernel: XFS (vdc1): Log I/O Error Detected.  Shutting down filesystem
Dec  2 05:33:17 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
Dec  2 05:33:17 es10 kernel: XFS (vdc1): Please umount the filesystem and rectify the problem(s)
Dec  2 05:33:47 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115248608
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115249616
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115250624
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115251632
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115252640
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115253648
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115254656
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115255664
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115256672
Dec  2 05:33:54 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2115256800
Dec  2 05:34:17 es10 kernel: XFS (vdc1): xfs_log_force: error -5 returned.
Dec  2 05:38:17 es10 kernel: blk_update_request: 8 callbacks suppressed
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660389
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660453
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660517
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660581
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660645
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660709
Dec  2 05:38:17 es10 kernel: blk_update_request: I/O error, dev vdc, sector 2097660773
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07b9e5 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07ba25 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07ba65 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07baa5 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07bae5 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07bb25 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e
Dec  2 05:38:17 es10 kernel: XFS (vdc1): metadata I/O error: block 0x7d07bb65 ("xlog_iodone") error 5 numblks 64
Dec  2 05:38:17 es10 kernel: XFS (vdc1): xfs_do_force_shutdown(0x2) called from line 1180 of file fs/xfs/xfs_log.c.  Return address = 0xffffffffa01c3b9e

从上面的日志可以看到:

主要问题还是 kernel: INFO: task java:10432 blocked for more than 120 seconds.这种问题,解决方法就是降低vm.dirty_ratiovm.dirty_background_ratio这两个参数。

echo "vm.dirty_ratio = 10" >> /etc/sysctl.conf
echo "vm.dirty_background_ratio = 1" >> /etc/sysctl.conf'
不重启生效:
sysctl -p

链接

http://blog.51cto.com/7938217/1679885

https://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/

https://bugzilla.redhat.com/show_bug.cgi?id=1211084

你可能感兴趣的:(分布式存储,ceph)