上次出现No space left on device 以为是inode节点不够引起的,但实际是ocfs2的bug
参考文档 metalink ID 1232702.1 https://support.oracle.com
java.lang.IllegalStateException: java.lang.IllegalStateException: /pic/claimDbpic/dubang/2010/1/0507/705072010370702000189/06010/Thu
mb70507201037070200018906010_3.JPG (No space left on device)
Last login: Thu Oct 21 10:18:12 2010 from 10.0.1.26
[jboss@ca-be00-ser05 ~]$ df -i
文件系统 Inode (I)已用 (I)可用 (I)已用% 挂载点
/dev/sda1 33587200 260074 33327126 1% /
tmpfs 3084467 1 3084466 1% /dev/shm
/dev/mapper/pic 32768160 32440546 327614 100% /pic
/dev/mapper/app 3276960 619879 2657081 19% /app
/dev/mapper/mpath5 503316480 6076467 497240013 2% /picNL
[jboss@ca-be00-ser05 ~]$ df -h
文件系统 容量 已用 可用 已用% 挂载点
/dev/sda1 125G 11G 108G 9% /
tmpfs 12G 0 12G 0% /dev/shm
/dev/mapper/pic 1001G 991G 10G 100% /pic
/dev/mapper/app 101G 19G 82G 19% /app
/dev/mapper/mpath5 15T 186G 15T 2% /picNL
[root@ca-be00-ser05 ~]# more /var/log/messages
Oct 21 07:41:40 ca-be00-ser05 kernel: (ocfs2_wq,6482,1):ocfs2_delete_inode:999 ERROR: status = -2
Oct 21 08:10:05 ca-be00-ser05 ntpd[6983]: time reset -0.572586 s
Oct 21 08:14:21 ca-be00-ser05 ntpd[6983]: synchronized to LOCAL(0), stratum 8
Oct 21 08:15:26 ca-be00-ser05 ntpd[6983]: synchronized to 10.1.2.101, stratum 2
Oct 21 08:22:58 ca-be00-ser05 ntpd[6983]: synchronized to LOCAL(0), stratum 8
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_orphan_del:1841 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_remove_inode:628 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_wipe_inode:754 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_delete_inode:999 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_orphan_del:1841 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_remove_inode:628 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_wipe_inode:754 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_delete_inode:999 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_orphan_del:1841 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_remove_inode:628 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_wipe_inode:754 ERROR: status = -2
Oct 21 08:23:21 ca-be00-ser05 kernel: (ocfs2_wq,6482,11):ocfs2_delete_inode:999 ERROR: status = -2
Oct 21 08:29:07 ca-be00-ser05 ntpd[6983]: synchronized to 10.1.2.101, stratum 11
Oct 21 08:56:54 ca-be00-ser05 ntpd[6983]: time reset +0.174235 s
Oct 21 09:00:14 ca-be00-ser05 ntpd[6983]: synchronized to LOCAL(0), stratum 8
Oct 21 09:01:21 ca-be00-ser05 ntpd[6983]: synchronized to 10.1.2.101, stratum 2
Oct 21 09:13:45 ca-be00-ser05 kernel: (ocfs2_wq,6482,7):ocfs2_orphan_del:1841 ERROR: status = -2
1) Permanent solution:
- Upgrade to OCFS2 1.6, backup the files on the volume, reformat it with the 1.6 mkfs.ocfs2, restore the volume.
- prior to 1.6, all clusters in a cluster group required contiguous bits in the global bitmap. With 1.6, a group is broken down into subgroups, requiring a much smaller number of contiguous bits.
2) Temporary solutions:
a) Upgrade to OCFS2 1.4, backup the files on the volume, reformat it with the 1.4 mkfs.ocfs2, restore the volume.
- this will reserve 0.3% of the volume to allow for metadata expansion, but problem could still re-occur
b) Check for "spare slots" and remove one or more of them
- when a volume is created, the "-N" parameter of "mkfs.ocfs" specifies the maximum number of nodes that can concurrently mount a volume. This reserves a "slot" for each node.
If there are many more node slots created than required, there will be unused space that can be reclaimed.
To do this:
i) unmount the volume on all nodes
ii) run "tunefs.ocfs2 -N <n>" where <n> is the reduced number of node slots
iii) remount the volume and rerun the operation to expand the file
- note: do not attempt to perform this pro-actively else the freed space might get used by other objects.
c) Identify one or more large files that can be moved.
i) move it/them off the volume, and check the global bitmap for sufficient contiguous bits.
ii) move it/them back to the volume.