xfs文件系统操作实践

XFS不仅仅是文件系统,它也提供一系列的工具来控制其工作状态、参数等,本文就其使用过程中的经验做一总结,希望能有些帮助。有多个相关的管理工具:xfs_admin、xfs_check、xfs_db、xfs_freeze、xfs_growfs、xfs_io、xfs_mdrestore、xfs_mkfile、xfs_quota、xfs_rtcp、xfs_bmap、xfs_copy、xfs_estimate、xfs_fsr、xfs_info、xfs_logprint、xfs_metadump、xfs_ncheck、xfs_repair。

xfs相关常用命令简介

xfs_admin: 调整 xfs 文件系统的各种参数

xfs_copy: 拷贝 xfs 文件系统的内容到一个或多个目标系统(并行方式)

xfs_db: 调试或检测 xfs 文件系统(查看文件系统碎片等)

xfs_check: 检测 xfs 文件系统的完整性

xfs_bmap: 查看一个文件的块映射

xfs_repair: 尝试修复受损的 xfs 文件系统

xfs_fsr: 碎片整理

xfs_quota: 管理 xfs 文件系统的磁盘配额

xfs_metadump: 将 xfs 文件系统的元数据 (metadata) 拷贝到一个文件中

xfs_mdrestore: 从一个文件中将元数据 (metadata) 恢复到 xfs 文件系统

xfs_growfs: 调整一个 xfs 文件系统大小(只能扩展)

xfs_freeze:暂停(-f)和恢复(-u)xfs 文件系统

xfs_logprint: 打印xfs文件系统的日志

xfs_mkfile: 创建xfs文件系统

xfs_info: 查询文件系统详细信息

xfs_ncheck: generate pathnames from i-numbers for XFS

xfs_rtcp: XFS实时拷贝命令

xfs_io: 调试xfs I/O路径


Debian在最小安装时没有XFS的支持,需要另外安装:apt-get install xfsprogs

一、为分区加别名标签
在为其赋'label'时,最好是将其'umount'掉,比如要为新加的硬盘(sdb1)设置别名:media
Usage: xfs_admin [-efjlpuV] [-c 0|1] [-L label] [-U uuid] device
xfs_admin -L media /dev/sdb1

这将会将原有的别名(即使没有)重置为'media',如果查看原有的别名,可做如下操作:
xfs_admin -l dev_name
# xfs_admin -l /dev/mapper/vg0-lv_home 
label = "home"

二、在lvm上扩展/减少已有分区卷
在需要调用'xfs_growfs'来实现,用于在lvm格式区上创建的分区,为原有区增加了空间,就需要用它的告知分区新的大小。与ext4分区扩展一样简单,不用卸载分区即可实现。比如已经有挂载:/dev/mapper/vg0-lv_home -> /home,那么直接操作'/home'分区就好了:
 xfs_growfs /home
 
关于分区减小的操作,很遗憾,xfs与jfs都不支持分区减小。

关于lvm分区调整,可以参考此文档:lvm设置入门


查看XFS文件系统信息

[root@localhost ~]# xfs_info /dev/sda3
meta-data=/dev/sda3      isize=256    agcount=32, agsize=228641200 blks
 =       sectsz=4096  attr=2, projid32bit=0
data     =       bsize=4096   blocks=7316518400, imaxpct=5
 =       sunit=0      swidth=0 blks
naming   =version 2      bsize=4096   ascii-ci=0
log      =internal       bsize=4096   blocks=521728, version=2
 =       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0

建立XFS文件系统(格式化)

mkfs.xfs -b size=2k -n size=4k /dev/sda3

mkfs.xfs -b size=2k /dev/sdxn

通过以上实例总结执行命令:mkfs.xfs -d su=64k,sw=4 /dev/sda3
指定 'su' and 'sw' 参数
# mkfs.xfs -d su=64k,sw=4 /dev/sda3

mkfs.xfs 常用参数说明:

-b选项 size= 逻辑块尺寸大小。The default block size is 4096 bytes (4 KB).

-n选项 size= 文件系统目录块大小。目录块应大于文件系统逻辑块。

-l选项 每个xfs文件系统有一个文件系统日志记录。这个日志需要专用的磁盘空间。这个空间不能被df显示,也不能以文件名来访问。日志记录分为外部和内部日志。外部指得是使用一个外部设备。内部指得是占用一个专用的磁盘空间。关于内部日志,这个大小是以-l size=选项来指定的。这个默认日志大小会越来越大,直到最大的日志大小,128M,在一个1TB的文件系统中。

For filesystems with a very high transaction activity, a large log size is recommended. You should avoid making your log too large because a large log can increase filesystem mount time after a crash.在一个比较大量活跃的业务文件系统中,推荐一个大的日志size,你应避免让你的日志大大,因为一个大日志能加重在crash后文件系统挂接时间。
 
-d选项 主要用于数据部份的参数指定,例如在一个raid设备中如何进行分配。

For a RAID device, the default stripe unit is 0, indicating that the feature is disabled. You should configure the stripe unit and width sizes of RAID devices in order to avoid unexpected performance anomalies caused by the filesystem doing non-optimal I/O operations to the RAID unit. For example, if a block write is not aligned on a RAID stripe unit boundary and is not a full stripe unit, the RAID will be forced to do a read/modify/write cycle to write the data. This can have a significant performance impact. By setting the stripe unit size properly, XFS will avoid unaligned accesses。

如果一个块写入不是被列在一个条带单元范围里并且不是一个完整的条带单元,这个raid会被强制执行一个r/m/wm周期以写入数据。这能带来重大性能的影响,通过正确设置条带大小,xfs会避免不连续的访问。

与阵列卡一起使用时块的计算

Start with your RAID stripe size.  Let’s use 64k which is a common default.  In this case 64K = 2^16 = 65536 bytes. 默认尺寸是64K

Get your sector size from fdisk.  In this case 512 bytes. 扇区大小512b

Calculate how many sectors fit in a RAID stripe. 65536 / 512 = 128 sectors per stripe. 每个条带大小128个扇区

Get start boundary of our mysql partition from fdisk: 27344896. 查看该分区的起始数为27344896

See if the Start boundary for our mysql partition falls on a stripe boundary by dividing the start sector of the partition by the sectors per stripe: 27344896 / 128 = 213632. This is a whole number, so we are good. If it had a remainder, then our partition would not start on a RAID stripe boundary. 查看如果由起始扇区划分的起始边界落到条带的边界,再计算扇区数,得到一个整数。如果有余数,那么我们的分区不会从raid条带边界开始。

XFS requires a little massaging (or a lot).  For a standard server, it’s fairly simple.  We need to know two things:

RAID stripe size
Number of unique, utilized disks in the RAID.  This turns out to be the same as the size formulas I gave above:
 RAID 1+0: is a set of mirrored drives, so the number here is num drives / 2.
 RAID 5: is striped drives plus one full drive of parity, so the number here is num drives – 1.

In our case, it is RAID 1+0 64k stripe with 8 drives. Since those drives each have a mirror, there are really 4 sets of unique drives that are striped over the top. Using these numbers, we set the ‘su’ and ‘sw’ options in mkfs.xfs with those two values respectively.

指定块和内部log大小
# mkfs.xfs -b size=1k -l size=10m /dev/sdc1

使用逻辑卷做为外部日志卷
# mkfs.xfs -l logdev=/dev/sdh,size=65536b /dev/sdc1

指定目录块
# mkfs.xfs -b size=2k -n size=4k /dev/sdc1

指定mount参数提高一些性能

比如常见的noatime,nodiratime,nobarrier

/home/pgsql xfs nobarrier,noatime,nodiratime

三、大分区使用xfs文件系统时空间不足的问题
在大分区在使用时,突然报空间不足的问题,登录系统查看结果如下:
[root@freeoa ~]# df -hT
Filesystem    Type    Size  Used Avail Use% Mounted on
/dev/sdb1      xfs     9T   6T  2.4T  88% /oabackup

[root@freeoa ~]# df -hi
Filesystem  Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb1   9.3G    3.4M    9.3G    1% /backup

可以看到,不论是物理空间,还是inode,都还有充足的余量,那为什么还会报告磁盘空间不够呢?

这种情况在XFS FAQ有如下的描述:

Q: What is the inode64 mount option for?

By default, with 32bit inodes, XFS places inodes only in the first 1TB of a disk. If you have a disk with 100TB, all inodes will be stuck in the first TB. This can lead to strange things like "disk full" when you still have plenty space free, but there's no more place in the first TB to create a new inode. Also, performance sucks.
To come around this, use the inode64 mount options for filesystems >1TB. Inodes will then be placed in the location where their data is, minimizing disk seeks.
Beware that some old programs might have problems reading 64bit inodes, especially over NFS. Your editor used inode64 for over a year with recent (openSUSE 11.1 and higher) distributions using NFS and Samba without any corruptions, so that might be a recent enough distro.

xfs文件系统会把inode存储在磁盘最开始的这1T空间里,如果这部分空间被完全使用了,那么就会出现磁盘空间不足的错误提示了。

解决办法就是在挂载时,指定inode64选项:
mount -o remount -o noatime,nodiratime,inode64,nobarrier /dev/sdb1 /oabackup


四、文件系统的维护

1、碎片的整理
查看文件块状况: xfs_bmap -v file.tar.bz2
查看磁盘碎片状况: xfs_db -c frag -r /dev/sda1
整理碎片: xfs_fsr /dev/sda1

2、文件系统一致性检测
xfs_repair -n /dev/cciss/cpd0p
xfs_repair -n (非更改模式)
xfs_check

不同于fsck,xfs_check和xfs_repair都不会在启动时自动调用。你应在觉得文件系统有问题时使用这些命令。

3、修复文件系统
修复不一致的文件系统
xfs_repair不使用-n选项检查xfs的一致性,并且如果检测到问题,也会尽可能的检验它们。被检查和修复的文件系统必须被卸载。xfs_repair检查大体分为7个阶段,修复错误信息大体为5种信息。

如果xfs_repair把文件和目录放进lost_found目录中并且你没有移动它们,下一步运行xfs_repair,它临时关闭这些文件和目录的inode,它们被重新连接在xfs_repair终止之前。由于关闭的inodes在lost_found,会看到以下的输出:
Phase 1 - find and verify superblock...
Phase 2 - zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
...
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- clearing existing “lost+found” inode
- deleting existing “lost+found” entry
- check for inodes claiming duplicate blocks...
- agno = 0
imap claims in-use inode 242000 is free, correcting imap
- agno = 1
- agno = 2
...
Phase 5 - rebuild AG headers and trees...
- reset superblock counters...
Phase 6 - check inode connectivity...
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 242000, moving to lost+found    
Phase 7 - verify and correct link counts...
done

In this example, inode 242000 was an inode that was moved to lost+found during a previous xfs_repair run. This run of xfs_repair found that the filesystem is consistent. If the lost+found directory had been empty, in phase 4 only the messages about clearing and deleting the lost+found directory would have appeared. The imap claims and disconnected inode messages appear (one pair of messages per inode) if there are inodes in the lost+found directory

在这个例子里 inode 242000是一个被移到lost_found在上次xfs_repair运行时侯。这个xfs_repair运行发现文件系统是连读的。如果lost_found被置空,在第四阶段只有关于清理和删除的信息出现。如果在lost_found目录里,imap claims和disconnected inode信息会出现。

4、为XFS文件系统重新分配UUID

查看所有分区(不仅是xfs文件系统)的uuid
blkid

取得已存在分区的uuid
# xfs_admin -u /dev/sdb
UUID = 88b6b5ee-1125-4a52-ae8a-48bf3ced1c18

为已存在的分区设定一个新的uuid,多用于新盘替换老盘又不想更改挂载信息的情况
# xfs_admin -U 893e121c-582e-4851-bad7-cf46f01167b3 /dev/sde
Clearing log and setting UUID
writing all SBs
new UUID = 893e121c-582e-4851-bad7-cf46f01167b3

清空分区的uuid
# xfs_admin -U nil /dev/sdb
Clearing log and setting UUID
writing all SBs
new UUID = 00000000-0000-0000-0000-000000000000

为分区指定一个随机的uuid
# xfs_admin -U generate /dev/sdb
writing all SBs
new UUID = c1b9d5a2-6789-11ab-9101-0020afc76f16

或者先用uuid生成工具生成一个,然后再将其写入
uuidgen
893e121c-582e-4851-bad7-cf46f01167b3

ext文件系统
tune2fs /dev/sdc1 -U f0acce91-a416-1234-abcd-43f3ed3768f9

5、磁盘挂载时需要清理的错误

mount挂载磁盘是提示"mount: Structure needs cleaning"。
使用"xfs_repair -L /dev/sdb1"修复所挂在的分区。
 

你可能感兴趣的:(大数据运维之linux日常)