检查ubuntu硬盘坏道和读写测试

遇到了磁盘写的问题,服务器一直超时,找不到问题,怀疑是网络问题,和磁盘问题,通过排除法,确定是某一台机器有问题,下面开始分析是磁盘问题,还是网卡问题,网卡采用iperf去做测试,本例只讲磁盘的检测:
采用工具是fio测试磁盘的读写性能,本例只以顺序写为例测试。

  1. 先安装fio
 $ apt-get install fio 
#出现 [Y/n]  时输入y即可,安装信息如下所示:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  formencode-i18n libbabeltrace-ctf1 libbabeltrace1 libcephfs2 libcurl3 libgoogle-perftools4 libjs-jquery libjs-sphinxdoc libjs-underscore libleveldb1v5 libpython2.7
  libradosstriper1 librgw2 libsnappy1v5 libtcmalloc-minimal4 libunwind8 python-cephfs python-cffi-backend python-cherrypy3 python-cryptography python-dnspython python-enum34
  python-flask python-formencode python-idna python-ipaddress python-itsdangerous python-jinja2 python-logutils python-mako python-markupsafe python-openssl python-paste
  python-pastedeploy python-pastedeploy-tpl python-pecan python-prettytable python-pyasn1 python-rados python-rbd python-repoze.lru python-requests python-rgw python-routes
  python-simplegeneric python-singledispatch python-tempita python-urllib3 python-waitress python-webob python-webtest python-werkzeug
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  librdmacm1
Suggested packages:
  gnuplot gfio
The following NEW packages will be installed:
  fio librdmacm1
0 upgraded, 2 newly installed, 0 to remove and 194 not upgraded.
Need to get 417 kB of archives.
After this operation, 1,730 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://cn.archive.ubuntu.com/ubuntu xenial/main amd64 librdmacm1 amd64 1.0.21-1 [49.1 kB]
Get:2 http://cn.archive.ubuntu.com/ubuntu xenial/universe amd64 fio amd64 2.2.10-1ubuntu1 [368 kB]
Fetched 417 kB in 2s (180 kB/s)
Selecting previously unselected package librdmacm1.
(Reading database ... 92532 files and directories currently installed.)
Preparing to unpack .../librdmacm1_1.0.21-1_amd64.deb ...
Unpacking librdmacm1 (1.0.21-1) ...
Selecting previously unselected package fio.
Preparing to unpack .../fio_2.2.10-1ubuntu1_amd64.deb ...
Unpacking fio (2.2.10-1ubuntu1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up librdmacm1 (1.0.21-1) ...
Setting up fio (2.2.10-1ubuntu1) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
  1. 测试使用:
fio -filename=/mnt/data/testd -direct=1 -thread -rw=write -bs=50k -size=20G -numjobs=20 -runtime=600 -group_reporting -name=sqe_100read_50k

filename 参数代表的是要测试的磁盘的位置,注意目录,不同的目录测试的是不同磁盘的性能,支持文件系统和裸设备。听说直接写裸盘出现了文件系统损坏的风险,慎重执行,本例是写的挂载了文件系统后的文件目录未出现问题;
directory 可以写到固定的文件夹下,需要文件夹存在;
direct=1 测试过程绕过机器自带的buffer,测真实的性能
rw 代表测试的模式,write是顺序写、read顺序读、randwread 随机读、randwrite 随机写、randrw 随机读写、rw 顺序读写
bs 单次io的文件大小
bsrange=512-2048 同上,提定数据块的大小范围
size=20g 本次的测试文件大小为20g,以每次50k的io进行测试
numjobs=20 本次的测试线程为20
runtime=600 测试时间为600秒,如果不写则一直将20g文件分50k每次写完为止
ioengine=psync io引擎使用pync方式,如果要使用libaio引擎,需要yum install libaio-devel包
rwmixwrite=30 在混合读写的模式下,写占30%
group_reporting 关于显示结果的,汇总每个进程的信息

3.测试结果

sqe_100read_50k: (g=0): rw=write, bs=50K-50K/50K-50K/50K-50K, ioengine=sync, iodepth=1
...
fio-2.2.10
Starting 20 threads
sqe_100read_50k: Laying out IO file(s) (1 file(s) / 20480MB)
Jobs: 20 (f=20): [W(20)] [100.0% done] [0KB/32200KB/0KB /s] [0/644/0 iops] [eta 00m:00s]
sqe_100read_50k: (groupid=0, jobs=20): err= 0: pid=240789: Wed Oct 16 20:41:21 2019
  write: io=23914MB, bw=40811KB/s, iops=816, runt=600035msec
    clat (usec): min=536, max=7925.8K, avg=24498.66, stdev=233209.22
     lat (usec): min=537, max=7925.8K, avg=24499.90, stdev=233209.24
    clat percentiles (usec):
     |  1.00th=[  596],  5.00th=[  620], 10.00th=[  636], 20.00th=[  660],
     | 30.00th=[  676], 40.00th=[  684], 50.00th=[  700], 60.00th=[  724],
     | 70.00th=[  788], 80.00th=[ 1896], 90.00th=[ 1992], 95.00th=[ 2224],
     | 99.00th=[970752], 99.50th=[1843200], 99.90th=[3489792], 99.95th=[4079616],
     | 99.99th=[5013504]
    bw (KB  /s): min=    7, max=54990, per=8.29%, avg=3382.11, stdev=8328.51
    lat (usec) : 750=67.24%, 1000=4.47%
    lat (msec) : 2=19.97%, 4=5.68%, 10=0.26%, 20=0.41%, 50=0.46%
    lat (msec) : 100=0.11%, 250=0.04%, 500=0.07%, 750=0.11%, 1000=0.26%
    lat (msec) : 2000=0.47%, >=2000=0.43%
  cpu          : usr=0.04%, sys=0.30%, ctx=979646, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=489759/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=23914MB, aggrb=40810KB/s, minb=40810KB/s, maxb=40810KB/s, mint=600035msec, maxt=600035msec

Disk stats (read/write):
  sda: ios=0/489803, merge=0/36, ticks=0/581616, in_queue=581452, util=96.56%

核心的就是看iops [0KB/32200KB/0KB /s] [0/644/0 iops],可以看下大概的性能。
io=执行了多少M的IO
bw=平均IO带宽
iops=IOPS
runt=线程运行时间
slat=提交延迟
clat=完成延迟
lat=响应时间
bw=带宽
cpu=利用率
IO depths=io队列
IO submit=单个IO提交要提交的IO数
IO complete=Like the above submit number, but for completions instead.
IO issued=The number of read/write requests issued, and how many of them were short.
IO latencies=IO完延迟的分布
io=总共执行了多少size的IO
aggrb=group总带宽
minb=最小.平均带宽.
maxb=最大平均带宽.
mint=group中线程的最短运行时间.
maxt=group中线程的最长运行时间.
ios=所有group总共执行的IO数.
merge=总共发生的IO合并数.
ticks=Number of ticks we kept the disk busy.
io_queue=花费在队列上的总共时间.
util=磁盘利用率

  1. 普通的磁盘当低于100左右4K的块时,代表的是有一定的问题,需要去测试磁盘是否有坏道或是问题:此处使用的是badblockes
$ badblocks -b 122880  -v /dev/sdb > badsectors.txt

当出现如下错误时,根据检测的磁盘空间大小去设置-b的大小:

badblocks: Value too large for defined data type invalid end block (85948465152): must be 32-bit value

例如:
6T的盘需要4096的大小,180T的盘就是(4096/6) * 180 = 122880

你可能感兴趣的:(检查ubuntu硬盘坏道和读写测试)