出问题主机工作环境用的是xenserver6.5集群,有一天上去突然发现一台vm连不上了,想着那就上去xenserver重启虚拟机,结果强制重启不能成功,就上去宿主机查询磁盘空间

[root@VIP-XS-08 cron.d]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              20G  20G   0  100% /
none                  7.8G  2.0M  7.8G   1% /dev/shm

发现宿主机磁盘空间满了,ok,那清磁盘空间吧,结果执行下面命令发现

[root@VIP-XS-08 /]# cd /
[root@VIP-XS-08 /]# du -sh *
5.7M    bin
24M     boot
2.1M    cli-rt
3.3M    dev
7.4M    etc
28K     EULA
4.0K    home
118M    lib
20M     lib64
16K     lost+found
4.0K    media
4.0K    mnt
554M    opt
du: cannot read directory `proc/7020': No such file or directory
du: cannot read directory `proc/7021': No such file or directory
0       proc
12K     Read_Me_First.html
102M    root
24M     sbin
4.0K    selinux
4.0K    srv
0       sys
1.6M    tftpboot
68K     tmp
542M    usr
2.6G    var

好吗,磁盘空间没满,那怎么办,其它空间哪里去了,想想应该是删除了未释放空间的文件导致,再执行下面的命令,看看哪些文件是删除了还在使用的

[root@VIP-XS-08 cron.d]#  ls -l /proc/[0-9]*/fd/* |grep delete 
ls: /proc/29018/fd/255: No such file or directory
ls: /proc/29018/fd/3: No such file or directory
l-wx------ 1 root   root   64 Nov 14 13:14 /proc/22020/fd/2 -> /tmp/stunnelbd3855.log (deleted)
l-wx------ 1 root   root   64 Nov 14 13:27 /proc/24758/fd/2 -> /tmp/stunnel1bc930.log (deleted)
lrwx------ 1 root   root   64 Nov 14 11:03 /proc/4555/fd/6 -> /tmp/tmpfLfGwGG (deleted)
lrwx------ 1 root   root   64 Nov 14 11:03 /proc/4556/fd/6 -> /tmp/tmpfLfGwGG (deleted)
l-wx------ 1 root   root   64 Nov 14 11:03 /proc/4587/fd/5 -> /var/run/openvswitch/ovs-xapi-sync.pid.tmp4587 (deleted)
l-wx------ 1 root   root   64 Nov 14 11:03 /proc/4587/fd/12 ->  /var/log/blktap/tapdisk.2345.log (deleted)

试了一圈,最后最大可能就是/var/log/blktap/tapdisk.2345.log (deleted) 这个文件了

tapdisk.2345.log 这个文件说明文件是一个tapdisk进程id为2345的log文件,里面主要记录tapdisk监控磁盘镜像的日志记录,像是下面的日志记录

Aug 21 17:55:06: [17:55:06.597] tapdisk_vbd_check_progress: vhd:/dev/VG_XenStorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/VHD-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog timeout: pending requests idle for 60 seconds
Aug 21 17:55:06: [17:55:06.597] tapdisk_vbd_check_progress: vhd:/dev/VG_XenStorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/VHD-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog timeout: pending requests idle for 60 seconds
Aug 21 17:55:06: [17:55:06.921] tapdisk_vbd_check_progress: vhd:/dev/VG_XenStorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/VHD-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog timeout: pending requests idle for 60 seconds
Aug 21 17:55:06: [17:55:06.925] tapdisk_vbd_check_progress: vhd:/dev/VG_XenStorage-39d05ede-4cd6-6dd0-4263-f8dbe2949580/VHD-2e957900-09c5-4e8d-9ba1-c9e17f78f519: watchdog timeout: pending requests idle for 60 seconds


那么xen的虚拟机挂了,会导致一开始那个问题呢,无法重启虚拟机,宿主机磁盘空间满,日志文件又给删除呢?

答案是虚拟机挂了后,宿主机上vm对应的的tapdisk进程不断刷日志,直到刷爆磁盘,导致虚拟机想重启也没法重启,因为宿主机的磁盘空间满了。但是如果日志大小超过了触发了日志滚动的大小,日志发生备份操作,滚动后刚刚好有超过了预设的最多保留个数的限制,那文件就会被删除掉

[root@VIP-XS-08 /]# rpm -vV  elasticsyslog
........  c /etc/cron.d/logrotate.cron
........  c /etc/logrotate-xenserver.conf
........    /etc/sysconfig/syslog.elastic
........    /etc/sysconfig/syslog.patch
........    /opt/xensource/bin/delete_old_logs_by_space
........    /opt/xensource/bin/elasticsyslog
........    /opt/xensource/bin/logrotate-xenserver
........    /opt/xensource/bin/rotate_logs_by_size
[root@VIP-XS-08 /]# cat /etc/logrotate.conf
# see "man logrotate" for details
# rotate log files weekly
weekly

# keep 4 weeks worth of backlogs
rotate 4

# create new (empty) log files after rotating old ones
create

# uncomment this if you want your log files compressed
#compress

# RPM packages drop log rotation information into this directory
include /etc/logrotate.d

# no packages own wtmp -- we'll rotate them here
/var/log/wtmp {
    monthly
    minsize 1M
    create 0664 root utmp
    rotate 1
}

/var/log/btmp {
    missingok
    monthly
    minsize 1M
    create 0600 root utmp
    rotate 1
}

# system-specific logs may be also be configured here.


说了那么多,解决的方法也很简单,就是释放占用删除文件的进程,看到上面的/var/log/blktap/tapdisk.2345.log (deleted) 了吗,进程号就是2345了,干掉它

[root@VIP-XS-08 /]# ps -ef |grep 2345
root     18165 15432  0 14:22 pts/37   00:00:00 grep 21611
root     2345     1  0 Jun01 ?        03:10:55 tapdisk
[root@VIP-XS-08 /]# kill 2345
[root@VIP-XS-08 /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1              20G  4.1G   15G  22% /
none                  7.8G  2.0M  7.8G   1% /dev/shm

好吧,看到空间出来了吧,这时候,你会看到宿主机恢复正常了,因为有磁盘空间了,我们原先挂掉的那台虚拟机也已经关机了.

那接下来,启动虚拟机吧,如果你是集群的虚拟机,那最简单,在另一个宿主机上启动就可以,如果你是单独一台虚拟机,或是想在原先的宿主机上启动,那你需要先启动tapdisk,这里需要个编号,在你干掉虚拟机进程前最好记住,没什么好办法,执行下面命令,保存,等到执行kill 进程后,再执行下面命令,就可以找到对应该虚拟机的启动tapdisk工作进程

 #查看所有的tapdisk进程
 #ps -ef |grep tap
 # 启动vm自己的tapdisk进程,注意,这里的8是我通过kill前后的执行  ps -ef |grep tap 对比得出,不是固定的
 #tapback -d -x 18

启动完vm对应的tapdisk进程,你就可以正常启动虚拟机了。


下面是补给,解释什么是tapdisk,可以给有需要的朋友,本人英文也是能仅限看懂的水平,就不献丑翻译了:

url : https://wiki.xen.org/wiki/Blktap

tapdisk, each tapdisk process in userspace is backed by one or several p_w_picpath files

When xend is started the userspace daemon blktapctrl is started, too. When booting the Guest VM the XenBus is initialized as described in XenSplitDrivers. The request for a new virtual disk is propagated to blktapctrl, which creates a new character device and two named pipes for communication with a newly forked tapdisk process. 

After opening the character device the shared memory is mapped to the fe_ring using the mmap system call. The tapdisk process opens the p_w_picpath file and sends information about the p_w_picpathas size back to blktapctrl, which stores it. After this initialization tapdisk executes a select system call on the two named pipes. On an event it checks if the tap-fd is set and if it is, tries to read a request from the frontend ring. 

The XenBus connection between DomU and Dom0 is used by XenStore to negotiate the backend/frontend connection. After the setup of both backend and frontend a shared ring page and an event channel are negotiated. These are used for any further communication between backend and frontend. I/O requests issued in the Guest VM are handled in the Guest OS and forwarded using these two communication channels.


There is a trade-off between delay and throughput which is controlled by modifying the number of requests until the blktap driver is notified. 

The blktap driver notifies the appropriate blktapctrl or tapdisk process depending on the event type by returning the poll and waking up the tapdisk process respectively. The shared frontend ring works as described in the ring.h. 

tapdisk reads the request from the frontend ring and in case of synchronous I/O reads and immediately returns the request. In case of asynchronous I/O a batch of requests is submitted to Linux AIO subsystem. Both mechanisms read from the p_w_picpath file. In the asynchronous case it is checked using the non-blocking system call io_getevents if the I/O requests were completed. 

The information about completed requests is propagated in the frontend ring. The blktap driver is notified by the tapdisk process with the ioctl system call. 
Using the same XenSplitDevices mechanism the data is returned to the frontend of the Guest VM.