blktap_device的结构很简单:
struct blktap_device {
spinlock_t lock;
struct gendisk *gd;
};
其中struct gendisk结构是内核块设备结构block_device用到的通用disk结构
blktap_device_open
从内核的通用结构 block_device -> bd_disk 中得到硬盘数据结构 struct gendisk 。 从gendisk->private_data中得到 blktap_device
我们/dev/xen/blktap-2/tapdiskXXX 这些块设备用得就是 blktap_device 结构
blktap_device_release
通过传入的gendisk结构,得到blktap_device, block_device, blktap 等结构, 调用blktap_device_release之后,最后把blktap结构的 dev_inuse 位设置为BLKTAP_DEVICE_CLOSED, 调用 blktap_ring_kick_user ,wake_up blktap->ring设备里的 poll_wait 信号。
blktap_device_getgeo
返回struct hd_geometry,包含块设备head, cylinder, sector等信息
blktap_device_create
blktap 环设备 blktapXXX,当调用ioctl 并传入cmd为BLKTAP2_IOCTL_CREATE_DEVICE时,会调用 blktap_device_create 来创建tapdevXXX设备。
if (test_bit(BLKTAP_DEVICE, &tap->dev_inuse))
return -EEXIST;
if (blktap_device_validate_params(tap, params))
return -EINVAL;
gd = alloc_disk(1);
if (!gd) {
err = -ENOMEM;
goto fail;
}
if (minor < 26) {
sprintf(gd->disk_name, "td%c", 'a' + minor % 26);
} else if (minor < (26 + 1) * 26) {
sprintf(gd->disk_name, "td%c%c",
'a' + minor / 26 - 1,'a' + minor % 26);
} else {
const unsigned int m1 = (minor / 26 - 1) / 26 - 1;
const unsigned int m2 = (minor / 26 - 1) % 26;
const unsigned int m3 = minor % 26;
sprintf(gd->disk_name, "td%c%c%c",
'a' + m1, 'a' + m2, 'a' + m3);
}
gd->major = blktap_device_major;
gd->first_minor = minor;
gd->fops = &blktap_device_file_operations;
gd->private_data = tapdev;
spin_lock_init(&tapdev->lock);
rq = blk_init_queue(blktap_device_do_request, &tapdev->lock);
if (!rq) {
err = -ENOMEM;
goto fail;
}
elevator_init(rq, "noop");
gd->queue = rq;
rq->queuedata = tapdev;
tapdev->gd = gd;
blktap_device_configure(tap, params);
add_disk(gd);
if (params->name[0])
strncpy(tap->name, params->name, sizeof(tap->name)-1);
set_bit(BLKTAP_DEVICE, &tap->dev_inuse);
dev_info(disk_to_dev(gd), "sector-size: %u capacity: %llu\n",
queue_logical_block_size(rq),
(unsigned long long)get_capacity(gd));
return 0;
test_bit 检查 tap设备是否在使用,如果已被使用报错退出。blktap_device_validate_params 检查blktap_params参数。比如sector size 不能 < 512 or > 4096,disk的capacity是否超过最大值 等。调用alloc_disk 创建一个gendisk结构,然后初始化这个gendisk结构,如下:
gd->major = blktap_device_major;
gd->first_minor = minor;
gd->fops = &blktap_device_file_operations;
gd->private_data = tapdev;
调用 blk_init_queue 初始化,关于blk_init_queue有如下描述
* Description:
* If a block device wishes to use the standard request handling procedures,
* which sorts requests and coalesces adjacent requests, then it must
* call blk_init_queue(). The function @rfn will be called when there
* are requests on the queue that need to be processed.
调用 elevator_init 初始化 request_queue rq
调用 add_disk(gendisk *),把struct gendisk 在内核注册
调用 blktap_device_configure,对tapdevXXX设备进行配置,其中blktap_params 参数由copy_from_user从user space得到:
set_capacity: 设置gendisk 磁盘大小 = 传入的 capacity
blk_queue_logical_block_size: set logical block size = 传入的sector_size
blk_queue_max_sectors:max_sectors 最小为8, 最大为1024个sector。注意这里的sector大小是块驱动认为的固定大小 512 bytes
blk_queue_segment_boundary / blk_queue_max_segment_size : per segment 的 size是 4K
blk_queue_max_phys_segments / blk_queue_max_hw_segments : request_queue 每个 request 最多有11个segment,每个segment 4k,相当于8个sectors大小
blktap_device_destroy
blktapXXX设备执行ioctl, command为BLKTAP2_IOCTL_REMOVE_DEVICE时,执行blktap_device_destroy。
blktap_device_destroy会调用 blk_cleanup_queue,这是内核的通用函数
void blk_cleanup_queue(struct request_queue *q)
{
/*
* We know we have process context here, so we can be a little
* cautious and ensure that pending block actions on this device
* are done before moving on. Going into this function, we should
* not have processes doing IO to this device.
*/
blk_sync_queue(q);
mutex_lock(&q->sysfs_lock);
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);
if (q->elevator)
elevator_exit(q->elevator);
blk_put_queue(q);
}
我们知道request_queue里的IO请求都是异步的,在关闭tapdevXXX 设备的时候,这些请求是需要进行清理的。这通过blk_sync_queue来实现。
/**
* blk_sync_queue - cancel any pending callbacks on a queue
* @q: the queue
*
* Description:
* The block layer may perform asynchronous callback activity
* on a queue, such as calling the unplug function after a timeout.
* A block device may call blk_sync_queue to ensure that any
* such activity is cancelled, thus allowing it to release resources
* that the callbacks might use. The caller must already have made sure
* that its ->make_request_fn will not re-add plugging prior to calling
* this function.
*
*/
void blk_sync_queue(struct request_queue *q)
{
del_timer_sync(&q->unplug_timer);
del_timer_sync(&q->timeout);
cancel_work_sync(&q->unplug_work);
}
blk_sync_queue应该对于没有返回的IO请求,取消之前的注册行为,相当于discard这些请求了。
blktap_device_fail_queue
该函数调用 __blktap_next_queued_rq 遍历 request_queue,对每个请求调用 __blktap_end_queued_rq(rq, -EIO)
我们回顾下blktapXXX设备提供了如下操作
static struct file_operations blktap_ring_file_operations = {
.owner = THIS_MODULE,
.open = blktap_ring_open,
.release = blktap_ring_release,
.ioctl = blktap_ring_ioctl,
.mmap = blktap_ring_mmap,
.poll = blktap_ring_poll,
};
blktap_ring_poll
blktap_ring_poll 会调用 blktap_device_run_queue,里面又是一个循环,对request_queue里的所有request, 调用 blktap_device_make_request 。
blktap_device_make_request 首先调用blktap_ring_make_request,生成 blktap_request 结构,然后调用 blktap_request_get_pages 为blktap_request 分配页框,最后调用 blktap_ring_submit_request
blktap_device_do_request 是 tapdevXXX 块设备初始化函数 blk_init_queue 传入的函数指针。这个指针具体做什么的请参考内核块设备。blktap_device_do_request 调用了blktap_ring_kick_user,用来 wake_up 一个 blktap_ring->poll_wait 结构。还记得之前的blktap_ring_poll函数么,该函数调用 poll_wait(filp, &ring->poll_wait, wait) 一直阻塞在 poll_wait 这个wait_queue list 上。所以可以认为 blktap_ring_kick_user 用来唤醒 blktap_ring_poll 函数,把request_queue里的request submit上去。
blktap_ring_submit_request 把请求放到IO环里,下一步应该是tapdisk2 来处理这些IO请求了