本篇的(上)基本搞清楚了我们已经实现的内存块设备驱动和通用块层之间的丝丝联系。现在我们该做点自己想做的事情了: 踢开IO调度器,自己来处理bio。
踢开IO调度器很容易,即不使用__make_request 这个系统指定的强力函数,如何不使用?其实我们从(上)的blk_init_queue()函数中也能看出来,系统使用了blk_queue_make_request(q, __make_request)这个函数,那么我们也可以使用这个函数来指定我们自己的策略函数,从而替换掉__make_request函数。那初始化request_queue的blk_init_queue函数也不需要了。
直接看更改过后的源码:
simp_blkdev.c:
[cpp]
view plaincopyprint?
#include<linux/init.h>
#include<linux/module.h>
#include<linux/genhd.h>
#include<linux/fs.h>
#include<linux/blkdev.h>
#include<linux/bio.h>
#define SIMP_BLKDEV_DISKNAME "simp_blkdev"
#define SIMP_BLKDEV_DEVICEMAJOR COMPAQ_SMART2_MAJOR
#define SIMP_BLKDEV_BYTES (8*1024*1024)
static DEFINE_SPINLOCK(rq_lock);
unsigned char simp_blkdev_data[SIMP_BLKDEV_BYTES];
static struct gendisk *simp_blkdev_disk;
static struct request_queue *simp_blkdev_queue;//device's request queue
struct block_device_operations simp_blkdev_fops = {
.owner = THIS_MODULE,
};
//handle bio
static int simp_blkdev_make_request(struct request_queue *q, struct bio *bio){
struct bio_vec *bvec;
int i;
void *dsk_mem;
if( (bio->bi_sector << 9) + bio->bi_size > SIMP_BLKDEV_BYTES){
printk(KERN_ERR SIMP_BLKDEV_DISKNAME ":bad request:block=%llu,count=%u\n",(unsigned long long)bio->bi_sector,bio->bi_size);
bio_endio(bio,-EIO);
return 0;
}
dsk_mem = simp_blkdev_data + (bio->bi_sector << 9);
bio_for_each_segment(bvec, bio, i){
void *iovec_mem;
switch( bio_rw(bio) ){
case READ:
case READA:
iovec_mem = kmap(bvec->bv_page) + bvec->bv_offset;
memcpy(iovec_mem, dsk_mem, bvec->bv_len);
kunmap(bvec->bv_page);
break;
case WRITE:
iovec_mem = kmap(bvec->bv_page) + bvec->bv_offset;
memcpy(dsk_mem, iovec_mem, bvec->bv_len);
kunmap(bvec->bv_page);
break;
default:
printk(KERN_ERR SIMP_BLKDEV_DISKNAME ": unknown value of bio_rw: %lu\n",bio_rw(bio));
bio_endio(bio,-EIO);
return 0;
}
dsk_mem += bvec->bv_len;
}
bio_endio(bio,0);
return 0;
}
static int simp_blkdev_init(void){
int ret;
simp_blkdev_queue = blk_alloc_queue(GFP_KERNEL);
if(!simp_blkdev_queue){
ret = -ENOMEM;
goto error_alloc_queue;
}
blk_queue_make_request(simp_blkdev_queue, simp_blkdev_make_request);
//alloc the resource of gendisk
simp_blkdev_disk = alloc_disk(1);
if(!simp_blkdev_disk){
ret = -ENOMEM;
goto error_alloc_disk;
}
//populate the gendisk structure
strcpy(simp_blkdev_disk->disk_name,SIMP_BLKDEV_DISKNAME);
simp_blkdev_disk->major = SIMP_BLKDEV_DEVICEMAJOR;
simp_blkdev_disk->first_minor = 0;
simp_blkdev_disk->fops = &simp_blkdev_fops;
simp_blkdev_disk->queue = simp_blkdev_queue;
set_capacity(simp_blkdev_disk,SIMP_BLKDEV_BYTES>>9);
add_disk(simp_blkdev_disk);
printk("module simp_blkdev added.\n");
return 0;
error_alloc_queue:
blk_cleanup_queue(simp_blkdev_queue);
error_alloc_disk:
return ret;
}
static void simp_blkdev_exit(void){
del_gendisk(simp_blkdev_disk);
put_disk(simp_blkdev_disk);
blk_cleanup_queue(simp_blkdev_queue);
printk("module simp_blkdev romoved.\n");
}
module_init(simp_blkdev_init);
module_exit(simp_blkdev_exit);
为了不使用IO调度器,自己处理bio,我们需要掌握如下几个关键方法和数据结构:
request_queue *blk_alloc_queue(gfp_t gfp_mask) //用来初始化request_queue,填充基本结构,如链表头,锁。
void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn) //源码的注释对该函数做了清楚的说明:
[plain]
view plaincopyprint?
/**
* blk_queue_make_request - define an alternate make_request function for a device
* @q: the request queue for the device to be affected
* @mfn: the alternate make_request function
*
* Description:
* The normal way for &struct bios to be passed to a device
* driver is for them to be collected into requests on a request
* queue, and then to allow the device driver to select requests
* off that queue when it is ready. This works well for many block
* devices. However some block devices (typically virtual devices
* such as md or lvm) do not benefit from the processing on the
* request queue, and are served best by having the requests passed
* directly to them. This can be achieved by providing a function
* to blk_queue_make_request().
*
* Caveat:
* The driver that does this *must* be able to deal appropriately
* with buffers in "highmemory". This can be accomplished by either calling
* __bio_kmap_atomic() to get a temporary kernel mapping, or by calling
* blk_queue_bounce() to create a buffer in normal memory.
**/
明白了吧,我们的块设备驱动由于也是虚拟的块设备,故并不受益于IO调度,而受益于直接处理bio。该函数的第二个参数就是我们需要编写的处理bio的函数。
int (your_make_request) (struct request_queue *q, struct bio *bio) // 这是我们需要编写的主要函数,功能即对bio进行处理。bio的结构自己去google吧,在这里我们只点出,bio对应块设备上一段连续空间的请求,bio中包含的多个bio_vec用来指出这个请求对应的每段内存。所以,该函数的本质即 在一个循环中,处理bio中的每个bio_vec。
bio_for_each_segment(bvl, bio, i) // 宏,用来方便我们对bio结构进行遍历。
bio->bi_sector //bio请求的块设备起始扇区
bio->bi_size //bio请求的扇区数
void bio_endio(struct bio *bio, int error) // 结束bio请求。
void *kmap(struct page *page) // 返回页的虚拟地址。如果页在高端内存,则将内存页映射到非线性映射区域再返回地址。
void kunmap(struct page *page) //将映射的非线性区域还给系统。
掌握了上面的知识,我们就可以看懂simp_blkdev_make_request函数了,总体过程为在bio_for_each_segment循环中根据读或者写来处理bio中的每一个bio_vec,处理bio_vec时,基本思想为计算bio_vec描述的内存地址以及我们块设备的地址dsk_mem,然后memcpy。细节为两边地址的计算。
好了。我们来实验一下我们新的块设备驱动程序吧:
初始化块设备:
挂载:
读写:
踢开了IO调度器,故sysfs中没有了以前的queue目录:
至此,一个没有IO调度器的内存块设备驱动终于完成,我们大呼一口气,可算自己做了点事情。