linux块设备2

kernel 3.1.5

block/blk-core.c: 管理queue, request。


1. EXPORT_SYMBOL宏解释:

#define __EXPORT_SYMBOL(sym, sec)                               \

extern typeof(sym) sym;                                 \
 __CRC_SYMBOL(sym, sec)                                  \
static const char __kstrtab_##sym[]                     \                                                                                                        

__attribute__((section("__ksymtab_strings"), aligned(1))) \                                                                                                     

= MODULE_SYMBOL_PREFIX #sym;                            \

static const struct kernel_symbol __ksymtab_##sym       \
__used                                                  \
__attribute__((section("___ksymtab" sec "+" #sym), unused))     \
= { (unsigned long)&sym, __kstrtab_##sym }


将sym的先存在__ksymtab_strings section的__kstrtab_##sym中,将sym的地址和 __kstrtab_##sym存入___ksymtab" sec "+" #sym section中的类型为 kernel_symbol 的结构__ksymtab_##sym。

__used : 告诉编译器无论是否发现调用者,都要编译被修饰的对象
unused: 用于函数和变量,表示该函数或变量可能不使用,这个属性可以避免编译器产生警告信息。


2. drive_stat_acct: 统计part iostatus


3. blk_queue_congestion_threshold: 计算queue 的nr_congestion_on /nr_congestion_off拥挤程度


4. blk_get_backing_dev_info:从block_device返回request_queue的backing_dev_info

5. blk_rq_init: 初始化request, 并且用request_queue初始化部分request


6. req_bio_endio: request endio处理, 减少bio->bi_size的值,如果bio->bi_size的size为零且命令不是REQ_FLUSH_SEQ,那么调用bio_endio。

bio_endio调用bio->bi_end_io处理实际的end io动作(为不同的disk, fs, flush等)


7. blk_delay_work: 由struct work_struct返回request_queue, 调用__blk_run_queue(), 发送一个request, 具体由q->request_fn实现。request_fn是在allocate queue node等的时候初始化的


8. blk_delay_queue:增加delayed work到work queue


9. blk_start_queue: restart a previously stopped queue。 清除stopped flag, call q->request_fn(q)启动一个单个的device queue

10. blk_stop_queue: 取消delayed work, 设置stopped标志


11. blk_sync_queue: remove delay work timer,  人然后ancel a delayed work and wait for it to finish(cancel_delayed_work_sync)。

12. __blk_run_queue: 运行一个单独的device queue

13. blk_run_queue_async: 如果却ue没有stopped, 取消该delayed work, 再queue delayed work with dalay time 0.

14. blk_run_queue: 与__blk_run_queue不同在于这个函数多了request queue_lock

15. blk_put_queue: 减少queue kobj的引用计数

16. blk_cleanup_queue: 先blk_sync_queue, 再call del_timer_sync删除q->backing_dev_info.laptop_mode_wb_timer, 设置queue falg为QUEUE_FLAG_DEAD。 递减queue kobj引用计数

17. blk_init_free_list: 初始化rq的request_list, 并从mempool中分配rq_pool

18. blk_alloc_queue: 调用blk_alloc_queue_node。

blk_alloc_queue_node:

通过调用kmem_cache_alloc_node分配queue, node是指memory node,对于numa有效。

初始化queue的backing_dev_info,调用blk_throtl_init初始化queue的throtl。

setup q->backing_dev_info.laptop_mode_wb_timer timer, 设置timer回调函数调用laptop_mode_timer_fn(如果backing_dev_info脏, 回写backing_dev_info)。

setup q->timeout timer(queue timer), 并设置 blk_rq_timed_out_timer(遍历queue的timeout的timeout list的request, 如果request 未完成, 调用blk_rq_timed_out, 该函数根据queue的rq_timed_out_fn函数返回值, 分别调用__blk_complete_request, 或重设timer)。

初始化q->timeout_list, q->flush_queue[0], q->flush_queue[1], flush_data_in_flight。

初始化delayedwork,设置函数blk_delay_work

初始化queue的kobj, 类型为blk_queue_ktype(

struct kobj_type blk_queue_ktype = {                                                                                                                                                
.sysfs_ops      = &queue_sysfs_ops,                                                                                                                                         
.default_attrs  = default_attrs,                                                                                                                                            
.release        = blk_release_queue,                                                                                                                                        
}; 

初始化sysfs_lock

初始化__queue_lock


19. blk_init_queue:see the comments:

 * blk_init_queue  - prepare a request queue for use with a block device
 * @rfn:  The function to be called to process requests that have been
 *        placed on the queue.
 * @lock: Request queue spin lock
 *
 * Description:
 *    If a block device wishes to use the standard request handling procedures,
 *    which sorts requests and coalesces adjacent requests, then it must
 *    call blk_init_queue().  The function @rfn will be called when there
 *    are requests on the queue that need to be processed.  If the device
 *    supports plugging, then @rfn may not be called immediately when requests
 *    are available on the queue, but may be called at some time later instead.
 *    Plugged queues are generally unplugged when a buffer belonging to one
 *    of the requests on the queue is needed, or due to memory pressure.
 *
 *    @rfn is not required, or even expected, to remove all requests off the
 *    queue, but only as many as it can handle at a time.  If it does leave
 *    requests on the queue, it is responsible for arranging that the requests
 *    get dealt with eventually.
 *
 *    The queue spin lock must be held while manipulating the requests on the
 *    request queue; this lock will be taken also from interrupt context, so irq
 *    disabling is needed for it.
 *
 *    Function returns a pointer to the initialized request queue, or %NULL if
 *    it didn't succeed.
 *
 * Note:
 *    blk_init_queue() must be paired with a blk_cleanup_queue() call
 *    when the block device is deactivated (such as at module unload).

调用blk_init_queue_node初始化queue


blk_init_queue_node:

先调用blk_alloc_queue_node(前述)

再调用blk_init_allocated_queue_node

blk_init_allocated_queue_node:

设置request_fn函数

调用blk_queue_make_request, 为queue设置alterative make request函数__make_request


20.

blk_get_queue: 增加queue kobj引用计数


21.

blk_free_request:如果queue命令是REQ_ELVPRIV, 调用elv_put_request去调用elevator_put_req_fn

调用mempool_free从rq_pool释放request


22. blk_alloc_request:

调用mempool_alloc分配request

blk_rq_init:初始化request

如果priv != 0, 设置cmdflag 为REQ_ELVPRIV(see no 21)


23:

ioc_batching: returns true if the ioc is a valid batching request。

        return ioc->nr_batch_requests == q->nr_batching ||
                (ioc->nr_batch_requests > 0
                && time_before(jiffies, ioc->last_waited + BLK_BATCH_TIME));


time_before(a, b): 如果time a 小于b, 返回true


24. ioc_set_batching: 如果是一个有效的batching request, 返回, 否则设置:

        ioc->nr_batch_requests = q->nr_batching;
        ioc->last_waited = jiffies;

这样就是一个有效新batching request了


25. __freed_request:

如果queue的rl->count[sync]

否则如果rl->count[sync] + 1 <= q->nr_requests, wake up rl等待队列, 清除queue_flags的QUEUE_FLAG_SYNCFULL/QUEUE_FLAG_ASYNCFULL


26.freed_request: 一个request被release之后(get_regest和__blk_put_request), 更新full/congestion状态, 唤醒等等者

rl->count[sync]--

调用__freed_request更新唤醒

如果有rl->starved[sync ^ 1], 同样调用__freed_request更新唤醒


27:

blk_rq_should_init_elevator:

如果bio bi_rw类型不是(REQ_FLUSH | REQ_FUA), 需要init elevator。


28: get_request: Get a free request, 由get_request_wait和blk_get_request调用

request_list: free request list, 一个为写, 一个为读, 这个解释了sync标志的含义, 一个读, 一个写

rw_is_sync: 不是写请求REQ_WRITE或者是同步REQ_SYNC, 返回true。

if mayqueue is ELV_MQUEUE_NO, queue started.(空闲, 如果rl->count[is_sync] == 0,要设置rl->starved[is_sync]=1

if (rl->count[is_sync]+1 >= q->nr_requests),if full标志未设,  ioc_set_batching, and set queue full, 否则如果不是batcher, 返回null request,


rl->count[is_sync]+1 >= queue_congestion_on_threshold(q), 就设置queue congested标志


if (rl->count[is_sync] >= (3 * q->nr_requests / 2)), 如果queue中的request的数量超过总数的一定值, 也返回空的request,根据这个公式, 表明request的数量是可能大于最大总数nr_requests的


增加request的list的count数

既然开始有request了, 那么就不饿了, 设置rl->starved[is_sync]=0

如果是flush, 不需要init elevator, 否则rl->elvpriv++;


为何先unlock queue_lock? 因为调用get_request的时候是queue_lock lock的时候(queue_lock must be held, 刚刚前面的一些计算和设置需要lock保护),这个lock是在blk_get_request中上锁的。

如果blk_alloc_request成功, 保持queue_lock unlock, 如果是在batch 的BLK_BATCH_TIME之内或者刚call了ioc_set_batching, 就递减ioc->nr_batch_requests, 表明可以做一次batch request。 调用trace_block_getrq, 增加bio到trace

如果alloc没有成功, queue_lock上锁, 退回前面的设置, 如有可能, 唤醒queue等待队列, 如果rl->count[is_sync] == 0, 设置starved标志, 返回空request

blk_alloc_request:看item 22, 从mempool中的到request, 并初始化,

以后将request插入requestlist的时候, queue_lock是要先上锁的。


get_request_wait: 当第一次get_request不成功时, 把自己放在等待队列上, 当唤醒之后再get一次, 唤醒时起码可以得到一个request。

blk_get_request: 被很多driver调用, 它分别调用get_request_wait和get_request, 在调用他们之前, 先lock queue_lock. 如果调用get_request不成功,会unlock queue_lock.


29:

blk_make_request:

先get request, 然后生成bounce buffer, 再将bio挂到request上, 然后再将request放到request queue(会用电梯算法哦),

bounce buffer:

中文版ldd 439 DMA映射中说 当驱动程序试图在外围设备不可访问的地址上执行DMA时,将创建回弹缓冲区。

相当于一个跳转的buffer。例如一些老设备只能访问16M以下的内存,但DMA的目的地址却在16M以上时,就需要在设备能访问16M范围内设置一个buffer作为跳转。
今后的PCI设备都会在设备上集成IOMMU,这种问题将不再存在


blk_rq_append_bio:

把bio加到request中。

如果rq->bio空,调用blk_rq_bio_prep, 生成新的request的bio,

否则调用ll_back_merge_fn, 将bio合并到rq的bio。

如果合并正确,更新request的biotail和data_size.

其中下面片段的含义:

                rq->biotail->bi_next = bio;  更新旧的biotail的next,
                rq->biotail = bio; 新的biotail是bio
另外, 如果bio是在high mem中, 就不会认为这个bv是segment的一部分, 会使用bounce buffer, 参见__blk_recalc_rq_segments的部分代码:

                        high = page_to_pfn(bv->bv_page) > queue_bounce_pfn(q);
                        if (high || highprv)
                                goto new_segment;

30:

blk_insert_request: 调用add_acct_request增加request到queue, 调用__blk_run_queue发起一个request


31.

part_round_stats: 统计io performance status


32:

blk_put_request: 释放request


33. blk_add_request_payload: 增加payload到request, :“this is a quite horrible hack and nothing but handling of discard requests should ever use it.“


34:

__make_request: 会被generic_make_request函数调用,

首先生成bio的bounce buffer(有dma和isa两种, 其他不会生成)

调用attempt_plug_merge尝试把bio插入到task的plug list中的request queue(这个queue等于函数参数提供的queue)中,成功的话返回。

否则merge到函数提供的queue中,

1. 如果bio->bi_rw & (REQ_FLUSH | REQ_FUA)是false,

调用elv_merge得到request,

调用bio_attempt_back_merge或者bio_attempt_front_merge, 将request merge到queue。

2. 否则,

调用get_request_wait生成request

调用init_request_from_bio初始化request

如果current->plug存在, 把request加到plug的tail, 否则把request加到queue中, 调用__blk_run_queue启动request


bio_attempt_back_merge: merge bio到request biotail

bio_attempt_front_merge: merge bio到request的bio


35:

blk_partition_remap: 如果是分区, remap bio, 具体就是加上分区的start_sect。


36:

handle_bad_sector:设置bio->bi_flags为BIO_EOF


37:

setup_fail_make_request/should_fail_request/fail_make_request_debugfs/should_fail_request:

handle request error


38:

bio_check_eod: 检查bio是否超出设备的结尾



39:

generic_make_request:发起一个request

如果bio_list非空, 把新的bio插入current->bio_list尾部, 

否则, bio_list=&bio_list_on_stack

调用__generic_make_request发起request(调用queue的make_request_fn)


submit_bio比generic_make_request多了count_vm_events。



40:

blk_rq_check_limits:检查request的sectors, bytes, or segments是否大于queue的限制


41:

blk_insert_cloned_request:插入一个cloned的request(从函数参数看, 是可以选择不同queue的, 不过似乎没这么做?!)

从这个函数dispatch_queued_ios来看, 该函数首先把request从queuelist中删除, 然后调用blk_insert_cloned_request时使用的queue是request指向的queue, 而后面blk_insert_cloned_request函数会设request的q等于这个, 也就是说实际上queue没变, 变得是request会插到queue的tail(flush时可能会不同)


42:

blk_rq_err_bytes: 统计在error之前完成的bytes


43:

blk_account_io_completion: 完成处理, 更新readbytes

blk_account_io_done: 统计duration等


44:

blk_peek_request: 根据elv算法取出request, 如有必要start


45:

blk_dequeue_request: 从queue中出去request


46:

blk_start_request: dequeue(调用item 45)一个request, 启动timer blk_add_timer(把request加入到queue的timeoutlist) 为这个request.



47:

blk_fetch_request:call blk_peek_request得到top request, 并start blk_start_request,


48:

blk_update_bidi_request: call blk_update_request更新request的data_len, segment, 等,如果request next不为空, 更新request(不同的更新参数),返回true, 否则, 增加radom, 返回false


49:

blk_unprep_request: 错误处理完成时调用, make一个request ready,(释放request占用的资源, 比如buffer)


50;

blk_finish_request:blk_delete_timer, blk_unprep_request, end_io, __blk_put_request



51: blk_end_bidi_request: blk_update_bidi_request, blk_finish_request

__blk_end_bidi_request,

blk_end_request

blk_end_request_all

blk_end_request_cur

blk_end_request_err

__blk_end_request

__blk_end_request_all

__blk_end_request_cur

__blk_end_request_err



52:

blk_rq_bio_prep: 用bio设置request


53:

rq_flush_dcache_pages: flush request的pages


54:

blk_rq_unprep_clone: remove request的bio

__blk_rq_prep_clone: copy src request到dest request

blk_rq_prep_clone: clone src request的bio到dest request, 并调用__blk_rq_prep_clone


55:调度queue work

kblockd_schedule_work

kblockd_schedule_delayed_work


56:

blk_start_plug: 设置plug


plug_rq_cmp: 比较两个request的queue是否相同



57:

queue_unplugged: call run queue

flush_plug_callbacks: 运行plug list的callback函数


blk_flush_plug_list: flush_plug_callbacks, queue_unplugged

blk_finish_plug: 调用blk_flush_plug_list, 将current plug设为kong


58:

blk_dev_init: 分配kblockd_workqueue, 创建request_cachep(生成request pool), 和blk_requestq_cachep(由blk_alloc_queue_node生成queue)。


























你可能感兴趣的:(linux,timer,merge,list,buffer,ioc)