Linux那些事儿之我是Block层(10)scsi命令的前世今生(四)

当然,while循环结束也可能是因为1453行的这两个判断.首先req如果没有了,另一个得看scsi_dev_queue_ready()的返回值,如果返回值为0,那么break也会被执行,从而结束循环.

1270 /*

1271 * scsi_dev_queue_ready: if we can send requests to sdev, return 1 else

1272 * return 0.

1273 *

1274 * Called with the queue_lock held.

1275 */

1276 static inline int scsi_dev_queue_ready(struct request_queue *q,

1277 struct scsi_device *sdev)

1278 {

1279 if (sdev->device_busy >= sdev->queue_depth)

1280 return 0;

1281 if (sdev->device_busy == 0 && sdev->device_blocked) {

1282 /*

1283 * unblock after device_blocked iterates to zero

1284 */

1285 if (--sdev->device_blocked == 0) {

1286 SCSI_LOG_MLQUEUE(3,

1287 sdev_printk(KERN_INFO, sdev,

1288 "unblocking device at zero depth/n"));

1289 } else {

1290 blk_plug_device(q);

1291 return 0;

1292 }

1293 }

1294 if (sdev->device_blocked)

1295 return 0;

1296

1297 return 1;

1298 }

这里需要判断的是device_busy.这个flag如果设置了,说明命令正在执行中,或者说命令已经传递到了底层驱动.因此,我们在调用scsi_dispatch_cmd之前先增加device_busy,1469.

另一个flagdevice_blocked.这个flag是告诉世人这个设备不能再接收新的命令了,因为它十有八九是正在处理命令.正常情况下这个flag的值为0.除非你调用了scsi_queue_insert()函数.友情提示一下,scsi设备的这个flag是提供了sysfs的接口的,因此我们可以通过sysfs的接口看一下设备的这个值,下面列举了两个scsi设备的这个变量的值,可以看到都是0,应该说这是它的常态.

[root@localhost ~]# ls /sys/bus/scsi/devices/

0:0:8:0/ 0:2:0:0/ 1:0:0:0/ 2:0:0:0/

[root@localhost ~]# ls /sys/bus/scsi/devices/2/:0/:0/:0/

block:sdb/ iocounterbits modalias rev subsystem/ bus/ iodone_cnt model scsi_device:2:0:0:0/ timeout delete ioerr_cnt queue_depth scsi_disk:2:0:0:0/ type device_blocked iorequest_cnt queue_type scsi_level uevent driver/ max_sectors rescan state vendor

[root@localhost ~]# cat /sys/bus/scsi/devices/2/:0/:0/:0/device_blocked

0

[root@localhost ~]# cat /sys/bus/scsi/devices/0/:0/:8/:0/device_blocked

0

所以正常情况下,scsi_dev_queue_ready()函数的返回值就是1,这一点正如其注释里说的那样.但是所谓的常态,指的是单独执行一个命令,如果要执行多个命令,或者说我们提交了多个request,那么device_busy就会一次次的在1469行增加,从而使得device_busy有可能将超过queue_depth,这样子scsi_dev_queue_ready()就会返回0,从而scsi_request_fn()就有可能结束,这之后,__generic_unplug_device也将返回,之后blk_execute_rq_nowait()返回,回到blk_execute_rq(),执行wait_for_completion(),于是就睡眠了,等待了,按照游戏规则,我们应该能找到一条complete()语句来唤醒它,那么这条语句在哪里呢?答案是blk_end_sync_rq.

网友宁失身不失眠非常好奇我是怎么知道的.说来话长,还记得我们当时在usb-storage中说的那个scsi_done?命令执行完了就会call scsi_done.scsi_done来自drivers/scsi/scsi.c,很显然这个函数是我们的突破口,我们找到了这个函数就好比国民党找到了甫志高,就好比王佳芝找到了易先生:

608 /**

609 * scsi_done - Enqueue the finished SCSI command into the done queue.

610 * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives

611 * ownership back to SCSI Core -- i.e. the LLDD has finished with it.

612 *

613 * This function is the mid-level's (SCSI Core) interrupt routine, which

614 * regains ownership of the SCSI command (de facto) from a LLDD, and enqueues

615 * the command to the done queue for further processing.

616 *

617 * This is the producer of the done queue who enqueues at the tail.

618 *

619 * This function is interrupt context safe.

620 */

621 static void scsi_done(struct scsi_cmnd *cmd)

622 {

623 /*

624 * We don't have to worry about this one timing out any more.

625 * If we are unable to remove the timer, then the command

626 * has already timed out. In which case, we have no choice but to

627 * let the timeout function run, as we have no idea where in fact

628 * that function could really be. It might be on another processor,

629 * etc, etc.

630 */

631 if (!scsi_delete_timer(cmd))

632 return;

633 __scsi_done(cmd);

634 }

躲躲闪闪的是来自同一文件的__scsi_done,

636 /* Private entry to scsi_done() to complete a command when the timer

637 * isn't running --- used by scsi_times_out */

638 void __scsi_done(struct scsi_cmnd *cmd)

639 {

640 struct request *rq = cmd->request;

641

642 /*

643 * Set the serial numbers back to zero

644 */

645 cmd->serial_number = 0;

646

647 atomic_inc(&cmd->device->iodone_cnt);

648 if (cmd->result)

649 atomic_inc(&cmd->device->ioerr_cnt);

650

651 BUG_ON(!rq);

652

653 /*

654 * The uptodate/nbytes values don't matter, as we allow partial

655 * completes and thus will check this in the softirq callback

656 */

657 rq->completion_data = cmd;

658 blk_complete_request(rq);

659 }

别的我们都不关心,就关心最后这个blk_complete_request().

3588 /**

3589 * blk_complete_request - end I/O on a request

3590 * @req: the request being processed

3591 *

3592 * Description:

3593 * Ends all I/O on a request. It does not handle partial completions,

3594 * unless the driver actually implements this in its completion callback

3595 * through requeueing. Theh actual completion happens out-of-order,

3596 * through a softirq handler. The user must have registered a completion

3597 * callback through blk_queue_softirq_done().

3598 **/

3599

3600 void blk_complete_request(struct request *req)

3601 {

3602 struct list_head *cpu_list;

3603 unsigned long flags;

3604

3605 BUG_ON(!req->q->softirq_done_fn);

3606

3607 local_irq_save(flags);

3608

3609 cpu_list = &__get_cpu_var(blk_cpu_done);

3610 list_add_tail(&req->donelist, cpu_list);

3611 raise_softirq_irqoff(BLOCK_SOFTIRQ);

3612

3613 local_irq_restore(flags);

3614 }

其它的咱们不管,就管一管这个raise_softirq_irqoff().在很久很久以前,有一个函数,它的名字叫做blk_dev_init().它是我们这个故事的起源.在这个函数中我们曾经见过这么一行,

3720 open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);

当时咱们就说过,它所做的就是初始化了一个softirq,BLOCK_SOFTIRQ.并且绑定了softirq函数blk_done_softirq,而要触发这个软中断,咱们当时也说了,只要调用raise_softirq_irqoff()即可.所以现在我们也就这样做了.这也就意味着,blk_done_softirq会被调用.

3542 /*

3543 * splice the completion data to a local structure and hand off to

3544 * process_completion_queue() to complete the requests

3545 */

3546 static void blk_done_softirq(struct softirq_action *h)

3547 {

3548 struct list_head *cpu_list, local_list;

3549

3550 local_irq_disable();

3551 cpu_list = &__get_cpu_var(blk_cpu_done);

3552 list_replace_init(cpu_list, &local_list);

3553 local_irq_enable();

3554

3555 while (!list_empty(&local_list)) {

3556 struct request *rq = list_entry(local_list.next, struct request, donelist);

3557

3558 list_del_init(&rq->donelist);

3559 rq->q->softirq_done_fn(rq);

3560 }

3561 }

而这个softirq_done_fn是什么呢?不要说你不知道,其实我们也讲过.不过忘记了也不要紧,人最大的烦恼便是记忆太好,健忘的人容易快乐.scsi_alloc_queue,我们调用blk_queue_softirq_donescsi_softirq_done赋给了q->softirq_done_fn,所以到了这里,被调用的就是scsi_softirq_done.

1376 static void scsi_softirq_done(struct request *rq)

1377 {

1378 struct scsi_cmnd *cmd = rq->completion_data;

1379 unsigned long wait_for = (cmd->allowed + 1) * cmd->timeout_per_command;

1380 int disposition;

1381

1382 INIT_LIST_HEAD(&cmd->eh_entry);

1383

1384 disposition = scsi_decide_disposition(cmd);

1385 if (disposition != SUCCESS &&

1386 time_before(cmd->jiffies_at_alloc + wait_for, jiffies)) {

1387 sdev_printk(KERN_ERR, cmd->device,

1388 "timing out command, waited %lus/n",

1389 wait_for/HZ);

1390 disposition = SUCCESS;

1391 }

1392

1393 scsi_log_completion(cmd, disposition);

1394

1395 switch (disposition) {

1396 case SUCCESS:

1397 scsi_finish_command(cmd);

1398 break;

1399 case NEEDS_RETRY:

1400 scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY);

1401 break;

1402 case ADD_TO_MLQUEUE:

1403 scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);

1404 break;

1405 default:

1406 if (!scsi_eh_scmd_add(cmd, 0))

1407 scsi_finish_command(cmd);

1408 }

1409 }

不用我多说,你也知道,scsi_softirq_done会调用scsi_finish_command,来自drivers/scsi/scsi.c:

661 /*

662 * Function: scsi_finish_command

663 *

664 * Purpose: Pass command off to upper layer for finishing of I/O

665 * request, waking processes that are waiting on results,

666 * etc.

667 */

668 void scsi_finish_command(struct scsi_cmnd *cmd)

669 {

670 struct scsi_device *sdev = cmd->device;

671 struct Scsi_Host *shost = sdev->host;

672

673 scsi_device_unbusy(sdev);

674

675 /*

676 * Clear the flags which say that the device/host is no longer

677 * capable of accepting new commands. These are set in scsi_queue.c

678 * for both the queue full condition on a device, and for a

679 * host full condition on the host.

680 *

681 * XXX(hch): What about locking?

682 */

683 shost->host_blocked = 0;

684 sdev->device_blocked = 0;

685

686 /*

687 * If we have valid sense information, then some kind of recovery

688 * must have taken place. Make a note of this.

689 */

690 if (SCSI_SENSE_VALID(cmd))

691 cmd->result |= (DRIVER_SENSE << 24);

692

693 SCSI_LOG_MLCOMPLETE(4, sdev_printk(KERN_INFO, sdev,

694 "Notifying upper driver of completion "

695 "(result %x)/n", cmd->result));

696

697 cmd->done(cmd);

698 }

也就是说,cmd->done会被调用,从而真正的幕后工作者scsi_blk_pc_done会被调用.因为,当初在scsi_setup_blk_pc_cmnd()中有这么一行,

1135 cmd->done = scsi_blk_pc_done;

scsi_blk_pc_done来自drivers/scsi/scsi_lib.c:

1078 static void scsi_blk_pc_done(struct scsi_cmnd *cmd)

1079 {

1080 BUG_ON(!blk_pc_request(cmd->request));

1081 /*

1082 * This will complete the whole command with uptodate=1 so

1083 * as far as the block layer is concerned the command completed

1084 * successfully. Since this is a REQ_BLOCK_PC command the

1085 * caller should check the request's errors value

1086 */

1087 scsi_io_completion(cmd, cmd->request_bufflen);

1088 }

来自drivers/scsi/scsi_lib.c:

789 /*

790 * Function: scsi_io_completion()

791 *

792 * Purpose: Completion processing for block device I/O requests.

793 *

794 * Arguments: cmd - command that is finished.

795 *

796 * Lock status: Assumed that no lock is held upon entry.

797 *

798 * Returns: Nothing

799 *

800 * Notes: This function is matched in terms of capabilities to

801 * the function that created the scatter-gather list.

802 * In other words, if there are no bounce buffers

803 * (the normal case for most drivers), we don't need

804 * the logic to deal with cleaning up afterwards.

805 *

806 * We must do one of several things here:

807 *

808 * a) Call scsi_end_request. This will finish off the

809 * specified number of sectors. If we are done, the

810 * command block will be released, and the queue

811 * function will be goosed. If we are not done, then

812 * scsi_end_request will directly goose the queue.

813 *

814 * b) We can just use scsi_requeue_command() here. This would

815 * be used if we just wanted to retry, for example.

816 */

817 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)

818 {

819 int result = cmd->result;

820 int this_count = cmd->request_bufflen;

821 request_queue_t *q = cmd->device->request_queue;

822 struct request *req = cmd->request;

823 int clear_errors = 1;

824 struct scsi_sense_hdr sshdr;

825 int sense_valid = 0;

826 int sense_deferred = 0;

827

828 scsi_release_buffers(cmd);

829

830 if (result) {

831 sense_valid = scsi_command_normalize_sense(cmd, &sshdr);

832 if (sense_valid)

833 sense_deferred = scsi_sense_is_deferred(&sshdr);

834 }

835

836 if (blk_pc_request(req)) { /* SG_IO ioctl from block level */

837 req->errors = result;

838 if (result) {

839 clear_errors = 0;

840 if (sense_valid && req->sense) {

841 /*

842 * SG_IO wants current and deferred errors

843 */

844 int len = 8 + cmd->sense_buffer[7];

845

846 if (len > SCSI_SENSE_BUFFERSIZE)

847 len = SCSI_SENSE_BUFFERSIZE;

848 memcpy(req->sense, cmd->sense_buffer, len);

849 req->sense_len = len;

850 }

851 }

852 req->data_len = cmd->resid;

853 }

854

855 /*

856 * Next deal with any sectors which we were able to correctly

857 * handle.

858 */

859 SCSI_LOG_HLCOMPLETE(1, printk("%ld sectors total, "

860 "%d bytes done./n",

861 req->nr_sectors, good_bytes));

862 SCSI_LOG_HLCOMPLETE(1, printk("use_sg is %d/n", cmd->use_sg));

863

864 if (clear_errors)

865 req->errors = 0;

866

867 /* A number of bytes were successfully read. If there

868 * are leftovers and there is some kind of error

869 * (result != 0), retry the rest.

870 */

871 if (scsi_end_request(cmd, 1, good_bytes, result == 0) == NULL)

872 return;

873

874 /* good_bytes = 0, or (inclusive) there were leftovers and

875 * result = 0, so scsi_end_request couldn't retry.

876 */

877 if (sense_valid && !sense_deferred) {

878 switch (sshdr.sense_key) {

879 case UNIT_ATTENTION:

880 if (cmd->device->removable) {

881 /* Detected disc change. Set a bit

882 * and quietly refuse further access.

883 */

884 cmd->device->changed = 1;

885 scsi_end_request(cmd, 0, this_count, 1);

886 return;

887 } else {

888 /* Must have been a power glitch, or a

889 * bus reset. Could not have been a

890 * media change, so we just retry the

891 * request and see what happens.

892 */

893 scsi_requeue_command(q, cmd);

894 return;

895 }

896 break;

897 case ILLEGAL_REQUEST:

898 /* If we had an ILLEGAL REQUEST returned, then

899 * we may have performed an unsupported

900 * command. The only thing this should be

901 * would be a ten byte read where only a six

902 * byte read was supported. Also, on a system

903 * where READ CAPACITY failed, we may have

904 * read past the end of the disk.

905 */

906 if ((cmd->device->use_10_for_rw &&

907 sshdr.asc == 0x20 && sshdr.ascq == 0x00) &&

908 (cmd->cmnd[0] == READ_10 ||

909 cmd->cmnd[0] == WRITE_10)) {

910 cmd->device->use_10_for_rw = 0;

911 /* This will cause a retry with a

912 * 6-byte command.

913 */

914 scsi_requeue_command(q, cmd);

915 return;

916 } else {

917 scsi_end_request(cmd, 0, this_count, 1);

918 return;

919 }

920 break;

921 case NOT_READY:

922 /* If the device is in the process of becoming

923 * ready, or has a temporary blockage, retry.

924 */

925 if (sshdr.asc == 0x04) {

926 switch (sshdr.ascq) {

927 case 0x01: /* becoming ready */

928 case 0x04: /* format in progress */

929 case 0x05: /* rebuild in progress */

930 case 0x06: /* recalculation in progress */

931 case 0x07: /* operation in progress */

932 case 0x08: /* Long write in progress */

933 case 0x09: /* self test in progress */

934 scsi_requeue_command(q, cmd);

935 return;

936 default:

937 break;

938 }

939 }

940 if (!(req->cmd_flags & REQ_QUIET)) {

941 scmd_printk(KERN_INFO, cmd,

942 "Device not ready: ");

943 scsi_print_sense_hdr("", &sshdr);

944 }

945 scsi_end_request(cmd, 0, this_count, 1);

946 return;

947 case VOLUME_OVERFLOW:

948 if (!(req->cmd_flags & REQ_QUIET)) {

949 scmd_printk(KERN_INFO, cmd,

950 "Volume overflow, CDB: ");

951 __scsi_print_command(cmd->cmnd);

952 scsi_print_sense("", cmd);

953 }

954 /* See SSC3rXX or current. */

955 scsi_end_request(cmd, 0, this_count, 1);

956 return;

957 default:

958 break;

959 }

960 }

961 if (host_byte(result) == DID_RESET) {

962 /* Third party bus reset or reset for error recovery

963 * reasons. Just retry the request and see what

964 * happens.

965 */

966 scsi_requeue_command(q, cmd);

967 return;

968 }

969 if (result) {

970 if (!(req->cmd_flags & REQ_QUIET)) {

971 scsi_print_result(cmd);

972 if (driver_byte(result) & DRIVER_SENSE)

973 scsi_print_sense("", cmd);

974 }

975 }

976 scsi_end_request(cmd, 0, this_count, !result);

977 }

又是一个令人发指的函数.但我什么都不想多说了.直接跳到最后一行,scsi_end_request().来自drivers/scsi_lib.c:

632 /*

633 * Function: scsi_end_request()

634 *

635 * Purpose: Post-processing of completed commands (usually invoked at end

636 * of upper level post-processing and scsi_io_completion).

637 *

638 * Arguments: cmd - command that is complete.

639 * uptodate - 1 if I/O indicates success, <= 0 for I/O error.

640 * bytes - number of bytes of completed I/O

641 * requeue - indicates whether we should requeue leftovers.

642 *

643 * Lock status: Assumed that lock is not held upon entry.

644 *

645 * Returns: cmd if requeue required, NULL otherwise.

646 *

647 * Notes: This is called for block device requests in order to

648 * mark some number of sectors as complete.

649 *

650 * We are guaranteeing that the request queue will be goosed

651 * at some point during this call.

652 * Notes: If cmd was requeued, upon return it will be a stale pointer.

653 */

654 static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int uptodate,

655 int bytes, int requeue)

656 {

657 request_queue_t *q = cmd->device->request_queue;

658 struct request *req = cmd->request;

659 unsigned long flags;

660

661 /*

662 * If there are blocks left over at the end, set up the command

663 * to queue the remainder of them.

664 */

665 if (end_that_request_chunk(req, uptodate, bytes)) {

666 int leftover = (req->hard_nr_sectors << 9);

667

668 if (blk_pc_request(req))

669 leftover = req->data_len;

670

671 /* kill remainder if no retrys */

672 if (!uptodate && blk_noretry_request(req))

673 end_that_request_chunk(req, 0, leftover);

674 else {

675 if (requeue) {

676 /*

677 * Bleah. Leftovers again. Stick the

678 * leftovers in the front of the

679 * queue, and goose the queue again.

680 */

681 scsi_requeue_command(q, cmd);

682 cmd = NULL;

683 }

684 return cmd;

685 }

686 }

687

688 add_disk_randomness(req->rq_disk);

689

690 spin_lock_irqsave(q->queue_lock, flags);

691 if (blk_rq_tagged(req))

692 blk_queue_end_tag(q, req);

693 end_that_request_last(req, uptodate);

694 spin_unlock_irqrestore(q->queue_lock, flags);

695

696 /*

697 * This will goose the queue request function at the end, so we don't

698 * need to worry about launching another command.

699 */

700 scsi_next_command(cmd);

701 return NULL;

702 }

而我们最需要关心的,693end_that_request_last.

3618 /*

3619 * queue lock must be held

3620 */

3621 void end_that_request_last(struct request *req, int uptodate)

3622 {

3623 struct gendisk *disk = req->rq_disk;

3624 int error;

3625

3626 /*

3627 * extend uptodate bool to allow < 0 value to be direct io error

3628 */

3629 error = 0;

3630 if (end_io_error(uptodate))

3631 error = !uptodate ? -EIO : uptodate;

3632

3633 if (unlikely(laptop_mode) && blk_fs_request(req))

3634 laptop_io_completion();

3635

3636 /*

3637 * Account IO completion. bar_rq isn't accounted as a normal

3638 * IO on queueing nor completion. Accounting the containing

3639 * request is enough.

3640 */

3641 if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {

3642 unsigned long duration = jiffies - req->start_time;

3643 const int rw = rq_data_dir(req);

3644

3645 __disk_stat_inc(disk, ios[rw]);

3646 __disk_stat_add(disk, ticks[rw], duration);

3647 disk_round_stats(disk);

3648 disk->in_flight--;

3649 }

3650 if (req->end_io)

3651 req->end_io(req, error);

3652 else

3653 __blk_put_request(req->q, req);

3654 }

好了,3651行这个end_io是最关键的代码.也许你早已忘记我们曾经见过end_io,但是不要紧,有我在.blk_execute_rq_nowait(),曾经有一行

2596 rq->end_io = done;

done是这个函数的第四个参数.当初我们在调用这个函数的时候,blk_execute_rq,我们是这样写的:

2636 blk_execute_rq_nowait(q, bd_disk, rq, at_head, blk_end_sync_rq);

也就是说,rq->end_io被赋上了blk_end_sync_rq.

2786 /**

2787 * blk_end_sync_rq - executes a completion event on a request

2788 * @rq: request to complete

2789 * @error: end io status of the request

2790 */

2791 void blk_end_sync_rq(struct request *rq, int error)

2792 {

2793 struct completion *waiting = rq->end_io_data;

2794

2795 rq->end_io_data = NULL;

2796 __blk_put_request(rq->q, rq);

2797

2798 /*

2799 * complete last, if this is a stack request the process (and thus

2800 * the rq pointer) could be invalid right after this complete()

2801 */

2802 complete(waiting);

2803 }

终于我们找到了亲爱的可爱的相爱的深爱的最爱的complete().那么如何确定此waiting就是彼wait?对照一下这个waiting,当时在blk_execute_rq中我们有:

2635 rq->end_io_data = &wait;

而眼下我们又有:

2793 struct completion *waiting = rq->end_io_data;

由此可知我们没有搞错对象,毕竟我们深知,接吻可以搞错对象,发脾气则不可以,写代码则更加不可以.

至此,blk_execute_rq被唤醒,然后迅速返回.紧随其后的是scsi_execute的返回和scsi_execute_req的返回.这一刻,一个scsi命令终于从无到有最终到有,它经历了scsi命令到request的蜕变,也经历了requestscsi命令的历练.最终它完成了它的使命.对它来说,生命是一场幻觉,别离或者死亡是唯一的结局.

你可能感兴趣的:(linux)