Linux那些事儿之我是Block层(5)浓缩就是精华?(二)

第二个,register_disk,来头不小,它来自遥远的fs/partitions/check.c:

    473 /* Not exported, helper to add_disk(). */

    474 void register_disk(struct gendisk *disk)

    475 {

    476         struct block_device *bdev;

    477         char *s;

    478         int i;

    479         struct hd_struct *p;

    480         int err;

    481

    482         strlcpy(disk->kobj.name,disk->disk_name,KOBJ_NAME_LEN);

    483         /* ewww... some of these buggers have / in name... */

    484         s = strchr(disk->kobj.name, '/');

    485         if (s)

    486                 *s = '!';

    487         if ((err = kobject_add(&disk->kobj)))

    488                 return;

    489         err = disk_sysfs_symlinks(disk);

    490         if (err) {

    491                 kobject_del(&disk->kobj);

    492                 return;

    493         }

    494         disk_sysfs_add_subdirs(disk);

    495

    496         /* No minors to use for partitions */

    497         if (disk->minors == 1)

    498                 goto exit;

    499

    500         /* No such device (e.g., media were just removed) */

    501         if (!get_capacity(disk))

    502                 goto exit;

    503

    504         bdev = bdget_disk(disk, 0);

    505         if (!bdev)

    506                 goto exit;

    507

    508         /* scan partition table, but suppress uevents */

    509         bdev->bd_invalidated = 1;

    510         disk->part_uevent_suppress = 1;

    511         err = blkdev_get(bdev, FMODE_READ, 0);

    512         disk->part_uevent_suppress = 0;

513         if (err < 0)

    514                 goto exit;

    515         blkdev_put(bdev);

    516

    517 exit:

    518         /* announce disk after possible partitions are already created */

    519         kobject_uevent(&disk->kobj, KOBJ_ADD);

    520

    521         /* announce possible partitions */

    522         for (i = 1; i < disk->minors; i++) {

    523                 p = disk->part[i-1];

    524                 if (!p || !p->nr_sects)

    525                         continue;

    526                 kobject_uevent(&p->kobj, KOBJ_ADD);

    527         }

    528 }

如果你不懂Linux 2.6的统一设备模型,那你要看懂这段代码估计够呛.但好在我们在<<我是Sysfs>>中对kobject方面的东西做了介绍.所以这里我们不会深入到kobject相关的函数内部中去,也不会深入到sysfs提供的函数内部,点到为止.

首先487行这个kobject_add的作用是很直观的,Sysfs中为这块磁盘建一个子目录.就比如下面这些目录中的那个sdf,就是为我的U盘而建立的,我要是把这个调用kobject_add函数这行注释掉,保证你就看不到这个sdf目录.

[root@lfg2 ~]# ls /sys/block/

md0   ram1   ram11  ram13  ram15  ram3  ram5  ram7  ram9  sdb  sdd  sdf ram0  ram10  ram12 ram14  ram2   ram4  ram6  ram8  sda   sdc  sde  sdg

这时候网友塞翁失身提出两个问题:

第一,为什么kobject_add这么一调用,生成的这个子目录的名字就叫做”sdf”,而不叫做别的?君还记得在sd_probe中我们做过一件事情么,当时我们可是精心计算过disk_name,disk_name正是struct gendisk的一个成员,这里我们看到482行我们把disk_name给了kobj.name,这就是为什么我们调用kobject_add添加一个kobject的时候,它的名字就是我们当时的disk_name.

第二,为什么生成的这个子目录是在/sys/block目录下面,而不是在别的位置?还记得在alloc_disk_node中我们申请struct gendisk的情景么?那句kobj_set_kset_s(disk,block_subsys)做的就是让disk对应的kobject从属于block_subsys对应的kobject下面.这就是为什么我们现在添加这个kobject的时候,它很自然的就会在/sys/block子目录下面建立文件.

继续走, disk_sysfs_symlinks来自fs/partitions/check.c,这个函数虽然不短,但是比较浅显易懂.

    429 static int disk_sysfs_symlinks(struct gendisk *disk)

    430 {

    431         struct device *target = get_device(disk->driverfs_dev);

    432         int err;

    433         char *disk_name = NULL;

    434

    435         if (target) {

    436                 disk_name = make_block_name(disk);

    437                 if (!disk_name) {

    438                         err = -ENOMEM;

    439                         goto err_out;

    440                 }

    441

    442                 err = sysfs_create_link(&disk->kobj, &target->kobj, "device");

    443                 if (err)

    444                         goto err_out_disk_name;

    445

    446                 err = sysfs_create_link(&target->kobj, &disk->kobj, disk_name);

    447                 if (err)

    448                         goto err_out_dev_link;

    449         }

    450

    451         err = sysfs_create_link(&disk->kobj, &block_subsys.kobj,

    452                                 "subsystem");

    453         if (err)

    454                 goto err_out_disk_name_lnk;

    455

    456         kfree(disk_name);

    457

    458         return 0;

    459

    460 err_out_disk_name_lnk:

    461         if (target) {

    462                 sysfs_remove_link(&target->kobj, disk_name);

    463 err_out_dev_link:

    464                 sysfs_remove_link(&disk->kobj, "device");

    465 err_out_disk_name:

    466                 kfree(disk_name);

    467 err_out:

    468                 put_device(target);

    469         }

    470         return err;

    471 }

我们用实际效果来解读这个函数.首先我们看正常工作的U盘会在/sys/block/sdf下面有哪些内容:

[root@localhost ~]# ls /sys/block/sdf/

capability  dev  device  holders  queue  range  removable  size  slaves  stat  subsystem  uevent

442行的sysfs_create_link这么一行创建的就是这里这个device这个软链接文件.我们来看它链接到哪里去了?

[root@localhost ~]# ls -l /sys/block/sdf/device

lrwxrwxrwx 1 root root 0 Dec 13 07:09 /sys/block/sdf/device -> ../../devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/host24/target24:0:0/24:0:0:0

446行这个sysfs_create_link则从那边又建立一个反链接,又给链接回来了.

[root@localhost~]# ls /sys/devices/pci0000/:00/0000/:00/:1d.7/usb4/4-4/4-4/:1.0/host24/target24/:0/:0/24/:0/:0/:0/

block:sdf driver ioerr_cnt model rescan scsi_generic:sg7  timeout bus             generic       iorequest_cnt  power        rev                   scsi_level        type delete iocounterbits  max_sectors   queue_depth  scsi_device:24:0:0:0  state       uevent device_blocked  iodone_cnt     modalias      queue_type   scsi_disk:24:0:0:0    subsystem  vendor

很明显,就是这个block:sdf.

[root@localhost~]# ls -l /sys/devices/pci0000/:00/0000/:00/:1d.7/usb4/4-4/4-4/:1.0/host24/target24/:0/:0/24/:0/:0/:0/block/:sdf

lrwxrwxrwx 1 root root 0 Dec 13 21:16 /sys/devices/pci0000:00/0000:00:1d.7/usb4/4-4/4-4:1.0/host24/target24:0:0/24:0:0:0/block:sdf -> ../../../../../../../../../block/sdf

于是这就等于你中有我我中有你,你那边有一个文件链接到了我这边,我这边有一个文件链接到了你那边.

然后451行再次调用sysfs_create_link.这次很显然,生成的是/sys/block/sdf/subsystem这个软链接文件.

[root@localhost ~]# ls -l /sys/block/sdf/subsystem

lrwxrwxrwx 1 root root 0 Dec 13 07:09 /sys/block/sdf/subsystem -> ../../block

三个链接文件建立好之后,disk_sysfs_symlinks也就结束了它的使命.接下来一个函数是disk_sysfs_add_subdirs.同样来自fs/partitions/check.c:

    342 static inline void disk_sysfs_add_subdirs(struct gendisk *disk)

    343 {

    344         struct kobject *k;

    345

    346         k = kobject_get(&disk->kobj);

    347         disk->holder_dir = kobject_add_dir(k, "holders");

    348         disk->slave_dir = kobject_add_dir(k, "slaves");

    349         kobject_put(k);

    350 }

这个函数的意图太明显了,相信虹口足球场外倒卖演唱会门票的黄牛党们都能看懂,无非就是建立holdersslaves两个子目录.

504,bdget_disk,这是一个内联函数,<<Thinking in C++>>告诉我们内联函数最好定义在头文件中,所以这个函数来自include/linux/genhd.h:

    433 static inline struct block_device *bdget_disk(struct gendisk *disk, int index)

    434 {

    435         return bdget(MKDEV(disk->major, disk->first_minor) + index);

    436 }

又是一次声东击西的调用.bdget来自fs/block_dev.c:

    554 struct block_device *bdget(dev_t dev)

    555 {

    556         struct block_device *bdev;

    557         struct inode *inode;

    558

    559         inode = iget5_locked(bd_mnt->mnt_sb, hash(dev),

    560                         bdev_test, bdev_set, &dev);

    561

    562         if (!inode)

    563                 return NULL;

    564

    565         bdev = &BDEV_I(inode)->bdev;

    566

    567         if (inode->i_state & I_NEW) {

    568                 bdev->bd_contains = NULL;

    569                 bdev->bd_inode = inode;

    570                 bdev->bd_block_size = (1 << inode->i_blkbits);

    571                 bdev->bd_part_count = 0;

    572                 bdev->bd_invalidated = 0;

    573                 inode->i_mode = S_IFBLK;

    574                 inode->i_rdev = dev;

    575                 inode->i_bdev = bdev;

    576                 inode->i_data.a_ops = &def_blk_aops;

    577                 mapping_set_gfp_mask(&inode->i_data, GFP_USER);

    578                 inode->i_data.backing_dev_info = &default_backing_dev_info;

    579                 spin_lock(&bdev_lock);

    580                 list_add(&bdev->bd_list, &all_bdevs);

    581                 spin_unlock(&bdev_lock);

    582                 unlock_new_inode(inode);

    583         }

    584         return bdev;

    585 }

真是祸不单行今日行啊,一下子跳出来两个变态的结构体来.struct block_devicestruct inode.

include/linux/fs.h中定义了这么一个结构体:

    460 struct block_device {

    461         dev_t                   bd_dev;  /* not a kdev_t - it's a search key */

    462         struct inode *          bd_inode;       /* will die */

    463         int                     bd_openers;

    464         struct mutex            bd_mutex;       /* open/close mutex */

    465         struct semaphore        bd_mount_sem;

    466         struct list_head        bd_inodes;

    467         void *                  bd_holder;

    468         int                     bd_holders;

    469 #ifdef CONFIG_SYSFS

    470         struct list_head        bd_holder_list;

    471 #endif

    472         struct block_device *   bd_contains;

    473         unsigned                bd_block_size;

    474         struct hd_struct *      bd_part;

    475         /* number of times partitions within this device have been opened. */

    476         unsigned                bd_part_count;

    477         int                     bd_invalidated;

    478         struct gendisk *        bd_disk;

    479         struct list_head        bd_list;

    480         struct backing_dev_info *bd_inode_backing_dev_info;

    481         /*

    482          * Private data.  You must have bd_claim'ed the block_device

    483          * to use this.  NOTE:  bd_claim allows an owner to claim

    484          * the same device multiple times, the owner must take special

    485          * care to not mess up bd_private for that case.

    486          */

    487         unsigned long           bd_private;

    488 };

很明显,Linux中每一个Block设备都由这么一个结构体变量表示,这玩意儿因此被称作块设备描述符.inode咱们不具体讲,但是这里挺逗的一个结构体是struct bdev_inode,

     29 struct bdev_inode {

     30         struct block_device bdev;

     31         struct inode vfs_inode;

     32 };

把两个变态的结构体组合起来就变成了第三个变态的结构体.

但是网名为避孕套一直用雕牌的哥们儿问我,bdev_inode好像没出现过,讲它干嘛?我想说看问题要看本质,不要被表面迷惑,这个世界上很多事情都不像表面上看起来那样.不信你看BDEV_I,这个内联函数来自fs/block_dev.c:

     34 static inline struct bdev_inode *BDEV_I(struct inode *inode)

     35 {

     36         return container_of(inode, struct bdev_inode, vfs_inode);

     37 }

很显然,inode得到相应的bdev_inode.于是565行这个&BDEV_I(inode)->bdev表示的就是inode对应的bdev_inode的成员struct block_device bdev.

但是结构体变量这东西不像公共汽车,只需等待就会自动来到你的面前,而需要你去申请才会有.iget5_locked就是干这件事情的,这个函数来自fs/inode.c,我们显然不会去深入看它,只能告诉你,这个函数这么一执行,我们就既有inode又有block_device.而且对于第一次申请的inode,i_state成员是设置了I_NEW这个flag,所以bdget()函数中,567行这一段if语句是要被执行的.这一段if语句的作用就是初始化inode结构体指针inode以及block_device结构体指针bdev.而函数最终返回的也正是bdev.需要强调一下,bdev正是从这一刻开始正式出现在我们的故事中的.

回到register_disk(),继续往下.下一个重量级的函数是blkdev_get,来自fs/block_dev.c:

   1206 static int __blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags,

   1207                         int for_part)

   1208 {

   1209         /*

   1210          * This crockload is due to bad choice of ->open() type.

   1211          * It will go away.

   1212          * For now, block device ->open() routine must _not_

   1213          * examine anything in 'inode' argument except ->i_rdev.

   1214          */

   1215         struct file fake_file = {};

   1216         struct dentry fake_dentry = {};

   1217         fake_file.f_mode = mode;

   1218         fake_file.f_flags = flags;

   1219         fake_file.f_path.dentry = &fake_dentry;

   1220         fake_dentry.d_inode = bdev->bd_inode;

   1221

   1222         return do_open(bdev, &fake_file, for_part);

   1223 }

   1224

   1225 int blkdev_get(struct block_device *bdev, mode_t mode, unsigned flags)

   1226 {

   1227         return __blkdev_get(bdev, mode, flags, 0);

   1228 }

看到blkdev_get调用的是__blkdev_get,所以我们两个函数一块贴出来了.

很显然,真正需要看的却是do_open,来自同一个文件.

   1103 /*

   1104  * bd_mutex locking:

   1105  *

   1106  *  mutex_lock(part->bd_mutex)

   1107  *    mutex_lock_nested(whole->bd_mutex, 1)

   1108  */

   1109

   1110 static int do_open(struct block_device *bdev, struct file *file, int for_part)

   1111 {

   1112         struct module *owner = NULL;

   1113         struct gendisk *disk;

   1114         int ret = -ENXIO;

   1115         int part;

   1116

   1117         file->f_mapping = bdev->bd_inode->i_mapping;

   1118         lock_kernel();

   1119         disk = get_gendisk(bdev->bd_dev, &part);

   1120         if (!disk) {

   1121                 unlock_kernel();

   1122                 bdput(bdev);

   1123                 return ret;

   1124         }

   1125         owner = disk->fops->owner;

   1126

   1127         mutex_lock_nested(&bdev->bd_mutex, for_part);

   1128         if (!bdev->bd_openers) {

   1129                 bdev->bd_disk = disk;

   1130                 bdev->bd_contains = bdev;

   1131                 if (!part) {

   1132                         struct backing_dev_info *bdi;

   1133                         if (disk->fops->open) {

   1134                                 ret = disk->fops->open(bdev->bd_inode, file);

   1135                                 if (ret)

   1136                                         goto out_first;

   1137                         }

   1138                         if (!bdev->bd_openers) {

   1139                                 bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);

   1140                                 bdi = blk_get_backing_dev_info(bdev);

   1141                                 if (bdi == NULL)

   1142                                         bdi = &default_backing_dev_info;

   1143                                 bdev->bd_inode->i_data.backing_dev_info = bdi;

   1144                         }

   1145                         if (bdev->bd_invalidated)

   1146                                 rescan_partitions(disk, bdev);

   1147                 } else {

   1148                         struct hd_struct *p;

   1149                         struct block_device *whole;

   1150                         whole = bdget_disk(disk, 0);

   1151                         ret = -ENOMEM;

   1152                         if (!whole)

   1153                                 goto out_first;

   1154                         BUG_ON(for_part);

   1155                         ret = __blkdev_get(whole, file->f_mode, file->f_flags, 1);

   1156                         if (ret)

   1157                                 goto out_first;

   1158                         bdev->bd_contains = whole;

   1159                         p = disk->part[part - 1];

   1160                         bdev->bd_inode->i_data.backing_dev_info =

   1161                            whole->bd_inode->i_data.backing_dev_info;

   1162                         if (!(disk->flags & GENHD_FL_UP) || !p || !p->nr_sects) {

   1163                                 ret = -ENXIO;

   1164                                 goto out_first;

   1165                         }

   1166                         kobject_get(&p->kobj);

   1167                         bdev->bd_part = p;

   1168                         bd_set_size(bdev, (loff_t) p->nr_sects << 9);

   1169                 }

   1170         } else {

   1171                 put_disk(disk);

   1172                 module_put(owner);

   1173                 if (bdev->bd_contains == bdev) {

   1174                         if (bdev->bd_disk->fops->open) {

   1175                                 ret = bdev->bd_disk->fops->open(bdev->bd_inode, file);

   1176                                 if (ret)

   1177                                         goto out;

   1178                         }

   1179                         if (bdev->bd_invalidated)

   1180                                 rescan_partitions(bdev->bd_disk, bdev);

   1181                 }

   1182         }

   1183         bdev->bd_openers++;

   1184         if (for_part)

   1185                 bdev->bd_part_count++;

   1186         mutex_unlock(&bdev->bd_mutex);

   1187         unlock_kernel();

   1188         return 0;

   1189

   1190 out_first:

   1191         bdev->bd_disk = NULL;

   1192         bdev->bd_inode->i_data.backing_dev_info = &default_backing_dev_info;

   1193         if (bdev != bdev->bd_contains)

   1194                 __blkdev_put(bdev->bd_contains, 1);

   1195         bdev->bd_contains = NULL;

   1196         put_disk(disk);

   1197         module_put(owner);

   1198 out:

   1199         mutex_unlock(&bdev->bd_mutex);

   1200         unlock_kernel();

   1201         if (ret)

   1202                 bdput(bdev);

   1203         return ret;

   1204 }

天哪.内核函数没有最变态,只有更变态.

一开始的时候,bd_openers是被初始化为了0,所以1128这个if语句是要被执行的.bd_openers0表示一个文件还没有被打开过.

一开始我们还没有涉及到分区的信息,所以一开始我们只有sda这个概念,而没有sda1,sda2,sda3…这些概念.这时候我们调用get_gendisk得到的part一定是0.所以1131行的if语句也会执行.disk->fops->open很明显,就是sd_open.(因为我们在sd_probe中曾经设置了gd->fops等于&sd_fops.)

但此时此刻我们执行sd_open实际上是不做什么正经事儿的.顶多就是测试一下看看sd_open能不能执行,如果能执行,那么就返回0.如果根本就不能执行,那就赶紧汇报错误.

接下来还有几个函数,主要做一些赋值,暂时先飘过.等到适当的时候需要看了再回来看.

1146行这个rescan_partitions()显然是我们要看的,首先我们在调用blkdev_get之前把bd_invalidated设置为了1,所以这个函数这次一定会被执行.从这一刻开始分区信息闯入了我们的生活.这个函数来自fs/partitions/check.c:

    530 int rescan_partitions(struct gendisk *disk, struct block_device *bdev)

    531 {

    532         struct parsed_partitions *state;

    533         int p, res;

    534

    535         if (bdev->bd_part_count)

    536                 return -EBUSY;

    537         res = invalidate_partition(disk, 0);

    538         if (res)

    539                 return res;

    540         bdev->bd_invalidated = 0;

    541         for (p = 1; p < disk->minors; p++)

    542                 delete_partition(disk, p);

    543         if (disk->fops->revalidate_disk)

    544                 disk->fops->revalidate_disk(disk);

    545         if (!get_capacity(disk) || !(state = check_partition(disk, bdev)))

    546                 return 0;

    547         if (IS_ERR(state))      /* I/O error reading the partition table */

    548                 return -EIO;

    549         for (p = 1; p < state->limit; p++) {

    550                 sector_t size = state->parts[p].size;

    551                 sector_t from = state->parts[p].from;

    552                 if (!size)

    553                         continue;

    554                 if (from + size > get_capacity(disk)) {

    555                         printk(" %s: p%d exceeds device capacity/n",

    556                                 disk->disk_name, p);

    557                 }

    558                 add_partition(disk, p, from, size, state->parts[p].flags);

    559 #ifdef CONFIG_BLK_DEV_MD

    560                 if (state->parts[p].flags & ADDPART_FLAG_RAID)

    561                         md_autodetect_dev(bdev->bd_dev+p);

    562 #endif

    563         }

    564         kfree(state);

    565         return 0;

    566 }

其实就算我们一行代码都不看也知道这个函数在干嘛,正如我们说的,这个函数执行过后,关于分区的信息我们就算都有了.关于分区,我们是用struct hd_struct这么个结构体来表示的,struct hd_struct也正是struct gendisk的成员,并且是个二级指针.一开始这个指针并无所指,或者说一开始我们并没有为struct hd_struct申请空间,所以我即使不贴出下面这个delete_partition函数的代码你也应该知道,此时此刻,它什么也不会干.

    352 void delete_partition(struct gendisk *disk, int part)

    353 {

    354         struct hd_struct *p = disk->part[part-1];

    355         if (!p)

    356                 return;

    357         if (!p->nr_sects)

    358                 return;

    359         disk->part[part-1] = NULL;

    360         p->start_sect = 0;

    361         p->nr_sects = 0;

    362         p->ios[0] = p->ios[1] = 0;

    363         p->sectors[0] = p->sectors[1] = 0;

    364         sysfs_remove_link(&p->kobj, "subsystem");

    365         kobject_unregister(p->holder_dir);

    366         kobject_uevent(&p->kobj, KOBJ_REMOVE);

    367         kobject_del(&p->kobj);

    368         kobject_put(&p->kobj);

    369 }

revalidate_disk指针指向的就是sd_revalidate_disk,这个函数我们在讲述sd的时候对它做足了文章.sd_probe调用add_disk之前,就已经执行过这个函数,这里只不过是再执行一次罢了.

接着,get_capacity().没有比这个函数更简单的函数了.来自include/linux/genhd.h:

    254 static inline sector_t get_capacity(struct gendisk *disk)

    255 {

    256         return disk->capacity;

    257 }

check_partition就稍微复杂一些了,来自fs/partitions/check.c:

    156 static struct parsed_partitions *

    157 check_partition(struct gendisk *hd, struct block_device *bdev)

    158 {

    159         struct parsed_partitions *state;

    160         int i, res, err;

    161

    162         state = kmalloc(sizeof(struct parsed_partitions), GFP_KERNEL);

    163         if (!state)

    164                 return NULL;

    165

    166         disk_name(hd, 0, state->name);

    167         printk(KERN_INFO " %s:", state->name);

    168         if (isdigit(state->name[strlen(state->name)-1]))

    169                 sprintf(state->name, "p");

    170

    171         state->limit = hd->minors;

    172         i = res = err = 0;

    173         while (!res && check_part[i]) {

    174                 memset(&state->parts, 0, sizeof(state->parts));

    175                 res = check_part[i++](state, bdev);

    176                 if (res < 0) {

    177                         /* We have hit an I/O error which we don't report now.

    178                         * But record it, and let the others do their job.

    179                         */

    180                         err = res;

    181                         res = 0;

    182                 }

    183

    184         }

    185         if (res > 0)

    186                 return state;

    187         if (err)

    188         /* The partition is unrecognized. So report I/O errors if there were any */

    189                 res = err;

    190         if (!res)

    191                 printk(" unknown partition table/n");

    192         else if (warn_no_part)

    193                 printk(" unable to read partition table/n");

    194         kfree(state);

    195         return ERR_PTR(res);

    196 }

首先,struct parsed_partitions结构体定义于fs/partitions/check.h这么一个头文件中:

      8 enum { MAX_PART = 256 };

      9

     10 struct parsed_partitions {

     11         char name[BDEVNAME_SIZE];

     12         struct {

     13                 sector_t from;

     14                 sector_t size;

     15                 int flags;

     16         } parts[MAX_PART];

     17         int next;

     18         int limit;

     19 };

这个结构体是我们用来记录分区信息的.

173行这个check_part是何许人物?fs/partitions/check.c中找到了它:

     43 int warn_no_part = 1; /*This is ugly: should make genhd removable media aware*/

     44

     45 static int (*check_part[])(struct parsed_partitions *, struct block_device *) = {

     46         /*

     47          * Probe partition formats with tables at disk address 0

     48          * that also have an ADFS boot block at 0xdc0.

     49          */

     50 #ifdef CONFIG_ACORN_PARTITION_ICS

     51         adfspart_check_ICS,

     52 #endif

     53 #ifdef CONFIG_ACORN_PARTITION_POWERTEC

     54         adfspart_check_POWERTEC,

     55 #endif

     56 #ifdef CONFIG_ACORN_PARTITION_EESOX

     57         adfspart_check_EESOX,

     58 #endif

     59

     60         /*

     61          * Now move on to formats that only have partition info at

     62          * disk address 0xdc0.  Since these may also have stale

     63          * PC/BIOS partition tables, they need to come before

     64          * the msdos entry.

     65          */

     66 #ifdef CONFIG_ACORN_PARTITION_CUMANA

     67         adfspart_check_CUMANA,

     68 #endif

     69 #ifdef CONFIG_ACORN_PARTITION_ADFS

     70         adfspart_check_ADFS,

     71 #endif

     72

     73 #ifdef CONFIG_EFI_PARTITION

     74         efi_partition,          /* this must come before msdos */

     75 #endif

     76 #ifdef CONFIG_SGI_PARTITION

     77         sgi_partition,

     78 #endif

     79 #ifdef CONFIG_LDM_PARTITION

     80         ldm_partition,          /* this must come before msdos */

     81 #endif

     82 #ifdef CONFIG_MSDOS_PARTITION

     83         msdos_partition,

     84 #endif

     85 #ifdef CONFIG_OSF_PARTITION

     86         osf_partition,

     87 #endif

     88 #ifdef CONFIG_SUN_PARTITION

     89         sun_partition,

     90 #endif

     91 #ifdef CONFIG_AMIGA_PARTITION

     92         amiga_partition,

     93 #endif

     94 #ifdef CONFIG_ATARI_PARTITION

     95         atari_partition,

     96 #endif

     97 #ifdef CONFIG_MAC_PARTITION

     98         mac_partition,

     99 #endif

    100 #ifdef CONFIG_ULTRIX_PARTITION

    101         ultrix_partition,

    102 #endif

    103 #ifdef CONFIG_IBM_PARTITION

    104         ibm_partition,

    105 #endif

    106 #ifdef CONFIG_KARMA_PARTITION

    107         karma_partition,

    108 #endif

    109 #ifdef CONFIG_SYSV68_PARTITION

    110         sysv68_partition,

    111 #endif

    112         NULL

    113 };

好家伙,一下子定义了这么多函数,要是每个都要看那我他妈还要不要活了.也亏了哥们儿是曾经的复旦大学优秀团员,要不然还不被吓死去了.

不过情况总还没有那么遭,我们不用像某些媒体一样每次都把夸大事实,以至于每年的洪水或干旱都被认定是百年一遇,搞得我们不禁怀疑自己到底活过了几个百年?眼下的情况其实很好对付,除非你就是专门研究分区表格式的,否则这一堆函数你一个也不用看.如果你真是研究分区表格式的,那么fs/partitions目录下面的文件你就得仔细看看了,各种格式的都有,你就捡自己需要的看吧.

localhost:/usr/src/linux-2.6.22.1 # ls fs/partitions/

Kconfig   acorn.h  atari.c  check.h  ibm.c    karma.h  mac.c    msdos.h  sgi.c  sun.h     ultrix.c Makefile amiga.c  atari.h  efi.c    ibm.h    ldm.c    mac.h    osf.c    sgi.h  sysv68.c  ultrix.h acorn.c   amiga.h check.c  efi.h    karma.c  ldm.h    msdos.c  osf.h    sun.c  sysv68.h

基本上我想说的是,以上那么多个函数其目的就是一个,为了找到分区信息.而且最终分区信息总是会被记录在那个struct parsed_partitions结构体的指针.而接下来我们就会用到其中的信息,这其中像size,from,这些变量的意思不言自明.

然后我们就来到了add_partition,仍然是来自fs/partitions/check.c:

    371 void add_partition(struct gendisk *disk, int part, sector_t start, sector_t len, int flags)

    372 {

    373         struct hd_struct *p;

    374

    375         p = kmalloc(sizeof(*p), GFP_KERNEL);

    376         if (!p)

    377                 return;

    378

    379         memset(p, 0, sizeof(*p));

    380         p->start_sect = start;

    381         p->nr_sects = len;

    382         p->partno = part;

    383         p->policy = disk->policy;

    384

    385         if (isdigit(disk->kobj.name[strlen(disk->kobj.name)-1]))

    386                 snprintf(p->kobj.name,KOBJ_NAME_LEN,"%sp%d",disk->kobj.name,part);

    387         else

    388                 snprintf(p->kobj.name,KOBJ_NAME_LEN,"%s%d",disk->kobj.name,part);

    389         p->kobj.parent = &disk->kobj;

    390         p->kobj.ktype = &ktype_part;

    391         kobject_init(&p->kobj);

    392         kobject_add(&p->kobj);

    393         if (!disk->part_uevent_suppress)

    394                 kobject_uevent(&p->kobj, KOBJ_ADD);

    395         sysfs_create_link(&p->kobj, &block_subsys.kobj, "subsystem");

    396         if (flags & ADDPART_FLAG_WHOLEDISK) {

    397                 static struct attribute addpartattr = {

    398                         .name = "whole_disk",

    399                         .mode = S_IRUSR | S_IRGRP | S_IROTH,

    400                         .owner = THIS_MODULE,

    401                 };

    402

    403                 sysfs_create_file(&p->kobj, &addpartattr);

    404         }

    405         partition_sysfs_add_subdir(p);

    406         disk->part[part-1] = p;

    407 }

有了之前的经验,现在再看这些kobject相关的,sysfs相关的函数就容易多了.

389行这个p->kobj.parent = &disk->kobj保证了我们接下来生成的东西在刚才的目录之下,sda1,sda2,…sda目录下.

[root@localhost tedkdb]# ls /sys/block/sda/

capability  device   queue  removable  sda10  sda12  sda14  sda2  sda5  sda7  sda9  slaves  subsystem dev         holders  range  sda1       sda11  sda13  sda15  sda3  sda6  sda8  size  stat    uevent

395sysfs_create_link的效果也很显然,

[root@localhost tedkdb]# ls -l /sys/block/sda/sda1/subsystem

lrwxrwxrwx 1 root root 0 Dec 13 03:15 /sys/block/sda/sda1/subsystem -> ../../../block

partition_sysfs_add_subdir也没什么好说的,来自fs/partitions/check.c:

    333 static inline void partition_sysfs_add_subdir(struct hd_struct *p)

    334 {

    335         struct kobject *k;

    336

    337         k = kobject_get(&p->kobj);

    338         p->holder_dir = kobject_add_dir(k, "holders");

    339         kobject_put(k);

    340 }

添加了holders子目录.

[root@localhost tedkdb]# ls /sys/block/sda/sda1/

dev  holders  size  start  stat  subsystem  uevent

最后,让我们记住这个函数做过的一件事情,p的各个成员进行了赋值,而在函数的结尾处把disk->part[part-1]指向了p.也就是说,从此以后,struct hd_struct这个指针数组里就应该有内容了,而不再是空的.

到这里,rescan_partitions()宣告结束,回到do_open().1183,bd_openers这个引用计数增加1,如果for_part有值,那么就让它对应的引用计数也加1.然后do_open也就华丽丽的结束了,像多米诺骨牌一样,__blkdev_getblkdev_get相继返回.blkdev_putblkdev_get做的事情基本相反,我们就不看了,只是需要注意,它把刚才增加上去的这两个引用计数给减了回去.

最后,register_disk()中调用的最后一个函数就是kobject_uevent(),这个函数就是通知用户空间的进程udevd,告诉它有事件发生了,如果你使用的发行版正确配置了udev的配置文件(详见/etc/udev/目录下),那么其效果就是让/dev目录下面有了相应的设备文件.比如:

[root@localhost tedkdb]# ls /dev/sda*

/dev/sda   /dev/sda10  /dev/sda12  /dev/sda14  /dev/sda2  /dev/sda5  /dev/sda7  /dev/sda9 /dev/sda1  /dev/sda11  /dev/sda13  /dev/sda15  /dev/sda3  /dev/sda6  /dev/sda8

至于为什么,你可以去阅读关于udev的知识,这是用户空间的程序,咱们就不多说了.

你可能感兴趣的:(Linux那些事儿之我是Block层(5)浓缩就是精华?(二))