Writing this turned out to be surprisingly difficult as the article kept on growing too long. I tried to be complete but also concise to make this a useful introduction to how Solaris uses disks and on how device naming works.
So to start at the start: Solaris builds a device tree which is persistent across reboots and even across configuration changes. Once a device is found at a specific position on the system bus, and entry is created for the device instance in the device tree, and an instance number is allocated. Predictably, the first instance of a device is zero, (e.g e1000g0) and subsequent instances of device using the same driver gets allocated instance numbers incrementally (e1000g1, e1000g2, etc). The allocated instance number is registered together with the device driver and path to the physical device in the /etc/path_to_inst file.
This specific feature of Solaris is very important in providing stable, predictable behavior across reboots and hardware changes. For disk controllers this is critical as system bootability depends on it!
With Linux, if the first disk in the system is known as /dev/sda, even if it happens to be on the second controller, or have a target number other than zero on that controller. New disk added on the first controller, or on the same controller but with a lower target number, causes the existing disk to move to /dev/sdb, and the new disk then becomes /dev/sda. This used to break systems, causing them to become non-bootable, and was being a general headache. Some methods of dealing with this exists, using unique disk identifiers and device paths based on /dev/disk/by-path, etc.
If a Solaris system is configured initially with all disks attached to the second controlled, the devices will get names starting with c1. Disks added to the first controller later on will have names starting with c0, and the existing disk device names will remain unaffected. If a new controlled is added to the system, it will get a new instance number, e.g c2, and existing disk device names will remain unaffected.
Solaris however composes disk device names (device aliases) of parts which identifies the controller, the target-id, the LUN-id, and finally the slice or partition on the disk.
I will use some examples to explain this. Looking at this device:
$ ls -lL /dev/dsk/c1t* br-------- 1 root sys 27, 16 Jun 2 16:26 /dev/dsk/c1t0d0p0 |
We notice the following:
1. The entries exist as links under /dev/dsk, pointing to the device node files in the /devices tree. Actually every device has got a second instance under /dev/rdsk. The ones under /dev/dsk are "block" devices, used in a random-access manner, e.g for mounting file systems. The "raw" device links are character devices, used for low-level access functions (such as creating a new file system).
2. The device names all start with c1, indicating controller c1 - so basically all the entries above are on one controller.
3. The next part of the device name is the target-id, indicated by t0. This is determined by the SCSI target-id number set on the device, and not by the order in which disks are discovered. Any new disk added to this controller will have a new unique SCSI target number and so will not affect existing device names.
4. After the target number each disk has got a LUN-id number, in the example d0. This too is determined by the SCSI LUN-id provided by the device. Normal disks on a simple SCSI card all show up as LUN-id 0, but devices like arrays or jbods can present multiple LUNs on a target. (In such devices the target usually indicates the port number on the enclosure)
5. Finally each device identifies a partition or slice on the disk. Devices with names ending with a p# indicates a PC BIOS disk partition (sometimes called an fdisk or primary partition), and names ending with an s# indicates a Solaris slice.
This begs some more explaining. There are five device names ending with p0 through p4. The p0 device, eg c1t0d0p0, indicates the whole disk as seen by the BIOS. The c_t_d_p1 device is the first primary partition, with c_t_d_p2 being the second, etc. These devices represent all four of the allowable primary partitions, and always exists even when the partitions are not used.
In addition there are 16 devices with names ending with s0 though s15. These are Solaris "disk slices", and originate from the way disks are "partitioned" on SPARC systems. Essentially Solaris uses slices much like PCs use partitions - most Solaris disk admin utilities work with disk slices, not with fdisk or BIOS partitions.
The way the "disk" is sliced is stored in the Solaris VTOC, which resides in the first sector of the "disk". In the case of x86 systems, the VTOC exists inside one of the primary partitions, and in fact most disk utilities treats the Solaris partition as the actual disk. Solaris splits up the particular partition into "slices", thus the afore mentioned "disk slices" really refers to slices existing in a partition.
Note that Solaris disk slices are often called disk partitions, so the two can be easily confused - when documentation refers to partitions you need to make sure you understand whether PC BIOS partitions or Solaris Slices are implied. In generally if the documentation applies to SPARC hardware (as well as to x86 hardware), then partitions are Solaris slices (SPARC does not have an equivalent to the PC BIOS partition concept)
Example Disk Layout:
p1 | First primary Partition | |||||||||||||||||||||||||||||||||
p2 | Second primary Partition | |||||||||||||||||||||||||||||||||
p3 |
|
|||||||||||||||||||||||||||||||||
p4 |
|
Note that traditionally slice 2 "overlaps" the whole disk, and is commonly referred to as the backup slice, or slightly less commonly, called the overlap slice.
The ability to have slice numbers from 8 to 15 is x86 specific. By default slice 8 covers the area on the disk where the label, vtoc and boot record is stored. Slice 9 covers the area where the "alternates" data is stored - a two-cylinder area used to record information about relocated/errored sectors.
Another example of disk device entries:
$ ls -lL /dev/dsk/c0* brw-r----- 1 root sys 102, 16 Jul 14 19:45 /dev/dsk/c0d0p0 |
The above example is taken form an x86 system. Note the lack of a target number in the device names. This is particular to ATA hard drives on x86 systems. Besides that it works like normal device names I described above.
Below, comparing the block and raw device entries:
$ ls -l /dev/*dsk/c1t0d0p0 lrwxrwxrwx 1 root root 49 Jun 26 16:22 /dev/dsk/c1t0d0p0 -> ../../devices/pci@0,0/pci-ide@1f,2/ide@1/sd@0,0:q |
These look the same, except that the second one points to the raw device node.
For completeness' sake, some utilities used in managing disks:
|
|||
format | The work-horse, used to perform partitioning (including fdisk partitioning on x86 based systems), analyzing/testing the disk media for defects, tuning advanced SCSI parameters, and generally checking the status and health of disks. | ||
rmformat | Shows information about removable devices, formats media, etc. | ||
prtvtoc | Command-line utility to display information about disk geometry and more importantly, the contents of the VTOC in a human readable format, showing the layout of the Solaris slices on the disk. | ||
fmthard | Write or overwrite a VTOC on a disk. Its input format is compatible with the output produced by prtvtoc, so it is possible to copy the VTOC between two disks by means of a command like this:
This is obviously not meaningful if the second disk do not have enough space. If the disks are of different sizes, you can use something like this:
The above awk command will cause the entry for slice 2 to be committed, and fmthard will then maintain the existing entry or, if none exists, create a default one on the target disk. Also note, as implied above, Solaris slices can (and often do) overlap. Care needs to be taken to not have file systems on slices which overlap other slices. |
||
iostat -En | Show "Error" information about disks, and often very usefull, the firmware revisions and manufacturer's identifier strings. | ||
format -e | This reveals [enhanced|advanced] functionality, such as the cache option on SCSI disks. | ||
format -Mm | Enable debugging output, particularly makes SCSI probe failures non-silent. cfgadm and luxadm also deserves honorable mention here. These commands manage disk enclosures, detaching and attaching, etc. but are also used in managing some aspects of disks. |
||
luxadm -e port | Show list of FC HBAs. luxadm can also for example be used to set the beacon LED on individual disks in FCAL enclosures that support this function. The details are somewhat specific to the relevant enclosure. cfgadm can be used to probe SAN connected subsystems, eg by doing:
|
Hopefully this gives you an idea about how disk device names, controller names, and partitions and slices all relate to one another.