Introduction to iSCSI
Executive  Summary
This white paper provides a basic working knowledge of the Internet Small Computer Systems Interface (iSCSI) protocol. iSCSI is an SCSI transport protocol for mapping of block-oriented storage data over TCP/IP networks. The paper focuses primarily on iSCSI, therefore some background in SCSI and storage-area network (SAN) protocols and architectures is recommended. Additional white papers that cover these other areas are referenced at the end of this document.
Information on the iSCSI protocol provided in this document is based on the Internet Engineering Task Force (IETF) IP Storage (IPS) iSCSI draft 10. For additional details, this document can be referenced via the URL provided in the references at the end of the document.
This paper includes a breakdown of the iSCSI protocol and processes, iSCSI security and management considerations, and some basic implementation information. Additionally, terms and acronyms are defined as they pertain to the iSCSI protocol.

iSCSI  in  Perspective
Basic Concept of iSCSI
Conceptually, iSCSI+TCP+IP provide an equivalent of Layer3/4 network transport, rather than alternatives of either parallel SCSI cable or Fibre Channel Protocol (FCP)
(SCSI over Fibre Channel). The basic idea of iSCSI is to take advantage of the investment in existing IP networks to facilitate and extend SANs. This is accomplished by using the TCP/IP protocol to transport SCSI commands and data between host and SAN nodes.
Traditionally SANs have required a separate dedicated infrastructure to interconnect hosts and storage systems. The primary means for these interconnections are Fibre Channel networks that provide the SCSI transport. The result is that separate parallel networks have to be built to support IP applications and associated storage. Additionally, Fibre Channel cannot be transported over lower-bandwidth WAN networks in its native form, therefore requires special hardware and handling.
The use of iSCSI over IP networks does not necessarily replace a Fibre Channel network but rather provides a transport for an IP-attached host to reach Fibre
Channel-based SANs. 
IP network infrastructures provide major advantages for interconnection of servers to block-oriented storage devices.
IP networks are cost-effective, and they provide security, scalability, interoperability, network management, and storage management.
IP network advantages include:
•   IP networks offer the availability of network protocols and middleware for management, security, and quality of service (QoS).
•   Skills developed in the design and management of IP networks can be applied to IP SANs. Trained and experienced IP networking staffs are available to install and operate these networks.
•   IP networks offer economies achieved from using a standard IP infrastructure, products, and service across the organization.
•   iSCSI is compatible with existing IP LAN and WAN infrastructures.
•   Distance is limited to application timeout, not by IP networks.
Value of iSCSI
By building on existing IP networks, users can connect hosts to storage facilities without additional host adapters, better utilize storage resources, and eliminate the need for separate parallel WAN infrastructures. Because iSCSI uses TCP/IP as its transport for SCSI, information can be passed over existing IP-based host connections typically via Ethernet. Additional value can be realized by being able to better utilize existing storage resources. Because hosts can utilize their existing IP/Ethernet network connection to access storage elements, it is now easier to consolidate storage and, therefore, realize higher utilization. As mentioned previously, SANs have in the past required special provisions for WAN connectivity. Significant cost savings can be realized by utilizing existing WAN connections for hosts to access storage via IP.
Other IP SAN Protocols
It should be noted that there are other proposed drafts for transporting storage traffic over IP networks. These include Fibre Channel over IP (FCIP), Internet Fibre Channel (iFCP), and Internet Storage Name Service (iSNS). Although these protocols are outside the scope of this document; additional information on them can be found in the references.
iSCSI Standards Track
The iSCSI draft is one of several protocols being worked on by the IP Storage (IPS) , working group in the IETF. Many industry leaders such as Cisco, IBM, and HP are leading the effort of standardizing the draft. The current release is draft-ietf-ips-iscsi-14 dated July 1, 2002. It is anticipated that the iSCSI draft will be submitted to the Internet Engineering Steering Group (IESG) for consideration as proposed standards later this year. Numerous vendors are currently developing products to the current draft standard. When setting up prestandard iSCSI solutions, it is necessary to determine which draft a vendor’s product(s) is based on in order to achieve interoperability.
Fundamentals  of  iSCSI
This section discusses the various layers and processes of iSCSI in order to build an overall functional understanding. Therefore, detailed packet formats and structures are intentionally omitted. This section focuses on iSCSI specifically, and the transport of SCSI over IP. However, a brief discussion of SCSI architecture is provided to aid in the understanding of this document.
SCSI Architecture
SCSI stands for Small Computer Systems Interface. Its root can be traced back to Shugart Associates System Interface (SASI), a disk drive and controller manufacturer. Based on the IBM input/output (I/O) channel, the SASI interface was widely received. It was introduced in 1979, when only an 8-bit parallel interface was available. The basic use of SASI was to allow independent peripheral devices to be connected to small and medium-sized computers.
In 1982, a formal draft of SCSI based on SASI was developed. Additional capacities were added to make this draft the first generation of the SCSI standard. The new capabilities include peer-to-peer communication, logical units, arbitration, and so on.
The American National Standards Institute (ANSI) approved SCSI-2 in 1994. It is a complete standalone document with the expanded definition of the Common Command Set (CCS) providing a software interface to many peripherals in addition to all disk drives. SCSI-2 defines the differential interface and the 16- and 32-bit-wide data bus, essentially doubling data throughput. It is backward compatible to SCSI-1.
SCSI-3 standards are currently under development. SCSI-3 refers to a collection of standards as a result of breaking
SCSI-2 into smaller, hierarchical modules that fit within a general framework called SCSI Architecture Model (SAM)
(refer to Figure 1).
Figure 1
SCSI Architecture Model and SCSI-3 Standards
 
SAM defines the concepts, entities, and interactions of SCSI layers. It mandates the initiator and target entities in the client/server model. The interaction between the initiator and target allows the information about the initiator and the target to be exchanged (refer to Figure 2). An example of such information is logical unit number (LUN).
Figure 2
SCSI Initiator and Target Interaction
 
The SAM defines generic requirements and implementation requirements. A service request must include a command
descriptor block (CDB), a basic building block for SCSI information exchange. Figure 3 shows the format of a CDB.
Figure 3
SCSI Command Descriptor Block (CDB)
iSCSI Terminology
The iSCSI draft uses the concept of a “network entity,” which represents a device or gateway that is attached to an IP network. This network entity must contain one or more network portals. An iSCSI node contained within a network entity can utilize any of the network portals to access the IP network. The iSCSI node is an iSCSI initiator
or target identified by its iSCSI name within a network entity. A SCSI device, as defined by SAM, is the iSCSI name of the node. There is exactly one SCSI device within an iSCSI node.
A network portal is essentially the component within the network entity responsible for implementing the TCP/IP protocol stack. Relative to the initiator, the network portal is identified solely by its IP address. For an iSCSI target, its IP address and its TCP listening port identify the network portal. Refer to Figure 4 for components within iSCSI client and iSCSI server. For iSCSI communications, a connection is established between an initiator network portal and a target network portal. A group of TCP connections between an initiator iSCSI node and a target iSCSI node make up an iSCSI session. This is analogous to but not equal to the SCSI I_T Nexus.
Figure 4
Components within iSCSI Client and iSCSI Server
 
Also defined by the standard draft are portal groups. Because iSCSI supports multiple TCP connections within a session, it is possible that these connections could be across multiple network portals. Therefore, a portal group is a set of network portals that supports an iSCSI session that is made up of multiple connections over different
network portals.
Naming and Addressing
The iSCSI protocol enables a methodology for both the naming and addressing of iSCSI initiators and targets. All iSCSI nodes, initiators, and targets are known by their iSCSI name. This is not to be confused with a host-type name that is resolved into an IP address, nor is it a worldwide node name. Therefore, this name is independent of the node location. There are two iSCSI name formats, iqn (iSCSI qualifier name) format and IEEE EUI format. An example of an iSCSI name with iqn format is: iqn.1987-05.com.Cisco.00.9f9ccf185aa2508c7a168967ccf96e0c.target1. An iSCSI name is useful because it provides:
•   A method for multiple initiators or targets to share a common IP network address
•   A method for multiple initiators or targets to be accessed via multiple IP addresses
•   A means by which nodes can be known, independent of their IP address and irrespective of the presence of IP
address and port mapping on firewalls
The iSCSI protocol does not do any processing of the iSCSI name other than to perform case-sensitive matching operations.
The iSCSI initiator name is the unique worldwide name this initiator is known by. Likewise, the iSCSI target name specifies the unique worldwide name of the target. These names in part can be used to identify the SCSI I_T Nexus as defined by SAM.
Addressing of an iSCSI node conforms to the standards-based IP addressing schema of [:port]. The can be of the form of either an IPv4 address in dotted-decimal notation, an IPv6 address in colon-separated hexadecimal notation, or a fully qualified domain name.
For iSCSI targets, the port number may also be specified along with the address. If no port is provided, the default port 3260 as assigned by the Internet Assigned Numbers Authority (IANA) is assumed.
iSCSI Protocol
The iSCSI protocol is a mapping of the SCSI Remote Procedure Call (reference SAM) model to the TCP/IP protocol. The iSCSI protocol provides its own conceptual layer independent of the SCSI CDB information it carries. In this fashion, SCSI commands are transported by iSCSI request and SCSI response and status are handled by iSCSI responses. Also, iSCSI protocol tasks are carried by this same iSCSI request and response mechanism (refer to
Figure 5).
Figure 5
iSCSI Protocol Stack
 
Just as in the SCSI protocol, iSCSI employs the concepts of “initiator,” “target,” and communication messages called protocol data units (PDUs). Likewise, iSCSI transfer direction is defined respective to the initiator. As a means to improve performance, iSCSI allows a “phase collapse” that allows a command or response and its associated data
to be sent in a single iSCSI PDU.
iSCSI Session
The highest level of an iSCSI communications path is a session that is formed between an iSCSI initiator and an iSCSI target. There are two types of sessions defined in iSCSI, a normal operational session and a discovery session used by the initiator to discover available targets.
A session is identified by a session ID (SSID), which is made up of an initiator (ISID) and target (TSID) components. TCP connections may be added and removed within a session; however, all connections are between the same unique initiator and target iSCSI nodes. Each connection within a session has a unique connection ID (CID). The makeup of the SSID, ISID, TSID, and CID are examined in greater detail later in this document (refer to Figure 6).
Figure 6
iSCSI Sessions with One or Multiple Connections

An iSCSI session is established via the iSCSI login process. This session is used to identify all TCP connections associated with a particular SCSI I_T Nexus. There may be one or more TCP connections within one session.
The login process is started when the initiator establishes a TCP connection to the desired target either via the well-known port or a specified target port. The initiator and target may carry out authentication of each other and negotiate a security protocol. During the login phase, numerous attributes are negotiated between the iSCSI initiator and the iSCSI target.
Upon the successful completion of the login phase, the session enters “full feature phase.” Login and security are further addressed later in this document.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
iSCSI  Sequencing
The iSCSI protocol deploys several registers to maintain ordering and sequencing of commands, status, and data. Each of these registers is an unsigned 32-bit integer counter. These numbers are communicated between the initiator and target in the appropriate iSCSI PDU fields during command, status, and data exchanges. Additionally, an
initiator or target may utilize a NOP-OUT/IN PDU to synchronize sequencing and numbering registers.
Command Numbering
Within an iSCSI session, all commands (initiator-to-target PDUs) are numbered with a command sequence number (CmdSN). CmdSN is used to ensure that every command is delivered in the order it is transmitted, regardless of which TCP connection in one session the command is carried on.
Command sequencing begins with the first login command and is incremenated by onr for each subsequent command. It is the responsibility of the iSCSI target layer to deliver the commands to the SCSI layer in the order of their CmdSN. The one exception to this is a command marked for immediate delivery. In this case, the CmdSN is
not incremented and the iSCSI target passes this command to the SCSI layer as soon as it is detected.
In addition to the CmdSN, the initiator and target maintain an expected command register (ExpCmdSN) and a maximum command register (MaxCmdSN). The target sets the ExpCmdSN to the highest-numbered nonimmediate command CmdSN it can deliver to the SCSI layer plus one. This acknowledges to the initiator the last in-sequence command received by the target. Because commands may be sent over multiple TCP connections, the target may have commands queued with a CmdSN higher than ExpCmdSN. These commands are held in order to prevent
out-of-sequence commands from being handed off to the SCSI layer. The MaxCmdSN is used by the initiator to determine if the target has queue space for additional commands to be sent. The queue capacity is derived by MaxCmdSN – ExpCmdSN + 1.
Status Numbering
Similarly to command numbering, status responses are sequentially numbered with a status sequence number (StatSN). Likewise, the initiator uses an expected status sequence number (ExpStatSN) register to acknowledge status PDUs received from the target. The initiator initiates recovery actions if the difference between the StatSN and ExpStatSN exceeds a preset value.
Data Sequencing
Data and request-to-transfer (R2T) PDUs are sequenced using the DataSN and R2TSN registers, respectfully. Data sequencing is used to ensure the in-order delivery of data within the same command.

For Read operations, the DataSN begins at 0 and is incremented by 1 for each subsequent data PDU in that command sequence. In the case of a write, the first unsolicited data PDU or the first data PDU in response to a R2T begins with a DataSN of 0 and increments by 1 for each subsequent data PDU. R2TSN is set to 0 at the initiation of the command and incremented by 1 for each subsequent R2T sent by the target for that command.
iSCSI  PDUs
The TCP payload of an iSCSI packet contains iSCSI PDUs. All iSCSI PDUs begin with one or more header segments followed by zero or one data segment.
The first segment is the basic header segment (BHS), a fixed-length 48-byte header segment. Additional header segments (AHSs) may follow the BHS. Figure 7 shows the format and content of an iSCSI PDU. All the headers are optional except BHS. The figure shows the iSCSI packet format with iSCSI PDU in the TCP payload.
Figure 7
iSCSI Packet Format
iSCSI PDU types include iSCSI request and respond, text request and respond, login request and respond, and so on.
Figure 8 shows an example of the header format for one of the iSCSI request commands.
Figure 8
iSCSI PDU Command
 
iSCSI Error Handling
A fundamental portion of error recovery is maintaining enough state and data to recover an errant process. This is the case with iSCSI in that the initiator is expected to retain the necessary command and data information to be able to rebuild any outstanding PDU. Likewise, the target is expected to maintain any unacknowledged data-out along with status response information.
Two mechanisms used by iSCSI for error handling are retry and reassignment. An initiator may attempt to “plug” any missing CmdSN by resending the same command or data PDU to the target. The reassignment is used when the TCP connection between the initiator and the target is lost. In this case, the initiator sends a “Task Reassign” task management PDU via a new connection, instructing the target to continue an outstanding command on the new CID.
It is not required for targets to support this feature, which is negotiated at login time.
iSCSI  Process
This section explains a complete iSCSI connection setup, exchange, and termination. Many variations can take place in actual iSCSI implementations, such as authentication, encryption, negotiation of various parameters, and different SCSI operations. The intent here is to provide a baseline understanding of the iSCSI phases and process flow.
The login process is discussed in terms of login phase beginning with the login initial request, followed by optionally a login partial response to which the initiator replies with more login request. The login partial response and more login request may be repeated as needed for additional parameter negotiations. A login final response must follow this phase from the target, indicating either “login accept” or “login reject.”
Login Initiation
The iSCSI login is used to establish an iSCSI session between an iSCSI initiator and an iSCSI target. The TCP connection has to be marked as belonging to an iSCSI session, and parameters such as security authentication and other operation parameters are exchanged and agreed upon by the iSCSI initiator and the iSCSI target.
This process starts when the initiator opens a TCP connection to the target on the target TCP listening port and assigns a CID. The initiator then sends a login request that includes the protocol version supported by the initiator,a SSID, the CID, and the negotiation phase the initiator is ready to enter into. Optionally, the initial request may contain security parameters or iSCSI operational parameters. Figure 9 illustrates the iSCSI login process.
Figure 9
iSCSI Login Packet Flow
 
As mentioned earlier, the SSID consists of an ISID and the TSID. The ISID is the first six bytes of the SSID, and it consists of a 1-byte type, a 3-byte naming authority, and a 2-byte qualifier field (refer to Table 1). The type byte signifies the format of the naming authority (refer to Table 2). The Naming Authority field of the ISID is the vendor or organization of this iSCSI initiator component. Lastly, the qualifier is a 16-bit unsigned integer that must be unique for this particular initiator and target portal group combination.
The TSID is a 16-bit value determined by the target iSCSI node. When the initiator is establishing a new session with a target, the TSID is set to 0 for the initial login request. This signals the target that a new session is requested. The target generates a TSID and returns it in the login response. If an initiator is attempting to establish an additional TCP connection for an existing session, the initiator uses the same TSID learned from the previous successful login attempt when the session was created. A nonzero value then signals that target that the initiator wishes to add a connection to an established session. When the initiator receives the TSID from the initial login response, the SSID
is complete and is used to identify this session for all subsequent login PDUs
 
Authentication and Parameter Negotiation
If the initiator and target require authentication, it is negotiated prior to exchanging operational parameters. Authentication, which may use any of a variety of methods, is discussed further later in this document. Parameter negotiations may begin with the initial login request sent by the initiator. When authentication (if required) is successfully completed, the initiator can proceed with parameter negotiations. Some operational parameters are passed using a text format. This exchange uses the following format:
Originator sends:  =
Responder replies:  = |None|Reject|NotUnderstood|Irrelevent|
The argument may be numerical, literal, Boolean (yes or no), or a list of comma-separated literal values. If the originator offers a list of , the responder should reply with the first supported value or “Reject” if it not supported. A response of “None” in the case of a literal list is acceptable only if “None” is provided as one of the possible values. It is also possible for vendors to add new s by prefixing them with X- followed by their domain name reversed.
 
Connection
The login process is concluded when the initiator receives a login final response from the target. If the response is “login reject,” then the attempt failed and the initiator should close the TCP connection identified by the CID. With a response of “login accept,” the session then enters the full-feature phase (assuming this is the initial login attempt). Only when full-feature phase is reached can the initiator begin to send SCSI command and data information contained in iSCSI PDUs.
In the case where multiple TCP connections are established (multiple logins) for a given iSCSI session, subsequent data and response PDUs must be sent on the same TCP connection (CID) on which their associated command was sent. This concept is referred to as “connection allegiance.” In the case where the originating CID has failed, connection allegiance may be reestablished by the error recovery procedure outlined earlier. Conversely, multiple commands associated with a SCSI task may be sent over different TCP connections. Also, unrelated SCSI commands, data, and status may be interleaved over the iSCSI session. Each of their respective data and responses, however, must follow the connection allegiance rules.
One of the negotiated operational parameters is whether the target operates in solicited (R2T) mode or in the unsolicited mode for outgoing data transfers (SCSI Write). In unsolicited mode, the initiator may send “immediate data” in the same PDU as the command (phase collapse), or it can be sent in a separate PDU. The maximum amount
of data the initiator can send for each of these cases may be negotiated at login. When the initial immediate data has been sent, all subsequent data PDUs must be sent in reply to an R2T response (solicited mode).
The initiator and target utilize the sequence numbering schema outlined earlier to maintain ordering of command, data, and response exchanges. The initiator may send a SNACK request if it determines an out-of-sync condition. Based on the sequence number registers, a single SNACK covers a missing contiguous set of data.
Logout and Shutdown
The logout process provides for a graceful shutdown mechanism to close an iSCSI connection or session. The initiator is responsible for commencing the logout procedure; however, the target may prompt this by sending an asynchronous iSCSI message indicating an internal error condition. In either case the initiator sends a logout request, after which no further request may be sent. The logout response from the target indicates that cleanup is complete and no further responses will be sent on this connection.
Additionally, the logout response contains recovery information from the target. This includes the length of time the target will hold, pending command information for recovery purposes (Time2Retain) and the length of time the initiator should wait before attempting to reestablish the connection (Time2Wait). Finally, connections are shut down by sending TCP FINs.
Security  Considerations
In the past, security as it pertains to storage devices and storage networks has not been a major consideration. Either storage devices were directly attached to hosts or they were connected via a separate SAN independent of user-accessible networks.
With iSCSI, as well as other IP-based SAN protocols, storage information is transported over open IP networks and, therefore, is subject to security risks. Knowing this, the IP Storage working group has also developed a draft for securing IP-based storage communications. This work is contained in the “Securing Block Storage Protocols over IP” draft. The iSCSI protocol draft specifies two elements relative to security, authentication, and packet protection.
Authentication
With iSCSI, the target may authenticate the initiator and optionally, the initiator may authenticate the target during the login process. This would take place prior to any parameter negotiation or login accept. If authentication is utilized, each connection within the iSCSI session has to be authenticated. The following authentication methods are defined by the iSCSI draft and are negotiated during the login phase via the “AuthMethod” key:
•   KRB5 Kerberos V5
•   SPKM1 Simple public-key generic security service (GSS) application programming interface (API) mechanism
•   SPKM2 Simple public-key GSS API mechanism
•   SRP Secure Remote Password
•   CHAP Challenge Handshake Authentication Protocol
•   None No authentication
Although these mechanisms may prevent unauthorized connections to a target device, they provide no protection for
subsequent PDU exchanges.
Packet Protection
Packet protection ensures the integrity, authentication, and confidentiality of communications between iSCSI nodes. For iSCSI connections, IP Security (IPSec) is utilized to provide secure private exchanges at the IP layer. In order to be draft compliant, an iSCSI network element must implement IPSec tunnel mode with the Encapsulating Security Protocol (ESP), including anti-replay. Because of the high speeds associated with iSCSI implementations, IPSec sequence number extensions may/should be implemented, depending on speed.
Confidentiality is obtained by encrypting the IPSec tunnel using Triple Digital Encryption Standard (3DES) in cipher block chaining (CBC) mode. An iSCSI node must support Internet Key Exchange (IKE) to provide authentication, security association negotiation, and key management. A separate IKE Phase 2 security association protects each TCP connection within an iSCSI session.
Implementation  of  iSCSI
The Cisco SN 5420 and 5428 storage routers support iSCSI draft 8. They use iSCSI as the transport for block-level storage access. Storage devices/ LUNs in the traditional Fibre Channel SAN are presented to the IP network hosts as if they are directly attached. An iSCSI driver or iSCSI network interface card should be installed in the IP hosts in order to support iSCSI operations.
Cisco SN 5420 storage routers function as iSCSI targets whereas IP hosts function as iSCSI initiators. IP hosts contact a Cisco SN 5420 Storage Router to perform iSCSI login via a defined iSCSI target address and port. The storage router either accepts or rejects the login (or multiple logins) based on the information exchanged and parameters negotiated. The Cisco SN 5420 maps iSCSI targets/LUNs to the actual physical targets/LUNs. If an iSCSI login is successful, the operation enters full feature phase and data transport is allowed, hence physical storage targets/LUNs are available to the iSCSI initiator located in the IP network. Figure 10 illustrates the iSCSI target mode (or SCSI router mode) of Cisco SN 5420 implementation.
Figure 10
iSCSI Implementation Example
 
Besides iSCSI target mode, the Cisco SN 5420 also supports transparent mode and iSCSI multi-initiator mode.
Storage consolidation/centralization in a departmental environment and remote backup are main applications for the
Cisco SN 5420.
For more information about the Cisco SN 5420 Storage Router, consult the Cisco Web page:
http://www.cisco.com/warp/public/cc/pd/rt/5420/.
Summary
The standardization of iSCSI may very well become a disruptive technology in the storage industry. By removing the physical constraints of traditional storage networks, iSCSI as a minimum is a technology that will significantly impact workgroup and enterprise networks in the near future. As standards-based devices become readily available, network engineers will need to understand iSCSI as a protocol and the implementation requirements associated with it. This document focuses specifically on the iSCSI protocol—not implementation considerations. Some of the implementation topics addressed include performance, security, and application requirements. These considerations include device capabilities, network capacities, QoS parameters, and application-specific requirements. Additional resources such as white papers, design guides, product certifications, and application notes are available to address these areas.
Additional resources may be found at: http://www.cisco.com http://www.ietf.org/html.charters/ips-charter.html http://www.snia.org/