SCSI over IP
© 2010, 2011, 2012 Dennis Leeuw dleeuw at made-it dot com
License: GPLv2 or later
iSCSI is client-server-based. The client is called the initiator and the server is called the target. The initiator sends SCSI commands to the target and the target sends the results back to the initiator. In hardware comparison, the initiator is the SCSI adapter and the target is the SCSI disk. With iSCSI the SCSI-bus is replaced by a network and TCP/IP.
The basic concept from a Linux point of view is that iSCSI makes block devices (disks) available across the network. This means you can partition it and create a filesystem on it. Because of this the initiator needs to have exclusive rights on the target. The target can only supply an iSCSI device to a single initiator.
There are two types of initiators:
Software initiators consist of one or more kernel drivers, a TCP-stack and a network card (NIC).
SCSI is CPU intensive esp. during high I/O loads. Hardware initiators offload the main CPU of a computer by implementing a SCSI ASIC on a network card and some form of TOE(TCP Offload Engine). The combination is often called an HBA (Host Bus Adapter). The HBA appears to the host as a "normal" SCSI adapter. With an option ROM it is even possible to boot a system from an iSCSI disk.
TOE handles part or all of the TCP-stack handling on the NIC. Which frees resources on the main CPU.
Since a host can supply multiple disks to the network, only an IP address is not sufficient to address a target. Of course next to an IP address also a port is used to connect to an iSCSI host. The combination of an IP address and a port is called the iSCSI portal. This way you can run multiple portals on a single host. The default port for the iSCSI portal is TCP/3260 and if a system port is needed port TCP/860 should be used.
iSCSI also has a globally unique addressing scheme to refer to a target. Both the initiator and target need to have such an iSCSI address. The address could look like this: iqn.1987-05.com.cisco:01.4ee667decaeb.
Type | Date | Naming Authority | String defined by the naming authority |
---|---|---|---|
iqn | 1987-05 | com.cisco | 01.4ee667decaeb |
iqn | 1992-05 | com.emc | ck2000544005070000-5 |
iqn | 1992-05 | com.example | storage.disk1.sys1.xyz |
The Type field can hold three possible values: iqn (iSCSI Qualified Name), eui (IEEE EUI-64 format) or naa (T11 Network Address Authority 64 or 128-bit identifier).
The Date field holds the date of the first full month an authority was registered.
The Naming Authority is the reversed domain name of the authority. Most of the times however an IP address is used for the actual setup and not the supplied domain name. So the actually used domain name might not be that relevant.
The String can hold any naming convention the authority wants to use. It is by the way legal to leave this field empty.
iSCSI addresses are written in the following format: iqn.1987-05.com.cisco:01.4ee667decaeb. There is colon (:) between de domain part and the actual device addressing. All other elements are seperarted by dots (.).
An initiator can ask a target for a list of available devices, this process is called Discovery or auto-discovery. Of course this is the easiest way of connecting to an iSCSI target. The other way is by directly supplying the portal and target or use iSNS (Internet Storage Naming Service). iSNS is like a DNS for storage networks.
To provide an iSCSI disk to the network, the underlying hardware can be any kind of disk and does not need to be a SCSI disk or a real disk for that matter, you could as easily provide RAM disks to the network or a disk image.
An iSCSI target is a host running a daemon that answers iSCSI calles coming in over the IP network. The disk(s) that are provided to the network are plain and simple block devices, so it has nothing to do with LVM, ext2 or ext3, or anything else related to file systems.
In this section we will show you how to quickly setup an iSCSI target using the iSCSI Enterprise Target Project, also known as the iscsitarget. On a Debian based system use:
apt-get install iscsitarget
The configuration file for the iSCSI target is /etc/ietd.conf. For testing purposes I created a 5M disk:
dd if=/dev/zero of=iSCSI-disk bs=1 count=5M parted iSCSI-disk mklabel loop losetup /dev/loop2 iSCSI-diskI then added this disk to the ietd.conf file as:
Target iqn.2001-04.com.example:storage.disk1.sys1.42 Lun 0 Path=/dev/loop2,Type=fileio,ScsiId=42,ScsiSN=42424242I then started or restarted, depending if the system is already running, the system. That's it! To admire your work one can do:
[prompt]# cat /proc/net/iet/volume tid:1 name:iqn.2001-04.com.example:storage.disk1.sys1.42 lun:0 state:0 iotype:fileio iomode:wt path:/dev/loop2and also have a look at:
[prompt]# cat /proc/net/iet/session tid:1 name:iqn.2001-04.com.example:storage.disk1.sys1.42This last one changes when an initiator connects to our drives.
The rest of the options used per Target are well described in the manual pages to come along the iSCSI Enterprise Target. This section is written to give you a quick and cheap solution to use for the next section.
Install on a Red Hat based system:
yum install iscsi-initiator-utilsOn Debian based systems:
apt-get install open-scsi
Our NAS consists of three systems, the storage system for which we use a MD3000i from Dell, the network for which we use an ethernet based iSCSI network, and the head nodes for which we use CentOS 5 machine runnig SAMBA and NFS.
This document assumes that you will be using the new open-iscsi tools, meaning your are using the scsiadm tool to configure your network.
For our initial setup I will be using a Dell MD3000i, since that is what I have at hand. The MD3000i will be our iSCSI target. Another reason I choose this box for this document is to be able to show how multipath works.
eth2: 10.13.251.3 netmask 255.255.255.192 (management)
eth1: 10.13.251.65 netmask 255.255.255.192
eth3: 10.13.251.129 netmask 255.255.255.192
You can protect your iSCSI devices against unauthorative access by using a username/password combination. We didn't do that, since our MD3000i had a way of blocking access to the disks by allowing and denying access on the basis of the iqn-address. And since our iSCSI network is a private network, that is was good enough for us. One thing you need to be aware of when using iqn-addresses and MD3000 systems is that if you have not created username/password protection on the target, you can log in to the target but you will not see any disks (no block devices are created), this can be puzzeling at moments.
So before doing anything we set an initiator address and make sure the MD3000 allows us access.
[prompt] # vim /etc/iscsi/initiatorname.iscsiOur system came with:
InitiatorName=iqn.1994-05.com.redhat:5f9695862554And we left it that way, but for readabilty you could change the 5f9695862554 part with e.g. the hostname.
To configure the iSCSI system with username/password security see the ietd.conf man-page for the IncomingUser and OutgoingUser options.
The open-iscsi project can be configured to use username/password authentication by using:
node.session.auth.authmethod = CHAP node.session.auth.username = username node.session.auth.password = passwordin the /etc/scsi/iscsid.conf file.
Create /etc/iscsi/iscsid.conf, bold are changes made according to the Dell site, italic is changes made by me. The remainder are the default values as supplied by CentOS 5.5:
node.startup = automatic # node.session.timeo.replacement_timeout = 120 node.session.timeo.replacement_timeout = 144 node.conn[0].timeo.login_timeout = 15 node.conn[0].timeo.logout_timeout = 15 # node.conn[0].timeo.noop_out_interval = 5 node.conn[0].timeo.noop_out_interval = 10 # node.conn[0].timeo.noop_out_timeout = 5 node.conn[0].timeo.noop_out_timeout = 15 node.session.err_timeo.abort_timeout = 15 node.session.err_timeo.lu_reset_timeout = 20 #node.session.initial_login_retry_max = 8 node.session.initial_login_retry_max = 4 node.session.cmds_max = 128 node.session.queue_depth = 32 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 # node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144 node.conn[0].iscsi.MaxRecvDataSegmentLength = 65535 discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768 node.conn[0].iscsi.HeaderDigest = None # This line did not exits node.conn[0].iscsi.DataDigest = None node.session.iscsi.FastAbort = Yes
Start iscsi:
[prompt] # service iscsi start iscsid dead but pid file exists Starting iSCSI daemon: [ OK ] [ OK ] Setting up iSCSI targets: iscsiadm: No records found! [ OK ]
Do a discovery to one of the targets:
[prompt] # iscsiadm --mode discovery --type sendtargets -p 10.13.251.66:3260 10.13.251.66:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c7]:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c9]:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 10.13.251.130:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 10.13.251.67:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 [fe80:0000:0000:0000:a6ba:dbff:fe18:e1a9]:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 [fe80:0000:0000:0000:a6ba:dbff:fe18:e1ab]:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 10.13.251.131:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157The discovery shows you the 4 MD3000i interfaces:
Portal | Controller | iSCSI address |
---|---|---|
10.13.251.66:3260 | 1 | iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 |
10.13.251.130:3260 | 2 | iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 |
10.13.251.67:3260 | 2 | iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 |
10.13.251.131:3260 | 1 | iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 |
In the discovery command above you see both IPv4 and IPv6 lines. In our case the IPv6 ones are not needed and do not want to use them. When the system starts it wants to log in to all found targets, since they are set to 'automatic', by setting the node start up 'manual' this will not happen. Use:
iscsiadm -m node -o update -n node.startup -v manual \ -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 \ -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c7]:3260,2To change the node.startup parameter for this node from automatic to manual. Change this for every IPv6 found target.
To get all information about a node use:
iscsiadm -m node -o show \ -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 \ -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c9]:3260,2
Get a list of current partitions:
[prompt] # cat /proc/partitions major minor #blocks name 8 0 48234496 sda 8 1 104391 sda1 8 2 5245222 sda2 8 3 2096482 sda3 8 4 1 sda4 8 5 5004216 sda5 253 0 5242880 dm-0 253 1 5001216 dm-1
The easiest way to login is to use:
[prompt] # iscsiadm -m node -lYou are now logged in to all portals.
Better is to login only to the portals that have the automatic parameter set. To do this use:
[prompt] # iscsiadm -m node --loginall=automatic
To see to which portals you are logged into you can use the following command:
[prompt] # iscsiadm -m session -P 0The output might look something like this:
tcp: [20] 10.13.251.130:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 tcp: [21] 10.13.251.66:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 tcp: [22] 10.13.251.131:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 tcp: [23] 10.13.251.67:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157The number between the [] is the session id, which you can use as a shorthand in -m session commands. Using -P 3 will also show you the available LUNs.
With a more detailed login command we can specify against which portal we want to log in:
[prompt] # iscsiadm --mode node --portal 10.13.251.66:3260 --targetname iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 --login Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260] Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260]: successful
The fact that we are now logged in, means we should have gotten extra disks to play with:
[prompt] # cat /proc/partitions major minor #blocks name 8 0 48234496 sda 8 1 104391 sda1 8 2 5245222 sda2 8 3 2096482 sda3 8 4 1 sda4 8 5 5004216 sda5 253 0 5242880 dm-0 253 1 5001216 dm-1 8 16 3904946176 sdb 8 32 524288000 sdc 8 48 524288000 sdd 8 64 903897088 sde 8 80 20480 sdf
A very handy tool for viewing SCSI related information is lsscsi, this command provides you with something like this:
# lsscsi [23:0:0:0] disk DELL MD3000i 0735 - [23:0:0:31] disk DELL Universal Xport 0735 /dev/sdb [24:0:0:0] disk DELL MD3000i 0735 - [24:0:0:31] disk DELL Universal Xport 0735 /dev/sdc [25:0:0:0] disk DELL MD3000i 0735 - [25:0:0:31] disk DELL Universal Xport 0735 /dev/sdd [26:0:0:0] disk DELL MD3000i 0735 - [26:0:0:31] disk DELL Universal Xport 0735 /dev/sde
To see the mapping between the /dev/sdb and the iqn use:
(cd /dev/disk/by-path; ls -l *dell* | awk '{FS=" "; print $9 " " $11}')Watch out... LUN 31 is reserved by the Access share and is not a normal disk.
A unique link is created in /dev/disk/by-path/. The name of the device has the generic naming convention:
ip-<ip-number>:<port>-iscsi-<target-indentifier>-lun-<number>
Example:
ip-10.13.251.130:3260-iscsi-iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157-lun-0.
The biggest problem with iSCSI is the reboot. Since the discovery of devices is done on boot, you never know which target reponds as the first, so every time the order in which the disks are "seen" by the kernel might be different. Meaning that the disk that was /dev/sdf, might now become /dev/sdg. This will quickly turn into an enormous problem. To make sure that every disk is assigned the same name every time, we use udev. See: /dev/disk/by-id/
Now run:
iscsiadm -m node -RTo rescan all nodes and discover the newly created disks. Use:
cat /proc/partitionsto show the new disk.
To logout from an iSCSI target use:
[prompt] # iscsiadm -m node --target iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 --logout Logging out of session [sid: 1, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260] Logout of [sid: 1, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260]: successful
To do that for all systems at once use:
iscsiadm -m node -u
A simple bash function that will logout all sessions:
function iscsi_logout { local IFS=$'\n' for line in `iscsiadm -m session -o show`; do portal=`echo $line | cut -d' ' -f3` target=`echo $line | cut -d' ' -f4` iscsiadm -m node -u -T $target -p $portal done }
To view the target per interface use:
iscsiadm -m iface -P 1
To view the interface per target use:
iscsiadm -m node -P 1
The information of a node that is stored in the local iSCSI database can be obtained through:
iscsiadm -m node -o show
To obtain statistics information about a session use:
iscsiadm -m session -r 21 -swhere 21 is the session id.
For all session information use (not the statistics):
iscsiadm -m session -P 3
The iSCSI database that is maintained by the iSCSI tools to keep track of your iSCSI network is a set of files. On a Red Hat based system you can find the files in /var/lib/iscsi. Debian uses /etc/iscsi to store this information.
Within the database directory there are at least three sub-directories ifaces, nodes, send_targets, which contain information about what the name suggests.
This directories are filled with data from the first moment on. That is stored fully automatic based on the configuration you did in /etc/iscsi/iscsid.conf. Changes in the /etc/iscsi/iscsid.conf file will have no effect on already gathered information. One has to use the iscsiadm to propagate changes to existing targets.
The iscsiadm command has the -o or --op= option to control the database. Per default the --op=show option is used for listing information. Other operations on the database are: new, delete, update and nonpersistent. new adds information to the database, delete removes information, update changes information and nonpersistent means that data gathered by the command is not added to the database.
An overview of the data per node can be retreived through:
iscsiadm -m node -o show \ -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 \ -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c9]:3260,2
To change a setting we use the -n and -v options. -n gives the name of the option to change and -v gives the, new, value:
iscsiadm -m node -o update -n node.startup -v manual \ -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 \ -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c7]:3260,2
There is a little problem I encountered in Red Hat 5 based systms in the iscsi init script which states in the stop function:
# If this is a final shutdown/halt, do nothing since # lvm/dm, md, power path, etc do not always handle this #if [ "$RUNLEVEL" = "6" -o "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "1" ]; then # success # return #fiI commented it out, since it hangs the machine during reboot if we don't.
When shutting down the iSCSI system on Debian it tries to gracefully umount everything that is related to iSCSI. However if for whatever reason there is no block device present for iSCSI the tools try to umount everything which results in a read-only /. To prevent this from happening change the /etc/init.d/umountiscsi.sh script with the following if-statement:
for BLOCK_FILE in $SESSION_DIR/target*/*\:*/block/*; do # Added to make the script skip a host when no block device is present if [ "${BLOCK_FILE}" = "${SESSION_DIR}/target*/*:*/block/*" ]; then log_warning_msg "No block devices found for host ${HOST_DIR##*/}" continue fiThe script now prints a warning if no block device entry is present.
If after a reboot or another change your LVM disk seems to be gone use:
# lvscan ACTIVE '/dev/vg_data/fsdisk' [4.77 GB] inherit inactive '/dev/home/part1' [4.50 TB] inherit inactive '/dev/var/part1' [6.40 TB] inherit ACTIVE '/dev/vg_data/fsroot' [5.00 GB] inheritIf it reports inactive, like in the above example use:
lvchange -ay home/part1 lvchange -ay var/part1To activate the devices.
If you want to move a target from one host to the other, use the following procedure. We assume the use of LVM here.
vgcfgbackup -f <VGName>.backup <VGName>
vgchange -an <VGName>
vgexport <VGName>
iscsiadm -m node -u
iscsiadm --mode discovery --type sendtargets -p 10.13.251.66:3260
iscsiadm -m node -l
vgcfgrestore --list -f <VGName>.backup
vgcfgrestore -f <VGName>.backup <VGName>
vgscan
pvscan
vgimport <VGName>
vgchange -ay <VGName>
lvscan