iSCSI

SCSI over IP

© 2010, 2011, 2012 Dennis Leeuw dleeuw at made-it dot com
License: GPLv2 or later

Index

    1. How it works
      1. Projects: initiators
      2. Projects: targets
    2. Creating a Target
    3. The initiator
      1. Installing the Initiator
      2. Our setup
      3. Security and the initiator name
      4. The iscsid.conf configuration
      5. Discovering nodes
      6. Log in
      7. Rescanning nodes
      8. Log out
      9. Collecting information
      10. Manipulating collected information
    4. Problems and fixes
      1. Red Hat
      2. Debian
      3. LVM
      4. Moving targets (disks) from one host to another

How it works

iSCSI is client-server-based. The client is called the initiator and the server is called the target. The initiator sends SCSI commands to the target and the target sends the results back to the initiator. In hardware comparison, the initiator is the SCSI adapter and the target is the SCSI disk. With iSCSI the SCSI-bus is replaced by a network and TCP/IP.

The basic concept from a Linux point of view is that iSCSI makes block devices (disks) available across the network. This means you can partition it and create a filesystem on it. Because of this the initiator needs to have exclusive rights on the target. The target can only supply an iSCSI device to a single initiator.

There are two types of initiators:

software initiator

Software initiators consist of one or more kernel drivers, a TCP-stack and a network card (NIC).

hardware initiator

SCSI is CPU intensive esp. during high I/O loads. Hardware initiators offload the main CPU of a computer by implementing a SCSI ASIC on a network card and some form of TOE(TCP Offload Engine). The combination is often called an HBA (Host Bus Adapter). The HBA appears to the host as a "normal" SCSI adapter. With an option ROM it is even possible to boot a system from an iSCSI disk.

TOE handles part or all of the TCP-stack handling on the NIC. Which frees resources on the main CPU.

Since a host can supply multiple disks to the network, only an IP address is not sufficient to address a target. Of course next to an IP address also a port is used to connect to an iSCSI host. The combination of an IP address and a port is called the iSCSI portal. This way you can run multiple portals on a single host. The default port for the iSCSI portal is TCP/3260 and if a system port is needed port TCP/860 should be used.

iSCSI also has a globally unique addressing scheme to refer to a target. Both the initiator and target need to have such an iSCSI address. The address could look like this: iqn.1987-05.com.cisco:01.4ee667decaeb.

Type Date Naming Authority String defined by the naming authority
iqn 1987-05 com.cisco 01.4ee667decaeb
iqn 1992-05 com.emc ck2000544005070000-5
iqn 1992-05 com.example storage.disk1.sys1.xyz

The Type field can hold three possible values: iqn (iSCSI Qualified Name), eui (IEEE EUI-64 format) or naa (T11 Network Address Authority 64 or 128-bit identifier).

The Date field holds the date of the first full month an authority was registered.

The Naming Authority is the reversed domain name of the authority. Most of the times however an IP address is used for the actual setup and not the supplied domain name. So the actually used domain name might not be that relevant.

The String can hold any naming convention the authority wants to use. It is by the way legal to leave this field empty.

iSCSI addresses are written in the following format: iqn.1987-05.com.cisco:01.4ee667decaeb. There is colon (:) between de domain part and the actual device addressing. All other elements are seperarted by dots (.).

An initiator can ask a target for a list of available devices, this process is called Discovery or auto-discovery. Of course this is the easiest way of connecting to an iSCSI target. The other way is by directly supplying the portal and target or use iSNS (Internet Storage Naming Service). iSNS is like a DNS for storage networks.

To provide an iSCSI disk to the network, the underlying hardware can be any kind of disk and does not need to be a SCSI disk or a real disk for that matter, you could as easily provide RAM disks to the network or a disk image.

Projects: initiators

Open iSCSI
Open-iSCSI project is a high performance, transport independent, multi-platform implementation of RFC3720. The code is enriched with the code from the Linux iSCSI project and hosted on the Linux iSCSI site (?)

Projects: targets

IET - iSCSI Enterprise Target
The aim of the project is to develop an open source iSCSI target with professional features, that works well in enterprise environment under real workload, and is scalable and versatile enough to meet the challenge of future storage needs and developments. This is a fork of the Ardis Project
NetBSD iSCSI
Red Hat seems to supply this iSCSI target.
LIO
LIO (linux-iscsi.org) is a Target implementation and successor of the STGT project in kernel 2.6.38 and up. It supports all prevalent storage fabrics, including iSCSI, Fibre Channel (QLogic), FCoE and InfiniBand (Mellanox SRP).

Creating a Target

An iSCSI target is a host running a daemon that answers iSCSI calles coming in over the IP network. The disk(s) that are provided to the network are plain and simple block devices, so it has nothing to do with LVM, ext2 or ext3, or anything else related to file systems.

In this section we will show you how to quickly setup an iSCSI target using the iSCSI Enterprise Target Project, also known as the iscsitarget. On a Debian based system use:

apt-get install iscsitarget

The configuration file for the iSCSI target is /etc/ietd.conf. For testing purposes I created a 5M disk:

dd if=/dev/zero of=iSCSI-disk bs=1 count=5M
parted iSCSI-disk mklabel loop
losetup /dev/loop2 iSCSI-disk
I then added this disk to the ietd.conf file as:
Target iqn.2001-04.com.example:storage.disk1.sys1.42
        Lun 0 Path=/dev/loop2,Type=fileio,ScsiId=42,ScsiSN=42424242
I then started or restarted, depending if the system is already running, the system. That's it! To admire your work one can do:
[prompt]# cat /proc/net/iet/volume 
tid:1 name:iqn.2001-04.com.example:storage.disk1.sys1.42
	lun:0 state:0 iotype:fileio iomode:wt path:/dev/loop2
and also have a look at:
[prompt]# cat /proc/net/iet/session 
tid:1 name:iqn.2001-04.com.example:storage.disk1.sys1.42
This last one changes when an initiator connects to our drives.

The rest of the options used per Target are well described in the manual pages to come along the iSCSI Enterprise Target. This section is written to give you a quick and cheap solution to use for the next section.

The initiator

Installing the Initiator

Install on a Red Hat based system:

yum install iscsi-initiator-utils
On Debian based systems:
apt-get install open-scsi

Our setup

Our NAS consists of three systems, the storage system for which we use a MD3000i from Dell, the network for which we use an ethernet based iSCSI network, and the head nodes for which we use CentOS 5 machine runnig SAMBA and NFS.

This document assumes that you will be using the new open-iscsi tools, meaning your are using the scsiadm tool to configure your network.

For our initial setup I will be using a Dell MD3000i, since that is what I have at hand. The MD3000i will be our iSCSI target. Another reason I choose this box for this document is to be able to show how multipath works.

eth2: 10.13.251.3 netmask 255.255.255.192 (management)
eth1: 10.13.251.65 netmask 255.255.255.192
eth3: 10.13.251.129 netmask 255.255.255.192

Security and the initiator name

You can protect your iSCSI devices against unauthorative access by using a username/password combination. We didn't do that, since our MD3000i had a way of blocking access to the disks by allowing and denying access on the basis of the iqn-address. And since our iSCSI network is a private network, that is was good enough for us. One thing you need to be aware of when using iqn-addresses and MD3000 systems is that if you have not created username/password protection on the target, you can log in to the target but you will not see any disks (no block devices are created), this can be puzzeling at moments.

So before doing anything we set an initiator address and make sure the MD3000 allows us access.

[prompt] # vim /etc/iscsi/initiatorname.iscsi
Our system came with:
InitiatorName=iqn.1994-05.com.redhat:5f9695862554
And we left it that way, but for readabilty you could change the 5f9695862554 part with e.g. the hostname.

To configure the iSCSI system with username/password security see the ietd.conf man-page for the IncomingUser and OutgoingUser options.

The open-iscsi project can be configured to use username/password authentication by using:

node.session.auth.authmethod = CHAP
node.session.auth.username = username
node.session.auth.password = password
in the /etc/scsi/iscsid.conf file.

The iscsid.conf configuration

Create /etc/iscsi/iscsid.conf, bold are changes made according to the Dell site, italic is changes made by me. The remainder are the default values as supplied by CentOS 5.5:

node.startup = automatic
# node.session.timeo.replacement_timeout = 120
node.session.timeo.replacement_timeout = 144
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
# node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_interval = 10
# node.conn[0].timeo.noop_out_timeout = 5
node.conn[0].timeo.noop_out_timeout = 15
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
#node.session.initial_login_retry_max = 8
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
# node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxRecvDataSegmentLength = 65535
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
# This line did not exits
node.conn[0].iscsi.DataDigest = None
node.session.iscsi.FastAbort = Yes

Start iscsi:

[prompt] # service iscsi start
iscsid dead but pid file exists
Starting iSCSI daemon:                                     [  OK  ]
                                                           [  OK  ]
Setting up iSCSI targets: iscsiadm: No records found!
                                                           [  OK  ]

Discovering nodes

Do a discovery to one of the targets:

[prompt] # iscsiadm --mode discovery --type sendtargets -p 10.13.251.66:3260
10.13.251.66:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
[fe80:0000:0000:0000:a6ba:dbff:fe18:e1c7]:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
[fe80:0000:0000:0000:a6ba:dbff:fe18:e1c9]:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
10.13.251.130:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
10.13.251.67:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
[fe80:0000:0000:0000:a6ba:dbff:fe18:e1a9]:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
[fe80:0000:0000:0000:a6ba:dbff:fe18:e1ab]:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
10.13.251.131:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
The discovery shows you the 4 MD3000i interfaces:
PortalControlleriSCSI address
10.13.251.66:32601 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
10.13.251.130:32602iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
10.13.251.67:32602 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
10.13.251.131:32601iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157

In the discovery command above you see both IPv4 and IPv6 lines. In our case the IPv6 ones are not needed and do not want to use them. When the system starts it wants to log in to all found targets, since they are set to 'automatic', by setting the node start up 'manual' this will not happen. Use:

iscsiadm -m node -o update -n node.startup -v manual \
	 -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 \
	 -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c7]:3260,2
To change the node.startup parameter for this node from automatic to manual. Change this for every IPv6 found target.

To get all information about a node use:

iscsiadm -m node -o show  \
	 -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157  \
	 -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c9]:3260,2

Log in

Get a list of current partitions:

[prompt] # cat /proc/partitions 
major minor  #blocks  name

   8     0   48234496 sda
   8     1     104391 sda1
   8     2    5245222 sda2
   8     3    2096482 sda3
   8     4          1 sda4
   8     5    5004216 sda5
 253     0    5242880 dm-0
 253     1    5001216 dm-1

The easiest way to login is to use:

[prompt] # iscsiadm -m node -l
You are now logged in to all portals.

Better is to login only to the portals that have the automatic parameter set. To do this use:

[prompt] # iscsiadm -m node --loginall=automatic

To see to which portals you are logged into you can use the following command:

[prompt] # iscsiadm -m session -P 0
The output might look something like this:
tcp: [20] 10.13.251.130:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
tcp: [21] 10.13.251.66:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
tcp: [22] 10.13.251.131:3260,1 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
tcp: [23] 10.13.251.67:3260,2 iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157
The number between the [] is the session id, which you can use as a shorthand in -m session commands. Using -P 3 will also show you the available LUNs.

With a more detailed login command we can specify against which portal we want to log in:

[prompt] # iscsiadm --mode node --portal 10.13.251.66:3260 --targetname iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 --login

Logging in to [iface: default, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260]
Login to [iface: default, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260]: successful

The fact that we are now logged in, means we should have gotten extra disks to play with:

[prompt] # cat /proc/partitions 
major minor  #blocks  name

   8     0   48234496 sda
   8     1     104391 sda1
   8     2    5245222 sda2
   8     3    2096482 sda3
   8     4          1 sda4
   8     5    5004216 sda5
 253     0    5242880 dm-0
 253     1    5001216 dm-1
   8    16 3904946176 sdb
   8    32  524288000 sdc
   8    48  524288000 sdd
   8    64  903897088 sde
   8    80      20480 sdf

A very handy tool for viewing SCSI related information is lsscsi, this command provides you with something like this:

# lsscsi
[23:0:0:0]   disk    DELL     MD3000i          0735  -       
[23:0:0:31]  disk    DELL     Universal Xport  0735  /dev/sdb
[24:0:0:0]   disk    DELL     MD3000i          0735  -       
[24:0:0:31]  disk    DELL     Universal Xport  0735  /dev/sdc
[25:0:0:0]   disk    DELL     MD3000i          0735  -       
[25:0:0:31]  disk    DELL     Universal Xport  0735  /dev/sdd
[26:0:0:0]   disk    DELL     MD3000i          0735  -       
[26:0:0:31]  disk    DELL     Universal Xport  0735  /dev/sde

To see the mapping between the /dev/sdb and the iqn use:

(cd /dev/disk/by-path; ls -l *dell* | awk '{FS=" "; print $9 " " $11}')
Watch out... LUN 31 is reserved by the Access share and is not a normal disk.

A unique link is created in /dev/disk/by-path/. The name of the device has the generic naming convention:
ip-<ip-number>:<port>-iscsi-<target-indentifier>-lun-<number>
Example:
ip-10.13.251.130:3260-iscsi-iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157-lun-0.

The biggest problem with iSCSI is the reboot. Since the discovery of devices is done on boot, you never know which target reponds as the first, so every time the order in which the disks are "seen" by the kernel might be different. Meaning that the disk that was /dev/sdf, might now become /dev/sdg. This will quickly turn into an enormous problem. To make sure that every disk is assigned the same name every time, we use udev. See: /dev/disk/by-id/

Rescanning nodes

Now run:

iscsiadm -m node -R
To rescan all nodes and discover the newly created disks. Use:
cat /proc/partitions
to show the new disk.

Log out

To logout from an iSCSI target use:

[prompt] # iscsiadm -m node --target iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 --logout
Logging out of session [sid: 1, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260]
Logout of [sid: 1, target: iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157, portal: 10.13.251.66,3260]: successful

To do that for all systems at once use:

iscsiadm -m node -u

A simple bash function that will logout all sessions:

function iscsi_logout {
	local IFS=$'\n'
	for line in `iscsiadm -m session -o show`; do
		portal=`echo $line | cut -d' ' -f3`
		target=`echo $line | cut -d' ' -f4`
		iscsiadm -m node -u -T $target -p $portal
	done
}

Collecting information

To view the target per interface use:

iscsiadm -m iface -P 1

To view the interface per target use:

iscsiadm -m node -P 1

The information of a node that is stored in the local iSCSI database can be obtained through:

iscsiadm -m node -o show

To obtain statistics information about a session use:

iscsiadm -m session -r 21 -s
where 21 is the session id.

For all session information use (not the statistics):

iscsiadm -m session -P 3

Manipulating collected information

The iSCSI database that is maintained by the iSCSI tools to keep track of your iSCSI network is a set of files. On a Red Hat based system you can find the files in /var/lib/iscsi. Debian uses /etc/iscsi to store this information.

Within the database directory there are at least three sub-directories ifaces, nodes, send_targets, which contain information about what the name suggests.

This directories are filled with data from the first moment on. That is stored fully automatic based on the configuration you did in /etc/iscsi/iscsid.conf. Changes in the /etc/iscsi/iscsid.conf file will have no effect on already gathered information. One has to use the iscsiadm to propagate changes to existing targets.

The iscsiadm command has the -o or --op= option to control the database. Per default the --op=show option is used for listing information. Other operations on the database are: new, delete, update and nonpersistent. new adds information to the database, delete removes information, update changes information and nonpersistent means that data gathered by the command is not added to the database.

An overview of the data per node can be retreived through:

iscsiadm -m node -o show  \
	 -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157  \
	 -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c9]:3260,2

To change a setting we use the -n and -v options. -n gives the name of the option to change and -v gives the, new, value:

iscsiadm -m node -o update -n node.startup -v manual \
	 -T iqn.1984-05.com.dell:powervault.md3000i.6a4badb00018e1a7000000004b690157 \
	 -p [fe80:0000:0000:0000:a6ba:dbff:fe18:e1c7]:3260,2

Problems and fixes

Red Hat

There is a little problem I encountered in Red Hat 5 based systms in the iscsi init script which states in the stop function:

        # If this is a final shutdown/halt, do nothing since
        # lvm/dm, md, power path, etc do not always handle this
        #if [ "$RUNLEVEL" = "6" -o "$RUNLEVEL" = "0" -o "$RUNLEVEL" = "1" ]; then
        #       success
        #       return
        #fi
I commented it out, since it hangs the machine during reboot if we don't.

Debian

When shutting down the iSCSI system on Debian it tries to gracefully umount everything that is related to iSCSI. However if for whatever reason there is no block device present for iSCSI the tools try to umount everything which results in a read-only /. To prevent this from happening change the /etc/init.d/umountiscsi.sh script with the following if-statement:

                for BLOCK_FILE in $SESSION_DIR/target*/*\:*/block/*; do
			# Added to make the script skip a host when no block device is present
                        if [ "${BLOCK_FILE}" = "${SESSION_DIR}/target*/*:*/block/*" ]; then
                                log_warning_msg "No block devices found for host ${HOST_DIR##*/}"
                                continue
                        fi
The script now prints a warning if no block device entry is present.

LVM

If after a reboot or another change your LVM disk seems to be gone use:

# lvscan 
  ACTIVE            '/dev/vg_data/fsdisk' [4.77 GB] inherit
  inactive          '/dev/home/part1' [4.50 TB] inherit
  inactive          '/dev/var/part1' [6.40 TB] inherit
  ACTIVE            '/dev/vg_data/fsroot' [5.00 GB] inherit
If it reports inactive, like in the above example use:
lvchange -ay home/part1
lvchange -ay var/part1
To activate the devices.

Moving targets (disks) from one host to another

If you want to move a target from one host to the other, use the following procedure. We assume the use of LVM here.

  1. Unmount the Logical Volumes on the old machine
  2. Make sure you have a copy of your Volume Group information:
    vgcfgbackup -f <VGName>.backup <VGName>
  3. Mark the Volume Group as inactive:
    vgchange -an <VGName>
  4. Export the Volume Group:
    vgexport <VGName>
  5. Logout from the iSCSI target:
    iscsiadm -m node -u
  6. Scan iSCSI targets on the new machine:
    iscsiadm --mode discovery --type sendtargets -p 10.13.251.66:3260
  7. Log in to the target:
    iscsiadm -m node -l
  8. Check the VG backup file:
    vgcfgrestore --list -f <VGName>.backup
  9. Restore the VG you want:
    vgcfgrestore -f <VGName>.backup <VGName>
  10. Scan for the available Volume Groups:
    vgscan
  11. Scan for the available Physical Volumes:
    pvscan
  12. Import the Volume Group information:
    vgimport <VGName>
  13. Activate the Volume Group:
    vgchange -ay <VGName>
  14. Check to see that all Logical Volumes are present:
    lvscan
  15. You can now mount the Logical Volumes.