Software RAID

device mapper raid

© 2010, 2011, 2012 Dennis Leeuw dleeuw at made-it dot com
License: GPLv2 or later

Index

RAID level	Technology	Min disks	Max failures	Capacity	Benefit	Downside
0	Striping	2	0	N*s	Speed	No redundancy
1	Mirroring	2	1	s	Read double speed, write singel speed
3	Parity	3	1	(N-1)*s	Safety is gained by a parity disk, which only costs one disk	Disk is only available for read or write
4	Parity per stripe	3	1	(N-1)*s	Safety is gained by a parity disk, which only costs one disk, read and write access
5	Parity distributed over disks	3	1	(N-1)*s	No limit speed of parity disk
6	2 parity blocks distributed over disks	4	2	(N-2)*s	No limit speed of parity disk	High write penalty

N: The amount of disks in the array
s: The size of the smallest disk

A combination of the above options is possible. Where one would create e.g. RAID 10, with 2 times 2 mirrored drives that hold a stripe set. Or RAID 01 where you would have a stripe set which is then mirrored. Capacity (N*s)/2, failure 1 disk, or an entire mirror set.

Collecting information

To get an overview of your RAID configuration use:

mdadm -D /dev/md0

This will output something like:

/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 10 11:16:54 2009
     Raid Level : raid5
     Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
  Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Apr 23 10:39:03 2011
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : f23d7914:e24c3d1b:46cbea72:d9f44c76
         Events : 0.146842

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       2       0        0        2      removed
       3       8       64        3      active sync   /dev/sde

This array was built with RAID5, containing 4 disks of 1T. One disk (/dev/sdd) is broken.

Stripped map

The striped type can be used to stripe data across two or more disks.

A striped map is defined as: start length striped #stripes chunk_size device1 offset1 ... deviceN

As described in the Introduction there is only one amount of sectors parameter in the table for striped setups. This means that the smallest disk is the limiting factor in creating a striped disk set. If you have unequal sized disks the remainig disk space can be used like described in the linear section. So nothing needs to go to waist.

New options in the table for a striped setup are the "amount of stripes" and the "chunk size". The chunk size is given in kilo bytes.

SIZE1=$(blockdev --getsize /dev/loop1)
echo "0 ${SIZE1} striped 2 32 /dev/loop1 0 /dev/loop2 0" |\
	 dmsetup create striped

Mirror map

A mirror map is defined as: start length mirror log_type #logargs logarg1 ... logargN device1 offset1 ... deviceN offsetN

Fixing RAID arrays

To add the new device to the software RAID configuration use:

mdadm --manage /dev/md0 --add /dev/sdd

This notifies our RAID that there is a replacement disk and it will thus start to rebuild /dev/sdd. During the rebuild we are still vulnarable for a crash of one of other disks. One can monitor the current status with:

/dev/md0:
        Version : 00.90.03
  Creation Time : Tue Mar 10 11:16:54 2009
     Raid Level : raid5
     Array Size : 2930287488 (2794.54 GiB 3000.61 GB)
  Used Dev Size : 976762496 (931.51 GiB 1000.20 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Apr 23 11:19:00 2011
          State : clean, degraded, recovering
 Active Devices : 3
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

 Rebuild Status : 1% complete

           UUID : f23d7914:e23c3d1b:46cbea62:d9f44b76
         Events : 0.146916

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       32        1      active sync   /dev/sdc
       4       8       48        2      spare rebuilding   /dev/sdd
       3       8       64        3      active sync   /dev/sde

The new disk is being rebuilt and for 1% complete.

https://raid.wiki.kernel.org/index.php/RAID_Recovery

Seagate NAS

This machine was not one of my own, but from a friend of a colleague. So no exact specs, just a single disk from a RAID1 set. The box crashed, now let's recover the data. Software we need, so install through apt-get:

mdadm
fuseext2

We hooked up the device though an USB-craddle and it got detected by the kernel and was assigned /dev/sdb (see dmesg for your local system information)

Using lsblk we tried to figure out what is what:

root@host:~# lsblk -f /dev/sdb
NAME                 FSTYPE            LABEL             MOUNTPOINT
sdb                                                      
├─sdb1                                                   
├─sdb2               linux_raid_member (none):0          
├─sdb3               ext2                                /media/blaat/73ade21c-3a3e-4f03-984d-910532c7b3c8
├─sdb4               linux_raid_member (none):2          
├─sdb5               linux_raid_member (none):3          
├─sdb6               linux_raid_member (none):4          
├─sdb7               linux_raid_member (none):5          
├─sdb8               linux_raid_member (none):6          
├─sdb9               linux_raid_member (none):7          
└─sdb10              linux_raid_member BA-0010753892FE:8

As you can see sdb3 was auto detected and mounted. But the real data is on sdb10 (this was deduced using the partition size which got from using parted -l.

Let's recreate a RAID1 with just this one disk:

root@host:~# mdadm --assemble --force /dev/md0 /dev/sdb10

Our new raid-system should show up in mdstat:

root@host:~# cat /proc/mdstat
Personalities : [raid1] 
md0 : active raid1 sdb10[1]
      1949260819 blocks super 1.2 [2/1] [_U]

If we try to mount /dev/md0 to /mnt it complains that md0 is an LVM2_member. So we have to recover our LVM:

root@host:~# pvscan
root@host:~# vgscan
root@host:~# lvscan
root@host:~# lvdisplay
  --- Logical volume ---
  LV Path                /dev/vg8/lv8
  LV Name                lv8
  VG Name                vg8
  LV UUID                mZEj8v-qjpm-kinw-qyJa-8g7T-qi8V-KfiENS
  LV Write Access        read/write
  LV Creation host, time , 
  LV Status              available
  # open                 1
  LV Size                1,82 TiB
  Current LE             475893
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           252:2

That looks like something we could use... but what is the FS that is on this system?

root@host:~# blkid /dev/vg8/lv8
/dev/vg8/lv8: UUID="9b39ef6d-8553-4d0e-ac82-f0789fca0bbf" TYPE="ext4"

So it is an ext4 file system, and it thus might have a journal on board.

Since the disk comes from a crashed box, we first want to recover what can be recovered:

root@host:~# fsck.ext4 /dev/vg8/lv8

So now that all is fixed (I typed yes to all questions), let's mount it...:

root@host:~# mount -t ext4 /dev/vg8/lv8 /mnt/
mount: wrong fs type, bad option, bad superblock on /dev/mapper/vg8-lv8,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

Oeps... dmesg reports: EXT4-fs (dm-2): bad block size 65536 and google knows that means that we have a disk with a block size larger then 4k. So fuse to the rescue:

root@host:~# fuseext2 -o ro -o sync_read /dev/vg8/lv8 /mnt/

Note the fact that we have mounted the filesystem read-only! Now we can finaly access the files again.

After rescueing the files we need to tear everything down again:

fusermount -u /mnt
lvchange -an /dev/vg8/lv8
mdadm -S /dev/md0
umount /dev/sdb3
udisks --detach /dev/sdb