Saturday, March 30, 2013

RAID configuration with mdadm

In this article, I'll try to explain how to configure any RAID with mdadm in a Linux box with Debian squeeze 6.0.5.

There are many considerations to think about before starting:

  • RAID is not a backup system, it doesn't substitute your backup policies, so you'll have to back up your data anyway.
  • There are many RAID configurations and, in addition, you can combine them. Read about RAID, then make a choice.
  • The utility mdadm allows you to configure a software RAID, so you won't need a special motherboard, as is the case of hardware or firmware RAIDs.
  • The goal is to create a new virtual device file that, when used, will internally use the device file of each partition belonging to the RAID. That's transparent for the user and the system commands like mkfs, mount, etc.
  • The easiest way to begin is with two or more empty partitions. Configuring RAID in a pre-existing filesystem (let's say the one in /dev/sdb1) leads to create the RAID with the rest of the partitions, then copying the data from the original filesystem to the new RAID filesystem (for instance /dev/md1) and, finally, adding it (/dev/sdb1 to /dev/md1).
  • Configuring RAID using the root partition leads to at least one reboot (while the system's running, you can't unmount the original filesystem to mount instead the RAID filesystem).
There are a couple of pre-requisites, too:
  • All partitions must be created in order to have its device file available.
  • Before configuring a RAID for an existing filesystem, back up its data, just in case.
Example 1: configuring RAID 5 on 3 new disks

First of all, check that all 3 devices have a partition you want to use for the RAID 5. In this example, I'll use the first partition of the disks sdb, sdc and sdd.

fdisk -l /dev/sdb /dev/sdc /dev/sdd

Disk /dev/sdb: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x1739a12e

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         130     1043201   83  Linux

Disk /dev/sdc: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xcdf40006

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1         130     1043201   83  Linux

Disk /dev/sdd: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x257d9f1f

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1         130     1043201   83  Linux

When possible, use disks with the same part number or model number. This will increase performance.

Then activate the kernel module for the RAID we want to configure:

modprobe raid5

And install the mdadm package:

apt-get install mdadm

When installing it, two questions will be asked. The first one asks about the MD (Multi Disk) arrays needed to bring up the root filesystem. Leave it blank, for we're not configuring a RAID for it, but for another filesystem.

The second question asks for the rest of MD arrays to be automatically started during the boot process. Answer 'yes'.

Now you're ready to create the RAID 5. Use the command bellow:

mdadm --create /dev/md5 --force --level=5 --raid-disks=3 /dev/sd[b-d]1

mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md5 started.

Where:
  • /dev/md5 is whatever name you want for the new device file.
  • force is telling mdadm to not to leave one of the devices in spare (read the mdadm's man page for more information on creating a RAID 5).
  • level is the number corresponding to the RAID you want.
  • raid-disks is the number of initial disks to use (this parameter must be equal to the number of devices plus the missing word used for those not listed).
  • /dev/sd[b-d]1 are the devices to use (write 'missing' if there are devices not listed).
Check that everything's working with:

cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md5 : active (auto-read-only) raid5 sdd1[2] sdc1[1] sdb1[0]
      2085888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      resync=PENDING
      
unused devices: <none>

Look at the UUU. If there was some problem, you'd see something like UU_ or _UU or U_U, depending of the failing disk.

Now a new /dev/md5 device has been created and can be used as any command line parameter. So, let's create the filesystem and the mount point:

mkfs.ext3 /dev/md5
mkdir /data

Edit the fstab to add the line shown bellow:

tail -1 /etc/fstab

/dev/md5 /data ext3 defaults 0 0

And mount the new filesystem, that should look like:

mount /data
df -h /dev/md5

Filesystem            Size  Used Avail Use% Mounted on
/dev/md5              2.0G   36M  1.9G   2% /data

Notice that only 2 GB of the 3 GB are available, for one block every three is the parity block and the other two the data blocks.

Finally, check again the RAID (the resync is already done):

cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md5 : active raid5 sdd1[2] sdc1[1] sdb1[0]
      2085888 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
      
unused devices: <none>

Example 2: configuring RAID 1 on system disk

To configure RAID 1 or mirroring on the system disk where the root filesystem is, while it's mounted and alive, you have to follow these general steps:
  1. Prepare the second disk (/dev/sdb) with the same layout and MBR (master boot record) as the first (/dev/sda).
  2. Create a RAID 1 with only the second disk (it will appear as degraded), create the filesystem for the RAID and mount it on a temporary mount point.
  3. Prepare the fstab to mount the RAID filesystem and GRUB to look for the kernel at this filesystem (instead of /dev/sda1 or /dev/sdb1).
  4. Copy all the data from the original root partition (/dev/sda1) to the new filesystem (/dev/md1 for instance).
  5. Once finished, reboot and try the new configuration.
  6. Add the root partition to the RAID 1. The syncronization process will start and may need hours to finish, depending on the size of the partition.

Step 1

Copy the partition layout from disk sda to disk sdb:

sfdisk -d /dev/sda | sfdisk --force /dev/sdb

Install mdadm and activate the kernel module raid1 (see the example above for more information). However, in this case, answer 'all' when asked for the MD (Multi Disk) arrays needed to bring up the root filesystem.

Step 2

Create a new RAID device, let's say /dev/md1, with the first real device missing (you'all add it later), as shown bellow:

mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/sdb1

Answer 'yes' when prompted about the boot loader if you're using GRUB.

They will start in a degraded state. You'll see it in an email you'll get in the root account (by default) and by showing the /proc/mdstat virtual file.

cat /proc/mdstat


Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active (auto-read-only) raid1 sdb1[1]
      3966964 blocks super 1.2 [2/1] [_U]
      
unused devices: <none>

Now create a temporary mount point, make the filesystem and mount it:

mkdir /mnt/temp
mkfs.ext3 /dev/md1
mount /dev/md1 /mnt/temp

Repeat this step for each filesystem you want to configure RAID for, changing the mount point accordingly, then continue.

To create a RAID 1 for the swap space, check the partition number corresponding to it:

swapon -s

Filename Type Size Used Priority
/dev/sda5   partition 223224 0 -1

Then create the RAID, with a missing device again:

mdadm --create /dev/md/swap --level=1 --raid-disks=2 missing /dev/sdb5

And make the new device a swap space (you'll add it later in the /etc/fstab):

mkswap /dev/md/swap

Step 3

Prepare the /etc/fstab to mount only RAID devices instead of disk partition devices. Yours may look like this:

grep md /etc/fstab

/dev/md1  / ext3 errors=remount-ro 0 1
/dev/md/swap none swap sw 0 0

Now it's time to prepare GRUB so the boot process will look for the kernel in the RAID instead of the disk partition. This will make your system bootable from any disk.

This part depends on the version of GRUB you're using. In my example, it's GRUB 2 (version 1.98).

First, copy the file 40_custom to whatever begins with 07, 08 or 09, so the option menu will be shown before the rest:

cp /etc/grub.d/40_custom /etc/grub.d/09_raid

Then add the following lines to it:

menuentry "Linux Debian squeeze - RAID" {
insmod raid
 insmod mdraid
 insmod part_msdos
 insmod ext2
 set root='(md/1)'
echo "Loading Linux kernel..."
linux /vmlinuz root=/dev/md1 ro quiet
echo "Loading initial ramdisk..."
initrd /initrd.img
}

It will make the system to go to the MD array number 1 (/dev/md/1) for the information in /boot.

Second, uncomment the line GRUB_DISABLE_LINUX_UUID=true in the file /etc/default/grub to disable the use of UUIDs.

Third, update the file /boot/grub/grub.cfg:

update-grub

And, last but not least, install GRUB in the sdb disk:

grub-install /dev/sda
grub-install /dev/sdb

GRUB boot menu

To know more about GRUB 2, clic here.

Step 4

Copy all the data from the original filesystem (/dev/sda1 mounted on /) to the new one (/dev/md1 mounted on /mnt/temp):

cp -dpRx / /mnt/temp

This may take several minutes, so stay calm and wait. Once you get again the shell prompt, check that the used space in both filesystems match:

df -h /dev/sda1 /dev/md1

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             3.8G  628M  3.0G  18% /
/dev/md1              3.8G  628M  3.0G  18% /mnt/temp

Repeat the copy process for each filesystem you are configuring RAID for.

Step 5

Now reboot your system.

The device files in /dev/md are simply symbolic links to /dev:

ls -l /dev/md

total 0
lrwxrwxrwx 1 root root 6 Mar 25 19:58 1 -> ../md1
lrwxrwxrwx 1 root root 8 Mar 25 19:58 swap -> ../md127

Check that the mounted filesystems are mirrored:

mount | grep md

/dev/md1 on / type ext3 (rw,errors=remount-ro)

Check that the swap space is mirrored:

swapon -s

Filename Type Size Used Priority
/dev/md127  partition 223212 0 -1

Step 6

After rebooting, the disk partitions in /dev/sda are not in use anymore and can be added to the corresponding RAID device.

Before doing it, change the type of partition to "Linux raid autodetect" or copy back the layout from sdb to sda (as we did in step 1):

sfdisk -d /dev/sdb | sfdisk --force /dev/sda

Let's begin with the root partition: 

mdadm --add /dev/md1 /dev/sda1

Now the swap space:

mdadm --add /dev/md/swap /dev/sda5

Finally, check the RAID status. You may see how it's syncronizing (it may take a very long time, probably hours):

cat /proc/mdstat


Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sda1[2] sdb1[1]
      3966964 blocks super 1.2 [2/1] [_U]
      [=========>...........]  recovery = 46.4% (1842560/3966964) finish=2.5min speed=13987K/sec
      
md127 : active raid1 sda5[2] sdb5[1]
      223220 blocks super 1.2 [2/1] [_U]
      resync=DELAYED
      
unused devices: <none>

Once the MD array in the system disk is detected, the file /boot/grub/grub.cfg is already configured to boot from the RAID (see its section 10_linux), so the section you added (09_raid) isn't necessary anymore.

You can remove the file /etc/grub.d/09_raid or simply make it not executable:

rm /etc/grub.d/09_raid

And reconfigure all the GRUB boot process:

update-grub
grub-install /dev/sda
grub-install /dev/sdb

Monitoring the MD arrays

You can stop an MD array for maintenance with:

mdamd --stop /dev/md5

Then, you'll be able to remove its superblock (otherwise, the system will detect it at startup):

mdadm --zero-superblock /dev/sdb5
mdadm --zero-superblock /dev/sdc5
mdadm --zero-superblock /dev/sdd5

You can examine the MD arrays of any disc:

mdadm --examine --scan

About the configuration file /etc/mdadm/mdadm.conf, it looks like this by default:

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This file was auto-generated on Fri, 29 Mar 2013 13:45:51 +0100
# by mkconf 3.1.4-1+8efb9d1+squeeze1

There you can set up the e-mail address (MAILADDR) where to send any issue relating the MD arrays, the devices to scan (DEVICES)or the information about existing MD arrays. Refer to the man page for further information.