Thứ Năm, 1 tháng 10, 2009

Configuring Software RAID1 on Fedora Core using Disk Druid during system install

RAID1 or mirroring configuration uses two hard drives to duplicates exactly one drive to the other. This provides hardware redundancy - if one drive fails the other can continue to operate independently. Hardware RAID is provided by the controller, which present to the operating system one logical drive and the RAID management is transparent.

Such RAID controllers manufacturers are Adaptec, LSI MegaRAID and 3ware. The last provides drivers for all operating system). Be aware of performance issues involved with software RAID.

Note: most of the onboard SATA RAID controllers are not a real hardware RAID, but just provide an extension for the operating system. A driver must be installed for proper use of such controllers. Also, Dell's PowerEdge 1850 and 1950 have the MegaRAID which require a driver to work properly under Linux.

If you tried to configure software RAID, and not followed the next steps, there is a good chance that you're not protected at all (did you ever test?). Further more, you might have bumped into many problems during the installation, such as disk not booting up after the installation, getting GRUB error messages during boot, system is bootable only when the primary disk is online but not when the secondary is, RAID is not working as you expect, and many more.

Configuring software RAID during the system install, using Disk Druid is not a trivial procedure. This document describe the steps you need to take in order for such configuration to work.

While writing this guide, I used two 8GB SATA hard drives; primary /dev/sda and secondary /dev/sdb. The BIOS was configured with the onboard SATA RAID disabled, and both drives were controlled directly by the BIOS. So the operating system see two hard drives.



0. To sum it up...
The following steps should be followed to achieve the goal:

•1. Partition and configure RAID using Disk Druid
•2. Build the RAID arrays
•3. Configure GRUB
•4. Test
Additional important steps:

•5. Check RAID status and put the RAID on monitor
•6. Recover from disk failure (god forbid)


1. Partition and configure RAID using Disk Druid
During the installation of Fedora, you'll be asked if to automatically partition using Disk Druid, or manually partition. No matter which you choose, you should delete all the existing partitions and start with a clean drivers (it will delete all your existing data - you should know):



There are 3 partitions you should create. /boot, swap and / (also referred as root). Our goal is to have both root and /boot partitions on the RAID1. It is unwise to put the swap on the software RAID as it will cause unnecessary overhead.
Important: The /boot partition should be the first one on the disk, i.e. start at cylinder 1. In addition, make sure you set "Force to be a primary partition" on each partition you create (unless you know what you're doing). A /boot partition of 100MB should be enough for most configurations.

Let's start with creating the /boot partition. Click on the RAID button and choose "Create a software RAID partition":



For the File System Type choose "Software RAID", select drive first drive and set a fixed size of 100MB:



Repeat the same for the second drive, resulting in two software RAID partitions of 100MB, one on each drive. Those partitions are now ready for RAID device and mount point creation:



Click on the RAID button and now choose "Create a RAID device". For the Mount Point choose "/boot", RAID Level should be RAID1, on device md0, as shown in the following figure:



Now create a swap partition. The swap partition size should at least match the size of the RAM. Swap should not reside on the software RAID so all you need to do is to click on New, and create a swap on each hard drive. The result will be two swap partitions, one on each drive:



Now, after creating the /boot and the swap partition, allocate the remaining free space as md1 and create the root partition on it. You should be now familiar with the steps. The final results of the partitioning should be similar to the following figure:



Complete the Fedora installation. When the system reboot it will probably halt ;( prior to loading GRUB. Error message may vary between file system errors, kernel panic, and GRUB error 17.

Don't be frustrated (yet) as there are some more actions you need to take.



2. Build the RAID arrays
Boot from the first installation CD, but instead of starting the installation type "linux rescue" to start the command prompt rescue mode. On the command prompt set a new root and build the RAID array:

sh-3.00# chroot /mnt/sysimage
RAID status is reported through the file /proc/mdstat. Let's view it and see how our RAID is performing:

[root@raidtest ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[1] sda3[0]
7060480 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
If you see similar results then the RAID configuration is correct. The [UU] means that both hard drives are up. But although the RAID is
configured, it is not performing correctly as it is not set as "hot". Run the following command to "hotadd" and
rebuild the array:

[root@raidtest ~]# mdadm /dev/md0 --add /dev/sda1
[root@raidtest ~]# mdadm /dev/md1 --add /dev/sda3
During the rebuild you can cat /proc/mdstat to check the current progress and status. This process might take some time - depends on the sizes of the partitions.
Important: Wait until the process is done before you continue to the next step.



3. Configure GRUB
The first drive (on my system is /dev/sda) is not yet bootable. In following actions we complete the GRUB loader installation on the both drives and set the /boot as bootable.

Continue working on the rescue mode command prompt, and load GRUB shell:

sh-3.00# grub
On the GRUB shell type the following commands to re-install the boot loader on both drives, so when (not if - when!) each of the drive will fail or crash, your system will still boot. You might need to substitute
the hard drive location to match your system configuration:

grub> device (hd0) /dev/sda
grub> root (hd0,0)
grub> setup (hd0)

grub> device (hd1) /dev/sdb
grub> root (hd1,0)
grub> setup (hd1)
Quit and boot from the hard disk. The system should load. Don't skip the testing stage to make sure everything is REALLY working properly.



4. Test
The best way to test is physically unplug each drive, and see if the system is boot with only the other drive connected (make sure you power down the system before unplugging the drive).
Important: Testing causes your RAID to be degraded. This means that after you reconnect the drive you must hotadd the drive back to the array using mdadm /dev/mdx --add /dev/sdxx command.


If the test completed successfully and your system is booting from each drive, then you're basically done. Though I suggest that you'll continue with the next procedures to learn more incase you'll have a major crisis (touch wood).



5. Check RAID status and put the RAID on monitor
There are several ways to check the current status of your RAID, the best is using the mdadm --detail command. In the following example you can see that the RAID is degraded. Only /dev/sdb1 is active while the other one /dev/sda1 is missing from the RAID.

[root@raidtest ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.01
Creation Time : Sun Jul 22 08:25:21 2007
Raid Level : raid1
Array Size : 104320 (101.88 MiB 106.82 MB)
Device Size : 104320 (101.88 MiB 106.82 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Aug 1 15:08:24 2007
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : 08ed38e5:7ffca26e:f5ec53fc:e5d1983e
Events : 0.1423

Number Major Minor RaidDevice State
0 0 0 - removed
1 8 17 1 active sync /dev/sdb1

Other ways of checking RAID is by checking the system log:

[root@raidtest ~]# tail -n 50 /var/log/messages
Or:


[root@raidtest ~]# dmesg
And and always you can check the content of the /proc/mdstat file


[root@raidtest ~]# cat /proc/mdstat
Now we'll put a monitor daemon that will send an email alert when there is a problem with the RAID:

[root@raidtest ~]# mdadm --monitor --scan --mail=you@domain.com delay=3600 --daemonise /dev/md0 /dev/md1
To test that emails are working, add a -t argument to the above line, and a test email will be sent. Don't forget to kill the test process you just created. It is recommended to put this line inside /etc/rc.local so it will automatically load after the system boots.



6. Recover from disk failure
When you encounter a failure in the RAID, the first thing I would suggest is that you DON'T PANIC! You should still be able to access your data and even boot, but the next thing you should do is to backup all the data. It happened to me once that after a disk failure, I accidentally deleted the good disk as well.... Luckily I didn't panic, and made a complete backup prior to any other actions I took :)

So, after you took a cold glass of water, and backed up all the data you need to identify the faulty disk by checking the content of the /proc/mdstat file. In my example below you can see that /dev/sda3 is no longer a member of the RAID, and obviously the RAID is not performing:

[root@raidtest ~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[1]
7060480 blocks [2/1] [_U]

md0 : active raid1 sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
/dev/sda is the SATA hard drive connected to the first SATA controller. I physically removed it from the system and replaced it by a new one. Note that /dev/sda1, which resides on the same hard drive did not fail, but when I replace the faulty drive I will have to rebuild both arrays.
When you plug in a new hard drive, you don't have to worry about the size of the disk - just make sure it is larger than the one you already have installed. The free space in the new drive will not be a member in the RAID.

After replacing the faulty disk the partition table is to be created using fdisk, based on the exact partition table of the good disk. Here, /dev/sda is a completely new 250GB hard drive.

[root@raidtest ~]# fdisk -l

Disk /dev/sda: 250.0 GB, 250058268160 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

Disk /dev/sdb: 8.0 GB, 8589934592 bytes
255 heads, 63 sectors/track, 1019 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 104391 fd Linux raid autodetect
/dev/sdb2 14 140 1020127+ 82 Linux swap / Solaris
/dev/sdb3 141 1019 7060567+ fd Linux raid autodetect

Disk /dev/md0: 106 MB, 106823680 bytes
2 heads, 4 sectors/track, 26080 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/md1: 7229 MB, 7229931520 bytes
2 heads, 4 sectors/track, 1765120 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md1 doesn't contain a valid partition table

Before you continue - are you sure everything is on backup? If so, then load fdisk with the new disk as a parameter. My inputs are highlighted. You will have to adjust the input to match your own system.

[root@raidtest ~]# fdisk /dev/sda

The number of cylinders for this disk is set to 30401.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-30401, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-30401, default 30401): 13

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (14-30401, default 14): 14
Last cylinder or +size or +sizeM or +sizeK (14-30401, default 30401): 140

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (141-30401, default 141): 141
Last cylinder or +size or +sizeM or +sizeK (141-30401, default 30401): 1019

Command (m for help): a
Partition number (1-4): 1

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): t
Partition number (1-4): 2
Hex code (type L to list codes): 82
Changed system type of partition 2 to 82 (Linux swap / Solaris)

Command (m for help): t
Partition number (1-4): 3
Hex code (type L to list codes): fd
Changed system type of partition 3 to fd (Linux raid autodetect)

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.


Here is explanation of the procedure:

•Create 3 primary partitions using the (n) command - the sizes are based on the info from the good and working drive.
•Set partition #1 as bootable using the (a) command.
•Change the partitions system id using the (t) command - partition #1 and #3 to type fd, and partition #2 to type 82.
•Save the changes to the partition table using the (w) command.
EDIT (2008-01-15): Reinstall GRUB boot loader on the failed drive as described on Step #3 above.

Now the new hard drive is ready to participate in the RAID. We just need to hotadd it to the RAID using mdadm /dev/mdx --add /dev/sdxx command:

[root@raidtest ~]# mdadm /dev/md0 --add /dev/sda1
[root@raidtest ~]# mdadm /dev/md1 --add /dev/sda3
and check the content of the /proc/mdstat file to make sure everything is working properly.


If this page has helped you and you would like to contribute to this web site, please donate. Small amounts like $5 are helpful and will be gratefully accepted. Thank you!
(Nguon :http://www.ping.co.il/node/1/)

Không có nhận xét nào: