Replace a Failing Drive in a RAID6 Array Using mdadm
Most users that run some sort of home storage server will probably, (see: hopefully), be running some type of RAID array.
It is also likely that at some point, one or more of the drives in your array will start to degrade. That could be read errors, bad sectors, or worse complete hardware failure. In this case you will have to replace the faulty drive with a new drive of equal or larger size.
I was experiencing read errors on a new 4TB Western Digital Red NAS drive. I have 6 of these drives in a RAID6 array running Ubuntu 13.10. The array was using mdadm as a software RAID controller.
Here you will find the steps taken to replace a failing drive within a RAID6 array that uses mdadm as a software RAID controller.
Identify the Problem
Running the smartctl on the drive in question allowed me to confirm that the drive was indeed having read errors.
$ sudo smartctl -a /dev/sdg
This produces the following results
=== START OF INFORMATION SECTION ===
Device Model: WDC WD40EFRX-68WT0N0
LU WWN Device Id: 5 0014ee 2092bd325
Firmware Version: 80.00A80
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sat May 31 13:22:51 2014 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (55920) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 559) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 199 051 Pre-fail Always - 439
3 Spin_Up_Time 0x0027 188 188 021 Pre-fail Always - 7591
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 61
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2705
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 61
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 58
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 2995
194 Temperature_Celsius 0x0022 117 102 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 11
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 9
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 12
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 2704 266440
# 2 Conveyance offline Completed: read failure 90% 2648 266440
# 3 Extended offline Completed: read failure 90% 2646 266440
# 4 Conveyance offline Completed: read failure 90% 2480 266440
# 5 Extended offline Completed: read failure 90% 2478 266440
# 6 Conveyance offline Completed: read failure 90% 2312 266440
# 7 Extended offline Completed: read failure 90% 2310 266440
# 8 Conveyance offline Completed: read failure 90% 2144 266440
# 9 Extended offline Completed: read failure 90% 2142 266440
#10 Extended offline Completed without error 00% 1985 -
#11 Extended offline Completed without error 00% 1818 -
#12 Extended offline Completed without error 00% 1650 -
#13 Extended offline Completed without error 00% 1482 -
#14 Extended offline Completed without error 00% 1314 -
#15 Extended offline Completed without error 00% 1146 -
#16 Extended offline Completed without error 00% 979 -
#17 Extended offline Completed without error 00% 811 -
#18 Extended offline Completed without error 00% 644 -
#19 Conveyance offline Completed: read failure 90% 468 269312
#20 Extended offline Completed: read failure 90% 466 269312
#21 Short offline Completed: read failure 90% 312 269312
3 of 12 failed self-tests are outdated by newer successful extended offline self-test #10
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
You can see in the highlighted lines, that there are a few read-errors. I figured I would replace the drive now, since I was still well within my warranty period and avoid headache later.
The Array
Before you begin, it may be a good idea to grab a birds eye view of what your array looks like.
This can easily be accomplished (if you are already using mdadm as your RAID controller) by running:
$ cat /proc/mdstat
This should return results similar to:
/dev/md0:
Version : 1.2
Creation Time : Sat Feb 8 00:12:06 2014
Raid Level : raid6
Array Size : 15627540480 (14903.58 GiB 16002.60 GB)
Used Dev Size : 3906885120 (3725.90 GiB 4000.65 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Update Time : Sat May 31 13:23:12 2014
State : clean
Active Devices : 6
Working Devices : 6
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : Sol:0 (local to host Sol)
UUID : 5fd6fcc6:d2300ce9:7d7184be:4b5e6da3
Events : 220
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
5 8 97 5 active sync /dev/sdg1
You can see that while the array state is clean and functioning properly, I still chose to replace the drive. Make a note of the drive number and mount point.
In this case I will be replacing drive 5 - /dev/sdg(1)
in the array /dev/md0
.
Failing/Removing the Drive
First, we need to mark the drive as failed within the array. This can be done with:
sudo mdadm --fail /dev/md0 /dev/sdg1
This will tell mdadm to fail drive /dev/sdg
in the array /dev/md0
. It will return the following results:
mdadm: set /dev/sdg1 faulty in /dev/md0
We can confirm that the drive has been set as faulty with:
$ sudo mdadm --detail /dev/md0
Which returns
/dev/md0:
Version : 1.2
Creation Time : Sat Feb 8 00:12:06 2014
Raid Level : raid6
Array Size : 15627540480 (14903.58 GiB 16002.60 GB)
Used Dev Size : 3906885120 (3725.90 GiB 4000.65 GB)
Raid Devices : 6
Total Devices : 6
Persistence : Superblock is persistent
Update Time : Sat May 31 13:25:24 2014
State : clean, degraded
Active Devices : 5
Working Devices : 5
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 512K
Name : Sol:0 (local to host Sol)
UUID : 5fd6fcc6:d2300ce9:7d7184be:4b5e6da3
Events : 222
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
4 8 81 4 active sync /dev/sdf1
5 0 0 5 removed
5 8 97 - faulty spare /dev/sdg1
We confirm that the drive has been marked as faulty.
Next we need to remove the failed drive from within the array. This can be done with:
$ sudo mdadm --manage /dev/md0 --remove /dev/sdg1
Which returns
mdadm: hot removed /dev/sdg1 from /dev/md0
We can confirm that the drive has been removed from the active array by running:
$ cat /proc/mdstat
Which returns
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdb1[0] sdd1[2] sdf1[4] sde1[3] sdc1[1]
15627540480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UUUUU_]
unused devices:
You can see that sdg1
is no longer within the active array.
Shutdown and Replace
It is now safe for you to shut down the machine and physically replace the drive in question.
You should have taken note of the serial number in the first steps to help make identifying the drive simple.
Partition and Add
Now that the new drive is in the system, we can go ahead and boot the machine.
Note: Upon boot you may encounter a boot error about a
Degraded RAID Array
.
It will recommend adding bootflags to the kernel boot parameters and dump you to an initramfs prompt, but I found that if I catch the prompt quick enough you can type y
and hit enter to force it to boot.
At this point, the system should be booted.
We are going to use a utility called gdisk
to copy the partition table from another drive onto our new drive. I didn’t have gdisk
installed by default but this can easily be installed through apt-get.
$ sudo apt-get install gdisk
Using gdisk, we’ll first use the -R
flag to replicate the partition schema of another drive within the array onto our new drive.
sudo sgdisk -R=/dev/sdg /dev/sdf
It’s very important you put the correct drives in the correct order. In the above command we are replicating the partition schema of drive /dev/sdf
to drive /dev/sdg
.
You will receive a response like this:
The operation has completed successfully.
Next we need to randomize the new drive’s GUIDs to prevent conflict with any other drives. This can be done with:
$ sudo sgdisk -G /dev/sdg
Which returns:
The operation has completed successfully.
Now we can verify that the partition tables of our two drives are identical.
The donor drive /dev/sdf
$ sudo parted /dev/sdf print
Gives us
Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdf: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 4001GB 4001GB raid
The receiving drive /dev/sdg
$ sudo parted /dev/sdg print
Gives us
Model: ATA WDC WD40EFRX-68W (scsi)
Disk /dev/sdg: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 4001GB 4001GB raid
Both now have identical partition schemas and flags. Great!
Adding the Drive Back to the Array
Now that we have a clean drive that is partitioned correctly, it is time to add it back into our array.
Remember in my case, the array mount-point is /dev/md0
but yours could be different.
To add the drive:
Note: Notice that I add the drive mount-point /dev/sdg1
$ sudo mdadm --manage /dev/md0 --add /dev/sdg1
Which returns:
mdadm: added /dev/sdg1
Verify Recovery
Now that the drive has successfully been added to the array, we can verify the rebuilding process is in progress.
$ cat /proc/mdstat
Which returns:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdg1[6] sde1[3] sdf1[4] sdc1[1] sdb1[0] sdd1[2]
15627540480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UUUUU_]
[>....................] recovery = 0.0% (383360/3906885120) finish=1188.8min speed=54765K/sec
Keep in mind that depending on the size of your array, the recovery process could take a while. In my case, nearly 20 hours. You can always check the status of the recovery by running the above command.