B.4. Faulty objects

As discussed in the previous section, if one of the active objects in a RAID-1 or RAID-4/5 region has a problem, that object will be kicked out and the region will become degraded. A problem can occur with active objects in a variety of ways. For instance, a disk can crash, a disk can be pulled out of the system, a drive cable can be removed, or one or more I/Os can cause errors. Any of these will result in the object being kicked out and the RAID region becoming degraded.

If a disk has completely stopped working or has been removed from the machine, EVMS obviously will no longer recognize that disk, and it will not show up as part of the RAID region when running the EVMS user interfaces. However, if the disk is still available in the machine, EVMS will likely be able to recognize that the disk is assigned to the RAID region, but has been removed from any active service by the kernel. This type of disk is referred to as a faulty object.

B.4.1. Removing faulty objects

Faulty objects are no longer usable by the RAID region, and should be removed. You can remove faulty objects with the "remfaulty" plug-in function for both RAID-1 and RAID-4/5. This operation is very similar to removing spare objects. After the object is removed, it will appear in the Available-Objects list in the EVMS user interfaces.

Faulty objects can be removed while the RAID region is active and in use.

B.4.2. Fixing temporarily failed objects

Sometimes a disk can have a temporary problem that causes the disk to be marked faulty and the RAID region to become degraded. For instance, a drive cable can come loose, causing the MD kernel driver to think the disk has disappeared. However, if the cable is plugged back in, the disk should be available for normal use. However, the MD kernel driver and the EVMS MD plug-in will continue to indicate that the disk is a faulty object because the disk might have missed some writes to the RAID region and would therefore be out of sync with the rest of the disks in the region.

In order to correct this situation, the faulty object should be removed from the RAID region (as discussed in the previous section). The object will then show up as an Available-Object. Next, that object should be added back to the RAID region as a spare (as discussed in Section B.3.1. When the changes are saved, the MD kernel driver will activate the spare and sync the data and parity. When the sync is complete, the RAID region will be operating in its original, normal configuration.

This procedure can be accomplished while the RAID region is active and in use.

B.4.3. Marking objects faulty

EVMS provides the ability to manually mark a child of a RAID-1 or RAID-4/5 region as faulty. This has the same effect as if the object had some problem or caused I/O errors. The object will be kicked out from active service in the region, and will then show up as a faulty object in EVMS. It can then be removed from the region as discussed in the previous sections.

There are a variety of reasons why you might want to manually mark an object faulty. One example would be to test failure scenarios to learn how Linux and EVMS deal with the hardware failures. Another example would be that you want to replace one of the current active objects with a different object. To do this, you would add the new object as a spare, then mark the current object faulty (causing the new object to be activated and the data to be resynced), and finally remove the faulty object.

EVMS allows you to mark an object faulty in a RAID-1 region if there are more than one active objects in the region. EVMS allows you to mark an object faulty in a RAID-4/5 region if the region has a spare object.

Use the "markfaulty" plug-in function for both RAID-1 and RAID-4/5. This command can be used while the RAID region is active and in use.