Managing RAID arrays with mdadm

Mdadm stands for Multiple Disk and Device Administration. It is a command line tool that can be used to manage software RAID arrays on your Linux PC. This article outlines the basics you need to get started with it.

The following five commands allow you to make use of mdadm’s most basic features:

  1. Create a RAID array:
    # mdadm --create /dev/md/test --homehost=any --metadata=1.0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
  2. Assemble (and start) a RAID array:
    # mdadm --assemble /dev/md/test /dev/sda1 /dev/sdb1
  3. Stop a RAID array:
    # mdadm --stop /dev/md/test
  4. Delete a RAID array:
    # mdadm --zero-superblock /dev/sda1 /dev/sdb1
  5. Check the status of all assembled RAID arrays:
    # cat /proc/mdstat

Notes on features

mdadm --create

The create command shown above includes the following four parameters in addition to the create parameter itself and the device names:

  1. –homehost:
    By default, mdadm stores your computer’s name as an attribute of the RAID array. If your computer name does not match the stored name, the array will not automatically assemble. This feature is useful in server clusters that share hard drives because file system corruption usually occurs if multiple servers attempt to access the same drive at the same time. The name any is reserved and disables the homehost restriction.
  2. –metadata:
    mdadm reserves a small portion of each RAID device to store information about the RAID array itself. The metadata parameter specifies the format and location of the information. The value 1.0 indicates to use version-1 formatting and store the metadata at the end of the device.
  3. –level:
    The level parameter specifies how the data should be distributed among the underlying devices. Level 1 indicates each device should contain a complete copy of all the data. This level is also known as disk mirroring.
  4. –raid-devices:
    The raid-devices parameter specifies the number of devices that will be used to create the RAID array.

By using level=1 (mirroring) in combination with metadata=1.0 (store the metadata at the end of the device), you create a RAID1 array whose underlying devices appear normal if accessed without the aid of the mdadm driver. This is useful in the case of disaster recovery, because you can access the device even if the new system doesn’t support mdadm arrays. It’s also useful in case a program needs read-only access to the underlying device before mdadm is available. For example, the UEFI firmware in a computer may need to read the bootloader from the ESP before mdadm is started.

mdadm --assemble

The assemble command above fails if a member device is missing or corrupt. To force the RAID array to assemble and start when one of its members is missing, use the following command:

# mdadm --assemble --run /dev/md/test /dev/sda1

Other important notes

Avoid writing directly to any devices that underlay a mdadm RAID1 array. That causes the devices to become out-of-sync and mdadm won’t know that they are out-of-sync. If you access a RAID1 array with a device that’s been modified out-of-band, you can cause file system corruption. If you modify a RAID1 device out-of-band and need to force the array to re-synchronize, delete the mdadm metadata from the device to be overwritten and then re-add it to the array as demonstrated below:

# mdadm --zero-superblock /dev/sdb1
# mdadm --assemble --run /dev/md/test /dev/sda1
# mdadm /dev/md/test --add /dev/sdb1

These commands completely overwrite the contents of sdb1 with the contents of sda1.

To specify any RAID arrays to automatically activate when your computer starts, create an /etc/mdadm.conf configuration file.

For the most up-to-date and detailed information, check the man pages:

$ man mdadm 
$ man mdadm.conf

The next article of this series will show a step-by-step guide on how to convert an existing single-disk Linux installation to a mirrored-disk installation, that will continue running even if one of its hard drives suddenly stops working!

Fedora Project community

14 Comments

  1. Norbert J.

    Great idea to shed some light on this quite complex matter, thanks!

    I have been using MD RAIDs on 2 desktop computers for some years and always wondered what “homehost” is good for. It took me also some time to figure out that a kernel parameter “rd.md.uuid=…” is needed if the root fs resides on a MD RAID and a generic (not host-only) initramfs is used; maybe you want to mention this in a later article.

    • Interesting. I didn’t know that an initramfs compiled with “hostonly=no” will exclude the /etc/mdadm.conf file. I just checked with the current version of dracut (049-26.git20181204), and indeed /etc/mdadm.conf does get excluded. So, your options with a “hostonly=no” initramfs are to specify the UUID on the kernel command line, manually include the /etc/mdadm.conf file when you build the initramfs, or specify the rd.auto kernel command line option to have all RAID arrays automatically assembled. The latter probably makes the most sense if you are really going for a “generic” system.

      I’m adding a note about this problem to the upcoming article right now. Thanks!

  2. Göran Uddeborg

    You hint this will become a series. It would be very much appreciated if you could include one post on debugging. I have tried to create a RAID-1 setup quite similar to what you describe above. It seems to work functionally, but a lot of the time it feels like it goes VERY slowly. When running on only one of the disks, I don’t see the issue. I would expect to see a slight speedup for reads and a slight slowdown for writes, but not as much as this.

    Note: I’m not asking you to help debugging my system! I’m only suggesting debugging as a topic for one of the parts in the series, and just gave the description above to explain what I mean.

  3. Mark

    Thanks for this, the other important notes section is what I will find most useful as at some point a disk will need to be replaced; so jotted that down as a starting point for the day I need it.
    I found the article at https://www.tecmint.com/create-raid1-in-linux/ an easy how-to in setting up software raid1 on two additional disks and using mdadm to create the contents needed for mdadm.conf.
    Looking forward on the upcoming article on how to convert a single disk system to mirrored, if it can be done without losing data.

  4. I like to use raid10 (which is different from raid1+0) because you can mirror and stripe with any number of devices, including 3. The stripes are arranged in a pattern to get performance like striping (raid0), but with mirroring. Raid 1+0 also does that, but only with more drives (starting with 4).

    I use raid1, when I need to mirror boot partitions.

    • Sure. I’ve heard that the answer to the problem “I need more disk speed” is to “throw more spindles” (i.e. disk drives in RAID arrays) at the problem; even for big companies and datacenters.

      One thing to beware of though is how hard it will be to recover from the situation if your computer were to fail. Which is yet another advantage of software RAID — it is a bit more “portable” than a proprietary hardware RAID controller (and I have seen hardware RAID controllers fail). If you have a more complicated software RAID configuration, it may be difficult to reconstruct in the event that your computer dies. So there are trade-offs.

  5. One huge advantage of software RAID, is that you don’t need to match drive size or waste space. E.g., suppose you have 2 1T drives, with raid1 (for simplicity). Now you add a 3rd 2T drive. How do you use the space while maintaining mirroring? Simple:

    o allocate 2 1T partitions on the 2T drive
    o migrate one of the 1T mirror legs to the first 1T partition.
    o create a new RAID array from the 2nd 1T partition and the vacated 1T drive
    o add the new RAID array to your Volume Group

    • Indeed. Another thing that I like about it is that it is easy to configure email notifications to be sent in case a drive fails. You can even configure an arbitrary program to be run if, for example, you wanted a text message instead of an email. The possibilities are endless. ????

      • I’ve also seen some benchmarks indicating that software RAID is faster than hardware for raid0, raid1 (and presumably raid10 if that were an option on hardware) with comparable controllers. The hardware is mainly for raid5 and 6, where handling the checksums and read, modify, write cycles is expensive.

        • Makes sense. The OS knows a lot more about the processes that are running and their access patterns than the hard drive or RAID controller ever could. It has a lot more memory with which to hold and reorder read and write operations too.

  6. I think RAID5/6 normally uses XOR for the parity calculations that allow it to reconstruct data when blocks are missing or corrupt. AES (Advanced Encryption Standard) is something a little different. It is used to encrypt data. But, yes, some common algorithms for encryption and checksums (like AES and CRC32, respectively) are hardware accelerated.

Comments are Closed

The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site. The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions