Loading...
 

Greg`s Tech blog

Recovering from a Bad Drive in a Greyhole storage pool

Monday 13 of February, 2017

I run an Amahi home server which hosts a number of web apps (inlcuding this blog) as well a a large pool of storage for my home.  Amahi uses greyhole (see here and here) to pool disparate disks into a single storage pool. Samba shares are then be added to the pool and greyhole handles distributing data across the pool to use up free space in a controlled manner.  Share data can be made redundant by choosing to make 1, 2 or max copies of the data (where max means a copy on every disk).


The benefit over, say, RAID 5 is that 1) different size disks may be used; 2) each disk has its own complete file system which does not depend on disk grouping; 3) each file system is mounted (and can be unmounted) separately or on a different machine.


So right before the holidays, the 3TB disk on my server (paired with a 1 TB disk) started to go bad.  Reads were succeeeding but took a long time.  Eventually we could no longer watch video files we store on the server and watch through WDTV.  Here is how I went about recovering service and the data (including the mistakes I made).



  • Bought a new 3TB drive and formatted it with ext4 and mounted it (using an external drive dock) and added it to the pool as Drive6.

  • Told greyhole the old disk was going away (drive4)
    greyhole --going=/var/hda/files/drives/drive4/gh

    Greyhole will look to copy any data off the drive that is not copied elsewhere in the pool. It has no effect on the data on the `going` disk (nothing is deleted) except it could cause further damage. The command ran for several days and due to disk errors didn't accomplish much, so I killed the process and took a new tact.

I decided to remove the disk from the pool and attempt an alternate method for recovering the data. 


  • Told greyhole the drive was gone.
    greyhole --gone=/var/hda/files/drives/drive4/gh 
    Greyhole will no longer look for the disk or the data on it.  It has no effect on the data on disk. 


  • Used safecopy to make a drive image of the old disk to a file on the new disk. (if you not used safecopy, check it out.  It will run different levels of data extraction, can be stopped and restarted using the same command and will resume where it left off.
    safecopy --stage1 /dev/sdd1 /var/hda/files/drives/Drive6/d1 -I /dev/null


This took about two weeks to accomplish due to drive errors.  And because I was making a disk image, I eventually ran out of space on the new disk before it completed.



  • Bought a  4TB drive and mounted it using an external dock as drive7; copied over and deleted the drive image from the Drive6.


  • Marked the 1TB drive (drive5) as going (see command above) and gone. This moved any good data off the 1TB drive to drive7 but left plenty of room to complete the drive image.




  • Swapped drive5 (1TB) and drive7 (4TB) in the server chassis. Retired the 1TB drive.




  • Mounted the bad 3TB drive in the external dock and resumed the safecopy using:
    safecopy --stage1 /dev/sdd1 /var/hda/files/drives/Drive7/d1 -I /dev/null



  • Mounted the drive image. The base OS for the server is Fedora 23. The drive tool inlcudes a menu item to mount a drive image.  It worked pretty simply to mount the image at /run/media/username/someGUID.




  • Used rsync to copy the data form the image to the data share.  I use a service script called mount_shares_locally as the preferred method for putting data into greyhole pool is by copying it to the samba share.  The one caveat here is that greyhole must stage the data while it copies it to the permanent location. That staging area is on the / partition under /var/hda.  I have about 300GB free on that partition so I had to monitor the copy and kill the rsync every couple hours. Fortunately, rsync handles this gracefully which is why I chose it over a straight copy.



rsync -av "/run/media/user/5685259e-b425-477b-9055-626364ac095e/gh/Video"  "/mnt/samba/"



 


A couple observations.  First, because of the way I had greyhole shares setup, I had safe copies of the critical data. All my docs, photos and music had a safe second copy. The data on the failed disk was disposable.  I undertook the whole process because I wanted to see if it would work and whatever I recovered would only be a plus.  


This took some time and a bit of finesse on my part to get the data back.  But I like how well greyhole performed and how having the independent filesystems gave me the option to recover data on my time. Finding safecopy simplified this a lot and added a new weapon to my recovery toolkit!.