Loading...
 

Greg`s Tech blog

Recovering from a Bad Drive in a Greyhole storage pool

Monday 13 of February, 2017

I run an Amahi home server which hosts a number of web apps (inlcuding this blog) as well a a large pool of storage for my home.  Amahi uses greyhole (see here and here) to pool disparate disks into a single storage pool. Samba shares are then be added to the pool and greyhole handles distributing data across the pool to use up free space in a controlled manner.  Share data can be made redundant by choosing to make 1, 2 or max copies of the data (where max means a copy on every disk).


The benefit over, say, RAID 5 is that 1) different size disks may be used; 2) each disk has its own complete file system; 3) each file system is mounted (and can be unmounted) separately.


So right before the holidays, the 3TB disk on my server (paied with a 1 TB disk) started to go bad.  Reads were succeeeding but took a long time.  Eventually we could no longer watch video files we store on the server and watch through WDTV.  Here is how I went about recovering service and the data.



  • Bought a new 3TB drive and formatted it with ext4 and mounted it (using an external drive dock) and added it to the pool as Drive6.

  • Told greyhole it the old disk was going away

greyhole --going=/var/hda/files/drives/drive4/gh



  • This told greyhole to move the data off the drive as it was being removed.  This ran for several days and due to disk errors didn't accomplish much, so I killed the process and took a new tact.

  • Told greyhole the drive was gonegreyhole --gone=/var/hda/files/drives/drive4/gh


  • Ran safecopy to make a drive image of the old disk to a file on the new disk. (if you not used safecopy, check it out.  It will run different levels of data extraction, can be stopped and restarted using the same command and will resume where it left off.


 safecopy --stage1 /dev/sdd1 /var/hda/files/drives/Drive6/d1 -I /dev/null


This took about two weeks to accomplish due to drive errors.  And evetually I ran out of space on the new disk before it completed.



  • Bought a  4TB drive mounted (drive7) it the the dock; copied and deleted the drive image from the Drive6.


  • Marked the 1TB drive (drive5) as going (see command above) and gone. This moved any good data off the 1TB drive to drive7 but left plenty of room to complete the drive image.




  • Swapped drive5 (1TB) and drive7 (4TB) in the server chassis. Retired the 1TB drive.




  • Mounted the bad 3TB drive in the external dock and resumed the safecopy using 



safecopy --stage1 /dev/sdd1 /var/hda/files/drives/Drive7/d1 -I /dev/null



  • Mounted the drive image. The base OS for the server is Fedora 23. The drive tool inlcudes a menu item to mount a drive image.  It worked pretty simply to mount the image at /run/media/username/some GUID.




  • Used rsync to copy the data form the image to the data share.  I use a service script called mount_shares_locally as the preferred method for putting data into greyhole pool is by copying it to the samba share.  The one caveat here is that greyhole must stage the data while it copies it to the permanent location. That staging area is on the / partition under /var/hda.  I have about 300GB free on that partition so I had to monitor the copy and kill the rsync every couple hours. Fortunately, rsync handles this gracefully which is why I chose it over a straight copy.


rsync -av "/run/media/user/5685259e-b425-477b-9055-626364ac095e/gh/Video"  "/mnt/samba/"


 


A couple observations.  First, because of the way I had greyhole shares setup, I had safe copies of the critical data. All my docs, photos and music had a safe second copy. The data on the failed disk was disposable.  I undertook the whole process because I wanted to see if it would work and whatever I recovered would only be a plus.  


This took some time and a bit of finesse on my part to get the data back.  But I like how well greyhole performed and how having the independent filesystems gave me the option to recover data on my time. Finding safecopy simplified this a lot and added a new weapon to my recovery toolkit!.