Recovering a bricked Centos 7 instance on Amazon AWS

If you work with AWS long enough, you're bound to do it.  You make a configuration change, it's committed to the file system and you manage to lock yourself out.  Unlike running your own servers, you can't walk up to the console or even get a console through your hypervisor management console.  So, what can you do?


This article from Amazon talks about a way to mount the volume on to a debug machine.  In theory it's great, but when executing it, I recently ran into an issue.  If the troubled volume came from a Marketplace image, you can't mount it dynamically.  You have to bring down the debug machine, attach it and bring up the debug machine.  And there's the rub ...

When doing that recently, the once functional debug machine hung on boot, too.  I believe the issue is that when booting, it scans the disks and boots or mounts the incorrect disk for the root volume (/).

Fortunately, the fix is fairly easy.  When you build your debug box, the initial /etc/fstab looks like this:

#
LABEL=/     /           ext4    defaults,noatime  1   1
tmpfs       /dev/shm    tmpfs   defaults        0   0
devpts      /dev/pts    devpts  gid=5,mode=620  0   0
sysfs       /sys        sysfs   defaults        0   0
proc        /proc       proc    defaults        0   0
The problem I believe is we have two volumes attached with a label of "/"

Fortunately, the fix was easy with the block UUID.  Before you mount the errant volume, run the following to get your block ID.

blkid /dev/sda1

Then, modify /etc/fstab to look like this:

#
UUID=e57ff077-2c96-4a37-bd16-1722932e6126     /           ext4    defaults,noatime  1   1
tmpfs       /dev/shm    tmpfs   defaults        0   0
devpts      /dev/pts    devpts  gid=5,mode=620  0   0
sysfs       /sys        sysfs   defaults        0   0
proc        /proc       proc    defaults        0   0
This forces a scan of the drives during boot and matches the correct volume by its UUID vs. its duplicate label.

Now, bring it down, attach the errant volume to the debug instance and bring up the debug instance.  You can then mount it and modify it.  When done, unmount the errant volume, reattach it to the correct instance and boot.

That's it.

Tags: