Server admin day a success, again!

More space, and a drive that is not warning of impending doom! Here are some tips and tricks for updating the disk capacity of a server with minimal down time and even letting users know about it!

 

How to bring down a system, copy a failing drive, and increase the size of the system.

We have a system that has a lot of web traffic, and a lot of applications. Its an excellent computer from Dex, but because of years of heavy use one of the drives is failing. Every problem is an opportunity or whatever that saying is, so I took this opportunity to increase the disk size. I replaced the old, failing disk with a larger 2TB Black Caviar disk from WD. I like these, the others we’ve been using are pretty stable.
However, to accomplish this I needed to bring the system down, copy the old data, extend the size of the partition, and make sure everything boots back up. Here are the steps I took:
First, make sure your back up ran the night before. Have a valid, recent, upto date backup to make sure that you don’t loose anything, or if you do you can get it back. This is important! I use an rsync based approach to back everything up every night anyway.
Part 1. Setting up a replacement server.
We need to set up a temporary site that all of our users will be redirected to. This is pretty simple, but does require you need a spare machine somewhere on the net that you can use for interim work.
Create a web page that shows that your machines and services are down and tells users when they should be back up. Put that in a directory called something like /down/ on the temporary server.
Edit that servers apache config settings in /etc/apache2/sites-available/default and add the following lines:
RewriteEngine on
RewriteRule !/down/ /var/www/down/index.html [L]
Note that this will send all of your traffic that does not contain /down/ as its path to /var/www/down/index.html. There is a gotcha here, if you want to include images, css, or other things they need to be in a subdirectory /var/www/down/down/ so that they don’t get redirected to index.html!!
Then copy the settings in /etc/network/interfaces from your old machine to /etc/network/interfaces on your spare machine. I keep those settings there anyway, so that if my server goes down I can put up a warning notice from anywhere I am.
Shutdown the internet on your main server, and restart the internet on the replacement. I would do this by using service networking stop and service networking restart, respectively. Its is a good idea to make sure that your replacement server is up and running before you continue!!
Part 2. Copying the data.
You can now reboot the machine that you’re going to work on and bring it up on a rescue disk (used to be called recuse cd, but who uses cd’s anymore?). The Ubuntu installer contains a fine rescue disk, although it maybe a little overkill for this purposes. We only need two commands: tail and dd!
When I do this, I slip out all the drives so that the only thing available is the rescue usb. It also makes identifying the drives easier later on.
In the Ubuntu installer you want to get to the menu that allows you to drop into a shell. Typically I use a random walk approach for this, but I eventually get there. Now we need to find the locations of our source and destination drives.
The source drive is the one that carries your data.
The destination drive is the one that is empty and will have your data. Likely this is the one in the box!
In the shell use tail -f /var/log/syslog to bring up the system log, and insert the source drive. It will tell you the connection you have made, and you need to look for something like sdb or sdc (assuming a scsi harddrive). Write that down and write a big SOURCE next to it.
Once you are confident that you have that device identified plug in the destination drive, and look for its name. It will probably be one letter after your SOURCE drive (e.g. if your source was sdb your destination will be sdc, and so on).
It is imperative that you get these two the right way around or we will eradicate all your data.
Now we’re going to use dd (see this page: http://www.linuxquestions.org/linux/answers/Applications_GUI_Multimedia/How_To_Do_Eveything_With_DD for a great explanation of everything dd) to copy the source drive to the destination drive.
my command was:
dd if=/dev/sdb of=/dev/sdc conv=notrunc,noerror bs=4096
This will take a while (about 4 hours to copy a 1TB hard drive), and will either be a great success or you will loose all your data and work. So either way, sit back and enjoy some coffee.
Once the copy is finished its time to put the new hard drive in place, remove the old one, and reboot. This is where you keep fingers and toes crossed.
You should seemlessly boot into your newly repaired system.
Part 3. Extend the size of the hard drive
Not quite out of the woods yet, everything can still crash and burn. We’re going to extend the LVM to include all the new disk space we just put there. Start by editing the partition table for the new disk. After reboot mine was in /dev/sdc
fdisk -u /dev/sdc
look at the existing partition table, with “p”
delete a partition
create a new one. The key here is to be sure that the new partition you create starts at exactly the same point as the one you deleted. You can then extended it through the end of the disk.
Write the partition table to disk, and reboot the system.
Getting closer, we just need to resize the LVM. Most of this was taken from http://www.linuxquestions.org/questions/fedora-35/lvm-partition-resizing-666683/
Use pvdisplay to list all current volumes:
pvdisplay
and then pvresize to resize the changed volume to (by default) include all the available space:
pvresize /dev/sdc1
This will tell you how many volumes changed, and how many stayed the same. You can use pvdisplay again to verify that your changes took place.
Now we need to use vgdisplay to see how much free space there is.
There is a line that says something like this:
Free PE / Size 238466 / 980 MB
The 238466 number is the important one as that is our free space. We use lvdisplay to see which logical volume we want to add it to:
lvdisplay
mine is called /dev/robedwards/rootVolume, so now I can add that free space to this volume:
lvextend -l +238466 /dev/robedwards/rootVolume
The plus here is important as it says add this much more space, rather than this much space in total.
Use vgdisplay again to confirm that we have no more free space.
We’re really close now, we just need to format the disk to make it usable.
resize2fs /dev/robedwards.rootVolume
The newest versions of resize2fs (e.g. if you are using ext4) work in online mode – you don’t need to unmount the partition to make this work!
That’s it. Once resize2fs is done, reboot the system, bring your ethernet back up and take down your temporary warning page. You should be good to go!
My notes on how to bring down a system, copy a failing drive, and increase the size of the system, mainly because I expect I wil have to do this again!

We have a system that has a lot of web traffic, and a lot of applications. Its an excellent computer from Dex, but because of years of heavy use one of the drives is failing. Every problem is an opportunity or whatever that saying is, so I took this opportunity to increase the disk size. I replaced the old, failing disk with a larger 2TB Black Caviar disk from WD. I like these, the others we’ve been using are pretty stable.

However, to accomplish this I needed to bring the system down, copy the old data, extend the size of the partition, and make sure everything boots back up. Here are the steps I took:

First, make sure your back up ran the night before. Have a valid, recent, upto date backup to make sure that you don’t loose anything, or if you do you can get it back. This is important! I use an rsync based approach to back everything up every night anyway.

Part 1. Setting up a replacement server.


We need to set up a temporary site that all of our users will be redirected to. This is pretty simple, but does require you need a spare machine somewhere on the net that you can use for interim work.

Create a web page that shows that your machines and services are down and tells users when they should be back up. Put that in a directory called something like /down/ on the temporary server.

Edit that servers apache config settings in /etc/apache2/sites-available/default and add the following lines:

	RewriteEngine on
RewriteRule !/down/ /var/www/down/index.html [L]

Note that this will send all of your traffic that does not contain /down/ as its path to /var/www/down/index.html. There is a gotcha here, if you want to include images, css, or other things they need to be in a subdirectory /var/www/down/down/ so that they don’t get redirected to index.html!!

Then copy the settings in /etc/network/interfaces from your old machine to /etc/network/interfaces on your spare machine. I keep those settings there anyway, so that if my server goes down I can put up a warning notice from anywhere I am.

Shutdown the internet on your main server, and restart the internet on the replacement. I would do this by using service networking stop and service networking restart, respectively. Its is a good idea to make sure that your replacement server is up and running before you continue!!

Part 2. Copying the data.


You can now reboot the machine that you’re going to work on and bring it up on a rescue disk (used to be called recuse cd, but who uses cd’s anymore?). The Ubuntu installer contains a fine rescue disk, although it maybe a little overkill for this purposes. We only need two commands: tail and dd!

When I do this, I slip out all the drives so that the only thing available is the rescue usb. It also makes identifying the drives easier later on.

In the Ubuntu installer you want to get to the menu that allows you to drop into a shell. Typically I use a random walk approach for this, but I eventually get there. Now we need to find the locations of our source and destination drives.

The source drive is the one that carries your data.

The destination drive is the one that is empty and will have your data. Likely this is the one in the box!

In the shell use
tail -f /var/log/syslog
to bring up the system log, and insert the source drive. It will tell you the connection you have made, and you need to look for something like sdb or sdc (assuming a scsi harddrive). Write that down and write a big SOURCE next to it.

Once you are confident that you have that device identified plug in the destination drive, and look for its name. It will probably be one letter after your SOURCE drive (e.g. if your source was sdb your destination will be sdc, and so on).

It is imperative that you get these two the right way around or we will eradicate all your data.

Now we’re going to use dd (see this great explanation of everything dd) to copy the source drive to the destination drive.

my command was:

	dd if=/dev/sdb of=/dev/sdc conv=notrunc,noerror bs=4096

This will take a while (about 4 hours to copy a 1TB hard drive), and will either be a great success or you will loose all your data and work. So either way, sit back and enjoy some coffee.

Once the copy is finished its time to put the new hard drive in place, remove the old one, and reboot. This is where you keep fingers and toes crossed.

You should seemlessly boot into your newly repaired system.

Part 3. Extend the size of the hard drive


Not quite out of the woods yet, everything can still crash and burn. We’re going to extend the LVM to include all the new disk space we just put there. Start by editing the partition table for the new disk. After reboot mine was in /dev/sdc

	fdisk -u /dev/sdc
look at the existing partition table, with "p"
delete a partition
create a new one. The key here is to be sure that the new partition you create starts at exactly the same point as the one you deleted. You can then extended it through the end of the disk.
Write the partition table to disk, and reboot the system.

Getting closer, we just need to resize the LVM. Most of this was taken from linux questions

Use pvdisplay to list all current volumes:

	pvdisplay

and then pvresize to resize the changed volume to (by default) include all the available space:

	pvresize /dev/sdc1

This will tell you how many volumes changed, and how many stayed the same. You can use pvdisplay again to verify that your changes took place.


Now we need to use vgdisplay to see how much free space there is.

There is a line that says something like this:
	Free PE / Size 238466 / 980 MB

The 238466 number is the important one as that is our free space. We use lvdisplay to see which logical volume we want to add it to:

	lvdisplay

mine is called /dev/robedwards/rootVolume, so now I can add that free space to this volume:

	lvextend -l +238466 /dev/robedwards/rootVolume

The plus here is important as it says add this much more space, rather than this much space in total.

Use vgdisplay again to confirm that we have no more free space.

We’re really close now, we just need to format the disk to make it usable.

	resize2fs /dev/robedwards.rootVolume

The newest versions of resize2fs (e.g. if you are using ext4) work in online mode – you don’t need to unmount the partition to make this work! It will take an hour or so, but you can access the machine and see what it is doing while this is working.

That’s it. Once resize2fs is done, reboot the system, bring your ethernet back up and take down your temporary warning page. You should be good to go!