Where have all the servers gone?

Dell blade chassisIn the past few months a huge change has occurred with Thayer School servers… now, most of them are virtual. Poof!

While the change isn’t very visible to the community, for us Systems Administrators (Jared, Jordan, and Matt Dailey), the change is significant.

Virtualization allows us to run multiple “computers” on one physical computer. So we can easily run independent servers for web site, email, databases all on one real server.

The short story is that we now have over a dozen virtual servers in production which has allowed us to save space, lower our hardware expenses, reduce electrical and cooling usage, ease backup, and improve reliability.

Read on for more details of our move to virtualized servers.

Setup details

Virtualization Software

We briefly looked at User-Mode-Linux, Xen, and more recently KVM, but in the end decided to use VMware server (not ESX). At the time, VMWare had recently released the free server version. We found it’s setup and administration to be very straightforward, and VMWare’s technology was mature. We decided to move cautiously. First experimenting with the product, then moving a server that was non-critical. Once it proved stable, we converted more and more of our servers to virtual machines. We’ve now gone through several iterations of server hosts and come to a solution that is stable and flexible.

Space

Thanks to our new server room in MacLean, space wasn’t at huge premium. But is definitely nice to have your servers consolidated. The reduction in cables alone is huge.

Cost

Costs of server hardware can be dramatically reduced. Many servers sit idle all day. This means one physical server can run many virtual servers. We’ve decided to run no more than about 4 virtual machines per physical server. However, there are several other hidden costs which are reduced. Less physical servers means less cooling and less electricity consumption.

Backup

The flexibility of backups that virtualization provides is one of the biggest changes from the move. In the past, we’ve used many techniques to do full backups that, in the event a physical server blows up, would allow us to quickly restore the server from scratch onto new hardware. Tools like dd, partimage, udpcast all work fairly well. However, they require the machine be shutdown and automation can be tricky.

With the help of LVM and a few other basic utilities, we have successfully implemented a completely automated full server backup system. The backup procedure is as follows:

  • Pause the virtual server
  • Create an LVM snapshot
  • Resume the virtual server (steps 1-3 require about 3-15 seconds of downtime)
  • Copy the Virtual machine files to the backup server
  • tar and gzip the files for safe keeping and space savings

For us, 3-15 seconds of downtime during non-peak hours is acceptable, so we create these full backups for all virtual machines every night. Once the files are compressed, the disk usage per server is typically very small 500 – 1500 MB. We keep several nights worth of these backups in case the most recent backup has corruption.

On many of our Virtual Servers, we continue to do rsync backups of important directories from within the OS. Initially, we did this so that, in case the LVM snapshot method wasn’t robust, we’d have a fall back. However, it is sometimes handy to access just a single backup file (ie. a config file from /etc), so we’ve kept them around. Disk space is cheap, and you can never have too many backups.

Reliability

Instead of spending all your server budget on many quasi-fault-tolerant servers, with virtualization, you can buy fewer very fault-tolerant servers. We have a blade chassis with 10 server blades which we run our virtual servers on. The blade chassis has four power supplies, and each blade has two network interfaces (which we’ve bonded for failover), and two hard drives which are mirrored. The system is connected to different electrical circuits and two different switches. If one blade begins to fail, we can quickly copy the virtual machines to another host (this takes approximately 10 minutes per virtual machine, could be faster if we were using a SAN). If the entire blade chassis were to melt down, the Virtual Machines can be restored to any machine running VMWare server. With the release of VMWare Fusion for Mac OS X, we could even run our critical servers off our MacBook Pros :)

Testing

Whenever you do a server upgrade, there is a chance something will break. Virtualization offers us a couple nice safety nets. VMware supports snapshots of the VM at any given point, which can be restored. We can also restore to a LVM snapshot backup. There is also the option of creating a new development virtual machine, testing the change, then applying it to the production server. When you are done with the development server, you can simply remove it. All without touching any physical hardware.

The down sides

While virtualization has been very positive overall, there are a couple down sides to keep in mind.

The biggest issue is that all your virtual machines are dependent on the host server. If your host machine crashes, it means all virtual machines will be down. Early on, we ran into a serious LVM bug in the Ubuntu 6.06 stock kernel. Creating and removing snapshots caused kernel panics on the host. We made the decision to use a minimal Ubuntu 6.10 as the host OS and haven’t had issues since.

Kernel upgrades also require some forethought. VMware can be set up to automatically shutdown and start up the virtual machines when the host machine is rebooted. However, VMware kernel modules need to be rebuilt whenever the kernel version changes. It isn’t complicated, but it does require some planning ahead.

Virtual machine performance can be slower than real servers. Our servers are rarely under any significant load, so we haven’t run into this yet. However, it is something to keep in mind. File servers and database servers are probably left as real servers unless they are very lightly used.

Long time passing

From all the buzz in the IT world regarding virtualization, it is clearly here to stay. As management tools continue to improve, I’m sure we’ll be re-evaluating the offerings, especially from the open source community. Until then, we are extremely pleased with our implementation.

Post a Comment

Your email is never published nor shared. Required fields are marked *