Cheap storage server project – part three

If you are just joining us, read part one and part two in the exciting series.

So after waiting over two years to dive into the cheap storage arena, we finally decided to go with a pretty boring solution. In this case, we hope boring will mean that it is a stable solution and easy to replace once our dream system comes along.

We chose to serve files with CIFS and NFS from a single server running Linux. Here are the details:

Aberdeen 48 bay storage server

Aberdeen 48 bay storage server

We purchased a 48 bay (with no drives) file server from Aberdeen. Aberdeen is small compared to Dell, Sun, HP, IBM and the rest. However, they are big enough, and have been around long enough that we have some confidence that they’ll be there to support us throughout the life of the system (which comes with a 5 year warranty). Their prices are very good, they use standard, off-the-shelf components, and you don’t have to buy the server fully populated with disks.

We compared Aberdeen to several other similarly sized server sellers (Pogo Linux, Server Direct, Thinkmate, etc.) Pricing was almost identical, but Aberdeen was the only vendor to include a standard 5 year warranty.

We also looked at the price of building the system ourselves. Using all the same components as the Stirling X888, the price (before shipping) came in $2,274.95 (19%) lower. Putting systems together can be time consuming. We may consider it for the backup system, but for our primary, it is good to know that if anything goes wrong, we can call one company and get help.

We’re going to start out with twelve 2 TB “enterprise” SATA drives in a hardware RAID 6 with one hot spare. That allows us to lose two drives per 12 drive RAID group without losing data. And the hot spare will allow use to start rebuilding immediately.

The server will be running Ubuntu with an XFS filesystem. We’re a happy Ubuntu shop so this decision was easy. We’d prefer to use Hardy, the latest Long Term Support release, but due to some network driver issues, we’ll use Jaunty. XFS is mature, deals well with large filesystems, supports advanced ACLs, and has project/directory based quotas.

There will be no snapshots (LVM snapshots cause abysmal performance). However, we will back up all the data (probably via rsync) to a nearly identical system located off site. The RAID should protect us against hard drive failure, and the offsite backup should protect us against accidental deletion and server room disaster (fire, flood).

In the case of a disaster, the hope is that the near identical backup system can become the primary server… though we’ve yet to deal with the nitty gritty details of this.

Expandability – We’d love a system like clustered Samba or AFS where we can just add additional storage servers as our needs grow at unpredictable rates. Our way around it? Buy a server with lots of empty bays so we can populate it as we need the space. This system has a 96 TB raw capacity today. As drive capacity grows, that will go even higher. The system can also use additional Direct Attached Storage. There is a risk that we’ll have several clients hitting the single server simultaneously and performance will suffer. This is an unknown that we hope to test, and hope to not run into in practice.

Samba/winbind and the NFS kernel server integrated with our Active Directory environment will be the heart of the file serving. We have experience with both, but not at this scale. The hope is that because these are widely deployed, we won’t run into any major issues. At the moment we don’t intend to offer SSH/SFTP or AFP.

High Availability Failover, or the lack thereof

Jordan and Dailey spent a lot of time investigating and testing High Availability techniques (heartbeat, STONITH, DRDB, etc). We could have chosen to have two head ends, or two fully redundant servers, and if one were to fail we would automatically promote the backup to the primary role. They got things working, but in the end, the failover technology made the system more complicated. Basically both servers have to share identities (IPs, kerberos keytabs, etc.) Then you have to set up reliable monitoring of system health, and kill the failed server, while promoting the backup. There are a lot of moving parts, and you need to be very sure only one server is in charge. This is one area where NetApp has earned its money.

If our NetApp is down, the productivity of Thayer School faculty, staff, and students takes a serious nose dive. So it is easy to justify an active failover head. With our cheap storage system however, the hope is that in the event of a server kernel panic our researches can still be productive with other things while we bring the system back up. Of course, the plan with a simple system is that failures such as kernel panics are very rare.

Further down the road, hopefully I’ll update this series of posts with our experience actually receiving the system and getting it set up.

Comments 1

  1. Small Business Serve wrote:

    Thanks for this informative post, we’re about to look in to this and will probably follow your lead.

    Posted 17 Oct 2009 at 4:35 pm

Trackbacks & Pingbacks 1

  1. From Computing@Thayer - Cheap storage Server – Part two:: Thayer School of Engineering at Dartmouth on 27 Aug 2009 at 10:42 pm

    [...] Read part three… what we actually chose to go with. (warning, it is pretty anticlimactic.) [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *