How To Build 100 Terabytes with S$250,000

Author : James Tay
Date : 12th June 2006

  • First Phase : 14TB. Parts list.
  • Background

    This article discusses some of my ideas on building a 100TB storage solution for data consolidation. The primary design focus is on being low cost. Our secondary objectives are to attain a reasonable degree of performance and also fault tolerance. Since the solution must be low cost, we will straight away eliminate the major SAN vendors such as HP, EMC, Sun, IBM, etc. For that matter, we want to avoid SAN due to cost reasons. At present, SATA disks are the most cost effective, so we'll concentrate on these.

    The Issue Of Scalability

    Since our target is not just several TB, but 100 TB total, we have to take scalability seriously. Presently, a 500GB SATA disk cost about S$500. Next, we have to figure out how to hook up some 200+ of these drives. Without going through a lengthy comparison on RAID controllers, let us consider the 3ware 16-port SATA raid card, which retails for about S$1500. Now imagine having a server, configured with one of these cards and 16x 500GB disks. We create a single raid-5 volume with one hotspare, thus presenting a 7TB volume to the server OS. We call this server a Storage Node. Each storage node then presents a single 7TB volume as an iSCSI target. In fact, the role of the storage node is simply to be a 7TB iSCSI target and nothing more. If we have a total of 15 storage nodes, we would have a total of 105TB storage available.

    A Service Node is a server which consists of an iSCSI initiator(s) which talks to one or more storage nodes. Its role is to perform block level volume management (eg concatenation, growing a block device, snapshots, etc). This job is done by Linux's LVM using the storage nodes as individual physical volumes, joining as many as desired to form a volume group. Logical volumes are then created (grown or shrinked) from within a volume group. The service node then makes use of these logical volumes, and presents them to servers/hosts as the final storage device. This can be accomplished in several ways. For example, the service node may put a filesystem on it and share it via NFS or Samba. It can also turn the logical volume into an iSCSI target, presenting it to external iSCSI initiators. Finally a service node can (optionally) be connected to a tape drive for backup or restore. While many servers/hosts may obtain storage from a single service node, more service nodes may be purchased as required, and organised as per site requirements.

    In order to minimise latency between service nodes and storage nodes, we require a high performance GigabitEthernet switch. Since this will be a pure layer 2 switch, we avoid latencies introduced by layer 3 routing. VLAN'ing can also be implemented to control traffic if necessary. The following diagram illustrates the entire architecture.

    Price Breakdown

    Storage node : $11,480

    Service node : $4,560

    Total Price : $249,560


    Notes

  • Instructions on a minimal configuration for a linux iscsi target (iscsitarget-0.4.13).
  • Instructions on a minimal configuration for a linux iscsi initiator (open-iscsi-1.0-485).
  • Instructions on a minimal configuration for a solaris 10 iscsi initiator (Solaris 10 6/06).
  • Instructions on a minimal iscsi initiator on FreeBSD (tested on release 6.1).