Friday, July 6, 2012

Cluster Sitsfleisch

Tasty?

My adventure in selecting components and building the 2 node cluster I talked about earlier, needs some fleshing out.  In early 2012, I read How much linux cluster for 6k and was somewhat inspired -- at least by the economy of it.  For aesthetic inspiration I found Helmer (Swedish for cluster?) or the details of justacluster and its use as a render farm for Sintel just really interesting.

Some choices which I made, that may be a matter of taste to some:

  • Intel 1Gbe NIC interfaces on the compute node motherboards since they seemed better than the RealTek equivalents (anecdotally) for features and consistency
  • Nodes should be thin:  this means they do not rely on local bootable hard disks.  The main advantage of this is avoiding "state" problems (inconsistencies in boot state).  Its bad enough that each node has its own BIOS!
  • NFS is good enough to serve software binaries to the compute nodes
  • PXE booting for compute nodes
  • good central control (issuing commands to all machines)
  • Want low latency networking and very low clock slew (PTP is a requirement) so that events happening in the cluster can be properly ordered and measured for our application.  This is probably a key reason why cloud solutions are not so great --  internal jitter hurts a trading platform.

Please bear in mind that prices I quote are from early 2012 and are in Canadian dollars.

For racking I had in mind (as more compute nodes were purchased) a utility shelving rack of the kind you buy on sale and use for light stuff in your garage.  Pricing for something that could hold 15 uATX motherboards would probably run 25$ to 45$ (depending on luck with sales).

Switch choices


Commodity Gigabit switches:


SwitchSpecCost($)
ManufModelCapacityPortsExtras 
D-LinkDGS1210-4896Gbps484 SFP, DHCP552.00
TRENDnetTEG-448WS96Gbps484 SFP expansion490.00
NetgearGS748TNA40Gbps484 SFP, DHCP464.00
CiscoSR2024CT-NA 242 SFP323.91
D-LinkDGS-1100-24 24 247.38
HP1410-24G48Gbps242 SFP318.06
NetgearGS724T48Gbps242 SFP145.00

Trade-off between immediate cost and expandability.
Cables:
  • 7' CAT6 patch cable costs 2.10$.
  • 2' Universal power cable 3.51$.
  • 6 outlet power surge protector 4.97$


Node Choices



Compute power, number of cores, and Watts:

ChipSpecsCost ($)
ManufModelCoresGhzGkBnchL2/3TDPmaxGBChipMBMem/GCoolPwr
Inteli7-3930K63.221.51213064655.25397.769.50103.74100
Inteli7-2600K43.412.169532359.34102.354.8730.6640
Inteli5-2500K43.39.369532239.88102.354.8730.6640
AMDFX-812083.18.77812532233.67107.004.8730.6650
AMDX6-1100T63.38.26612532194.99107.004.8730.6650
AMDX6-1055T62.87.3969532149.99107.004.8730.6640
IntelAtom D42521.81.290.510489.99incl6.50incl25

All motherboards (MB) are GBLAN, and accept DDR3 RAM (1333-SODIMM).
GkBnch is the Geekbench benchmark score divided by 1000.

I retrieved this data from various online sources and did not measure it myself, but it seemed to all fit with my pre-conceptions: AMD had less IPC and more power but cost less than Intel (at the time).

Node Summary



How many (fractional) nodes would you get for 6000$ (6K)? How much of everything is there?

Node typeCoresMMemCMemGkBCostLoad Pwr6KNodes6KCores6KMMem6KGkB6KPwr
i7-3930K6321221.51567.190.228kW3.822.8121.681.70.87kW
i7-2600K416612.1616.710.164kW9.738.8155.2117.41.59kW
i5-2500K41669.3497.250.152kW10.642.4169.698.61.61kW
FX-812081688.77505.690.223kW12.196.8193.6106.12.70kW
X6-1100T61668.26467.010.229kW12.876.8204.8105.72.93kW
X6-1055T61667.39412.010.222kW14.687.6233.6107.93.24kW
Atom D425240.51.29147.430.036kW40.781.4162.852.51.47kW

At 14 cents per kWh, the electrical running cost for 1.0kW cluster would be 1226$ for a year.  Not bad.
Several 15 Amp circuits would be needed for anything with more than 6 nodes, and HVAC needs to be considered where ever it sits.

Comparison to AWS


Ok, the comparison may not be fair in some ways since EC2 is not designed to be a cheap commodity low-latency rig (one wonders what the clock slew is like?).  But people who care about acronyms like EBDA and TCO want to know where the break even is on full duty cycle:

Using the numbers I found at AWSTests, I compared a job we do at my work with: 400 m1.large instances which provide 7.5G RAM, 4 ECU, 3k Geekbench score.  Each of these on-demand instances cost 0.34$ per hour at the time.

From the cluster a single i7-2600k compute node provides (relative to the m1.large instance): 4 times the CPU (ECU or Geekbench) and twice the RAM at 0.04$ for each hour of power.  The cost outlay for the node is 685$.  With these assumptions, running at full tilt the cluster reaches cost parity with AWS after 98 days (but remember - you still own a cluster after running it for 98 days).

Summary


So with a 600$ master computer (commodity PC serving NFS PXE bootable partitions), the whole shebang (with 2 thin compute nodes) costs less than 3000$ Canadian assuming you go with the i7-2600k CPUs for the nodes.  This is what happened for us.  I will try to give more (tasty) detail on the software we run on the master and nodes in  a future post.  Certainly some poison (trial with error) was encountered in that process.  Stay tuned!

1 comment:

  1. https://nysetechnologies.nyx.com/en/hosted-solutions/community-platform

    Aha, a cloud solution which caters to capital markets and their low latency needs.

    ReplyDelete