Tasty.
Node.js (a single threaded event loop) and LVS (Linux virtual server, a freely available layer 4 IP routing mechanism that is part of the Linux kernel) seem to be a natural fit. The main advantage of node.js is that you can quickly write your server side code in javascript or coffeescript and get it up and running with in V8 javascript engine on top of node.js. My main aim is to try out LVS and see how it does its job on my cluster:
The 4 physical machines on my 1Gbe commodity switch are commodity machines running Debian Wheezy. Test is a Dell with an E8400 dual core chip. Master is a i5 Dell, and its RealTek 1Gbe chip is doing all the routing work over its VIP (192.168.1.100 for my case). I installed ldirectord onto the cluster by simply installing it on the Master (who's binaries are accessible to the nodes via NFS):
# sudo apt-get install ldirectord
It is configured as follows (in /etc/ha.d/ldirectord.cf) :
checktimeout = 5 negotiatetimeout = 30 checkinterval = 10 failurecount = 1 autoreload = no logfile = "local0" fork = yes quiescent = no cleanstop = no virtual=192.168.1.100:3000 real=192.168.1.10:3000 gate real=192.168.1.11:3000 gate service=http request="hello" receive="Up" scheduler=lc protocol=tcp checktype=negotiate
I have removed the comments from the above file to make it shorter. The LVS load balancing scheduler choice was lc which stands for "least connections". There are about a dozen alternatives which are interesting.
Next, I used Pacemaker+Corosync to manage the VIPs:
# crm configure edit
To make our REST service run on the Node1 and Node2 machines (pacemaker is managing the Master, Node1, and Node2 as a single cluster), we add the following lines:
primitive Rest ocf:heartbeat:anything \ params binfile="/opt/scripts/rest.sh" \ user="rapt01" logfile="/tmp/rest.out" \ errlogfile="/tmp/rest.err" \ op start interval="0" timeout="20" \ op stop interval="0" timeout="30" \ op monitor interval="20" clone Rest-clone Rest \ meta ordered="false" clone-max="2" \ target-role="Started" location Rest-loc Rest-clone -inf: Master
To configure VIPs and ldirectord under pacemaker we add the lines:
primitive RestVip ocf:heartbeat:IPaddr2 \ op monitor interval="60" timeout="20" \ params ip="192.168.1.100" lvs_support="true" primitive RestVip-lo ocf:heartbeat:IPaddr2 \ op monitor interval="60" timeout="20" \ params ip="192.168.1.100" nic="lo" cidr_netmask="32" primitive ldirectord ocf:heartbeat:ldirectord \ op monitor interval="20" timeout="10" clone RestVip-lo-clone RestVip-lo \ meta interleave="true" clone-max="2" target-role="Started" colocation rest-vip-coloc inf: ldirectord RestVip colocation rest-vip-lo-coloc -inf: RestVip RestVip-lo-clone
Finally I also tweaked the arp settings which I believe steers clear of the "arp problem":
Adding NAT forwarding to Master's /etc/sysctl.conf and running sysctl -p:
net.ipv4.ip_forward=1 net.ipv6.conf.all.forwarding=1
And on each of Node1/Node2 a similar edit to /etc/sysctl.conf (with reload) for arp:
net.ipv4.conf.all.arp_ignore=1 net.ipv4.conf.all.arp_announce=2
Testing.
Using Apache bench:
# ab -c 1000 -n 10000 http://192.168.1.100:3000/something
Nice output like this is observed from our Test machine:
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 192.168.1.100 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests
Server Software:
Server Hostname: 192.168.1.100
Server Port: 3000
Document Path: /something
Document Length: 167 bytes
Concurrency Level: 1000
Time taken for tests: 1.648 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Total transferred: 3330000 bytes
HTML transferred: 1670000 bytes
Requests per second: 6068.91 [#/sec] (mean)
Time per request: 164.774 [ms] (mean)
Time per request: 0.165 [ms] (mean, across all concurrent requests)
Transfer rate: 1973.58 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 51 216.3 0 1004
Processing: 1 45 35.2 41 658
Waiting: 0 45 35.2 41 658
Total: 1 96 223.8 43 1270
Percentage of the requests served within a certain time (ms)
50% 43
66% 56
75% 64
80% 69
90% 110
95% 367
98% 1055
99% 1066
100% 1270 (longest request)
Results.
Using Apache Bench with 1000 concurrent connections and 10000 requests, I obtained some data indicating how well LVS performs. My node.js code is a bit disappointing I think since it is not really handling what I think of as high load. I should spare node.js until I have a chance to learn it better and the techniques used to make it scale (other than LVS).
address/keepalive on? | Requests/sec avg. | mean Latency (ms) | 95th percentile latency (ms) |
VIP/no | 6068.91 | 164.774 | 367 |
Node1/no | 3393.52 | 294.679 | 1040 |
VIP/yes | 8639.80 | 115.743 | 126 |
Node1/yes | 4447.24 | 224.858 | 225 |
So latency is roughly halved, and throughput is roughly doubled when we load balance to both Node1 and Node2 (versus just using Node1). The HTTP keep-alive does improve throughput as expected. My only remaining uneasiness is that node.js, though easy to code, is not performing as well as I had hoped in terms of both latency and throughput. I may look into something a bit lower level.
No comments:
Post a Comment