Poison, Poison, Tasty Fish!: LVS balancing clustered REST service

Tasty.

Node.js (a single threaded event loop) and LVS (Linux virtual server, a freely available layer 4 IP routing mechanism that is part of the Linux kernel) seem to be a natural fit. The main advantage of node.js is that you can quickly write your server side code in javascript or coffeescript and get it up and running with in V8 javascript engine on top of node.js. My main aim is to try out LVS and see how it does its job on my cluster:

The 4 physical machines on my 1Gbe commodity switch are commodity machines running Debian Wheezy. Test is a Dell with an E8400 dual core chip. Master is a i5 Dell, and its RealTek 1Gbe chip is doing all the routing work over its VIP (192.168.1.100 for my case). I installed ldirectord onto the cluster by simply installing it on the Master (who's binaries are accessible to the nodes via NFS):

 # sudo apt-get install ldirectord

It is configured as follows (in /etc/ha.d/ldirectord.cf) :

checktimeout = 5
negotiatetimeout = 30
checkinterval = 10
failurecount = 1
autoreload = no
logfile = "local0"
fork = yes
quiescent = no
cleanstop = no
virtual=192.168.1.100:3000
    real=192.168.1.10:3000 gate
    real=192.168.1.11:3000 gate
    service=http
    request="hello"
    receive="Up"
    scheduler=lc
    protocol=tcp
    checktype=negotiate

I have removed the comments from the above file to make it shorter. The LVS load balancing scheduler choice was lc which stands for "least connections". There are about a dozen alternatives which are interesting.

Next, I used Pacemaker+Corosync to manage the VIPs:

 # crm configure edit

To make our REST service run on the Node1 and Node2 machines (pacemaker is managing the Master, Node1, and Node2 as a single cluster), we add the following lines:

 
 primitive Rest ocf:heartbeat:anything \
        params binfile="/opt/scripts/rest.sh" \
        user="rapt01" logfile="/tmp/rest.out" \
        errlogfile="/tmp/rest.err" \
        op start interval="0" timeout="20" \
        op stop interval="0" timeout="30" \
        op monitor interval="20"
 clone Rest-clone Rest \
        meta ordered="false" clone-max="2" \
        target-role="Started"
 location Rest-loc Rest-clone -inf: Master

To configure VIPs and ldirectord under pacemaker we add the lines:

 primitive RestVip ocf:heartbeat:IPaddr2 \
   op monitor interval="60" timeout="20" \
   params ip="192.168.1.100" lvs_support="true"
 primitive RestVip-lo ocf:heartbeat:IPaddr2 \
   op monitor interval="60" timeout="20" \
   params ip="192.168.1.100" nic="lo" cidr_netmask="32"
 primitive ldirectord ocf:heartbeat:ldirectord \
   op monitor interval="20" timeout="10"
 clone RestVip-lo-clone RestVip-lo \
        meta interleave="true" clone-max="2" target-role="Started"
 colocation rest-vip-coloc inf: ldirectord RestVip
 colocation rest-vip-lo-coloc -inf: RestVip RestVip-lo-clone

Finally I also tweaked the arp settings which I believe steers clear of the "arp problem":

Adding NAT forwarding to Master's /etc/sysctl.conf and running sysctl -p:

  net.ipv4.ip_forward=1
  net.ipv6.conf.all.forwarding=1

And on each of Node1/Node2 a similar edit to /etc/sysctl.conf (with reload) for arp:

  net.ipv4.conf.all.arp_ignore=1
  net.ipv4.conf.all.arp_announce=2

Testing.

Using Apache bench:

 # ab -c 1000 -n 10000 http://192.168.1.100:3000/something

Nice output like this is observed from our Test machine:

This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 192.168.1.100 (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        
Server Hostname:        192.168.1.100
Server Port:            3000

Document Path:          /something
Document Length:        167 bytes

Concurrency Level:      1000
Time taken for tests:   1.648 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      3330000 bytes
HTML transferred:       1670000 bytes
Requests per second:    6068.91 [#/sec] (mean)
Time per request:       164.774 [ms] (mean)
Time per request:       0.165 [ms] (mean, across all concurrent requests)
Transfer rate:          1973.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   51 216.3      0    1004
Processing:     1   45  35.2     41     658
Waiting:        0   45  35.2     41     658
Total:          1   96 223.8     43    1270

Percentage of the requests served within a certain time (ms)
  50%     43
  66%     56
  75%     64
  80%     69
  90%    110
  95%    367
  98%   1055
  99%   1066
 100%   1270 (longest request)

Results.

Using Apache Bench with 1000 concurrent connections and 10000 requests, I obtained some data indicating how well LVS performs. My node.js code is a bit disappointing I think since it is not really handling what I think of as high load. I should spare node.js until I have a chance to learn it better and the techniques used to make it scale (other than LVS).

address/keepalive on?	Requests/sec avg.	mean Latency (ms)	95th percentile latency (ms)
VIP/no	6068.91	164.774	367
Node1/no	3393.52	294.679	1040
VIP/yes	8639.80	115.743	126
Node1/yes	4447.24	224.858	225

So latency is roughly halved, and throughput is roughly doubled when we load balance to both Node1 and Node2 (versus just using Node1). The HTTP keep-alive does improve throughput as expected. My only remaining uneasiness is that node.js, though easy to code, is not performing as well as I had hoped in terms of both latency and throughput. I may look into something a bit lower level.

Poison, Poison, Tasty Fish!

Thursday, July 19, 2012

LVS balancing clustered REST service

Tasty.

Testing.

Results.

No comments:

Post a Comment