Tuesday, July 10, 2012

Nagle's Algorithm

Poison.


Every now and then a little history rears up and teaches us something.  Nagle's algorithm is a legacy part of TCP which was used to economize TCP headers by holding up tiny quantities of packet payload in order to wait a bit to see if more data is following soon (if it isn't then the tiny packet gets sent out after a delay).  This is insidious for an application that distributes real-time market data (prices of things).  Holding up a single price because it is not travelling with a pack of other prices, is bad since it degrades (in a non-uniform way) the market data stream.

I discovered the effects of Nagle's algorithm (and how to disable it with setsockopt and TCP_NODELAY with the berkeley sockets interface) when dealing with our FIX rates service.  A trans-pacific connection showed poor transmission quality at the time, and it was a bit of a mystery why so many of the prices were taking so long to get out.


The graph shows the maximum, average and minimum for a single hour on the trans-pacific link before Nagle's algorithm was disabled on this link.  The distrubing thing is that every 10 second period had a wide spread between the minimum time for a price to transit and the maximum.  This indicated that there was something at play.  A one way trip between the eastern seaboard of the United States to Tokyo is in effect a bit less than 100msec (Akamai keeps stats which indicate they can do that sort of latency).

Cure.


The C code to turn Nagle's algorithm off is pretty simple. One important catch is that it must be done on both sides of the connection (or at least on any side which writes small packets).
  int on = 1;
  int nodelay_ret = setsockopt( fd, IPPROTO_TCP, 
        TCP_NODELAY, (void *)&on, 4);
Its also pretty easy to detect dynamically using rudimentary tracing tools on Linux:
  strace -f -e trace=setsockopt -p `pidof `
Or on Solaris:
  truss -f -tsetsockopt -p `pgrep -x `
While running these traces, you need to program to accept somehow (establish a connection) so that it has a chance to show you how smartly it is setting this esoteric option.


The pay off is huge.  Min and max are very close together once this issue is fixed (and if there are many hops, then the fix needs to get into several places).  It was cathartic in the end to discover that a fairly recent Java library for FIX called QuickFIX had properly disabled Nagle's algorithm, and that our custom ("faster") C code needed fixing.

No comments:

Post a Comment