Tuesday, July 17, 2012

Anti-pattern: Lock and Block

Poison.


Never hold a lock and then block waiting for I/O. Just do that one thing, and you are mostly out of the woods. Even if your "multi-thread" program ultimately serializes to the equivalent of one thread because of your synchronization choices, doing that one thing (not locking while blocking) should keep your application at above average awesome.

Threads.


So how does locking work?  Why is it done? What is a thread anyway?

So many questions for a neophyte C++ developer to ask.  A thread is an execution context (stack, program counter, thread local storage) within a process which shares the address space (so global variables, binary code, and text sections) with other threads in that process.  The kernel schedules each runnable thread in the program on the physical cores that the machines has.  A core?  They told you about how many of those you've got when you bought your computer -- I know it sounded esoteric at the time.  Each core is capable of moving a thread along through the code, changing the state of the program.  As time progresses, machines have more cores, and software is written to have more threads which make use of that added hardware/machinery.

As time progresses (downwards in the below diagram) two threads spend time on a core (shown in blue), time off the core (shown in green), and time waiting for a blocking system call to return:


Since threads progress through these phases of doing serious number crunching, waiting for system calls to return, mutexes/semaphores to be released by other threads and acquired by this thread, and just waiting its turn for time on a core, having more threads than cores makes sense.  Unless all the thread does is number crunching in local memory, it does not spend all of its time on a core.

Locking.


When threads want to share memory (say a structure like hash table or a vector or a queue), the textbook way of making that safe (it is unsafe since partial state changes on that memory -- when interleaved -- might transition the structure into a state that should be unreachable or invalid) is a lock of some sort (a mutex usually).  Read and write operations on structures shared by threads use locks to serialize -- allowing uninterrupted access for the duration -- the operations and keep things safe:



Poison.



Serialization costs something.  All this co-ordination has a bit of overhead, and it slows both the threads down as they hit the brakes more often to wait for access to the shared structures.  If the duration of the lock only has some very fast number crunching or local memory (cache) access, then things aren't so bad.  The time in the critical section is so short, the penalty is not onerous.  Things get epic-ly awful if there are blocking system calls happening in a thread while those locks are held:


This can quite terrible for threads waiting for their turn with the lock, since these threads collectively go to 0 percent, I/O ends up dictating the speed of progress.  This is not what you want.

Tasty.


In the OFlux run-time, blocking system calls are shimmed via interposition so that the main run-time mutex is not held for the duration of those system calls.  It avoids the "big faux pas" by design:


Other event-based systems do similar tricks to avoid the same problem.  Hopefully this post helps explain what the problem is with blocking whiling locking,  and how life is better (by design) using a run-time which side-steps the issue completely.

No comments:

Post a Comment