Friday, July 13, 2012

Enumeration in an MT world

Poison.



Learning new programming languages is a healthy thing to do as a programmer.  It keeps your mind vital and your skills spry.  Its also a pretty easy thing to do the more you do it, since metaphors are often copied wholesale from one language to another.  For instance, both C and Java can be credibly described as descendants of Algol 68 an early imperative programming language.  They both do loops and conditionals in much the same way and the semantics of the programs written in those languages is mostly understood operationally.

Although it might be difficult, exposure to other families of programming languages like functional (e.g. OCaml) or declarative (e.g. prolog) can make modern everyday skills easier to grasp (Javascript or gnu-make).  Without good cross-pollination, old metaphors which are no longer appropriate out stay their welcome.  My main example of this is C pointers.  They are an ancient concept (as old as C is), but I feel they should have stayed where they were.

Long ago, C programs were mostly single threaded, and locations in memory were tracked with pointers which were de-referenced via * or -> primitive operators.  These were great ideas in that context.  C structs have locations, and those locations (pointers) are useful handles for referring to them.  Consider the following code to iterate through an array:


  static const char * names[] =
    { "mark"
    , "paul"
    , "kevin"
    , 0 };
  for( const char ** n_ptr = &(names[0])
      ; *n_ptr != 0; ++n_ptr) {
    printf("hi %s\n",*n_ptr);
  }

Its pretty low-level, but you can see how it is that n_ptr is used to traverse the list until the last element a 0 pointer is encountered (and not printed), at which point we stop iterating.  My claim is that this pattern of using a pointer for enumerating elements is not such a great idea in a multi-threaded context.  STL tries to do just that with containers (and thats the problem) in C++.  Its an idea that should not have been copied.

Tasty.


Java did it much better by using Enumeration/Iterator interfaces, which I will explain now using C++.  Suppose you have a container or class that will give you (via a member method) a "list" of T objects:

  template< typename AnyT >
  struct Enumerator {
    virtual bool next(const AnyT * &) = 0;
    virtual ~Enumerator() {}
  };

  // now implement it
  Enumerator<t> * 
  SomeCollection::someMethodGivesList()
  { 
    // build/return Enumerator<t> subclass
    ...
  }

All of this setup gives us a single thing Enumerator<T> * with a simple type which can be drained of its T items easily, and its subclass implementation can be buried inside of SomeCollection.cpp. This has a side benefit of not polluting the compilation and link steps with extra symbols (bloating header files for instance).

  // now use that implementation 
  SomeCollection sc;
  const T * tptr = 0;
  Enumerator<t> * enumptr = sc.someMethodGivesList()
  while(enumptr && enumptr->next(tptr)) {
    ... // now use tptr
  }

I am a huge fan of how functional languages deal with this by using either fold or map functions, and we could emulate this somewhat with C function pointers (implementing a visitor call-back argument).

  void
  SomeCollection::someMethodVisitsList(
     void * fp
   , void (*oneach_cb)(void *, const T *))
  {
    // ok traverse the internal list here
    // calling (*oneach_cb)(fp,tptr) all the way
  }

This is a bit of a "poor man's" solution compared to what a real functional language can do with its parametric polymorphism capabilities. The opportunity to write a traversal function once is not really there, but could be part-way done with templates.

Poison.


The real problem with the STL using pointers as the basis for enumeration is that the container's  iterators are not safely managed:

  for(std::vector::iterator itr = vContainer.begin()
      ; itr != v.end(); ++itr) {
    ...
  }

For starters the end() could be a moving target if another thread updates the vContainer underneath us at an unfortunate time. The job of synchronization was just never considered in with single threaded C and its pointers. Another issue is that iterators from different STL containers are not really interchangable, and passing a pair of iterators as a result to something is a bit painful.  The Enumerator<AnyT> * is easily held in a smart pointer (perhaps a shared pointer) so it hops a couple fences there as well.

Summary.


Of course there are details to make either the Enumerator<AnyT> * solution or the visitor call-back solution work safely for multi-threaded programs -- but narrower interface these idioms have make the job easier.

I know that there is an MT-safe library called TBB which impletements STL-like containers, but it sounds hard to me -- and I am getting tired for forgetting to increment my "pointers".

No comments:

Post a Comment