Poison, Poison, Tasty Fish!: functional-programming

Showing posts with label functional-programming. Show all posts

Friday, August 17, 2012

Memories Shared Over Coffee while RESTing on Node.js

A shared memory library (written in C++) needs to be exposed via REST as a read-only part of an API. I accomplished this with a brew of Node.js, expressjs, Coffeescript, and some careful examination of the C++ add-on documentation. Although I am admittedly pretty new to the javascript side of this code (which is the focus of this post), I found that all of those parts were very well documented on the web. The C++ stuff is old hat for me at least, and I can describe that another time (comment if you are interested).

Shared Memory API

The data that I am drawing out of shared memory is simple fixed sized records for things called instruments. Users of this REST API will simply ask for these records with an HTTP GET request to the proper URI whenever they need the most up to date information. The C++ library that I have written exposes some very simple functions and data structures in its header file to allow this:

int
get_pair_id_from_symbol(const char * symbol);

struct InstrumentInfo {
  char display_name[100];
  int pip_location;
  int extra_precision;
  double pip;
};

InstrumentInfo *
getInstrumentInfo(int pair_id);

Things like the name of the shared memory segment and what to do if it needs to be created and populated are all hidden inside of the library. This is nice, since the application using it just needs to work with the above functions to get its work done.

Instrument Info Add-on

In a file called instrumentaddon.cpp, I define the following (based on the information from the v0.8.2 of Node.js which I am using):

#define BUILDING_NODE_EXTENSION
#include <node.h>
#include "InstrumentsShm.h"

using namespace v8;


Handle<value>
GetInstrumentInfo(const Arguments & args)
{
  HandleScope scope;
  Local&ltfunction> cb = Local<function>::Cast(args[1]);

  Local<string> symbol = Local<string>::Cast(args[0]);
  String::AsciiValue ascii_val(symbol);
  int pair_id = get_pair_id_from_symbol(*ascii_val);
  if(!pair_id) {
     // not calling the cb is insult enough
     //throw Exception(String::NewSymbol("unknown pair"));
     return scope.Close(Undefined());
  }
  InstrumentInfo * ii = getInstrumentInfo(pair_id);
  if(!ii) {
    throw -1;
  }
  const int argc  = 1;
  { // display_name first
    Local<value> argv[argc] =
        { String::New(ii->display_name) };
    cb->Call(Context::GetCurrent()->Global(), argc, argv);
  }
  { // pip_location second
    Local<value> argv[argc] =
        { Number::New(ii->pip_location) };
    cb->Call(Context::GetCurrent()->Global(), argc, argv);
  }
  { // extra_precision third
    Local<value> argv[argc] =
        { Number::New(ii->extra_precision) };
    cb->Call(Context::GetCurrent()->Global(), argc, argv);
  }
  { // pip third
    Local<value> argv[argc] =
        { Number::New(ii->pip) };
    cb->Call(Context::GetCurrent()->Global(), argc, argv);
  }
  return scope.Close(Undefined());
}

extern "C" {
void init(Handle<object> target) 
{
  target->Set(String::NewSymbol("getInstrumentInfo"),
      FunctionTemplate::New(GetInstrumentInfo)->GetFunction());
} 
} // "C"

NODE_MODULE(instrumentaddon, init)

You can see how I am calling on the v8 engine (javascript engine which Node.js uses) to wrap data values and conform to the add-on API (admittedly this seems like black magic -- and it sort of is). My error handling is not very good since I am just trying to prove the concept and get some code to work which can be demonstrated.

To build this C++ into a shared object which the Node.js run-time can use, I do the following (actually I have Gnu make do it for me, rather than the default python builder):

g++ -g -O3 -Wall  -D_XOPEN_SOURCE -fPIC -DPIC -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -DEV_MULTIPLICITY=0 -I/usr/local/include/node -I. -Iinclude -Isrc/common   -c instrumentaddon.cpp -o instrumentaddon.nlo
/usr/local/include/node/uv-private/ev.h:574:1: warning: ‘ev_tstamp ev_now()’ defined but not used [-Wunused-function]
/usr/local/include/node/uv-private/ev.h:583:1: warning: ‘int ev_is_default_loop()’ defined but not used [-Wunused-function]
g++  -lm -lrt -lnsl -lpthread -L. -linstr_shm -Xlinker '-rpath=/opt/x/lib'  -g -O3 -Wall  -shared -o instrumentaddon.node instrumentaddon.nlo

The warnings building instrumentaddon.nlo (I chose this extension to associate the specific required compiler options that are needed for a Node.js add-on) are inoccuous, and can be ignored. The -rpath linker option specifies a search path for the required libinstr_shm.so library (which exports the C++ functions mentioned earlier to access the shared memory segment).

A Little Coffeescript

The Node.js program which uses this add-on to serve the data is quite simple and even looks fairly nice with the help of Coffeescript (which beautifies the necessary syntax) in a file rest.coffee:

express = require 'express'
instrumentaddon = require './instrumentaddon'

getinstrumentinfo = (symbol) ->
  info = []
  instrumentaddon.getInstrumentInfo( symbol, (itm) -> 
    info.push(itm) )
  throw something if info.length == 0
  pip = info[3]
  res =
    "instrument" : symbol
    "display_name" : info[0]
    "pip_location" : info[1]
    "extra_precision" : info[2]
    "pip_str" : "#{pip}"


app.configure( () ->
  app.set 'port', (process.env.PORT || 3000)
  app.set 'views', (__dirname + '/views')
  app.set 'view engine', 'jade'
  app.use (express.favicon())
  app.use (express.logger('dev'))
  app.use (express.bodyParser())
  app.use (express.methodOverride())
  app.use app.router
  app.use (express.static (__dirname + '/public') )
)

app.configure('development', () ->
  app.use(express.errorHandler())
)

###
instrument info route
###

app.get /^\/instruments\/([^_]+)_([^_\/]+)/, (req,res) ->
  res.json ( getinstrumentinfo (req.params[0] + '/' 
   + req.params[1]) )

app.listen 3000

console.log 'Express app started on port 3000'

Like Python, Coffeescript uses indentation levels to demarcate functional blocks and scope. This has the effect of dropping quite a bit of syntax which would otherwise clutter the code. Adding more routes which do other things (more shared memory reads or socket request/response code) is fairly simple since express provides a way of adding more lines that begin with one of app.get, app.put or app.post (followed by a URI regular expression and a handler) to accomplish this.

To convert the rest.coffee source into javascript (rest.js):

coffee --compile --bare --output builds/_Linux_x86_64_production rest.coffee

And run it (assuming that instrumentaddon.node and rest.js endup under /opt/x/node path):

% NODE_ENV=production \
NODE_PATH=/usr/local/lib/node:/usr/local/lib/node_modules \
LD_LIBRARY_PATH=/opt/x/node:/opt/x/lib \
node /opt/x/node/rest.js

The program listens on non-privileged port 3000 to serve the HTTP REST requests for this new service:

% curl "http://localhost:3000/instruments/EUR_USD"
{"instrument":"EUR/USD","display_name":"EUR/USD","pip_location":-4,"extra_precision":0,"pip_str":"0.0001"}

Summary

Integrating a C++ shared memory library with Node.js is pretty straightforward. Shared memory provides an ideal interface since it is non-blocking and we don't need to worry about slowing down the Node.js event loop by calling out to our custom library. The steps to build and run this application on Linux are described above in detail using the Gnu C++ compiler (g++), the Coffeescript compiler (coffee) and Node.js run-time (node). Hopefully this bits are helpful to anyone else who might find themselves attempting something similar. A word of warning though; Node.js and its libraries are undergoing rapid change at the moment, so any description of how to build an add-on is likely to go out-of-date soon!

Monday, August 13, 2012

Secondary School: XML Parsed with OCaml

In my previous post, I described how C++ could transfer XML into some predefined types (basic structs). Changing those types to make the XML reading easier was not an option (in effect that would be cheating) since I assumed that working with pre-existing types is part of the problem. I will do the same thing with my OCaml version of the problem. The types will not be laid out in a way that necessarily makes the XML reading any simpler.

Meet the Types

The OCaml types related to reading the school schema (tastyPS.xml is an example input) are as follows:

type person = { name : string }
type teacher = { super : person; subject : string }
type student = person
type clazz =
  { hour : int
  ; teacher : teacher
  ; students : student list }
type school = { classes : clazz list }
type result = { school : school }

Faking Expat

I did not look for a library which can read XML and provide string handlers for nodes (and their attributes); there may indeed be such a library. Instead I faked out the behaviour of the C++ Expat by just writing a long function which will operate the node handlers in a similar way:

(** test program simulates the read of TastyPS.xml *)

let readxml' nd_start_fun nd_end_fun () =
  let res1 = nd_start_fun "school" [] in
  let res1_1 =
    let res2 = nd_start_fun "class"
       ["day","Monday"; "hour","10"] in
      let res2_1 =
        let res3 = nd_start_fun "teacher"
          ["name","Mr Gauss"; "subject", "math"]
        in nd_end_fun res3 [] in
      let res2_2 =
        let res3 = nd_start_fun "student" ["name", "Jake"]
        in nd_end_fun res3 [] in
      let res2_3 =
        let res3 = nd_start_fun "student" ["name", "Mark"]
        in nd_end_fun res3 []
      in  nd_end_fun res2 [res2_1;res2_2;res2_3] in
  let res1_2 =
    let res2 = nd_start_fun "class"
        ["day","Tuesday";"hour","11"] in
    let res2_1 =
      let res3 = nd_start_fun "teacher"
        ["name","Mr Shakespeare";"subject","english"]
      in  nd_end_fun res3 [] in
    let res2_2 =
      let res3 = nd_start_fun "student" ["name","Christine"]
      in  nd_end_fun res3 [] in
    let res2_3 =
      let res3 = nd_start_fun "student" ["name","Thom"]
      in nd_end_fun res3 []
    in  nd_end_fun res2 [res2_1;res2_2;res2_3]
  in nd_end_fun res1 [res1_1;res1_2]

This code is not really meant to be elegant, it is just a large expression which does the same function calls in the same order as would an Expat library for OCaml (in a way that is similar to Expat).

Some Helper Code

An exception type is useful for when things go wrong:

exception Reader of string

A partial XML type is used to hold the results of partially converting the XML to the pre-defined types above:

type partial_xml = (** type for recursing partial content *)
   | PXStudent of student
   | PXTeacher of teacher
   | PXClass of clazz
   | PXSchool of school

An attribute finding function is useful for traversing the lists of pairs that are thrown at our handler:

let get_attr attr aname =
  try List.assoc aname attr
  with Not_found ->
    raise (Reader ("attribute "^aname^" not found"))

Node Handlers

The start node handler is where all the real work happens, but the end node handler is just used to complete the continuation which takes the converted content (list of partial converted XML) and builds the appropriate type:

let start_fun nname attr =
  let part_class_content content =
    let pcc (tlist,slist) px =
      match px with
        (PXStudent s) -> (tlist,s::slist)
        | (PXTeacher t) -> (t::tlist,slist)
        | _ -> raise 
           (Reader "classes only have teachers & students")
    in  List.fold_left pcc ([],[]) content in
    let part_school_content content =
      let psc px =
        match px with
          (PXClass cls) -> cls
          | _ -> raise 
             (Reader "school can only have classes in it ")
      in  List.map psc content in
    let get_attr = get_attr attr
    in match nname with
      "student" ->
        (fun _ ->
          PXStudent { name = get_attr "name" })
      | "teacher" ->
        (fun _ ->
          PXTeacher { super = { name = get_attr "name" }
                    ; subject = get_attr "subject" })
      | "class" ->
        (fun content ->
           match part_class_content content with
             ([teacher],students) ->
                 PXClass { hour = 
                           (int_of_string (get_attr "hour"))
                         ; teacher = teacher
                         ; students = students }
             | _ -> raise (Reader 
                "each class needs exactly one teacher"))
      | "school" ->
        (fun content ->
          let classes = part_school_content content
          in  PXSchool { classes = classes })
      | _ -> raise (Reader "unrecognized node tag")

The end node handler and the wrapper code are much simpler:

let end_fun sfunres content = sfunres content

let readxml () =
  let px = readxml' start_fun end_fun ()
  in  match px with
    (PXSchool school) -> { school = school }
    | _ -> raise (Reader "top tag is not a school")

let _ = readxml () (* run it *)

I could have made a lookup array to be used with List.assoc (a lookup function from the ocre library which matchs a key to a list of key/value pairs and returns the associated value) to deliver the proper continuation (e.g. part_class_content used for the class tag). This might have been a nicer parallel to the C++ code. I really like having a code structure with separated handlers for each node. I think it makes it easier to understand, and gives a fresh set of eyes (who might want to add a new node type) a nice pattern to follow. Building and running this code (in schoolxml.ml) is a simple matter as well:

$ ocamlc -g schoolxml.ml -o s.x
$ ./s.x

Summary

Rewriting this code in OCaml is (after writing the C++ version) is informative. The OCaml code seems to handle more corner cases and raises exceptions properly (I omitted the throw statements in the C++, but left comments for where they should be added). One very interesting rule is that a class should have exactly one teacher -- not fewer or greater. The C++ code will likely allow the teacher value to be NULL - an error by omission -- but will not allow there to be more than one teacher. The last observation is that, although the syntax of OCaml is much simpler, the overall parsimony of the comparable code is much more satisfying. I guess that is why I like functional programming so much.

Wednesday, July 25, 2012

OCaml: Hash Anything

Tasty

OCaml has a generic marshalling code within it, and it has MD5 Digest code. In this post I will simply compose the two of them to achieve a hash function which works on any object (a handy magical thing to have):

 % ocaml
        Objective Caml version 3.10.2

 # Marshall.to_string;;
 - : 'a -> Marshal.extern_flags list -> string = <fun>
 # Digest.string;;
 - : string -> Digest.t = <fun>
 # Digest.to_hex;;
 - : Digest.t -> string = <fun>

Just entering the component functions you can see the types of each of them when they are entered into ocaml REPL (in fact, ocaml returns symbolname : type = value on each evaluation). Note that ;; is needed to get it to evaluate since it takes multi-line input (OCaml does not require indentation for scoping -- a "feature" that some languages make use of):

 # let hash x =
    let m_x = Marshal.to_string x [] in
    let d_m_x = Digest.string m_x
    in  Digest.to_hex d_m_x;;
 val hash : 'a -> string = <fun>

Now we can use this hash on all kinds of things (with any type):

 # hash [1;2;3];;
 - : string = "64cb37afbe72effe97fb4f089a82f9b2"
 # hash 4.5966;;
 - : string = "5eed266efb8593218f1cb1cacf1f8d89"
 # hash "Ocaml rocks!";;
 - : string = "f078e0f37bce6c8a83112f9461bf7544"

It is almost too simple. Writing a meta-programming monster hash function in C++ is much less enjoyable. Some have criticized the OCaml library and its incompleteness (compared to Java perhaps) as a reason for its less wide adoption. Now that OCaml-java is a project, that excuse has less weight. Hopefully some of the parametric polymorphism available to the OCaml bundled libraries finds its way into that project. As usual, choosing the proper tool for any job is an important step in getting it done.

Friday, July 13, 2012

Enumeration in an MT world

Poison.

Learning new programming languages is a healthy thing to do as a programmer. It keeps your mind vital and your skills spry. Its also a pretty easy thing to do the more you do it, since metaphors are often copied wholesale from one language to another. For instance, both C and Java can be credibly described as descendants of Algol 68 an early imperative programming language. They both do loops and conditionals in much the same way and the semantics of the programs written in those languages is mostly understood operationally.

Although it might be difficult, exposure to other families of programming languages like functional (e.g. OCaml) or declarative (e.g. prolog) can make modern everyday skills easier to grasp (Javascript or gnu-make). Without good cross-pollination, old metaphors which are no longer appropriate out stay their welcome. My main example of this is C pointers. They are an ancient concept (as old as C is), but I feel they should have stayed where they were.

Long ago, C programs were mostly single threaded, and locations in memory were tracked with pointers which were de-referenced via * or -> primitive operators. These were great ideas in that context. C structs have locations, and those locations (pointers) are useful handles for referring to them. Consider the following code to iterate through an array:

  static const char * names[] =
    { "mark"
    , "paul"
    , "kevin"
    , 0 };
  for( const char ** n_ptr = &(names[0])
      ; *n_ptr != 0; ++n_ptr) {
    printf("hi %s\n",*n_ptr);
  }

Its pretty low-level, but you can see how it is that n_ptr is used to traverse the list until the last element a 0 pointer is encountered (and not printed), at which point we stop iterating. My claim is that this pattern of using a pointer for enumerating elements is not such a great idea in a multi-threaded context. STL tries to do just that with containers (and thats the problem) in C++. Its an idea that should not have been copied.

Tasty.

Java did it much better by using Enumeration/Iterator interfaces, which I will explain now using C++. Suppose you have a container or class that will give you (via a member method) a "list" of T objects:

  template< typename AnyT >
  struct Enumerator {
    virtual bool next(const AnyT * &) = 0;
    virtual ~Enumerator() {}
  };

  // now implement it
  Enumerator<t> * 
  SomeCollection::someMethodGivesList()
  { 
    // build/return Enumerator<t> subclass
    ...
  }

All of this setup gives us a single thing Enumerator<T> * with a simple type which can be drained of its T items easily, and its subclass implementation can be buried inside of SomeCollection.cpp. This has a side benefit of not polluting the compilation and link steps with extra symbols (bloating header files for instance).

  // now use that implementation 
  SomeCollection sc;
  const T * tptr = 0;
  Enumerator<t> * enumptr = sc.someMethodGivesList()
  while(enumptr && enumptr->next(tptr)) {
    ... // now use tptr
  }

I am a huge fan of how functional languages deal with this by using either fold or map functions, and we could emulate this somewhat with C function pointers (implementing a visitor call-back argument).

  void
  SomeCollection::someMethodVisitsList(
     void * fp
   , void (*oneach_cb)(void *, const T *))
  {
    // ok traverse the internal list here
    // calling (*oneach_cb)(fp,tptr) all the way
  }

This is a bit of a "poor man's" solution compared to what a real functional language can do with its parametric polymorphism capabilities. The opportunity to write a traversal function once is not really there, but could be part-way done with templates.

Poison.

The real problem with the STL using pointers as the basis for enumeration is that the container's iterators are not safely managed:

  for(std::vector::iterator itr = vContainer.begin()
      ; itr != v.end(); ++itr) {
    ...
  }

For starters the end() could be a moving target if another thread updates the vContainer underneath us at an unfortunate time. The job of synchronization was just never considered in with single threaded C and its pointers. Another issue is that iterators from different STL containers are not really interchangable, and passing a pair of iterators as a result to something is a bit painful. The Enumerator<AnyT> * is easily held in a smart pointer (perhaps a shared pointer) so it hops a couple fences there as well.

Summary.

Of course there are details to make either the Enumerator<AnyT> * solution or the visitor call-back solution work safely for multi-threaded programs -- but narrower interface these idioms have make the job easier.

I know that there is an MT-safe library called TBB which impletements STL-like containers, but it sounds hard to me -- and I am getting tired for forgetting to increment my "pointers".

Thursday, July 5, 2012

Parametric polymorphism pleases...

Tasty

I still write the occasional bit of OCaml for a project I maintain. Its a language which I am very happy to write code in, since I get pretty far without too much work.

Here is an interface snippet for a bit of code that translates an equivalence relation (represented as a list of pairs) into a set of equivalence classes (represented as a list of lists of items).

(** UnionFind
  This breaks an equivalence relation into 
  equivalence classes
  *)

val unionfind : ('a * 'a) list -> 'a list list

An example invocation which happens to use int for 'a:

let input1 = [ 1,2;3,4;5,6;7,8;8,3;6,1 ]

let res = unionfind input1

The res evaluates to [[1;2;6;5];[3;4;8;7]].
My really basic implementation of unionfind is embarrassingly sub-optimal (so I won't post it unless somebody asks). But, in my defense, the simple code that I wrote to do this works well enough so far in the context it is used in. Simple things are easier to write correctly anyway.

Follow @PichoraM
Follow Mark on GitHub