Showing posts with label ocaml. Show all posts
Showing posts with label ocaml. Show all posts

Monday, August 13, 2012

Secondary School: XML Parsed with OCaml

In my previous post, I described how C++ could transfer XML into some predefined types (basic structs).  Changing those types to make the XML reading easier was not an option (in effect that would be cheating) since I assumed that working with pre-existing types is part of the problem.  I will do the same thing with my OCaml version of the problem.  The types will not be laid out in a way that necessarily makes the XML reading any simpler.


Meet the Types


The OCaml types related to reading the school schema (tastyPS.xml is an example input) are as follows:

type person = { name : string }
type teacher = { super : person; subject : string }
type student = person
type clazz =
  { hour : int
  ; teacher : teacher
  ; students : student list }
type school = { classes : clazz list }
type result = { school : school }

Faking Expat


I did not look for a library which can read XML and provide string handlers for nodes (and their attributes); there may indeed be such a library.  Instead I faked out the behaviour of the C++ Expat by just writing a long function which will operate the node handlers in a similar way:


(** test program simulates the read of TastyPS.xml *)

let readxml' nd_start_fun nd_end_fun () =
  let res1 = nd_start_fun "school" [] in
  let res1_1 =
    let res2 = nd_start_fun "class"
       ["day","Monday"; "hour","10"] in
      let res2_1 =
        let res3 = nd_start_fun "teacher"
          ["name","Mr Gauss"; "subject", "math"]
        in nd_end_fun res3 [] in
      let res2_2 =
        let res3 = nd_start_fun "student" ["name", "Jake"]
        in nd_end_fun res3 [] in
      let res2_3 =
        let res3 = nd_start_fun "student" ["name", "Mark"]
        in nd_end_fun res3 []
      in  nd_end_fun res2 [res2_1;res2_2;res2_3] in
  let res1_2 =
    let res2 = nd_start_fun "class"
        ["day","Tuesday";"hour","11"] in
    let res2_1 =
      let res3 = nd_start_fun "teacher"
        ["name","Mr Shakespeare";"subject","english"]
      in  nd_end_fun res3 [] in
    let res2_2 =
      let res3 = nd_start_fun "student" ["name","Christine"]
      in  nd_end_fun res3 [] in
    let res2_3 =
      let res3 = nd_start_fun "student" ["name","Thom"]
      in nd_end_fun res3 []
    in  nd_end_fun res2 [res2_1;res2_2;res2_3]
  in nd_end_fun res1 [res1_1;res1_2]



This code is not really meant to be elegant, it is just a large expression which does the same function calls in the same order as would an Expat library for OCaml (in a way that is similar to Expat).


Some Helper Code



An exception type is useful for when things go wrong:


exception Reader of string

A partial XML type is used to hold the results of partially converting the XML to the pre-defined types above:


type partial_xml = (** type for recursing partial content *)
   | PXStudent of student
   | PXTeacher of teacher
   | PXClass of clazz
   | PXSchool of school

An attribute finding function is useful for traversing the lists of pairs that are thrown at our handler:


let get_attr attr aname =
  try List.assoc aname attr
  with Not_found ->
    raise (Reader ("attribute "^aname^" not found"))


Node Handlers


The start node handler is where all the real work happens, but the end node handler is just used to complete the continuation which takes the converted content (list of partial converted XML) and builds the appropriate type:



let start_fun nname attr =
  let part_class_content content =
    let pcc (tlist,slist) px =
      match px with
        (PXStudent s) -> (tlist,s::slist)
        | (PXTeacher t) -> (t::tlist,slist)
        | _ -> raise 
           (Reader "classes only have teachers & students")
    in  List.fold_left pcc ([],[]) content in
    let part_school_content content =
      let psc px =
        match px with
          (PXClass cls) -> cls
          | _ -> raise 
             (Reader "school can only have classes in it ")
      in  List.map psc content in
    let get_attr = get_attr attr
    in match nname with
      "student" ->
        (fun _ ->
          PXStudent { name = get_attr "name" })
      | "teacher" ->
        (fun _ ->
          PXTeacher { super = { name = get_attr "name" }
                    ; subject = get_attr "subject" })
      | "class" ->
        (fun content ->
           match part_class_content content with
             ([teacher],students) ->
                 PXClass { hour = 
                           (int_of_string (get_attr "hour"))
                         ; teacher = teacher
                         ; students = students }
             | _ -> raise (Reader 
                "each class needs exactly one teacher"))
      | "school" ->
        (fun content ->
          let classes = part_school_content content
          in  PXSchool { classes = classes })
      | _ -> raise (Reader "unrecognized node tag")


The end node handler and the wrapper code are much simpler:


let end_fun sfunres content = sfunres content

let readxml () =
  let px = readxml' start_fun end_fun ()
  in  match px with
    (PXSchool school) -> { school = school }
    | _ -> raise (Reader "top tag is not a school")

let _ = readxml () (* run it *)


I could have made a lookup array to be used with List.assoc (a lookup function from the ocre library which matchs a key to a list of key/value pairs and returns the associated value) to deliver the proper continuation (e.g. part_class_content used for the class tag). This might have been a nicer parallel to the C++ code.  I really like having a code structure with separated handlers for each node.  I think it makes it easier to understand, and gives a fresh set of eyes (who might want to add a new node type) a nice pattern to follow. Building and running this code (in schoolxml.ml) is a simple matter as well:


$ ocamlc -g schoolxml.ml -o s.x
$ ./s.x


Summary


Rewriting this code in OCaml is (after writing the C++ version) is informative.  The OCaml code seems to handle more corner cases and raises exceptions properly (I omitted the throw statements in the C++, but left comments for where they should be added).  One very interesting rule is that a class should have exactly one teacher -- not fewer or greater.  The C++ code will likely allow the teacher value to be NULL - an error by omission -- but will not allow there to be more than one teacher.  The last observation is that, although the syntax of OCaml is much simpler, the overall parsimony of the comparable code is much more satisfying.  I guess that is why I like functional programming so much.

Wednesday, July 25, 2012

OCaml: Hash Anything

Tasty


OCaml has a generic marshalling code within it, and it has MD5 Digest code.  In this post I will simply compose the two of them to achieve a hash function which works on any object (a handy magical thing to have):


 % ocaml
        Objective Caml version 3.10.2

 # Marshall.to_string;;
 - : 'a -> Marshal.extern_flags list -> string = <fun>
 # Digest.string;;
 - : string -> Digest.t = <fun>
 # Digest.to_hex;;
 - : Digest.t -> string = <fun>

Just entering the component functions you can see the types of each of them when they are entered into ocaml REPL (in fact, ocaml returns symbolname : type = value on each evaluation).  Note that ;; is needed to get it to evaluate since it takes multi-line input (OCaml does not require indentation for scoping -- a "feature" that some languages make use of):


 # let hash x =
    let m_x = Marshal.to_string x [] in
    let d_m_x = Digest.string m_x
    in  Digest.to_hex d_m_x;;
 val hash : 'a -> string = <fun>

Now we can use this hash on all kinds of things (with any type):


 # hash [1;2;3];;
 - : string = "64cb37afbe72effe97fb4f089a82f9b2"
 # hash 4.5966;;
 - : string = "5eed266efb8593218f1cb1cacf1f8d89"
 # hash "Ocaml rocks!";;
 - : string = "f078e0f37bce6c8a83112f9461bf7544"

It is almost too simple.  Writing a meta-programming monster hash function in C++ is much less enjoyable.  Some have criticized the OCaml library and its incompleteness (compared to Java perhaps) as a reason for its less wide adoption.  Now that OCaml-java is a project, that excuse has less weight. Hopefully some of the parametric polymorphism available to the OCaml bundled libraries finds its way into that project.  As usual, choosing the proper tool for any job is an important step in getting it done.

Wednesday, July 11, 2012

OFlux

Tasty.



Many years ago, I was looking for a way to help write C++ applications quickly using re-usable components with an event-based metaphor. The problems we were dealing with felt like describing an assembly line which turned requests into responses.  Each stage of the assembly line could be described by pretty basic functions which mutate state in some way.  Some casting around for solutions to this problem led me to Emery Berger et al's Flux project which had a USENIX 2006 paper published describing it.  I really liked how this model abstracted away the writing of thread code.  I think we take this for granted now with event-based programming paradigms now, but it looked like a promising line of attack for us.

For several reasons, we decided to rewrite the Flux tool and add features to it that our application developers needed.  First of all the compiler was redone from scratch using OCaml instead of Java, and then the run-time was redone using C++ which loaded flow information via XML.  These choices made the code a bit more flexible, and -- to be fair -- moving away from expedient graduate student project code had to happen for production use.

Account Cache Example


To consider a small example (below) of an integer indexed account cache (containing C++ Account objects), consider the following.

 exclusive account_lock (int account) => Account *;
 node Start () => (int account);
 node Finish (int account, guard account_lock(account)) => ();
 source Start -> Finish;


Using exclusive guard account_lock as a resource the flow consisting of C++ functions Start() and Finish() operates as follows:


  • Start() generates integers refering  to the keys of the account_lock which are accessed by the Finish() function.
  • Start() may be doing I/O (reading a socket or a file) to obtain account integer keys
  • multiple Finish() events can run at the same time on different accounts (the semantics of the exclusive key word)
  • The account_lock guard resource is accessed for the duration of the Finish() function invocation and de-referenced with the account integer key.

Here is the shell C++ code for Finish which does the initial object population from the heap when the guard de-references to a NULL:
int Finish(const Finish_in * fi
         , Finish_out *
         , Finish_atoms * fa)
{
 if(fa->account_lock() == NULL) {
  fa->account_lock() = new Account(fi->account);
 }
 ...
 return 0;
}

This example demonstrates how the different sub-tasks of producing and consuming work in the program can be decoupled to increase concurrency in a beneficial way.  Often, for the purpose of testing/demonstrating, it is sufficient to write an example source node (which generate events from nothing like Start() does) via a call to sleep() and some formula for where the input comes from.  Like the Flux run-time the issuing of a blocking system call like sleep() causes the run-time to consult its queue of run-able events to see if another thread can either finish (preferred) or initiate an event execution.  This is detected via interposition of the system calls so that the application developer is unaware that it is happening.

New Repo


The OFlux repo is now publicly available on Github and released under AGPLv3.  Please have a look at the src/examples sub-directory for more examples (documentation is coming), if you are interested.

Thursday, July 5, 2012

Parametric polymorphism pleases...

Tasty


I still write the occasional bit of OCaml for a project I maintain.  Its a language which I am very happy to write code in, since I get pretty far without too much work.

Here is an interface snippet for a bit of code that translates an equivalence relation (represented as a list of pairs) into a set of equivalence classes (represented as a list of lists of items).

(** UnionFind
  This breaks an equivalence relation into 
  equivalence classes
  *)

val unionfind : ('a * 'a) list -> 'a list list

An example invocation which happens to use int for 'a:

let input1 = [ 1,2;3,4;5,6;7,8;8,3;6,1 ]

let res = unionfind input1

The res evaluates to [[1;2;6;5];[3;4;8;7]].
My really basic implementation of unionfind is embarrassingly sub-optimal (so I won't post it unless somebody asks). But, in my defense, the simple code that I wrote to do this works well enough so far in the context it is used in. Simple things are easier to write correctly anyway.


Follow Mark on GitHub