Thursday, July 26, 2012

OFlux: Building a Flow

Most of a .flux file content will be node declarations and connecting flow.  There are two types of nodes: abstract nodes which serve as helpful connection points and do not have C++ code associated with them, and concrete nodes which are implemented via a C++ function.  In this post, I will describe the major features of the OFlux language in some detail.  First off is something programming language people call "choice".

Routing with Conditions


Suppose we have a source node Src which blocks on some input and produces an output Foo * foo.   The source node is declared as follows (its input set is empty -- which is necessary for a source node):


 node Src () => (Foo * foo);
 source Src;

As written, the oflux compiler will complain about this input since the flow rooted at Src with only one node in it does not end with a node that has a () output set.  Adding a line which has terminate Src, will silence the complaint (in effect saying "we know what we are doing, don't complain").  What we really want to accomplish is to apply a condition isFooEnough() to every foo that comes out of Src.  This condition will -- in reality -- be implemented within our C++ code using a function with prototype bool isFooEnough(Foo *):


 condition isFooEnough(Foo *) => bool;


Suppose we want to implement separate nodes ConsumeFooEnough and ConsumerFooLacking as successors depending on the outcome of the isFooEnough() test.  Needless to say, it is assumed that isFooEnough has no side-effects on its argument or the global state of the program since multiple calls to such a conditional might occur when the logic gets more complicated.  Once consumed we will dispose of foo with a node called DisposeFoo.  The abstract node ComsumeFoo is used below only to help describe the decision being made:


 node ConsumeFooEnough (Foo * foo) => (Foo *);
 node ConsumeFooLacking (Foo * foo) => (Foo *);
 node DisposeFoo (Foo * foo) => ();
 node abstract ConsumeFoo (Foo * foo) => ...;
 ConsumeFoo: [isFooEnough] = ConsumeFooEnough -> DisposeFoo;
 ConsumeFoo: [*] = ConsumeFooLacking -> DisposeFoo;
 source Src -> ConsumeFoo; /* changing the original */

Much like pattern matching in a language like OCaml or Scala, the syntax for routing is done using the rule which first matches the input.  So if isFooEnough(foo) returns true, then the routing to ConsumeFooEnough happens, and otherwise the path to ConsumeFooLacking is chosen.  This syntax survives from the original Flux language design.  Elipsis (...)  is used on the output declaration of ConsumeFoo to indicate to the compiler that we do not care to specify the output set, and if we did only () would be acceptable since other options would not unify with the output set of DisposeFoo.  Now that a basic flow is described, the application programmer only has to implement the following concrete node functions in their C++ code to create a runnable program: Src, ConsumeFooEnough, ConsumeFooLacking, and DisposeFoo.  If it is not necessary to have DisposeFoo as a separate node, I would recommend that it just be implemented as a function which is called inside of the ConsumeFooEnough/ConsumeFooLacking C++ functions.  The only reason to have a separate node (which implies a separate event and a run-time scheduling of that event -- a non-negligible cost), is if DisposeFoo is interacting with resources in the program (please see my posting on OFlux guards).

Running oflux


To compile the content above in a file called web.flux we issue the command:


 % ./oflux web.flux
OFlux v1.00-6-gc59d671 on web.flux

This causes the following output to be created locally:

  • web.dot: a description of the OFlux flow which can be turned into a pretty picture showing types, conditions and nodes (blue are abstract).  To create a picture with graphviz, run dot -Tpng web.dot -o web.png:

  • web.xml: the XML description of the OFlux flow which will be loaded at run-time to make the program run (edits to this file can cause changes to the flow without having to recompile. For brevity only the Src node is shown as it has the most interesting entry.):


<flow name="web.flux" ofluxversion="v1.00-6-gc59d671">
 <node name="Src" function="Src" source="true" door="false" iserrhandler="false"
 detached="false" external="false" inputunionhash="cbd4d4a285d623ee19470f7d5d68e
1c1a765263c3c9242a8d61b222d49c7e64c" outputunionhash="6e63ce559f96ef9b1f68aa4370
2f1343c9638715464dea7d140e6b4d068d9688">
  <successorlist>
   <successor name="0">
    <case nodetarget="ConsumeFooEnough">
     <condition name="isFooEnough" argno="1" isnegated="false" unionhash="6e63ce
559f96ef9b1f68aa43702f1343c9638715464dea7d140e6b4d068d9688"/>
    </case>
    <case nodetarget="ConsumeFooLacking">
     <condition name="isFooEnough" argno="1" isnegated="true" unionhash="6e63ce5
59f96ef9b1f68aa43702f1343c9638715464dea7d140e6b4d068d9688"/>
    </case>
   </successor>
   <successor name="erste">
    <case nodetarget="Src"/>
   </successor>
  </successorlist>
 </node>
 ... 
</flow>


  • OFluxGenerate.h: A header file declaring the needed node and conditional functions which includes the application's mImpl.h header file (which defines the types used).  This header should be included in the application's .cpp source.  It should not be necessary to understand all of the mechanics inside of the OFluxGenerate.h file.  Using the node declarations inside of web.flux, each node N gives rise to types N_in, N_out and N_atoms which are needed for the C++ prototype  int N(const N_in *, N_out *, N_atoms).
  • OFluxGenerate.cpp: The generated C++ code needed to bind the OFlux program to the run-time.  This is where the static tables used to look-up symbols read from the XML file (web.xml) live.
Building a complete application means OFlux compiling/C++ compiling/linking/running with these outputs.  Errors can crop up at any of those stages, so it is typical to have a tool like Gnu make build the whole project -- even verifying that the XML file is loadable (properly gets its symbols from the OFluxGenerate.cpp code)


Some Limitations and Philosophy


In the above example there are plenty of things for the oflux compiler to check for us.  When a node is connected to another as a successor, the compiler does an asymmetric unification of the output set with the input set.  This means that each input should exist as an output from the predecessor.  Generally this is done using the name of the argument (so matching foo with foo in my example), but if there is a mis-match in naming an attempt is made to unify using types.

If the names of the formal parameters do not match there are consequences in the generated code -- a C union is necessary to give the field two names (this is trouble if any of the argument types is non-POD, and that happens quite a bit with C++ code).  When unification fails, the offending node and its argument is indicated by the oflux compiler (the first error causes the compilation to halt).


OFlux adds another dimension to the task of programming a server.  A new server design will have nodes and flow to go with it.  My consistent impression working with developers who are new to the tool is that many many nodes end up being defined (much like functions are used in C++).  Making a flow very long (lots of nodes from source to any sink), can be quite detrimental to performance if most of the nodes are doing less work than the overhead to execute them.  My best advice is to try an initial flow with the absolute least number of nodes possible, and then investigate refining that program into more nodes in a step-wise manner.  Keep it as simple as possible!


Routing to Many Places


Occasionally concurrency is needed in the flow since an output is consumed by multiple nodes, and its inefficient to have them each process that input (a foo perhaps) sequentially.  The OFlux run-time keeps a node event (which holds its output data) alive using reference counting, so it is possible to keep a node event around long enough to be processed by multiple successor node events.

If we wanted to modify our example above to have DisposeFoo be an abstract which aliases two nodes we want to run at the same time (ReclaimFoo and CommunicateFoo).  This could be done as follows (replacing the C++ function implementation we have for DisposeFoo with implementations for ReclaimFoo and CommunicateFoo:


 node abstract DisposeFoo (Foo *foo) => ();
 node CommunicateFoo(Foo * foo) => ();
 node ReclaimFoo(Foo * foo) => ();
 DisposeFoo = CommunicateFoo & ReclaimFoo;

Using this technique, the completion of a ConsumeFooEnough event will cause two new events to be created: one for CommunicateFoo and another for ReclaimFoo.  If these nodes were detached or had a shimmed system call in them, they could both dispatch at the same time and run concurrently.


Summary



Using the basic composition syntax within OFlux an application developer can describe the top-level flow of events in his program without having to explicitly manage a thread pool themselves.  The language offers three main types of basic composition:


  1. sequential (using ->)
  2. concurrent (using &)
  3. choice using (using :[ ? ]matching)

No comments:

Post a Comment