Thursday, July 19, 2012

Pacemaker + Corosync Play Clue

Tasty.



Clue was a popular board game (and possibly still is) where possible suspects, locations and murder weapons were guessed at in order to solve a crime (which meant finding the correct combination of all three).  In the spirit of this board game, I would like to explain how it is that resources (processes, virtual IP addresses, services) are assigned to cluster nodes by a cluster manager (Pacemaker) according to some easily expressible constraints.  The analogy is that cluster nodes are a bit like rooms, and resources are a bit like people or murder weapons.

Pacemaker is a subproject of HALinux (high availability) which is used to manage resources on a cluster and determine what to do about failures in the cluster (crashed processes and/or failed nodes).  Given a set of constraints (or clues, if you like) it figures out a solution which satisfies those constraints (or comes closest to doing so).  Pacemaker can run on two reliable message buses: Corosync and Heartbeat (the latter is older than the former).  I chose to install Corosync as the message bus, since it seemed to be the new thing -- not always a good reason to pick something though.


Mansion Layout


Our cluster has 3 rooms at the moment:
  1. Kitchen (used as a fileserver mostly) 
  2. Bedroom1 (first compute node)
  3. Bedroom2 (second compute node)
In terms that Pacemaker understands we declare those nodes by editing the configuration using:

 # crm configure edit

And then,

 node Kitchen \
 attributes standby="off"
 node Bedroom1 \
 attributes standby="off"
 node Bedroom2 \
 attributes standby="off"

Characters


A simple service that is maintained by running a process called ProfPlum is added to the cluster with the following lines in the Pacemaker configuration:

 primitive AProfPlum ocf:heartbeat:anything \
 params binfile="/opt/clue/bin/ProfPlum" \
        user="clue" logfile="/tmp/prof.out" \
        errlogfile="/tmp/prof.err" \
 op start interval="0" timeout="20" \
 op stop interval="0" timeout="30" \
 op monitor interval="20"

The ocf:heartbeat:anything part refers to a script which implements  a primitive which is capable of running a simple process resource.  There is a large bundle of similar scripts for other cluster resources and commonly used packages (e.g. IP addresses, databases, file systems). As is, we can start and stop this resource using:

 # crm resource [start | stop | restart] AProfPlum

Locations.


In order to tell Pacemaker that we want to bind this resource to a particular node (in this case Kitchen) we add to the Pacemaker configuration:


 location AProfPlum-loc AProfPlum inf: Kitchen 

We could have just said things a little differently by specifying where we don't want AProfPlum to run using:

 location AProfPlum-not-bedroom1-loc AProfPlum -inf: Bedroom1
 location AProfPlum-not-bedroom2-loc AProfPlum -inf: Bedroom2

If we create similar simple process resource called AColonelMustard to manage a program called ColonelMustard, we can keep AColonelMustard and AProfPlum in the same room using this snippet of Pacemaker config:

 colocation APlum-and-Must-coloc inf: AProfPlum AColonelMustard

Using -inf: instead in the colocation line would have specified that they be kept separated instead of together.

Many Peacocks


Suppose we have a special service we want to run on both Node1 and Node2 called MissPeacock, we can accomplish this with a clone line and a location constraint that keeps the clones out of the Kitchen:

 
 clone AMissPeacock-clone AMissPeacock \
 meta ordered="false" clone-max="2" \
        target-role="Started"
 location AMissPeacock-loc AMissPeacock-clone -inf: Kitchen

House of Cards


To check the status of the cluster we type:

 # crm status

or monitor it persistently:

 # crm_mon

To fail a node (e.g. Bedroom1) to do maintenance on it we request:

 # crm node standby Bedroom1

and bring it back online:

 # crm node online Bedroom1

Summary


Pacemaker+Corosync provide a declarative way to describe the services and resources that need to be maintained in the cluster.  Using a resource list and constraints, resources are distributed amongst the available nodes.  Cluster nodes and resources are also easily taken down or restored.  There are many tools which can do these tricks on a single box, but on a cluster it is quite something to see.

To learn more about Pacemaker, there are nice documents explaining webserver installations for SUSE Linux and from Cluster Labs.

No comments:

Post a Comment