[Linux-cluster] Re: [ANNOUNCE] Linux Cluster Summit 2005

Sat May 21 04:44:49 UTC 2005

On 5/18/05, Daniel Phillips <phillips at redhat.com> wrote:
> Linux Cluster Summit 2005
> 
> June 20 and 21, Walldorf, Germany (Near Heidelberg)
> 
> Sponsors: Red Hat, SAP AG, Oracle, HP, and Fujitsu-Siemens.
> 
> The goal of the two-day Linux Cluster Summit workshop is to bring
> together the key individuals who can realize a general purpose
> clustering API for Linux, including, kernel components, userspace
> libraries and internal and external interfaces.  Results of this
> workshop will be presented to the Kernel Summit the following month.

target vision for cluster infrastructure (thoughts on reading
interview with Andrew Morton in
 Ziff-Davis eweek)  

April 21, 2005, edited May 20
I was surprised to see that cluster infrastructure is still missing,
yet pleased that the need for it is more widely perceived today
http://www.uwsg.iu.edu/hypermail/linux/kernel/0409.0/0238.html

than it was four years ago when the linux-cluster mailing list was formed.
http://mail.nl.linux.org/linux-cluster/2001-02/msg00000.html

although there is nothing but spam in its archive since july 2002.

A quick review of more recent developments indicates that little has changed.

There is no need for standardization accross cluster infrastructures at any one
installation, and the sense that a discussion is over "whose version
gets included" rather than "what can we add to make things easier for
everyone, even if doing so will actually hurt the prospects for the clustering
infrastructure I am promoting" still leads to benchmark wars whenever the
subject comes up.  So I gather from glancing at discussion on LKML from
last september that there has been some progress but not much.

Four years ago, I proposed a target vision for linux cluster kernel development
which I believe still makes sense. (And now I know to call it a
"target vision!")

At the time,  I had no good answer for why we would bother to implement support
for something that nobody would immediately use, and sky-pie descriptions of
wide area computing grids seemed silly. (they may still.)  

The vision was, that a Linux box could be in multiple clusters at
once, or could, if so configured, be a "cluster router" similar to the
file system shareres/retransmitters one can set up to run interference
between disparate network file systems.

Supporting this vision -- a box is in N clusters from M separate
cluster system vendors, at the same time, and these N clusters know nothing
about each other -- is in my opinion a reasonable plan of attack for selecting
features to include, or interfaces to mandate conformity to in cluster
system vendors, rather than getting into detailed fights about whose
implementation of feature F belongs in the core distribution.

In the automatic process migration model, it is easy to imagine a wide
cluster where each node might represent a cluster rather than a unit, and would
want to hide the details of the cluster it is representing.  Four
years ago, Mosix allowed pretty wide clusters containing nodes not directly
reachable from each other, but node id numbers needed to be unique across
the whole universe.

in the "galaxy cluster" vision, a cluster can represent itself as a node, to
other nodes participating in the same cluster, without revealing
internal details of the galaxy (because from far enough away, a galaxy
looks, at first, like a single star).

The closest thing to implementing this vision that was available when
I last reviewed what was available was implementing Condor to link separate
Mosix clusters.

I remember a few near-consensuses being reached on the linux-cluster
mailing list.

These included:

   Defining a standard interface for  requesting and obtaining
cluster services and enforcing compliance to it makes sense.

   Arguing about whose implementation of any particular clustering
feature is best does not make sense.  (Given free exchange of techniques
and a standard interface, the in fact better techniques will gradually nudge
out the in fact inferior ones with no shoving required.)

    A standard cluster configuration interface (CCI) defined as a fs
of its own makes sense, rather that squatting within /proc

   the CCI can be mounted anywhere, (possibly back
 within /proc) so multiple clusters on the same box will not
collide with each other -- each gets its own CCI, and all syscalls to
cluster parts include a pointer to a cluster configuration object, of which
there can be more than one defined 

   The first order of business therefore was to take a survey of
services provided by clustering infrastructures and work out standardizable
interfaces to these services

That's what I remember.  The survey of services may or may not have
been performed
formally, I know that a survey of cluster services provided can be
done easily -- is done
often, whenever anyone tries to select what kind of cluster they want to set up.

The role of the linux kernel maintainer, in the question of supporting
multiple disparate
cluster platforms, is NOT to choose one, but is to set up ground rules
under which
they can all play nice together.  Just like the file systems all play
nice together currently.

The thought of having two different spindles each with their own file
system is not
something that anyone blinks at anymore, but it was once revolutionary.

Having one computer participating in N clusters, at the same time, may in the
future be a similar non-issue.

Pertaining to the list of cluster services, here's a quick and small list of the
ones that spring to my mind as being valiud for inclusion into the CCI, without
doing too much additional research:

   services (including statistics) that cluster membership provides
to processes on the node should be defined and offered through the CCI

   node identification in each cluster, abstracted into that cluster

   information about other nodes

   extended PID must include cluster-ID as well as node-ID when discussing
   PID extension mechanisms:  if I am process 00F3 on my own node, I might
   have an extended pid of 000400F3 on a cluster in which I am node 4 and an
   extended pid of 001000F3 on a cluster in which I am node sixteen.

   the publish/subscribe model (just read that one today) is very good

   standardize a publish/subscribe RPC interface in terms of operations on
   filesystem entities within the CCI

Based on discussion on the cap-talk mailing list, i'd like to suggest
that publish/subscribe get implemented in terms of one-off capability
tickets, but that's a perfect example of the kind of implementatin
detail I'm saying we
do not need to define.  How a particular clustering system
implements remote procedure call is not relevant to mandating a standard for
how clustering systems, should their engineers choose to have their
product comply with a standard, may present available RPCs in the CCI, and how
processes on nodes subscribed to their clusters may call an available
RPC through
the CCI.

The big insight that I hope you to take away from this e-mail, if you
haven't had it already (I have been out of touch with the insight level of
LKML for a long time) is that clustering integration into the kernel makes
sense as a standards establishment and enforcement effort, rather than a
technology selection and inclusion effort (at least at first -- once
the CCI exists,
cluster providers might all rush to provide their "CCI driver modules" and then
we're back to selection and inclusion) and that is a different kind of effort.

Although not a new kind.  While writing that paragraph I realized that
the file system interface and the entire module interface, and any other
kind of plug-it-in-and-it-works-the-same interface linux supports, sound,
video, et cetera -- are all standards enforcement problems rather than
technology
selection problems.

Not recognizing that clustering is such a problem is what I believe is
holding back cluster infrastructure from appearing in the kernel.

So last septembers thread about message passing services, in my
vision, is improper. The question is not, how do we pass messages,
but, we have nodes that we know by node-id, and we have messages
that we wish to pass to them, how do we provide a mechanism so that
knowing only those things, node Id and message, an entity that
wishes to pass a message can ask the cluster system to pass the message?

Given modularization, it will then be possible to drop in and replace systems as
needed or as appropriate.

--
David L Nicol
Director of Research and Development
Mindtrust LLC, Kansas City, Missouri