[Linux-cluster] Clustered DNS service use on the node's resolvers?

Sat Nov 27 07:35:02 UTC 2010

On 11/26/2010 04:50 PM, Colin Simpson wrote:
> Hi
> 
> I playing with clustering bind. The named service has it's own IP
> resource. I have successfully tested a basic caching name servers all
> seems to be working. 
> 
> Now is it possible (or wise) to give the nodes of the cluster hosting
> the DNS service the resource IP as their DNS server in their
> resolv.conf. 

I guess it all depends on what kind of services are running on the
cluster and how dependent they are on DNS itself.

> 
> This is mainly so that other services can have a highly available DNS
> service. Multiple DNS servers in resolv.conf really is a non-starter as
> it takes forever to give up on the first name server and try the next
> one (then it always keeps trying the first server again on subsequent
> queries).
> 
> My main issue, I think, is that I'm guessing so long as all my cluster
> node names (and any other names) used in cluster.conf are in /etc/hosts
> then the cluster suite itself should be happy ? Does is ever need to hit
> DNS? 

This should work.

> 
> But what if services (maybe Samba or httpd) uses and/or requires DNS? I
> know I could make these dependant on DNS but that seems a bit messy
> (with so many services inside one). Plus I would like DNS to be in a
> different failover domain from some of the services.

Hmm this is a bit of a grey area. Let me explain what I think.

Let's assume you have service foo (like httpd) that needs to do lots of
DNS queries (to resolve ip <-> hostnames for logs) and for whatever
reason your DNS service dies, rgmanager attempts to relocate, still fail
(maybe a configuration error from the admin that cannot be recovered by
rgmanager for obvious reasons).

rgmanager will stop trying to recover DNS according to your configured
policies, but then httpd, and any other service dependent on DNS, will
suffer from the lack of it.

The question then goes down to: how quickly would the admin notice that
DNS is down and recover?

> So I suppose the
> question is, does rgmanager start the service and decide a service is
> alive or dead based on just return status, not timeouts ?

It depends on the resource agent to drive that resource. In most cases
the resource agent behaves like an init script and does a status check.
Nothing stops you to customize the resource agent and check for
something else. Just make sure to use standard exit codes.

> Will it get
> upset by a slow starting service?

Still depends on how the resource agent is written and how the resource
work. Some daemons will fork in background right away, init script will
exit clean and later the daemon fail. rgmanager will notice that the
daemon has failed only at the next status check.

If the init script (or resource agent) will instead wait for the service
to start before exiting, there should be no problems at all.

It will just take longer to start the resources dependent on the slow one.

> And will it start services in parallel
> (not waiting for each to start in turn)? 
> 

I can't remember the option to do it, but it is possible.

Now keep in mind that, by splitting the resources in different failover
domains, you can start in them parallel, but if they depend on each
other for proper runtime functionality, you are introducing a situation
where rgmanager can't help you to start them in the right order.
So make sure it's actually what you are looking for.

Fabio