[Linux-cluster] Cluster won't come up when T1 is down???

isplist at logicore.net isplist at logicore.net
Fri Oct 12 03:26:06 UTC 2007


Apologies to list for multiple posts. Just an email server burp. Building a 
new server and the same message ended up queued a few times cept for the last 
one asking if I could be seen.

No, my sites aren't down, no I have no additional information, this was a one 
time weirdness. I thought I'd pick some minds, never know, sometimes you ask 
your associates when something weird happens. Generates new questions and 
sometimes, answers.

So...

> Please, let's please dispense with the vitriol, and solve your problem, ok?
> Great, then let's get on with it. You can thank me later.

Agreed. On the problem, there is no solution because it's been too long. I was 
asking a general question, to see if anyone might have come across anything 
similar, not exactly the same perhaps but somehow related. It's not like this 
weirdness created any logs or anything that I could search out to get anything 
relevant to post so asked in general.
 
> name resolution problem happening. Where are the names you use for your
> nodes to find each other stored? I'm guessing in a dns that you cannot
> access when the T1 drops.

The thing is, it's all internal, behind firewalling. There aren't any 
dependencies on anything external for the cluster to come together. The T1 
being down should have had no effect what so ever yet it prevented the cluster 
from coming back together. 
 
> Insure that each node has an /etc/host file that has all node names and IP
> addresses in it, if you have not done so already. This will ensure that
> names will be resolved correctly - even if dns is not available.
> Understanding your network topology would also be helpful.

Each node does have a hosts file with all of the other nodes on it as well. I 
have internal DNS servers as well. The most frustrating aspect is that there 
isn't anything to look for since it works just fine other than that one time. 
Again, that's why my question was so general. Every now and then, something 
weird happens in a network and you just ask your associates for thoughts, 
ideas, maybe they will think of other questions to ask you which will get you 
thinking. That's what my intent was here too. 

> But keep in mind I am simply guessing here - as you have provided me with
> few details. Can you please provide:

That's because there really isn't much more to it. The cluster works just 
fine, always, except for that one time which didn't make any sense. 
 
> * a description of your topology

All servers are NETed behing firewalling. LVS front end to the services. 
Brocade switches for fencing and fibre channel networking. 

> * config data for all interfaces

Where do I start? 

> * contents of /etc/hosts

Check, each node knows about the next, all work just fine, just that one time.

> * IP address of dns server

Hosts files and internal DNS's.

I know this isn't something we'll find an answer to by breaking down my 
network. It's a one time thing that I thought I'd ask about, to see if anyone 
else had seen anything like that. Everything works perfectly, other than that 
one time. 


---

> results? I'm simply attempting to engage you to help you - selfishly, I'll
> admit, because to be honest, I'm tired of this email hitting my inbox.

It's just email. I think we as humans have to stop finding reasons to be 
pissed off, upset, worked up over such small things. I'm sure you get a lot 
more spam than anything else. That should frustrate you a lot more than my 
email burp.

Hope we're done with the list issues :).

Mike






More information about the Linux-cluster mailing list