[Linux-cluster] Starter Cluster / GFS

Fri Nov 19 02:23:33 UTC 2010

Hi again !

I am begining to play with my new servers. I got for starter 2 nodes (1u
intel server platform, with a LSI Logic FC949ES FC card). I am like a child
playing with his new toy at christmas...

So, now I have a few points and questions. Sorry if it's long.

1. Raid sets

So, I made up a 2-node cluster for the moment. I was able to bring up the 
cluster and make a GFS file system, in fact 2. We've made some test with 
different strategy of raid. Our first idea for the gfs was to use 5 1tb 
disks in raid 5. With that I got a 4 tb fs. It has been suggested previously 
that might not be a good idea. Our controler don't support directly raid 10 
wich seems to be the consensus of a better setup. We will be making the 0 
part on linux.

I made 2 raid 1 sets of 1tb (2 disks) on our raid enclosure, and added them 
to a single vg. I created a lv on top of that, so I yield with a 2 tb fs. We 
don't plan on using striping on the lv (-i2) because of the overhead if we 
add more space we will need to add 2 sets of raid1. So we plan on making a 
"starter" gfs with those 2 sets (2tb total). It's nearly double the 1.1 tb 
we have now, so we'll start with that.

Now, we made some write test with dd, and judging by the disk activity, all 
data was writen to the first disk (pair of) of the vg, and never the second 
one. I assume that once the first disk is full, it'll start writing to the 
2nd one. In the long term, I don't beleive it'll be a problem, but I'd 
prefer if the data was written alternativly on both disks without using 
stripes. Is that possible ? I looked at the --alloc option to the vgcreate, 
but it doesn't seem to be that.

2. Network setup.

All our new servers have 3 nics, one being dedicated on to the mamagement 
module. I will be using the first one to make a private network that will be 
serving my services. In my new setup real routable ips will terminated at 
the router and will be nated to the private ones for eventual 
load-balancing. I will be using the second network on a different vlan and 
subnet for cluster communications. The management modules will be on that 
same vlan. So is this a good setup ? Should I be doing something differently 
?

3. Deadlocks

I found a small c program for testing the locks/s that is possible on a file 
accessed similtunously on many nodes. (It's ping_pong, some fo you might 
have used it). So, one of the parameters of that program is the number of 
nodes using that file +1. On one test, I used 2 in stead of 3 on one of the 
node. Both profram on both nodes seemed stuck, not killable, not even -9. So 
I must assume that they were in some kind of deadlock. dlm_tool 
deadlock_check didn't show anything, and I can't make heads or tails from 
gfs2_tool lockdump or what to do with it. I was forced to reboot (forcebly) 
one of the node. Most likely on my production environement we won't arrive 
to that situation. But I want to know what happed and what to do to prevent 
it or stop that kind of lock.

Thank you all.