From jpalmae at gmail.com  Fri May  2 01:16:40 2008
From: jpalmae at gmail.com (Jorge Palma)
Date: Thu, 1 May 2008 21:16:40 -0400
Subject: [Linux-cluster] GFS Storage cluster !!!!
In-Reply-To: <48189385.8080007@nexatech.com>
References: <48119C51.8020904@monster.co.in>
	<5b65f1b10804300830s46f1038bj3e29e79c9699a133@mail.gmail.com>
	<48189385.8080007@nexatech.com>
Message-ID: <5b65f1b10805011816s74312a4qf5ebcdb17b398e93@mail.gmail.com>

I Know....

Thanks!!

On Wed, Apr 30, 2008 at 11:43 AM, Jeff Macfarland
<jmacfarland at nexatech.com> wrote:
> Just an FYI- does not support SCSI PR
>
>
>  Jorge Palma wrote:
>  > you can use ISCSI to simulate a SAN
>  >
>  > http://iscsitarget.sourceforge.net/
>  >
>  > Regards
>  >
>
>  --
>
>
> Linux-cluster mailing list
>  Linux-cluster at redhat.com
>  https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Jorge Palma Escobar
Ingeniero de Sistemas
Red Hat Linux Certified Engineer
Certificate N? 804005089418233



From jas199931 at yahoo.com  Fri May  2 01:43:30 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 1 May 2008 18:43:30 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
Message-ID: <353354.65091.qm@web32206.mail.mud.yahoo.com>

Hi, All:

I have downloaded "Programming Locking Applications"
written by Christine Caulfield from
http://sources.redhat.com/cluster/wiki/HomePage?action=AttachFile&do=view&target=rhdlmbook.pdf


I read it through, especially the DLM locking model.
It is very informative. Thanks Christine.

Now I have some questions about the lock resource and
wish to get answers from you.

1. Whether the kernel on each server/node is going to
initialize a number of empty lock resources after
completely rebooting the cluster? 

2. If so, what is the default value of the number of
empty lock resources? Is it configurable?

3. Whether the number of lock resources is fixed
regardless the load of the server?

4. If not, how the number of lock resources will be
expended under a heavy load?

5. The lock manager maintains a cluster-wide directory
of the locations of the master copy of all the lock
resources within the cluster and evenly divides the
content of the directory across all nodes. How can I
check the content held by a node (what command or
API)?

6. If only one node A is busy while other nodes are
idle all the time,  does it mean that the node A holds
a very big master copy of lock resources and other
nodes have nothing?

7. For the above case, what would be the content of
the cluster-wide directory? Only one entry as only the
node A is really doing IO, or many entries and the
number of entries is the same as the number of used
lock resources on the node A? If the latter case is
true, will the lock manager still divide the content
evenly to other nodes? If so, would it costs the node
A extra time on finding the location of the lock
resources, which is just on itself,  by messaging
other nodes?


If you need more information from me in order to help
me, or if you think my questions are not clear, please
kindly let me know.

Thank you very much in advance and look forward to
hearing from you.

Jas



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From ccaulfie at redhat.com  Fri May  2 07:35:32 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 02 May 2008 08:35:32 +0100
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <353354.65091.qm@web32206.mail.mud.yahoo.com>
References: <353354.65091.qm@web32206.mail.mud.yahoo.com>
Message-ID: <481AC444.5080709@redhat.com>

Ja S wrote:
> Hi, All:
> 
> I have downloaded "Programming Locking Applications"
> written by Christine Caulfield from
> http://sources.redhat.com/cluster/wiki/HomePage?action=AttachFile&do=view&target=rhdlmbook.pdf
> 
> 
> I read it through, especially the DLM locking model.
> It is very informative. Thanks Christine.
> 
> Now I have some questions about the lock resource and
> wish to get answers from you.
> 
> 1. Whether the kernel on each server/node is going to
> initialize a number of empty lock resources after
> completely rebooting the cluster? 
> 
> 2. If so, what is the default value of the number of
> empty lock resources? Is it configurable?

There is no such thing as an "empty" lock resource. Lock resources are
allocated from kernel memory as required. That does mean that the number
of resources that can be held on a node is limited by the amount of
physical memory in the system. I think this addresses 3 & 4.

> 3. Whether the number of lock resources is fixed
> regardless the load of the server?
> 
> 4. If not, how the number of lock resources will be
> expended under a heavy load?
> 
> 5. The lock manager maintains a cluster-wide directory
> of the locations of the master copy of all the lock
> resources within the cluster and evenly divides the
> content of the directory across all nodes. How can I
> check the content held by a node (what command or
> API)?

On RHEL4 (cluster 1) systems the lock directory is viewable in
/proc/cluster/dlm_dir. I don't think there is currently any equivalent
in RHEL5 (cluster 2)

> 6. If only one node A is busy while other nodes are
> idle all the time,  does it mean that the node A holds
> a very big master copy of lock resources and other
> nodes have nothing?

That's correct. There is no point in mastering locks on a remote node as
it will just slow access down for the only node using those locks.

> 7. For the above case, what would be the content of
> the cluster-wide directory? Only one entry as only the
> node A is really doing IO, or many entries and the
> number of entries is the same as the number of used
> lock resources on the node A? If the latter case is
> true, will the lock manager still divide the content
> evenly to other nodes? If so, would it costs the node
> A extra time on finding the location of the lock
> resources, which is just on itself,  by messaging
> other nodes?

You're correct that the lock directory will still be distributed around
the cluster in this case and that it causes network traffic. It isn't a
lot of network traffic (and there needs to be some way of determining
where a resource is mastered; a node does not know, initially, if it is
the only node that is using a resource). That lookup only happens the
first time a resource is used by a node, once the node knows where the
master is, it does not need to look it up again, unless it releases all
locks on the resource.



I hope this helps,

-- 

Chrissie



From jas199931 at yahoo.com  Fri May  2 12:23:16 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 2 May 2008 05:23:16 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <481AC444.5080709@redhat.com>
Message-ID: <364006.18824.qm@web32202.mail.mud.yahoo.com>

Hi, Christine:

Really appreciate your prompt and kind reply. 

I have some further questions.


> > 
> > 1. Whether the kernel on each server/node is going
> to
> > initialize a number of empty lock resources after
> > completely rebooting the cluster? 
> > 
> > 2. If so, what is the default value of the number
> of
> > empty lock resources? Is it configurable?
> 
> There is no such thing as an "empty" lock resource.
> Lock resources are
> allocated from kernel memory as required. That does
> mean that the number
> of resources that can be held on a node is limited
> by the amount of
> physical memory in the system. 

Does it mean the cache allocated for disk IO will be
reduced to meet the need of more lock resources? 

If so, for an extremely busy node, when reducing the
cache, the physical disk IO will increase, which in
turn increases the processing time (as disk IO is much
slower than accessing cache), which then in turn
increases the period of holding the lock resources,
which in turn makes the kernel grab more memory space
that should be used for cache in order to create new
lock resources for new requests, and on and on, and
eventually ends up to a no-cache situtation at all.
Would this case ever happen?


> I think this addresses 3 & 4.

Yes, your answer does address them. Thank you.
However, what will happen if an extremely busy
application needs to write more new files thus the
kernel needs to allocate more lock resources but the
physical memory limit has been reached and all
existing lock resources cannot be released? I guess
the kernel will simply force the application go into
an uninterruptable sleep until some lock resources are
released or some memories are freed. Am I right?



> > 3. Whether the number of lock resources is fixed
> > regardless the load of the server?
> > 
> > 4. If not, how the number of lock resources will
> be
> > expended under a heavy load?
> > 
> > 5. The lock manager maintains a cluster-wide
> directory
> > of the locations of the master copy of all the
> lock
> > resources within the cluster and evenly divides
> the
> > content of the directory across all nodes. How can
> I
> > check the content held by a node (what command or
> > API)?
> 
> On RHEL4 (cluster 1) systems the lock directory is
> viewable in
> /proc/cluster/dlm_dir. I don't think there is
> currently any equivalent
> in RHEL5 (cluster 2)

Thanks. Very helpful. From the busiest node A the
first several lines of dlm_dir are below. How to
interpret them, please? 

DLM lockspace 'data'
       5         2f06768 1
       5          114d15 1
       5          120b13 1
       5         5bd1f04 1
       3          6a02f8 2
       5          cb7604 1
       5          ca187b 1


Also there are many files under /proc/cluster, Could
you please direct me to a place where I can find the
usages of these files and descriptions of their
content? 


> > 6. If only one node A is busy while other nodes
> are
> > idle all the time,  does it mean that the node A
> holds
> > a very big master copy of lock resources and other
> > nodes have nothing?
> 
> That's correct. There is no point in mastering locks
> on a remote node as
> it will just slow access down for the only node
> using those locks.
> 
> > 7. For the above case, what would be the content
> of
> > the cluster-wide directory? Only one entry as only
> the
> > node A is really doing IO, or many entries and the
> > number of entries is the same as the number of
> used
> > lock resources on the node A? If the latter case
> is
> > true, will the lock manager still divide the
> content
> > evenly to other nodes? If so, would it costs the
> node
> > A extra time on finding the location of the lock
> > resources, which is just on itself,  by messaging
> > other nodes?
> 
> You're correct that the lock directory will still be
> distributed around
> the cluster in this case and that it causes network
> traffic. It isn't a
> lot of network traffic (and there needs to be some
> way of determining
> where a resource is mastered; a node does not know,
> initially, if it is
> the only node that is using a resource). 



> That lookup only happens the first time
> a resource is used by a node, once the
> node knows where the master is, 
> it does not need to look it up again,
> unless it releases all
> locks on the resource.
> 

Oh, I see. Just to further clarify, does it means if
the same lock resource is required again by an
application on the node A, the node A will go straight
to the known node (ie the node B) which holds the
master previously, but needs to lookup again if the
node B has already released the lock resource?

> 
> 
> I hope this helps,
> 
Yes, yes, very helpful. Thank you very much indeed.

Wish to receive your kind reply again.

Jas


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From ccaulfie at redhat.com  Fri May  2 12:41:00 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 02 May 2008 13:41:00 +0100
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <364006.18824.qm@web32202.mail.mud.yahoo.com>
References: <364006.18824.qm@web32202.mail.mud.yahoo.com>
Message-ID: <481B0BDC.1000105@redhat.com>

Ja S wrote:
> Hi, Christine:
> 
> Really appreciate your prompt and kind reply. 
> 
> I have some further questions.
> 
> 
>>> 1. Whether the kernel on each server/node is going
>> to
>>> initialize a number of empty lock resources after
>>> completely rebooting the cluster? 
>>>
>>> 2. If so, what is the default value of the number
>> of
>>> empty lock resources? Is it configurable?
>> There is no such thing as an "empty" lock resource.
>> Lock resources are
>> allocated from kernel memory as required. That does
>> mean that the number
>> of resources that can be held on a node is limited
>> by the amount of
>> physical memory in the system. 
> 
> Does it mean the cache allocated for disk IO will be
> reduced to meet the need of more lock resources? 
> 
> If so, for an extremely busy node, when reducing the
> cache, the physical disk IO will increase, which in
> turn increases the processing time (as disk IO is much
> slower than accessing cache), which then in turn
> increases the period of holding the lock resources,
> which in turn makes the kernel grab more memory space
> that should be used for cache in order to create new
> lock resources for new requests, and on and on, and
> eventually ends up to a no-cache situtation at all.
> Would this case ever happen?

I suppose it could happen, yes. There are tuning values for GFS you can
use to make it flush unused locks more frequently but if the locks are
needed then they are needed!

> 
>> I think this addresses 3 & 4.
> 
> Yes, your answer does address them. Thank you.
> However, what will happen if an extremely busy
> application needs to write more new files thus the
> kernel needs to allocate more lock resources but the
> physical memory limit has been reached and all
> existing lock resources cannot be released? I guess
> the kernel will simply force the application go into
> an uninterruptable sleep until some lock resources are
> released or some memories are freed. Am I right?

I think so yes. The VMM is not my speciality

> 
> 
>>> 3. Whether the number of lock resources is fixed
>>> regardless the load of the server?
>>>
>>> 4. If not, how the number of lock resources will
>> be
>>> expended under a heavy load?
>>>
>>> 5. The lock manager maintains a cluster-wide
>> directory
>>> of the locations of the master copy of all the
>> lock
>>> resources within the cluster and evenly divides
>> the
>>> content of the directory across all nodes. How can
>> I
>>> check the content held by a node (what command or
>>> API)?
>> On RHEL4 (cluster 1) systems the lock directory is
>> viewable in
>> /proc/cluster/dlm_dir. I don't think there is
>> currently any equivalent
>> in RHEL5 (cluster 2)
> 
> Thanks. Very helpful. From the busiest node A the
> first several lines of dlm_dir are below. How to
> interpret them, please? 
> 
> DLM lockspace 'data'
>        5         2f06768 1
>        5          114d15 1
>        5          120b13 1
>        5         5bd1f04 1
>        3          6a02f8 2
>        5          cb7604 1
>        5          ca187b 1
> 

The first two numbers are the lock name. Don't ask me what they mean,
that's a GFS question! (actually, I think inode numbers might be
involved) The last number is the nodeID on which the lock is mastered.

> Also there are many files under /proc/cluster, Could
> you please direct me to a place where I can find the
> usages of these files and descriptions of their
> content? 

They are not well documented. Mainly because they are subject to change
and are not a recognised API. Maybe something could be put onto the
cluster wiki at some point.

>>> 6. If only one node A is busy while other nodes
>> are
>>> idle all the time,  does it mean that the node A
>> holds
>>> a very big master copy of lock resources and other
>>> nodes have nothing?
>> That's correct. There is no point in mastering locks
>> on a remote node as
>> it will just slow access down for the only node
>> using those locks.
>>
>>> 7. For the above case, what would be the content
>> of
>>> the cluster-wide directory? Only one entry as only
>> the
>>> node A is really doing IO, or many entries and the
>>> number of entries is the same as the number of
>> used
>>> lock resources on the node A? If the latter case
>> is
>>> true, will the lock manager still divide the
>> content
>>> evenly to other nodes? If so, would it costs the
>> node
>>> A extra time on finding the location of the lock
>>> resources, which is just on itself,  by messaging
>>> other nodes?
>> You're correct that the lock directory will still be
>> distributed around
>> the cluster in this case and that it causes network
>> traffic. It isn't a
>> lot of network traffic (and there needs to be some
>> way of determining
>> where a resource is mastered; a node does not know,
>> initially, if it is
>> the only node that is using a resource). 
> 
> 
> 
>> That lookup only happens the first time
>> a resource is used by a node, once the
>> node knows where the master is, 
>> it does not need to look it up again,
>> unless it releases all
>> locks on the resource.
>>
> 
> Oh, I see. Just to further clarify, does it means if
> the same lock resource is required again by an
> application on the node A, the node A will go straight
> to the known node (ie the node B) which holds the
> master previously, but needs to lookup again if the
> node B has already released the lock resource?

Not quite. A resource is mastered on a node for as long as there are
locks for it. If node A gets the lock (which is mastered on node B) then
it knows always to go do node B until all locks on node A are released.
When that happens the local copy of the resource on node A is released
including the reference to node B. If all the locks on node B are
released (but A still has some) then the resource will stay mastered on
node B and nodes that still have locks on that resource will know where
to find it without a directory lookup.



-- 

Chrissie



From jas199931 at yahoo.com  Fri May  2 13:25:38 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 2 May 2008 06:25:38 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <481B0BDC.1000105@redhat.com>
Message-ID: <557151.79920.qm@web32204.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:


> > DLM lockspace 'data'
> >        5         2f06768 1
> >        5          114d15 1
> >        5          120b13 1
> >        5         5bd1f04 1
> >        3          6a02f8 2
> >        5          cb7604 1
> >        5          ca187b 1
> > 
> 
> The first two numbers are the lock name. Don't ask
> me what they mean,
> that's a GFS question! (actually, I think inode
> numbers might be
> involved) The last number is the nodeID on which the
> lock is mastered.


Great, thanks again!


> >> That lookup only happens the first time
> >> a resource is used by a node, once the
> >> node knows where the master is, 
> >> it does not need to look it up again,
> >> unless it releases all
> >> locks on the resource.
> >>
> > 
> > Oh, I see. Just to further clarify, does it means
> if
> > the same lock resource is required again by an
> > application on the node A, the node A will go
> straight
> > to the known node (ie the node B) which holds the
> > master previously, but needs to lookup again if
> the
> > node B has already released the lock resource?
> 
> Not quite. A resource is mastered on a node for as
> long as there are
> locks for it. If node A gets the lock (which is
> mastered on node B) then
> it knows always to go do node B until all locks on
> node A are released.
> When that happens the local copy of the resource on
> node A is released
> including the reference to node B. If all the locks
> on node B are
> released (but A still has some) then the resource
> will stay mastered on
> node B and nodes that still have locks on that
> resource will know where
> to find it without a directory lookup.
> 

Aha, I think I missed another important concept -- a
local copy of lock resources. I did not realise the
existence of the local copy of lock resources. Which
file should I check to figure out how many local
copies a node has and what the local copies are? 

Many thanks again, you have been very helpful.

Jas


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From ccaulfie at redhat.com  Fri May  2 13:33:52 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 02 May 2008 14:33:52 +0100
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <557151.79920.qm@web32204.mail.mud.yahoo.com>
References: <557151.79920.qm@web32204.mail.mud.yahoo.com>
Message-ID: <481B1840.8020907@redhat.com>

Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
> 
>>> DLM lockspace 'data'
>>>        5         2f06768 1
>>>        5          114d15 1
>>>        5          120b13 1
>>>        5         5bd1f04 1
>>>        3          6a02f8 2
>>>        5          cb7604 1
>>>        5          ca187b 1
>>>
>> The first two numbers are the lock name. Don't ask
>> me what they mean,
>> that's a GFS question! (actually, I think inode
>> numbers might be
>> involved) The last number is the nodeID on which the
>> lock is mastered.
> 
> 
> Great, thanks again!
> 
> 
>>>> That lookup only happens the first time
>>>> a resource is used by a node, once the
>>>> node knows where the master is, 
>>>> it does not need to look it up again,
>>>> unless it releases all
>>>> locks on the resource.
>>>>
>>> Oh, I see. Just to further clarify, does it means
>> if
>>> the same lock resource is required again by an
>>> application on the node A, the node A will go
>> straight
>>> to the known node (ie the node B) which holds the
>>> master previously, but needs to lookup again if
>> the
>>> node B has already released the lock resource?
>> Not quite. A resource is mastered on a node for as
>> long as there are
>> locks for it. If node A gets the lock (which is
>> mastered on node B) then
>> it knows always to go do node B until all locks on
>> node A are released.
>> When that happens the local copy of the resource on
>> node A is released
>> including the reference to node B. If all the locks
>> on node B are
>> released (but A still has some) then the resource
>> will stay mastered on
>> node B and nodes that still have locks on that
>> resource will know where
>> to find it without a directory lookup.
>>
> 
> Aha, I think I missed another important concept -- a
> local copy of lock resources. I did not realise the
> existence of the local copy of lock resources. Which
> file should I check to figure out how many local
> copies a node has and what the local copies are? 

All the locks are displayed in /proc/cluster/dlm_locks, that shows you
which are local copies and which are masters.

-- 

Chrissie



From jas199931 at yahoo.com  Fri May  2 13:48:38 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 2 May 2008 06:48:38 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <481B1840.8020907@redhat.com>
Message-ID: <555963.87870.qm@web32202.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > --- Christine Caulfield <ccaulfie at redhat.com>
> wrote:
> > 
> > 
> >>> DLM lockspace 'data'
> >>>        5         2f06768 1
> >>>        5          114d15 1
> >>>        5          120b13 1
> >>>        5         5bd1f04 1
> >>>        3          6a02f8 2
> >>>        5          cb7604 1
> >>>        5          ca187b 1
> >>>
> >> The first two numbers are the lock name. Don't
> ask
> >> me what they mean,
> >> that's a GFS question! (actually, I think inode
> >> numbers might be
> >> involved) The last number is the nodeID on which
> the
> >> lock is mastered.
> > 
> > 
> > Great, thanks again!
> > 
> > 
> >>>> That lookup only happens the first time
> >>>> a resource is used by a node, once the
> >>>> node knows where the master is, 
> >>>> it does not need to look it up again,
> >>>> unless it releases all
> >>>> locks on the resource.
> >>>>
> >>> Oh, I see. Just to further clarify, does it
> means
> >> if
> >>> the same lock resource is required again by an
> >>> application on the node A, the node A will go
> >> straight
> >>> to the known node (ie the node B) which holds
> the
> >>> master previously, but needs to lookup again if
> >> the
> >>> node B has already released the lock resource?
> >> Not quite. A resource is mastered on a node for
> as
> >> long as there are
> >> locks for it. If node A gets the lock (which is
> >> mastered on node B) then
> >> it knows always to go do node B until all locks
> on
> >> node A are released.
> >> When that happens the local copy of the resource
> on
> >> node A is released
> >> including the reference to node B. If all the
> locks
> >> on node B are
> >> released (but A still has some) then the resource
> >> will stay mastered on
> >> node B and nodes that still have locks on that
> >> resource will know where
> >> to find it without a directory lookup.
> >>
> > 
> > Aha, I think I missed another important concept --
> a
> > local copy of lock resources. I did not realise
> the
> > existence of the local copy of lock resources.
> Which
> > file should I check to figure out how many local
> > copies a node has and what the local copies are? 
> 
> All the locks are displayed in
> /proc/cluster/dlm_locks, that shows you
> which are local copies and which are masters.


Fantastic !

Thank you very much once more.

Jas




      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From oliveiros.cristina at gmail.com  Sun May  4 22:33:34 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Sun, 4 May 2008 23:33:34 +0100
Subject: [Linux-cluster] GFS on fedora
Message-ID: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>

Howdy List,
I would like to install gfs on a two node cluster running both fedora 8.

Can anyone please kindly supply me with some links for the procedure?

Which packages are needed, where to get them, that sort of things.

I've already googled up and down a little but I couldn't find no
rigourous information on this, or maybe I am just blind :-)

Thanks a lot in advance

Best,
Oliveiros
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080504/bf291227/attachment.htm>

From gordan at bobich.net  Sun May  4 23:22:44 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Mon, 05 May 2008 00:22:44 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
Message-ID: <481E4544.1020301@bobich.net>

Oliveiros Cristina wrote:
> Howdy List,
> I would like to install gfs on a two node cluster running both fedora 8.
> 
> Can anyone please kindly supply me with some links for the procedure?

First part of the procedure is to not use FC if you plan for this to be 
useful. FC7+ comes only with GFS2. There are no GFS1 packages included, 
  and GFS2 isn't stable yet.

> Which packages are needed, where to get them, that sort of things.

cman
openais
gfs-utils
kmod-gfs
rgmanager

Can't remember if there may be more.

> I've already googled up and down a little but I couldn't find no
> rigourous information on this, or maybe I am just blind :-)

This is probably a not a bad place to start:
https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS#section-GFS-Documentation

Gordan



From jas199931 at yahoo.com  Sun May  4 23:27:36 2008
From: jas199931 at yahoo.com (Ja S)
Date: Sun, 4 May 2008 16:27:36 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <481B1840.8020907@redhat.com>
Message-ID: <853958.85045.qm@web32207.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > --- Christine Caulfield <ccaulfie at redhat.com>
> wrote:
> > 
> > 
> >>> DLM lockspace 'data'
> >>>        5         2f06768 1
> >>>        5          114d15 1
> >>>        5          120b13 1
> >>>        5         5bd1f04 1
> >>>        3          6a02f8 2
> >>>        5          cb7604 1
> >>>        5          ca187b 1
> >>>
> >> The first two numbers are the lock name. Don't
> ask
> >> me what they mean,
> >> that's a GFS question! (actually, I think inode
> >> numbers might be
> >> involved) The last number is the nodeID on which
> the
> >> lock is mastered.
> > 
> > 
> > Great, thanks again!
> > 
> > 
> >>>> That lookup only happens the first time
> >>>> a resource is used by a node, once the
> >>>> node knows where the master is, 
> >>>> it does not need to look it up again,
> >>>> unless it releases all
> >>>> locks on the resource.
> >>>>
> >>> Oh, I see. Just to further clarify, does it
> means
> >> if
> >>> the same lock resource is required again by an
> >>> application on the node A, the node A will go
> >> straight
> >>> to the known node (ie the node B) which holds
> the
> >>> master previously, but needs to lookup again if
> >> the
> >>> node B has already released the lock resource?
> >> Not quite. A resource is mastered on a node for
> as
> >> long as there are
> >> locks for it. If node A gets the lock (which is
> >> mastered on node B) then
> >> it knows always to go do node B until all locks
> on
> >> node A are released.
> >> When that happens the local copy of the resource
> on
> >> node A is released
> >> including the reference to node B. If all the
> locks
> >> on node B are
> >> released (but A still has some) then the resource
> >> will stay mastered on
> >> node B and nodes that still have locks on that
> >> resource will know where
> >> to find it without a directory lookup.
> >>
> > 
> > Aha, I think I missed another important concept --
> a
> > local copy of lock resources. I did not realise
> the
> > existence of the local copy of lock resources.
> Which
> > file should I check to figure out how many local
> > copies a node has and what the local copies are? 
> 
> All the locks are displayed in
> /proc/cluster/dlm_locks, that shows you
> which are local copies and which are masters.


A couple of further questions about the master copy of
lock resources.

The first one:
=============

Again, assume:
1) Node A is extremely too busy and handle all
requests
2) other nodes are just idle and have never handled
any requests

According to the documents, Node A will hold all
master copies initially. The thing I am not aware of
and unclear is whether the lock manager will evenly
distribute the master copies on Node A to other nodes
when it thinks the number of master copies on Node A
is too many?


The second one:
==============

Assume a master copy of lock resource is on Node A.
Now Node B holds a local copy of the lock resource.
When the lock queues changed on the local copy on Node
B, will the master copy on Node A be updated
simultaneously? If so, when more than one nodes have
the local copy of the same lock resource, how the lock
manager to handle the update of the master copy? Using
another lock mechanism to prevent the corruption of
the master copy?


Thanks again in advance.

Jas
 


> -- 
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From oliveiros.cristina at gmail.com  Sun May  4 23:36:04 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Mon, 5 May 2008 00:36:04 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <481E4544.1020301@bobich.net>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
	<481E4544.1020301@bobich.net>
Message-ID: <f54607780805041636q1847f313r922180105c8268a3@mail.gmail.com>

Hello, Gordan,
Thank you for your e-mail.


*"First part of the procedure is to not use FC if you plan for this to be
useful"

*By this you mean that it is not a good idea to install it on FC? Is GFS
somewhat RH oriented?
I chose FC because I am not familiar with rh and I've read somewhere that
gfs would work on fc

Thank you for the package names and link

Best,
Oliveiros

2008/5/5 Gordan Bobic <gordan at bobich.net>:

> Oliveiros Cristina wrote:
>
> > Howdy List,
> > I would like to install gfs on a two node cluster running both fedora 8.
> >
> > Can anyone please kindly supply me with some links for the procedure?
> >
>
> First part of the procedure is to not use FC if you plan for this to be
> useful. FC7+ comes only with GFS2. There are no GFS1 packages included,  and
> GFS2 isn't stable yet.
>
>  Which packages are needed, where to get them, that sort of things.
> >
>
> cman
> openais
> gfs-utils
> kmod-gfs
> rgmanager
>
> Can't remember if there may be more.
>
>  I've already googled up and down a little but I couldn't find no
> > rigourous information on this, or maybe I am just blind :-)
> >
>
> This is probably a not a bad place to start:
>
> https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS#section-GFS-Documentation
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080505/44ce705c/attachment.htm>

From jas199931 at yahoo.com  Mon May  5 00:28:29 2008
From: jas199931 at yahoo.com (Ja S)
Date: Sun, 4 May 2008 17:28:29 -0700 (PDT)
Subject: [Linux-cluster] An odd problem may be related to GFS+DLM
Message-ID: <897999.3920.qm@web32203.mail.mud.yahoo.com>

Hi, All:

We realised a problem and suspected that the problem
might be related to GFS and DLM. Therefore, I am
sending the email to this group. If you think my
problem is irrelevant, please forgive me.

=========================

We have a SAN environment, where 5 nodes running RHEL
v4u4 and Redhat Cluster Suite connected to EMC
AX150SCi iSCSI RAID storage (GFS+DLM, RAID10)

We have a subdirectory on the storage and we are sure
that no applications on these five nodes know the
existence of the subdirectory. In other words, the
subdirectory should be free of lock but its parent
directories may have locks. The subdirectory holds
more than 31700 small files and the total size of
these files is about 4.3G. Within these 31700 files,
about 1/3 of them are symbolic links pointing to other
files at the same subdirectory.

The subdirectory stat is:
  File: `abc'
  Size: 8192            Blocks: 6024       IO Block:
4096   directory
Device: fc00h/64512d    Inode: 1065226     Links: 2
Access: (0770/drwxrwx---)  Uid: (    0/    root)  
Gid: (    0/    root)
Access: 2008-05-04 22:53:39.000000000 +0000
Modify: 2008-04-15 03:02:24.000000000 +0000
Change: 2008-04-15 07:11:52.000000000 +0000


Now, when I tried to ls the subdirectory from an idle
node, it took ages to output the information. I then
timed the ls command, and the results were shocking. 

# time ls -la > /dev/null

real    3m5.249s
user    0m0.628s
sys     0m5.137s

As I said that the node I used to access the
subdirectory was completely idle, what could cause the
long delay?

We asked EMC to check the hardware (including the
controller and hard drives) and was reported that
there was no problem at all.

Therefore, I would like to seek your kind answers to
the following questions:

Is the problem related to GFS and DLM? I heard GFS is
not suitable for many small files. Is that true? Is
the delay caused by locks applied to its parent
directories? Which direction should I go to figure out
what is happening and what is the underlying reason?


Thanks for your time and look forward to your reply.

Jas


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From gordan at bobich.net  Mon May  5 01:07:26 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Mon, 05 May 2008 02:07:26 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <f54607780805041636q1847f313r922180105c8268a3@mail.gmail.com>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>	<481E4544.1020301@bobich.net>
	<f54607780805041636q1847f313r922180105c8268a3@mail.gmail.com>
Message-ID: <481E5DCE.5040206@bobich.net>

Oliveiros Cristina wrote:
> /"First part of the procedure is to not use FC if you plan for this to 
> be useful"
> 
> /By this you mean that it is not a good idea to install it on FC? Is GFS 
> somewhat RH oriented?

FC is effectively RedHat alpha. There is no structural or organizational 
difference between them. The differences are in stability and the amount 
of testing that goes into things.

GFS (and RedHat Cluster Services which GFS is a part of) will run on any 
distribution, of course - it's just that you may have to build the 
correct stable packages from source, which seems pointless when you can 
have something that just works already. It's down to personal preference.

> I chose FC because I am not familiar with rh and I've read somewhere 
> that gfs would work on fc

It'll work, but running FC in a production environment is asking for 
trouble. You might as well run it on Gentoo and custom compile 
everything from bleeding edge sources, but it isn't going to help you 
achieve a stable system that has been tested by someone else other than 
just you.

Gordan



From oliveiros.cristina at gmail.com  Mon May  5 09:46:32 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Mon, 5 May 2008 10:46:32 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <481E5DCE.5040206@bobich.net>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
	<481E4544.1020301@bobich.net>
	<f54607780805041636q1847f313r922180105c8268a3@mail.gmail.com>
	<481E5DCE.5040206@bobich.net>
Message-ID: <f54607780805050246q4f5fb8d6yc5bc2b5a318fd6c2@mail.gmail.com>

Hello again , Gordan.

I understand what you explained.
But, actually, I don't want to run it on a production environment.
It is mainly for testing purposes, it's part of a work for university.

And , could you please tell me where can I download the source tree ?
I will need to read the code.

Thanks you for your help and thoughful considerations.

All The Best,
Oliveiros


2008/5/5 Gordan Bobic <gordan at bobich.net>:

> Oliveiros Cristina wrote:
>
> > /"First part of the procedure is to not use FC if you plan for this to
> > be useful"
> >
> > /By this you mean that it is not a good idea to install it on FC? Is GFS
> > somewhat RH oriented?
> >
>
> FC is effectively RedHat alpha. There is no structural or organizational
> difference between them. The differences are in stability and the amount of
> testing that goes into things.
>
> GFS (and RedHat Cluster Services which GFS is a part of) will run on any
> distribution, of course - it's just that you may have to build the correct
> stable packages from source, which seems pointless when you can have
> something that just works already. It's down to personal preference.
>
>  I chose FC because I am not familiar with rh and I've read somewhere that
> > gfs would work on fc
> >
>
> It'll work, but running FC in a production environment is asking for
> trouble. You might as well run it on Gentoo and custom compile everything
> from bleeding edge sources, but it isn't going to help you achieve a stable
> system that has been tested by someone else other than just you.
>
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080505/28912bf6/attachment.htm>

From sghosh at redhat.com  Mon May  5 13:56:25 2008
From: sghosh at redhat.com (Subhendu Ghosh)
Date: Mon, 05 May 2008 09:56:25 -0400
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <f54607780805050246q4f5fb8d6yc5bc2b5a318fd6c2@mail.gmail.com>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>	<481E4544.1020301@bobich.net>	<f54607780805041636q1847f313r922180105c8268a3@mail.gmail.com>	<481E5DCE.5040206@bobich.net>
	<f54607780805050246q4f5fb8d6yc5bc2b5a318fd6c2@mail.gmail.com>
Message-ID: <481F1209.1080109@redhat.com>

Source tree is available at:
http://sources.redhat.com/cluster/wiki/

There are at least 4 major branches that are being maintained - roughly 
equivalent to RHEL 3, 4, 5 and devel

-regards
Subhendu

Oliveiros Cristina wrote:
> Hello again , Gordan.
> 
> I understand what you explained.
> But, actually, I don't want to run it on a production environment.
> It is mainly for testing purposes, it's part of a work for university.
> 
> And , could you please tell me where can I download the source tree ?
> I will need to read the code.
> 
> Thanks you for your help and thoughful considerations.
> 
> All The Best,
> Oliveiros
> 
> 
> 2008/5/5 Gordan Bobic <gordan at bobich.net <mailto:gordan at bobich.net>>:
> 
>     Oliveiros Cristina wrote:
> 
>         /"First part of the procedure is to not use FC if you plan for
>         this to be useful"
> 
> 
>         /By this you mean that it is not a good idea to install it on
>         FC? Is GFS somewhat RH oriented?
> 
> 
>     FC is effectively RedHat alpha. There is no structural or
>     organizational difference between them. The differences are in
>     stability and the amount of testing that goes into things.
> 
>     GFS (and RedHat Cluster Services which GFS is a part of) will run on
>     any distribution, of course - it's just that you may have to build
>     the correct stable packages from source, which seems pointless when
>     you can have something that just works already. It's down to
>     personal preference.
> 
> 
>         I chose FC because I am not familiar with rh and I've read
>         somewhere that gfs would work on fc
> 
> 
>     It'll work, but running FC in a production environment is asking for
>     trouble. You might as well run it on Gentoo and custom compile
>     everything from bleeding edge sources, but it isn't going to help
>     you achieve a stable system that has been tested by someone else
>     other than just you.
> 
> 
>     Gordan
> 
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Red Hat Summit Boston |  June 18-20, 2008
Learn more: http://www.redhat.com/summit

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sghosh.vcf
Type: text/x-vcard
Size: 266 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080505/0b5c58cf/attachment.vcf>

From underscore_dot at yahoo.com  Mon May  5 18:54:39 2008
From: underscore_dot at yahoo.com (nch)
Date: Mon, 5 May 2008 11:54:39 -0700 (PDT)
Subject: [Linux-cluster] GFS on fedora
Message-ID: <422308.31579.qm@web32401.mail.mud.yahoo.com>

see the docs section
ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.02.tar.gz

cheers


----- Original Message ----
From: Oliveiros Cristina <oliveiros.cristina at gmail.com>
To: Linux-cluster at redhat.com
Sent: Monday, May 5, 2008 12:33:34 AM
Subject: [Linux-cluster] GFS on fedora

Howdy List,
I would like to install gfs on a two node cluster running both fedora 8.

Can anyone please kindly supply me with some links for the procedure?

Which packages are needed, where to get them, that sort of things.

I've already googled up and down a little but I couldn't find no
rigourous information on this, or maybe I am just blind :-)

Thanks a lot in advance

Best,
Oliveiros



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080505/2c809723/attachment.htm>

From raycharles_man at yahoo.com  Mon May  5 23:29:44 2008
From: raycharles_man at yahoo.com (Ray Charles)
Date: Mon, 5 May 2008 16:29:44 -0700 (PDT)
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <422308.31579.qm@web32401.mail.mud.yahoo.com>
Message-ID: <526617.75888.qm@web32105.mail.mud.yahoo.com>




Hi,

I'd like to add a word on choosing F8 for trying gfs.
A while back, could still be the case, gfs2-tools were
not as complete as they are on Centos-5. Specifically
it was the util to grow the file system that was not
working. So you may need to consider that if its still
not working.

-Ray 

--- nch <underscore_dot at yahoo.com> wrote:

> see the docs section
>
ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.02.tar.gz
> 
> cheers
> 
> 
> ----- Original Message ----
> From: Oliveiros Cristina
> <oliveiros.cristina at gmail.com>
> To: Linux-cluster at redhat.com
> Sent: Monday, May 5, 2008 12:33:34 AM
> Subject: [Linux-cluster] GFS on fedora
> 
> Howdy List,
> I would like to install gfs on a two node cluster
> running both fedora 8.
> 
> Can anyone please kindly supply me with some links
> for the procedure?
> 
> Which packages are needed, where to get them, that
> sort of things.
> 
> I've already googled up and down a little but I
> couldn't find no
> rigourous information on this, or maybe I am just
> blind :-)
> 
> Thanks a lot in advance
> 
> Best,
> Oliveiros
> 
> 
> 
>      
>
____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now. 
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ>
--
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From dhongqian at 163.com  Tue May  6 05:54:40 2008
From: dhongqian at 163.com (dhongqian)
Date: Tue, 6 May 2008 13:54:40 +0800
Subject: [Linux-cluster] Problem: can't write file via gfs2
Message-ID: <200805061354402181105@163.com>

I use 4 nodes cluster that all mount the same gfs2 storage.

On one node, I use dd write a 512000 byte file while on other node ,

I use the command 'ls -l' to see , the file only 3584 byte.

[root at nd11 mnt]# dd if=/dev/zero of=x count=1000
1000+0 records in
1000+0 records out
[root at nd11 mnt]# ll
total 501492
-rw-r--r--  1 root root 512000 May  5 23:55 x
-rw-r--r--  1 root root   3584 May  6  2008 xxx
 
[root at nd13 mnt]# ll
total 501492
-rw-r--r--  1 root root 3584 May  5  2008 x
-rw-r--r--  1 root root 3584 May  6  2008 xxx

       Thank you very much in advance and look forward to
hearing from you.

hongqian


2008-05-06 



hongqian 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/1152949d/attachment.htm>

From swhiteho at redhat.com  Tue May  6 07:31:05 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 06 May 2008 08:31:05 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <526617.75888.qm@web32105.mail.mud.yahoo.com>
References: <526617.75888.qm@web32105.mail.mud.yahoo.com>
Message-ID: <1210059065.3413.1.camel@localhost.localdomain>

Hi,

That bug has been fixed, along with others and the gfs2-utils in F-8 is
now the most uptodate code. Unfortunately cman still lags behind, but we
are working on that,

Steve.

On Mon, 2008-05-05 at 16:29 -0700, Ray Charles wrote:
> 
> 
> Hi,
> 
> I'd like to add a word on choosing F8 for trying gfs.
> A while back, could still be the case, gfs2-tools were
> not as complete as they are on Centos-5. Specifically
> it was the util to grow the file system that was not
> working. So you may need to consider that if its still
> not working.
> 
> -Ray 
> 
> --- nch <underscore_dot at yahoo.com> wrote:
> 
> > see the docs section
> >
> ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.02.tar.gz
> > 
> > cheers
> > 
> > 
> > ----- Original Message ----
> > From: Oliveiros Cristina
> > <oliveiros.cristina at gmail.com>
> > To: Linux-cluster at redhat.com
> > Sent: Monday, May 5, 2008 12:33:34 AM
> > Subject: [Linux-cluster] GFS on fedora
> > 
> > Howdy List,
> > I would like to install gfs on a two node cluster
> > running both fedora 8.
> > 
> > Can anyone please kindly supply me with some links
> > for the procedure?
> > 
> > Which packages are needed, where to get them, that
> > sort of things.
> > 
> > I've already googled up and down a little but I
> > couldn't find no
> > rigourous information on this, or maybe I am just
> > blind :-)
> > 
> > Thanks a lot in advance
> > 
> > Best,
> > Oliveiros
> > 
> > 
> > 
> >      
> >
> ____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ>
> --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From lists at tangent.co.za  Tue May  6 08:02:55 2008
From: lists at tangent.co.za (Chris Picton)
Date: Tue, 6 May 2008 08:02:55 +0000 (UTC)
Subject: [Linux-cluster] GFS vs GFS2
Message-ID: <fvp3be$er6$1@ger.gmane.org>

Hi All

I am investigating a new cluster installation.

Documentation from redhat indicates that GFS2 is not yet production 
ready.  Tests I have run show it is *much* faster that gfs for my 
workload.

Is GFS2 not production-ready due to lack of testing, or due to known bugs?

Any advice would be appreciated

Chris



From underscore_dot at yahoo.com  Tue May  6 10:29:00 2008
From: underscore_dot at yahoo.com (nch)
Date: Tue, 6 May 2008 03:29:00 -0700 (PDT)
Subject: [Linux-cluster] mounting as non root
Message-ID: <580317.8530.qm@web32404.mail.mud.yahoo.com>


Hi, there.
I can mount a gfs2 filesystem (gnbd) as root, but I'm having difficulties to mount it or, at least, giving write access to other users/groups.
Any ideas on how to do this?
Regards.



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/87fa6286/attachment.htm>

From suuuper at messinalug.org  Tue May  6 10:55:25 2008
From: suuuper at messinalug.org (Giovanni Mancuso)
Date: Tue, 06 May 2008 12:55:25 +0200
Subject: [Linux-cluster] Problem gfs2 and drbd
Message-ID: <4820391D.1070601@messinalug.org>

Hi to all,
I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/'
i have ls: /store/new/: Input/output error
and in my dmesg i have:

GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state !=
LM_ST_UNLOCKED" failed
GFS2: fsid=sophosha:gfs00.1:   function = gfs2_glock_drop_th, file =
fs/gfs2/glock.c, line = 963
  [<f8bce739>] gfs2_assert_warn_i+0x7e/0x108 [gfs2]
  [<f8bba6c0>] gfs2_glock_drop_th+0x83/0xfb [gfs2]
  [<f8bba842>] xmote_bh+0x10a/0x271 [gfs2]
  [<f8bbabbd>] run_queue+0xd4/0x26e [gfs2]
  [<f8bbb550>] glock_work_func+0x24/0x31 [gfs2]
  [<c04332dc>] run_workqueue+0x78/0xb5
  [<f8bbb52c>] glock_work_func+0x0/0x31 [gfs2]
  [<c0433b90>] worker_thread+0xd9/0x10d
  [<c04202b1>] default_wake_function+0x0/0xc
  [<c0433ab7>] worker_thread+0x0/0x10d
  [<c0435f65>] kthread+0xc0/0xeb
  [<c0435ea5>] kthread+0x0/0xeb
  [<c0405c3b>] kernel_thread_helper+0x7/0x10
  =======================

how can i solve it?

P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd.



From maciej.bogucki at artegence.com  Tue May  6 10:57:41 2008
From: maciej.bogucki at artegence.com (Maciej Bogucki)
Date: Tue, 06 May 2008 12:57:41 +0200
Subject: [Linux-cluster] mounting as non root
In-Reply-To: <580317.8530.qm@web32404.mail.mud.yahoo.com>
References: <580317.8530.qm@web32404.mail.mud.yahoo.com>
Message-ID: <482039A5.8070705@artegence.com>

nch napisa?(a):
> 
> Hi, there.
> I can mount a gfs2 filesystem (gnbd) as root, but I'm having
> difficulties to mount it or, at least, giving write access to other
> users/groups.
> Any ideas on how to do this?


It is from "man mount"

       (iii) Normally, only the superuser can mount file systems.
However, when fstab contains the user option on a line, anybody can
mount the corresponding system.

       Thus, given a line
              /dev/cdrom  /cd  iso9660  ro,user,noauto,unhide
       any user can mount the iso9660 file system found on his CDROM
using the command

Best Regards
Maciej Bogucki



From swhiteho at redhat.com  Tue May  6 11:00:44 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 06 May 2008 12:00:44 +0100
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <4820391D.1070601@messinalug.org>
References: <4820391D.1070601@messinalug.org>
Message-ID: <1210071644.3413.18.camel@localhost.localdomain>

Hi,

I've not seen that before. What version of GFS2 are you using and are
you using lock_nolock or lock_dlm?

Steve.

On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote:
> Hi to all,
> I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/'
> i have ls: /store/new/: Input/output error
> and in my dmesg i have:
> 
> GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state !=
> LM_ST_UNLOCKED" failed
> GFS2: fsid=sophosha:gfs00.1:   function = gfs2_glock_drop_th, file =
> fs/gfs2/glock.c, line = 963
>   [<f8bce739>] gfs2_assert_warn_i+0x7e/0x108 [gfs2]
>   [<f8bba6c0>] gfs2_glock_drop_th+0x83/0xfb [gfs2]
>   [<f8bba842>] xmote_bh+0x10a/0x271 [gfs2]
>   [<f8bbabbd>] run_queue+0xd4/0x26e [gfs2]
>   [<f8bbb550>] glock_work_func+0x24/0x31 [gfs2]
>   [<c04332dc>] run_workqueue+0x78/0xb5
>   [<f8bbb52c>] glock_work_func+0x0/0x31 [gfs2]
>   [<c0433b90>] worker_thread+0xd9/0x10d
>   [<c04202b1>] default_wake_function+0x0/0xc
>   [<c0433ab7>] worker_thread+0x0/0x10d
>   [<c0435f65>] kthread+0xc0/0xeb
>   [<c0435ea5>] kthread+0x0/0xeb
>   [<c0405c3b>] kernel_thread_helper+0x7/0x10
>   =======================
> 
> how can i solve it?
> 
> P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd.
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From suuuper at messinalug.org  Tue May  6 11:04:49 2008
From: suuuper at messinalug.org (Giovanni Mancuso)
Date: Tue, 06 May 2008 13:04:49 +0200
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <1210071644.3413.18.camel@localhost.localdomain>
References: <4820391D.1070601@messinalug.org>
	<1210071644.3413.18.camel@localhost.localdomain>
Message-ID: <48203B51.7030705@messinalug.org>

I use

lock_dlm

and my version of gfs2 is:
    GFS2 (built Oct 10 2007 16:34:59) installed

Thanks


Steven Whitehouse ha scritto:
> Hi,
>
> I've not seen that before. What version of GFS2 are you using and are
> you using lock_nolock or lock_dlm?
>
> Steve.
>
> On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote:
>   
>> Hi to all,
>> I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/'
>> i have ls: /store/new/: Input/output error
>> and in my dmesg i have:
>>
>> GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state !=
>> LM_ST_UNLOCKED" failed
>> GFS2: fsid=sophosha:gfs00.1:   function = gfs2_glock_drop_th, file =
>> fs/gfs2/glock.c, line = 963
>>   [<f8bce739>] gfs2_assert_warn_i+0x7e/0x108 [gfs2]
>>   [<f8bba6c0>] gfs2_glock_drop_th+0x83/0xfb [gfs2]
>>   [<f8bba842>] xmote_bh+0x10a/0x271 [gfs2]
>>   [<f8bbabbd>] run_queue+0xd4/0x26e [gfs2]
>>   [<f8bbb550>] glock_work_func+0x24/0x31 [gfs2]
>>   [<c04332dc>] run_workqueue+0x78/0xb5
>>   [<f8bbb52c>] glock_work_func+0x0/0x31 [gfs2]
>>   [<c0433b90>] worker_thread+0xd9/0x10d
>>   [<c04202b1>] default_wake_function+0x0/0xc
>>   [<c0433ab7>] worker_thread+0x0/0x10d
>>   [<c0435f65>] kthread+0xc0/0xeb
>>   [<c0435ea5>] kthread+0x0/0xeb
>>   [<c0405c3b>] kernel_thread_helper+0x7/0x10
>>   =======================
>>
>> how can i solve it?
>>
>> P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd.
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/193c1c41/attachment.htm>

From swhiteho at redhat.com  Tue May  6 11:06:38 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 06 May 2008 12:06:38 +0100
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <48203B51.7030705@messinalug.org>
References: <4820391D.1070601@messinalug.org>
	<1210071644.3413.18.camel@localhost.localdomain>
	<48203B51.7030705@messinalug.org>
Message-ID: <1210071998.3413.21.camel@localhost.localdomain>

Hi,

On Tue, 2008-05-06 at 13:04 +0200, Giovanni Mancuso wrote:
> I use 
> lock_dlm
> and my version of gfs2 is:
>     GFS2 (built Oct 10 2007 16:34:59) installed
> 
Built from what exactly? Linus' kernel tree? the -nmw git tree? Some
distribution or other?

I suspect that you probably need to upgrade to a newer kernel version
though given that date. Ideally as recent as possible,

Steve.


> Thanks
> 
> 
> Steven Whitehouse ha scritto: 
> > Hi,
> > 
> > I've not seen that before. What version of GFS2 are you using and are
> > you using lock_nolock or lock_dlm?
> > 
> > Steve.
> > 
> > On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote:
> >   
> > > Hi to all,
> > > I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/'
> > > i have ls: /store/new/: Input/output error
> > > and in my dmesg i have:
> > > 
> > > GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state !=
> > > LM_ST_UNLOCKED" failed
> > > GFS2: fsid=sophosha:gfs00.1:   function = gfs2_glock_drop_th, file =
> > > fs/gfs2/glock.c, line = 963
> > >   [<f8bce739>] gfs2_assert_warn_i+0x7e/0x108 [gfs2]
> > >   [<f8bba6c0>] gfs2_glock_drop_th+0x83/0xfb [gfs2]
> > >   [<f8bba842>] xmote_bh+0x10a/0x271 [gfs2]
> > >   [<f8bbabbd>] run_queue+0xd4/0x26e [gfs2]
> > >   [<f8bbb550>] glock_work_func+0x24/0x31 [gfs2]
> > >   [<c04332dc>] run_workqueue+0x78/0xb5
> > >   [<f8bbb52c>] glock_work_func+0x0/0x31 [gfs2]
> > >   [<c0433b90>] worker_thread+0xd9/0x10d
> > >   [<c04202b1>] default_wake_function+0x0/0xc
> > >   [<c0433ab7>] worker_thread+0x0/0x10d
> > >   [<c0435f65>] kthread+0xc0/0xeb
> > >   [<c0435ea5>] kthread+0x0/0xeb
> > >   [<c0405c3b>] kernel_thread_helper+0x7/0x10
> > >   =======================
> > > 
> > > how can i solve it?
> > > 
> > > P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd.
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > > https://www.redhat.com/mailman/listinfo/linux-cluster
> > >     
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >   
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From underscore_dot at yahoo.com  Tue May  6 12:32:39 2008
From: underscore_dot at yahoo.com (nch)
Date: Tue, 6 May 2008 05:32:39 -0700 (PDT)
Subject: [Linux-cluster] mounting as non root
Message-ID: <149194.48263.qm@web32401.mail.mud.yahoo.com>


  I tried that, unsuccessfully.
  The relevant line in my fstab is:
/dev/gnbd/disk  /mnt/shared    gfs2    user,noauto     0       0

  And this is the error msg when trying "mount /mnt/shared" as a non root user
/sbin/mount.gfs2: error mounting /dev/gnbd/disk on /mnt/shared: Operation not permitted

  Any suggestions?
  Lots of thanks.


----- Original Message ----
From: Maciej Bogucki <maciej.bogucki at artegence.com>
To: linux clustering <linux-cluster at redhat.com>
Sent: Tuesday, May 6, 2008 12:57:41 PM
Subject: Re: [Linux-cluster] mounting as non root

nch napisa?(a):
> 
> Hi, there.
> I can mount a gfs2 filesystem (gnbd) as root, but I'm having
> difficulties to mount it or, at least, giving write access to other
> users/groups.
> Any ideas on how to do this?


It is from "man mount"

       (iii) Normally, only the superuser can mount file systems.
However, when fstab contains the user option on a line, anybody can
mount the corresponding system.

       Thus, given a line
              /dev/cdrom  /cd  iso9660  ro,user,noauto,unhide
       any user can mount the iso9660 file system found on his CDROM
using the command

Best Regards
Maciej Bogucki

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/55baa7b3/attachment.htm>

From T.Kumar at alcoa.com  Tue May  6 13:40:28 2008
From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS))
Date: Tue, 6 May 2008 09:40:28 -0400
Subject: [Linux-cluster] lvextend error on Redhat-cluster suit 5.1
Message-ID: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com>



Linux hostname 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008
x86_64 x86_64 x86_64 GNU/Linux 

Red Hat Enterprise Linux Server release 5.1 (Tikanga) 

# lvextend -L +4000M /dev/vgec_rde0_pdb/lvol2 
Extending logical volume lvol2 to 63.91 GB 
Error locking on node xxxxxx: Volume group for uuid not found: 

CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl 
Error locking on node xxxxxx: Volume group for uuid not found: 

CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl 
Failed to suspend lvol2 


# vgdisplay -v vgec_rde0_pdb 
Using volume group(s) on command line 
Finding volume group "vgec_rde0_pdb" 
--- Volume group --- 
VG Name vgec_rde0_pdb 
System ID 
Format lvm2 
Metadata Areas 4 
Metadata Sequence No 9 
VG Access read/write 
VG Status resizable 
Clustered yes 
Shared no 
MAX LV 255 
Cur LV 7 
Open LV 7 
Max PV 150 
Cur PV 4 
Act PV 4 
VG Size 269.62 GB 
PE Size 32.00 MB 
Total PE 8628 
Alloc PE / Size 6752 / 211.00 GB 
Free PE / Size 1876 / 58.62 GB 
VG UUID CyPYYt-smPY-Fg2M-11gl-sWM2-OSzm-cVAbkm 


# lvdisplay -v /dev/vgec_rde0_pdb/lvol2 
Using logical volume(s) on command line 
--- Logical volume --- 
LV Name /dev/vgec_rde0_pdb/lvol2 
VG Name vgec_rde0_pdb 
LV UUID 05WEDG-VxER-xVhT-jHDI-l90y-jpjq-7urtPl 
LV Write Access read/write 
LV Status available 
# open 1 
LV Size 60.00 GB 
Current LE 1920 
Segments 1 
Allocation inherit 
Read ahead sectors 0 
Block device 253:15 


Let me know if you have any suggetion




From bkyoung at gmail.com  Tue May  6 14:02:19 2008
From: bkyoung at gmail.com (Brandon Young)
Date: Tue, 6 May 2008 09:02:19 -0500
Subject: [Linux-cluster] lvextend error on Redhat-cluster suit 5.1
In-Reply-To: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com>
References: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com>
Message-ID: <824ffea00805060702ycc37813ib4412b3169eac327@mail.gmail.com>

'partprobe' on each cluster node, and try restarting clvmd on each node.
Note that you should unmount the filesystem before restarting clvmd ...

On Tue, May 6, 2008 at 8:40 AM, Kumar, T Santhosh (TCS) <T.Kumar at alcoa.com>
wrote:

>
>
> Linux hostname 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008
> x86_64 x86_64 x86_64 GNU/Linux
>
> Red Hat Enterprise Linux Server release 5.1 (Tikanga)
>
> # lvextend -L +4000M /dev/vgec_rde0_pdb/lvol2
> Extending logical volume lvol2 to 63.91 GB
> Error locking on node xxxxxx: Volume group for uuid not found:
>
> CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl
> Error locking on node xxxxxx: Volume group for uuid not found:
>
> CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl
> Failed to suspend lvol2
>
>
> # vgdisplay -v vgec_rde0_pdb
> Using volume group(s) on command line
> Finding volume group "vgec_rde0_pdb"
> --- Volume group ---
> VG Name vgec_rde0_pdb
> System ID
> Format lvm2
> Metadata Areas 4
> Metadata Sequence No 9
> VG Access read/write
> VG Status resizable
> Clustered yes
> Shared no
> MAX LV 255
> Cur LV 7
> Open LV 7
> Max PV 150
> Cur PV 4
> Act PV 4
> VG Size 269.62 GB
> PE Size 32.00 MB
> Total PE 8628
> Alloc PE / Size 6752 / 211.00 GB
> Free PE / Size 1876 / 58.62 GB
> VG UUID CyPYYt-smPY-Fg2M-11gl-sWM2-OSzm-cVAbkm
>
>
> # lvdisplay -v /dev/vgec_rde0_pdb/lvol2
> Using logical volume(s) on command line
> --- Logical volume ---
> LV Name /dev/vgec_rde0_pdb/lvol2
> VG Name vgec_rde0_pdb
> LV UUID 05WEDG-VxER-xVhT-jHDI-l90y-jpjq-7urtPl
> LV Write Access read/write
> LV Status available
> # open 1
> LV Size 60.00 GB
> Current LE 1920
> Segments 1
> Allocation inherit
> Read ahead sectors 0
> Block device 253:15
>
>
> Let me know if you have any suggetion
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/f3f71271/attachment.htm>

From suuuper at messinalug.org  Tue May  6 14:32:44 2008
From: suuuper at messinalug.org (Giovanni Mancuso)
Date: Tue, 06 May 2008 16:32:44 +0200
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <1210071998.3413.21.camel@localhost.localdomain>
References: <4820391D.1070601@messinalug.org>	<1210071644.3413.18.camel@localhost.localdomain>	<48203B51.7030705@messinalug.org>
	<1210071998.3413.21.camel@localhost.localdomain>
Message-ID: <48206C0C.3000805@messinalug.org>

I use kernel:
    uname -a
    Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 
i686 athlon i386 GNU/Linux

and release:
    cat /etc/redhat-release
    Red Hat Enterprise Linux Server release 5.1 (Tikanga)

Thanks

Steven Whitehouse ha scritto:
> Hi,
>
> On Tue, 2008-05-06 at 13:04 +0200, Giovanni Mancuso wrote:
>   
>> I use 
>> lock_dlm
>> and my version of gfs2 is:
>>     GFS2 (built Oct 10 2007 16:34:59) installed
>>
>>     
> Built from what exactly? Linus' kernel tree? the -nmw git tree? Some
> distribution or other?
>
> I suspect that you probably need to upgrade to a newer kernel version
> though given that date. Ideally as recent as possible,
>
> Steve.
>
>
>   
>> Thanks
>>
>>
>> Steven Whitehouse ha scritto: 
>>     
>>> Hi,
>>>
>>> I've not seen that before. What version of GFS2 are you using and are
>>> you using lock_nolock or lock_dlm?
>>>
>>> Steve.
>>>
>>> On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote:
>>>   
>>>       
>>>> Hi to all,
>>>> I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/'
>>>> i have ls: /store/new/: Input/output error
>>>> and in my dmesg i have:
>>>>
>>>> GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state !=
>>>> LM_ST_UNLOCKED" failed
>>>> GFS2: fsid=sophosha:gfs00.1:   function = gfs2_glock_drop_th, file =
>>>> fs/gfs2/glock.c, line = 963
>>>>   [<f8bce739>] gfs2_assert_warn_i+0x7e/0x108 [gfs2]
>>>>   [<f8bba6c0>] gfs2_glock_drop_th+0x83/0xfb [gfs2]
>>>>   [<f8bba842>] xmote_bh+0x10a/0x271 [gfs2]
>>>>   [<f8bbabbd>] run_queue+0xd4/0x26e [gfs2]
>>>>   [<f8bbb550>] glock_work_func+0x24/0x31 [gfs2]
>>>>   [<c04332dc>] run_workqueue+0x78/0xb5
>>>>   [<f8bbb52c>] glock_work_func+0x0/0x31 [gfs2]
>>>>   [<c0433b90>] worker_thread+0xd9/0x10d
>>>>   [<c04202b1>] default_wake_function+0x0/0xc
>>>>   [<c0433ab7>] worker_thread+0x0/0x10d
>>>>   [<c0435f65>] kthread+0xc0/0xeb
>>>>   [<c0435ea5>] kthread+0x0/0xeb
>>>>   [<c0405c3b>] kernel_thread_helper+0x7/0x10
>>>>   =======================
>>>>
>>>> how can i solve it?
>>>>
>>>> P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd.
>>>>
>>>> --
>>>> Linux-cluster mailing list
>>>> Linux-cluster at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>>     
>>>>         
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>>   
>>>       
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>     
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/eb01f292/attachment.htm>

From jbrassow at redhat.com  Tue May  6 15:31:21 2008
From: jbrassow at redhat.com (Jonathan Brassow)
Date: Tue, 6 May 2008 10:31:21 -0500
Subject: [Linux-cluster] lvextend error on Redhat-cluster suit 5.1
In-Reply-To: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com>
References: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com>
Message-ID: <05861C58-13E1-4D9D-BFCF-209A566A659E@redhat.com>


On May 6, 2008, at 8:40 AM, Kumar, T Santhosh (TCS) wrote:

>
>
> Linux hostname 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008
> x86_64 x86_64 x86_64 GNU/Linux
>
> Red Hat Enterprise Linux Server release 5.1 (Tikanga)
>
> # lvextend -L +4000M /dev/vgec_rde0_pdb/lvol2
> Extending logical volume lvol2 to 63.91 GB
> Error locking on node xxxxxx: Volume group for uuid not found:

This type of message usually implies that the machine can see a  
storage device that is no longer part of a volume group, but has not  
been wiped (pvremove).  This can happen for any number of reasons.   
The admin may have repartitioned something and forgot to wipe the  
PVs... a disk failed and came back... new disks were added that had  
LVM metadata on them... etc.

Certainly try the suggestion about 'partprobe' and restarting  
clvmd...  If that works, great.  Otherwise, you will have to find the  
partition with the stray PV metadata on it - perhaps best done by  
reconciling 'cat /proc/partitions' and 'pvs'.

  brassow 
  



From brian at chpc.utah.edu  Tue May  6 18:57:11 2008
From: brian at chpc.utah.edu (Brian D. Haymore)
Date: Tue, 06 May 2008 12:57:11 -0600
Subject: [Linux-cluster] Sanity Check
Message-ID: <4820AA07.3090303@chpc.utah.edu>

I tried to send this yesterday but didn't see it on the list yet so I 
apologize if this ends up being a duplicate.




We have been starting to play with Cluster Suite as part of RHEL over 
the past week.  Our needs, we think, are pretty basic.  However we have 
not found enough information in the docs to help validate our plans as 
being sane.  So for that I'm turning to the list in hopes someone can help.

What we are trying to do is simply have 3 servers attached to a SAN with 
common disk storage.  By common storage I simply mean the SAN is zoned 
so that all three servers can see this common storage.  We then want to 
lvm, cluster lvm flag enabled, this storage such that we can create many 
logical volumes.  Each of those LVs would have an ext3 file system on 
it, implying only one server at a time will mount and use it.  Then we 
can take the N LVs and distribute them out between the three servers in 
a very fixed fashion.

So thus far we see that we need to have CMAN running as well as have 
lvm2-cluster installed and having run `lvmconf --enable-cluster`.  Then 
from system-config-lvm we created a cluster of our 3 servers.  We think 
that is about all we need to do for this very crude initial setup.  This 
is where we wanted to get some feedback if in fact this is an 
acceptable, while overly basic, configuration.  Could someone offer any 
feedback here?

We do realize that we are ignoring many of the key features of the 
cluster setup where we could define these LVs and their file systems as 
resourced as well as the services using them and have cman, rgmanager, 
etc help build a more robust and polished setup.  We are in a time 
crunch for now and need to get an initial setup going thus the above 
question, then with time we hope to learn the other parts of the system 
and then migrate things in a better direction.  Thanks for your time.


-- 
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112-0190
Phone: (801) 558-1150, Fax: (801) 585-5366
http://www.map.utah.edu/umaplink/0019.html



From sdake at redhat.com  Tue May  6 19:04:10 2008
From: sdake at redhat.com (Steven Dake)
Date: Tue, 06 May 2008 12:04:10 -0700
Subject: [Linux-cluster] Sanity Check
In-Reply-To: <4820AA07.3090303@chpc.utah.edu>
References: <4820AA07.3090303@chpc.utah.edu>
Message-ID: <1210100651.7766.13.camel@balance>

On Tue, 2008-05-06 at 12:57 -0600, Brian D. Haymore wrote:
> I tried to send this yesterday but didn't see it on the list yet so I 
> apologize if this ends up being a duplicate.
> 
> 
> 
> 
> We have been starting to play with Cluster Suite as part of RHEL over 
> the past week.  Our needs, we think, are pretty basic.  However we have 
> not found enough information in the docs to help validate our plans as 
> being sane.  So for that I'm turning to the list in hopes someone can help.
> 
> What we are trying to do is simply have 3 servers attached to a SAN with 
> common disk storage.  By common storage I simply mean the SAN is zoned 
> so that all three servers can see this common storage.  We then want to 
> lvm, cluster lvm flag enabled, this storage such that we can create many 
> logical volumes.  Each of those LVs would have an ext3 file system on 
> it, implying only one server at a time will mount and use it.  Then we 
> can take the N LVs and distribute them out between the three servers in 
> a very fixed fashion.
> 
> So thus far we see that we need to have CMAN running as well as have 
> lvm2-cluster installed and having run `lvmconf --enable-cluster`.  Then 
> from system-config-lvm we created a cluster of our 3 servers.  We think 
> that is about all we need to do for this very crude initial setup.  This 
> is where we wanted to get some feedback if in fact this is an 
> acceptable, while overly basic, configuration.  Could someone offer any 
> feedback here?
> 
> We do realize that we are ignoring many of the key features of the 
> cluster setup where we could define these LVs and their file systems as 
> resourced as well as the services using them and have cman, rgmanager, 
> etc help build a more robust and polished setup.  We are in a time 
> crunch for now and need to get an initial setup going thus the above 
> question, then with time we hope to learn the other parts of the system 
> and then migrate things in a better direction.  Thanks for your time.
> 
> 

Your design should work fine although maybe not clearly defined in any
documentation as a use case.  If you have no need of shared storage for
the logical volumes, then there is no need for gfs.  You will need to
run lvm2 (clvmd) in clustered mode so that each node sees the logical
volume changes when any node makes a change.

check out the documentation section on

http://sources.redhat.com/cluster/wiki

you may find some information there that is helpful in your
configuration.

Regards
-steve



From lhh at redhat.com  Tue May  6 19:06:54 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 06 May 2008 15:06:54 -0400
Subject: [Linux-cluster] Sanity Check
In-Reply-To: <4820AA07.3090303@chpc.utah.edu>
References: <4820AA07.3090303@chpc.utah.edu>
Message-ID: <1210100814.15248.8.camel@localhost.localdomain>

On Tue, 2008-05-06 at 12:57 -0600, Brian D. Haymore wrote:

> So thus far we see that we need to have CMAN running as well as have 
> lvm2-cluster installed and having run `lvmconf --enable-cluster`.  Then 
> from system-config-lvm we created a cluster of our 3 servers.  We think 
> that is about all we need to do for this very crude initial setup.  This 
> is where we wanted to get some feedback if in fact this is an 
> acceptable, while overly basic, configuration.  Could someone offer any 
> feedback here?

That looks about right.  You also want fencing if you're using clustered
LVM to protect the LVM metadata, but I don't know if it's strictly
*required* or not, since you're statically assigning VGs to specific
nodes.

Note that if you just assign static LUNs to each node and manage those
LUNs from the SAN management interface, you don't even need lvm2-cluster
or CMAN.  For example, you can present only certain LUNs to certain
computers.


> We do realize that we are ignoring many of the key features of the 
> cluster setup where we could define these LVs and their file systems as 
> resourced as well as the services using them and have cman, rgmanager, 
> etc help build a more robust and polished setup.



-- Lon



From garromo at us.ibm.com  Tue May  6 20:37:32 2008
From: garromo at us.ibm.com (Gary Romo)
Date: Tue, 6 May 2008 14:37:32 -0600
Subject: [Linux-cluster] How do you verify/test fencing?
Message-ID: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>

Is there a command that you can run to test/veryify that fencing is 
working properly?
Or that it is part of the fence if you will?
I realize that the primary focus of the fence is to shut off the other 
server(s).
However, when I have a cluster up, how can I determine that all of my 
nodes are properly fenced?

Gary Romo
IBM Global Technology Services
303.458.4415
Email: garromo at us.ibm.com
Pager:1.877.552.9264
Text message: gromo at skytel.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/ad5490cb/attachment.htm>

From lhh at redhat.com  Tue May  6 21:35:04 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 06 May 2008 17:35:04 -0400
Subject: [Linux-cluster] How do you verify/test fencing?
In-Reply-To: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>
References: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>
Message-ID: <1210109704.15248.28.camel@localhost.localdomain>

On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote:
> 
> Is there a command that you can run to test/veryify that fencing is
> working properly? 
> Or that it is part of the fence if you will? 
> I realize that the primary focus of the fence is to shut off the other
> server(s). 
> However, when I have a cluster up, how can I determine that all of my
> nodes are properly fenced? 

I'm not sure exactly how to answer the question.  Fencing is used to cut
a node off; if all nodes are fenced, no one can access shared storage ;)

* For testing whether or not fencing works, stop the cluster software on
all the nodes and run 'fence_node <nodename>' (where nodename is a host
you're not working on).

* For testing whether or not a node will be fenced as a matter of
recovery, try 'cman_tool services'.  If that node's ID isn't in the
"fence" section, it will not be fenced if it fails.

(Note that mounting a GFS file system will fail if the node is not a
part of the "fence" service.)

Let me know if this answers your question.

-- Lon



From jas199931 at yahoo.com  Tue May  6 21:36:25 2008
From: jas199931 at yahoo.com (Ja S)
Date: Tue, 6 May 2008 14:36:25 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <853958.85045.qm@web32207.mail.mud.yahoo.com>
Message-ID: <777828.55819.qm@web32207.mail.mud.yahoo.com>

> A couple of further questions about the master copy
> of
> lock resources.
> 
> The first one:
> =============
> 
> Again, assume:
> 1) Node A is extremely too busy and handle all
> requests
> 2) other nodes are just idle and have never handled
> any requests
> 
> According to the documents, Node A will hold all
> master copies initially. The thing I am not aware of
> and unclear is whether the lock manager will evenly
> distribute the master copies on Node A to other
> nodes
> when it thinks the number of master copies on Node A
> is too many?
> 


After reading the source code briefly, it seems that
there is a remastering process, which will be called
when recovering and rebuilding the lock directory when
any node(s) failed. Correct me if I am wrong, please. 


> 
> The second one:
> ==============
> 
> Assume a master copy of lock resource is on Node A.
> Now Node B holds a local copy of the lock resource.
> When the lock queues changed on the local copy on
> Node
> B, will the master copy on Node A be updated
> simultaneously? If so, when more than one nodes have
> the local copy of the same lock resource, how the
> lock
> manager to handle the update of the master copy?
> Using
> another lock mechanism to prevent the corruption of
> the master copy?
> 


I have not found the answer so far. I may need to read
the source code very carefully. Can anyone kindly
provide any hints? 

Thanks again in advance.
 
Jas
>  
> 
> 
> > -- 
> > 
> > Chrissie
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> 
> 
>      
>
____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From garromo at us.ibm.com  Tue May  6 21:37:46 2008
From: garromo at us.ibm.com (Gary Romo)
Date: Tue, 6 May 2008 15:37:46 -0600
Subject: [Linux-cluster] fence error messages
Message-ID: <OFE3EB3D72.3BD0278F-ON87257441.0071EFE8-87257441.007699D3@us.ibm.com>

I am getting these in /var/log/messages

May  6 14:46:28 lxomt06e fenced[2849]: fencing node "bogusnode"
May  6 14:46:28 lxomt06e fenced[2849]: fence "bogusnode" failed

I am basically setting up a single-node cluster, because I don't have the 
2nd node yet.
So I am using manual fence in order to accomplish this.


                        <fence>
                                <method name="1">
                                        <device name="manual_fence" 
nodename="bogusnode"/>
                                </method>
                        </fence>

Gary Romo
IBM Global Technology Services
303.458.4415
Email: garromo at us.ibm.com
Pager:1.877.552.9264
Text message: gromo at skytel.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080506/3a7a1cde/attachment.htm>

From lhh at redhat.com  Tue May  6 21:47:26 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 06 May 2008 17:47:26 -0400
Subject: [Linux-cluster] fence error messages
In-Reply-To: <OFE3EB3D72.3BD0278F-ON87257441.0071EFE8-87257441.007699D3@us.ibm.com>
References: <OFE3EB3D72.3BD0278F-ON87257441.0071EFE8-87257441.007699D3@us.ibm.com>
Message-ID: <1210110446.15248.37.camel@localhost.localdomain>

On Tue, 2008-05-06 at 15:37 -0600, Gary Romo wrote:
> 
> I am getting these in /var/log/messages 
> 
> May  6 14:46:28 lxomt06e fenced[2849]: fencing node "bogusnode" 
> May  6 14:46:28 lxomt06e fenced[2849]: fence "bogusnode" failed 
> 
> I am basically setting up a single-node cluster, because I don't have
> the 2nd node yet. 
> So I am using manual fence in order to accomplish this. 
> 
> 
>                         <fence> 
>                                 <method name="1"> 
>                                         <device name="manual_fence"
> nodename="bogusnode"/> 
>                                 </method> 
>                         </fence> 

do you have a manual_fence in the <fencedevices> section?

-- Lon




From ccaulfie at redhat.com  Wed May  7 06:58:04 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 07 May 2008 07:58:04 +0100
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <853958.85045.qm@web32207.mail.mud.yahoo.com>
References: <853958.85045.qm@web32207.mail.mud.yahoo.com>
Message-ID: <482152FC.4070707@redhat.com>

Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
>> Ja S wrote:
>>> --- Christine Caulfield <ccaulfie at redhat.com>
>> wrote:
>>>
>>>>> DLM lockspace 'data'
>>>>>        5         2f06768 1
>>>>>        5          114d15 1
>>>>>        5          120b13 1
>>>>>        5         5bd1f04 1
>>>>>        3          6a02f8 2
>>>>>        5          cb7604 1
>>>>>        5          ca187b 1
>>>>>
>>>> The first two numbers are the lock name. Don't
>> ask
>>>> me what they mean,
>>>> that's a GFS question! (actually, I think inode
>>>> numbers might be
>>>> involved) The last number is the nodeID on which
>> the
>>>> lock is mastered.
>>>
>>> Great, thanks again!
>>>
>>>
>>>>>> That lookup only happens the first time
>>>>>> a resource is used by a node, once the
>>>>>> node knows where the master is, 
>>>>>> it does not need to look it up again,
>>>>>> unless it releases all
>>>>>> locks on the resource.
>>>>>>
>>>>> Oh, I see. Just to further clarify, does it
>> means
>>>> if
>>>>> the same lock resource is required again by an
>>>>> application on the node A, the node A will go
>>>> straight
>>>>> to the known node (ie the node B) which holds
>> the
>>>>> master previously, but needs to lookup again if
>>>> the
>>>>> node B has already released the lock resource?
>>>> Not quite. A resource is mastered on a node for
>> as
>>>> long as there are
>>>> locks for it. If node A gets the lock (which is
>>>> mastered on node B) then
>>>> it knows always to go do node B until all locks
>> on
>>>> node A are released.
>>>> When that happens the local copy of the resource
>> on
>>>> node A is released
>>>> including the reference to node B. If all the
>> locks
>>>> on node B are
>>>> released (but A still has some) then the resource
>>>> will stay mastered on
>>>> node B and nodes that still have locks on that
>>>> resource will know where
>>>> to find it without a directory lookup.
>>>>
>>> Aha, I think I missed another important concept --
>> a
>>> local copy of lock resources. I did not realise
>> the
>>> existence of the local copy of lock resources.
>> Which
>>> file should I check to figure out how many local
>>> copies a node has and what the local copies are? 
>> All the locks are displayed in
>> /proc/cluster/dlm_locks, that shows you
>> which are local copies and which are masters.
> 
> 
> A couple of further questions about the master copy of
> lock resources.
> 
> The first one:
> =============
> 
> Again, assume:
> 1) Node A is extremely too busy and handle all
> requests
> 2) other nodes are just idle and have never handled
> any requests
> 
> According to the documents, Node A will hold all
> master copies initially. The thing I am not aware of
> and unclear is whether the lock manager will evenly
> distribute the master copies on Node A to other nodes
> when it thinks the number of master copies on Node A
> is too many?

Locks are only remastered when a node leaves the cluster. In that case
all of its nodes will be moved to another node. We do not do dynamic
remastering - a resource that is mastered on one node will stay mastered
on that node regardless of traffic or load, until all users of the
resource have been freed.

> The second one:
> ==============
> 
> Assume a master copy of lock resource is on Node A.
> Now Node B holds a local copy of the lock resource.
> When the lock queues changed on the local copy on Node
> B, will the master copy on Node A be updated
> simultaneously? If so, when more than one nodes have
> the local copy of the same lock resource, how the lock
> manager to handle the update of the master copy? Using
> another lock mechanism to prevent the corruption of
> the master copy?
> 

All locking happens on the master node. The local copy is just that, a
copy. It is updated when the master confirms what has happened. The
local copy is there mainly for rebuilding the resource table when a
master leaves the cluster, and to keep a track of locks that exist on
the local node. The local copy is NOT complete. it only contains local
users of a resource.


-- 

Chrissie



From maciej.bogucki at artegence.com  Wed May  7 07:24:53 2008
From: maciej.bogucki at artegence.com (Maciej Bogucki)
Date: Wed, 07 May 2008 09:24:53 +0200
Subject: [Linux-cluster] mounting as non root
In-Reply-To: <149194.48263.qm@web32401.mail.mud.yahoo.com>
References: <149194.48263.qm@web32401.mail.mud.yahoo.com>
Message-ID: <48215945.7090704@artegence.com>

nch napisa?(a):
> 
>   I tried that, unsuccessfully.
>   The relevant line in my fstab is:
> /dev/gnbd/disk  /mnt/shared    gfs2    user,noauto     0       0
> 
>   And this is the error msg when trying "mount /mnt/shared" as a non
> root user
> /sbin/mount.gfs2: error mounting /dev/gnbd/disk on /mnt/shared:
> Operation not permitted

Please paste me the output of:

"id" and "cat /etc/fstab"

Best Regards
Maciej Bogucki



From maciej.bogucki at artegence.com  Wed May  7 07:30:16 2008
From: maciej.bogucki at artegence.com (Maciej Bogucki)
Date: Wed, 07 May 2008 09:30:16 +0200
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <48206C0C.3000805@messinalug.org>
References: <4820391D.1070601@messinalug.org>	<1210071644.3413.18.camel@localhost.localdomain>	<48203B51.7030705@messinalug.org>	<1210071998.3413.21.camel@localhost.localdomain>
	<48206C0C.3000805@messinalug.org>
Message-ID: <48215A88.2020805@artegence.com>

Giovanni Mancuso napisa?(a):
> I use kernel:
>     uname -a
>     Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007
> i686 athlon i386 GNU/Linux
> 
> and release:
>     cat /etc/redhat-release
>     Red Hat Enterprise Linux Server release 5.1 (Tikanga)

Hello,

You could try to upgrade kernel to the newer one and the rest of the
packages.

Best Regards
Maciej Bogucki



From swhiteho at redhat.com  Wed May  7 08:33:51 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 07 May 2008 09:33:51 +0100
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <48215A88.2020805@artegence.com>
References: <4820391D.1070601@messinalug.org>
	<1210071644.3413.18.camel@localhost.localdomain>
	<48203B51.7030705@messinalug.org>
	<1210071998.3413.21.camel@localhost.localdomain>
	<48206C0C.3000805@messinalug.org>  <48215A88.2020805@artegence.com>
Message-ID: <1210149231.3345.1.camel@localhost.localdomain>

Hi,

On Wed, 2008-05-07 at 09:30 +0200, Maciej Bogucki wrote:
> Giovanni Mancuso napisa?(a):
> > I use kernel:
> >     uname -a
> >     Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007
> > i686 athlon i386 GNU/Linux
> > 
> > and release:
> >     cat /etc/redhat-release
> >     Red Hat Enterprise Linux Server release 5.1 (Tikanga)
> 
> Hello,
> 
> You could try to upgrade kernel to the newer one and the rest of the
> packages.
> 
> Best Regards
> Maciej Bogucki
> 

Yes, thats certainly worth doing, although RHEL 5.1 kernels are not the
best testing ground for GFS2. I'd suggest using a Fedora kernel for
testing purposes, or at least the latest 5.2 kernel if you really need
to use RHEL.

Steve.




From jas199931 at yahoo.com  Wed May  7 10:34:31 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 7 May 2008 03:34:31 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <482152FC.4070707@redhat.com>
Message-ID: <424492.84161.qm@web32205.mail.mud.yahoo.com>

> > 
> > A couple of further questions about the master
> copy of
> > lock resources.
> > 
> > The first one:
> > =============
> > 
> > Again, assume:
> > 1) Node A is extremely too busy and handle all
> > requests
> > 2) other nodes are just idle and have never
> handled
> > any requests
> > 
> > According to the documents, Node A will hold all
> > master copies initially. The thing I am not aware
> of
> > and unclear is whether the lock manager will
> evenly
> > distribute the master copies on Node A to other
> nodes
> > when it thinks the number of master copies on Node
> A
> > is too many?
> 
> Locks are only remastered when a node leaves the
> cluster. In that case
> all of its nodes will be moved to another node. We
> do not do dynamic
> remastering - a resource that is mastered on one
> node will stay mastered
> on that node regardless of traffic or load, until
> all users of the
> resource have been freed.


Thank you very much.


> 
> > The second one:
> > ==============
> > 
> > Assume a master copy of lock resource is on Node
> A.
> > Now Node B holds a local copy of the lock
> resource.
> > When the lock queues changed on the local copy on
> Node
> > B, will the master copy on Node A be updated
> > simultaneously? If so, when more than one nodes
> have
> > the local copy of the same lock resource, how the
> lock
> > manager to handle the update of the master copy?
> Using
> > another lock mechanism to prevent the corruption
> of
> > the master copy?
> > 
> 
> All locking happens on the master node. The local
> copy is just that, a
> copy. It is updated when the master confirms what
> has happened. The
> local copy is there mainly for rebuilding the
> resource table when a
> master leaves the cluster, and to keep a track of
> locks that exist on
> the local node. The local copy is NOT complete. it
> only contains local
> users of a resource.
> 

Thanks again for the kind and detailed explanation. 


I am sorry I have to bother you again as I am having
more questions. I analysed /proc/cluster/dlm_dir and
dlm_locks and found some strange things. Please see
below:


>From /proc/cluster/dlm_dir:

In lock space [ABC]:
This node (node 2) has 445 lock resources in total
where
--328   master lock resources
--117   local copies of lock resources mastered on
other nodes.

===============================
===============================


>From /proc/cluster/dlm_locks:

In lock space [ABC]:
There are 1678 lock resouces in use where
--1674  lock resources are mastered by this node (node
2)
--4     lock resources are mastered by other nodes,
within which:
----1 lock resource mastered on node 1
----1 lock resource mastered on node 3
----1 lock resource mastered on node 4
----1 lock resource mastered on node 5

A typical master lock resource in
/proc/cluster/dlm_locks is:
Resource 000001000de4fd88 (parent 0000000000000000).
Name (len=24) "       3         5fafc85"
Master Copy
LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00
     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Granted Queue
1ff5036d NL Remote:   4 000603e8
80d2013f NL Remote:   5 00040214
00240209 NL Remote:   3 0001031d
00080095 NL Remote:   1 00040197
00010304 NL
Conversion Queue
Waiting Queue


After search for local copy in
/proc/cluster/dlm_locks, I got:
Resource 000001002a273618 (parent 0000000000000000).
Name (len=16) "withdraw 3......"
Local Copy, Master is node 3
Granted Queue
0004008d PR Master:     0001008c
Conversion Queue
Waiting Queue

--
Resource 000001003fe69b68 (parent 0000000000000000).
Name (len=16) "withdraw 5......"
Local Copy, Master is node 5
Granted Queue
819402ef PR Master:     00010317
Conversion Queue
Waiting Queue

--
Resource 000001002a2732e8 (parent 0000000000000000).
Name (len=16) "withdraw 1......"
Local Copy, Master is node 1
Granted Queue
000401e9 PR Master:     00010074
Conversion Queue
Waiting Queue

--
Resource 000001004a32e598 (parent 0000000000000000).
Name (len=16) "withdraw 4......"
Local Copy, Master is node 4
Granted Queue
1f5b0317 PR Master:     00010203
Conversion Queue
Waiting Queue

These four local copy of lock resources have been
staying in /proc/cluster/dlm_locks for several days.

Now my questions:
1. In my case, for the same lock space, the number of
master lock resources reported by dlm_dir is much
SMALLER than that reported in dlm_locks. My
understanding is that master lock resources listed in
dlm_dir must be larger than or at least the same as
that reported in dlm_locks. The situation I discovered
on the node does not make any sense to me. Am I
missing anything? Can you help me to clarify the case?

2. What can cause "withdraw ...." to be the lock
resource name? 

3. These four local copy of lock resources have not
been released for at least serveral days as I knew.
How can I find out whether they are in a strange dead
situation or are still waiting for the lock manager to
release them? How to change the timeout?


Thank you very much for your great further help in
advance.

Jas



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From suuuper at messinalug.org  Wed May  7 11:07:08 2008
From: suuuper at messinalug.org (Giovanni Mancuso)
Date: Wed, 07 May 2008 13:07:08 +0200
Subject: [Linux-cluster] Problem gfs2 and drbd
In-Reply-To: <1210149231.3345.1.camel@localhost.localdomain>
References: <4820391D.1070601@messinalug.org>	<1210071644.3413.18.camel@localhost.localdomain>	<48203B51.7030705@messinalug.org>	<1210071998.3413.21.camel@localhost.localdomain>	<48206C0C.3000805@messinalug.org>
	<48215A88.2020805@artegence.com>
	<1210149231.3345.1.camel@localhost.localdomain>
Message-ID: <48218D5C.5040604@messinalug.org>

Ok, now i try to upgrade my kernel with the latest 5.2 kernel.

Thanks

Steven Whitehouse ha scritto:
> Hi,
>
> On Wed, 2008-05-07 at 09:30 +0200, Maciej Bogucki wrote:
>   
>> Giovanni Mancuso napisa?(a):
>>     
>>> I use kernel:
>>>     uname -a
>>>     Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007
>>> i686 athlon i386 GNU/Linux
>>>
>>> and release:
>>>     cat /etc/redhat-release
>>>     Red Hat Enterprise Linux Server release 5.1 (Tikanga)
>>>       
>> Hello,
>>
>> You could try to upgrade kernel to the newer one and the rest of the
>> packages.
>>
>> Best Regards
>> Maciej Bogucki
>>
>>     
>
> Yes, thats certainly worth doing, although RHEL 5.1 kernels are not the
> best testing ground for GFS2. I'd suggest using a Fedora kernel for
> testing purposes, or at least the latest 5.2 kernel if you really need
> to use RHEL.
>
> Steve.
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/a96a96b4/attachment.htm>

From lists at tangent.co.za  Wed May  7 11:19:13 2008
From: lists at tangent.co.za (Chris Picton)
Date: Wed, 7 May 2008 11:19:13 +0000 (UTC)
Subject: [Linux-cluster] Re: How do you verify/test fencing?
References: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>
	<1210109704.15248.28.camel@localhost.localdomain>
Message-ID: <fvs37h$jvo$1@ger.gmane.org>

On Tue, 06 May 2008 17:35:04 -0400, Lon Hohberger wrote:

> On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote:
>> 
>> Is there a command that you can run to test/veryify that fencing is
>> working properly?
>> Or that it is part of the fence if you will? I realize that the primary
>> focus of the fence is to shut off the other server(s).
>> However, when I have a cluster up, how can I determine that all of my
>> nodes are properly fenced?
>
> 
> * For testing whether or not fencing works, stop the cluster software on
> all the nodes and run 'fence_node <nodename>' (where nodename is a host
> you're not working on).
> 
> * For testing whether or not a node will be fenced as a matter of
> recovery, try 'cman_tool services'.  If that node's ID isn't in the
> "fence" section, it will not be fenced if it fails.

These two step will not test that a node will be fenced automatically if 
it is malfunctioning.

What can be done to a cluster node, to test that it will be automatically 
fenced if there is a problem.



From vimal at monster.co.in  Wed May  7 11:41:10 2008
From: vimal at monster.co.in (Vimal Gupta)
Date: Wed, 07 May 2008 17:11:10 +0530
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <fvp3be$er6$1@ger.gmane.org>
References: <fvp3be$er6$1@ger.gmane.org>
Message-ID: <48219556.9060901@monster.co.in>

Hi,

I have the same question.???
Anybody has the answer Please.......???

Chris Picton wrote:
> Hi All
>
> I am investigating a new cluster installation.
>
> Documentation from redhat indicates that GFS2 is not yet production 
> ready.  Tests I have run show it is *much* faster that gfs for my 
> workload.
>
> Is GFS2 not production-ready due to lack of testing, or due to known bugs?
>
> Any advice would be appreciated
>
> Chris
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>   


-- 

Vimal Gupta
Sr. System Administrator
Monster.com India Pvt.Ltd.
FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A,
NOIDA, UP 201 301, INDIA
Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360



From oliveiros.cristina at gmail.com  Wed May  7 11:44:28 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Wed, 7 May 2008 12:44:28 +0100
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <48219556.9060901@monster.co.in>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
Message-ID: <f54607780805070444i17d8baa2w96c025188bf2ebac@mail.gmail.com>

And I have the same question, also.

Best,
Oliveiros

2008/5/7 Vimal Gupta <vimal at monster.co.in>:

> Hi,
>
> I have the same question.???
> Anybody has the answer Please.......???
>
>
> Chris Picton wrote:
>
> > Hi All
> >
> > I am investigating a new cluster installation.
> >
> > Documentation from redhat indicates that GFS2 is not yet production
> > ready.  Tests I have run show it is *much* faster that gfs for my workload.
> >
> > Is GFS2 not production-ready due to lack of testing, or due to known
> > bugs?
> >
> > Any advice would be appreciated
> >
> > Chris
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> >
> >
>
>
> --
>
> Vimal Gupta
> Sr. System Administrator
> Monster.com India Pvt.Ltd.
> FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A,
> NOIDA, UP 201 301, INDIA
> Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/e159a945/attachment.htm>

From jb at soe.se  Wed May  7 11:44:51 2008
From: jb at soe.se (=?ISO-8859-1?Q?Jonas_Bj=F6rklund?=)
Date: Wed, 7 May 2008 13:44:51 +0200 (CEST)
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <48219556.9060901@monster.co.in>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
Message-ID: <Pine.LNX.4.64.0805071344100.17044@localhost>

Hello,

I would like to know also...

/Jonas

On Wed, 7 May 2008, Vimal Gupta wrote:

> Hi,
>
> I have the same question.???
> Anybody has the answer Please.......???
>
> Chris Picton wrote:
>>  Hi All
>>
>>  I am investigating a new cluster installation.
>>
>>  Documentation from redhat indicates that GFS2 is not yet production ready.
>>  Tests I have run show it is *much* faster that gfs for my workload.
>>
>>  Is GFS2 not production-ready due to lack of testing, or due to known bugs?
>>
>>  Any advice would be appreciated
>>
>>  Chris
>>
>>  --
>>  Linux-cluster mailing list
>>  Linux-cluster at redhat.com
>>  https://www.redhat.com/mailman/listinfo/linux-cluster
>> 
>>
>> 
>
>
> -- 
>
> Vimal Gupta
> Sr. System Administrator
> Monster.com India Pvt.Ltd.
> FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A,
> NOIDA, UP 201 301, INDIA
> Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>



From gordan at bobich.net  Wed May  7 11:53:50 2008
From: gordan at bobich.net (gordan at bobich.net)
Date: Wed, 7 May 2008 12:53:50 +0100 (BST)
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <Pine.LNX.4.64.0805071344100.17044@localhost>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
	<Pine.LNX.4.64.0805071344100.17044@localhost>
Message-ID: <alpine.LRH.1.10.0805071248310.4369@skynet.shatteredsilicon.net>

For some reason, I always worry when people whether something that isn't 
production ready _REALLY_ isn't production ready, or whether the 
developers are just saying it isn't production ready for fun...

IIRC, the plan was that it will be ready by RHEL5.1, but additional 
critical bugs were discovered, the fixes for which have, to my 
knowledge, not made it into the distro yet.

Gordan

On Wed, 7 May 2008, Jonas Bj?rklund wrote:

> Hello,
>
> I would like to know also...
>
> /Jonas
>
> On Wed, 7 May 2008, Vimal Gupta wrote:
>
>> Hi,
>> 
>> I have the same question.???
>> Anybody has the answer Please.......???
>> 
>> Chris Picton wrote:
>>>  Hi All
>>>
>>>  I am investigating a new cluster installation.
>>>
>>>  Documentation from redhat indicates that GFS2 is not yet production 
>>> ready.
>>>  Tests I have run show it is *much* faster that gfs for my workload.
>>>
>>>  Is GFS2 not production-ready due to lack of testing, or due to known 
>>> bugs?
>>>
>>>  Any advice would be appreciated
>>>
>>>  Chris
>>>
>>>  --
>>>  Linux-cluster mailing list
>>>  Linux-cluster at redhat.com
>>>  https://www.redhat.com/mailman/listinfo/linux-cluster
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> 
>> Vimal Gupta
>> Sr. System Administrator
>> Monster.com India Pvt.Ltd.
>> FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A,
>> NOIDA, UP 201 301, INDIA
>> Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360
>> 
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> 
>> 
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

From swhiteho at redhat.com  Wed May  7 12:09:21 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 07 May 2008 13:09:21 +0100
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <Pine.LNX.4.64.0805071344100.17044@localhost>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
	<Pine.LNX.4.64.0805071344100.17044@localhost>
Message-ID: <1210162161.3345.26.camel@localhost.localdomain>

Hi,

On Wed, 2008-05-07 at 13:44 +0200, Jonas Bj?rklund wrote:
> Hello,
> 
> I would like to know also...
> 
> /Jonas
> 
> On Wed, 7 May 2008, Vimal Gupta wrote:
> 
> > Hi,
> >
> > I have the same question.???
> > Anybody has the answer Please.......???
> >
> > Chris Picton wrote:
> >>  Hi All
> >>
> >>  I am investigating a new cluster installation.
> >>
> >>  Documentation from redhat indicates that GFS2 is not yet production ready.
> >>  Tests I have run show it is *much* faster that gfs for my workload.
> >>
> >>  Is GFS2 not production-ready due to lack of testing, or due to known bugs?
> >>
> >>  Any advice would be appreciated
> >>
> >>  Chris
> >>

The answer is a bit of both. We are getting to the stage where the known
bugs are mostly solved or will be very shortly. You can see the state of
the bug list at any time by going to bugzilla.redhat.com and looking for
any bug with gfs2 in the summary line. There are currently approx 70
such bugs, but please bear in mind that a large number of these are
asking for new features, and some of them are duplicates of the same bug
across different versions of RHEL and/or Fedora.

We are currently at a stage where having a large number of people
helping us in testing would be very helpful. If you have your own
favourite filesystem test, or if you are in a position to run a test
application, then we would be very interested in any reports of
success/failure.

If you do have any problems, then please do:
 o Check bugzilla to see if someone else has had the same problem
 o Report them (preferably via bugzilla, as that ensures that they won't
get lost somewhere)
 o Report them as "Fedora, rawhide" if they relate to the upstream
kernel (either Linus' tree or my -nmw git tree) and indicate in the
comments section which of these kernels you were using
 o Send patches if you have them, but please don't let that stop you
reporting bugs. All reports are useful. We might not be able to always
fix each and every report right away, but sometimes patterns emerge via
a number of reports which do allow us to home in on a particularly
tricky issue.
 o If you experience a hang, then please include (if possible):
    - A glock lock dump from all nodes (via debugfs)
    - A dlm lock dump from all nodes (via debugfs)
    - A stack trace from all nodes (echo t >/proc/sysrq-trigger)
 o If you experience an oops, then please make sure that you include all
the messages (including those which might have been logged just before
the oops itself).

The more people we have testing & reporting bugs, the quicker we can
approach stability.

There is one issue which I'm currently working on relating to a (fairly
rare, but nonetheless possible) race. This happens when two threads
calling ->readpage() race with each other. The reason that this is
problematic is that its the one place left where we are using "try
locks" to get around the page lock/glock lock ordering problem and the
VFS's AOP_TRUNCATED_PAGE return code is not guaranteed to result in
->readpage() being called again if another ->readpage() has raced with
it and brought the page uptodate. As a result "try locks" are the only
option, but for long and complicated reasons when a "try lock" is queued
it might end up triggering a demotion (if a request is pending from a
remote node) which deadlocks due to page lock/glock ordering.

The patch I'm working on at the moment, fixes that problem by failing
the glock (GLR_TRYFAILED) if a demote is needed and scheduling the glock
workqueue to deal with the demotion, thus avoiding the race. The try
lock will then be retried at a later date when it can be successful.

The bugzilla for this is #432057 if you want to follow my progress.

Steve.




From swhiteho at redhat.com  Wed May  7 12:21:31 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Wed, 07 May 2008 13:21:31 +0100
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <alpine.LRH.1.10.0805071248310.4369@skynet.shatteredsilicon.net>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
	<Pine.LNX.4.64.0805071344100.17044@localhost>
	<alpine.LRH.1.10.0805071248310.4369@skynet.shatteredsilicon.net>
Message-ID: <1210162891.3345.38.camel@localhost.localdomain>

Hi,

On Wed, 2008-05-07 at 12:53 +0100, gordan at bobich.net wrote:
> For some reason, I always worry when people whether something that isn't 
> production ready _REALLY_ isn't production ready, or whether the 
> developers are just saying it isn't production ready for fun...
> 
> IIRC, the plan was that it will be ready by RHEL5.1, but additional 
> critical bugs were discovered, the fixes for which have, to my 
> knowledge, not made it into the distro yet.
> 
> Gordan
> 
This issue is that the rules for updating RHEL are that we can't put in
updates to GFS2 in RHEL 5.1 because GFS2 is a demo feature in 5.1 and we
don't want to potentially risk adding bugs by fixing unsupported
features. I know that it seems to have been a long time but, I hope,
understandably, we are cautious of risking other people's important data
on the filesystem until we are sure that we've sorted out all the issues
and have been through extensive testing.

The net result is that there is a delay between the "appears to work ok"
stage and the "this is supported" stage and thats more or less
inevitable.

Fedora (and rawhide in particular) is there to provide the "bleeding
edge" code for testing purposes ahead of the RHEL releases. I know that
we've been a bit slow in pushing updates (particularly of the gfs2-utils
and cman packages) into Fedora in the past. Thats changing and we should
be much better at keeping those uptodate in the future. The gfs2-utils
package was recently updated and cman is on the list to be done shortly,

Steve.




From Sayed.Mujtaba at in.unisys.com  Wed May  7 12:20:58 2008
From: Sayed.Mujtaba at in.unisys.com (Mujtaba, Sayed Mohammed)
Date: Wed, 7 May 2008 17:50:58 +0530
Subject: [Linux-cluster] Red Hat cluster Manager(cman) and rgmanager 
Message-ID: <D566E8CF3538B54D95B925CB69CB4D2A10060AB8@inblr-exch1.eu.uis.unisys.com>

Hi,

 I am interested in studying design of Cluster Manager (cman) and
rgmanager.

 Is there any specific document available focusing on development of
these?

 

 Or let me know where I can get more information about it as only going
through 

 available source code in Red Hat site to understand it is not a good
idea.

 

 

 Thanks,

-Mujtaba

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/811b4cff/attachment.htm>

From gordan at bobich.net  Wed May  7 12:27:14 2008
From: gordan at bobich.net (gordan at bobich.net)
Date: Wed, 7 May 2008 13:27:14 +0100 (BST)
Subject: [Linux-cluster] GFS vs GFS2
In-Reply-To: <1210162891.3345.38.camel@localhost.localdomain>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
	<Pine.LNX.4.64.0805071344100.17044@localhost>
	<alpine.LRH.1.10.0805071248310.4369@skynet.shatteredsilicon.net>
	<1210162891.3345.38.camel@localhost.localdomain>
Message-ID: <alpine.LRH.1.10.0805071324280.4369@skynet.shatteredsilicon.net>

On Wed, 7 May 2008, Steven Whitehouse wrote:

> On Wed, 2008-05-07 at 12:53 +0100, gordan at bobich.net wrote:
>> For some reason, I always worry when people whether something that isn't
>> production ready _REALLY_ isn't production ready, or whether the
>> developers are just saying it isn't production ready for fun...
>>
>> IIRC, the plan was that it will be ready by RHEL5.1, but additional
>> critical bugs were discovered, the fixes for which have, to my
>> knowledge, not made it into the distro yet.
>>
> This issue is that the rules for updating RHEL are that we can't put in
> updates to GFS2 in RHEL 5.1 because GFS2 is a demo feature in 5.1 and we
> don't want to potentially risk adding bugs by fixing unsupported
> features. I know that it seems to have been a long time but, I hope,
> understandably, we are cautious of risking other people's important data
> on the filesystem until we are sure that we've sorted out all the issues
> and have been through extensive testing.

I think you misunderstood - I fully suport the approach you are taking of 
ensuring that RHEL features are totally stable. Those that want to play 
with unstable features always have FC available. :)

Gordan



From underscore_dot at yahoo.com  Wed May  7 15:03:12 2008
From: underscore_dot at yahoo.com (nch)
Date: Wed, 7 May 2008 08:03:12 -0700 (PDT)
Subject: [Linux-cluster] GFS vs GFS2
Message-ID: <879831.53021.qm@web32408.mail.mud.yahoo.com>

Hi, I think I'll post mine. 
I'm using a GNBD device formated as GFS2 (min-gfs.txt) to share a Compass/Lucene search engine index between two instances of a web app. If one of the instances creates the index, the other one won't be able to read it, whether the first one is running or not, throwing java.io.IOException: read past EOF.

I might have configured sth wrong, but the thing is that if I format the device as GFS, instead of GFS2, then this issue does not occur.

Regards



----- Original Message ----
From: Steven Whitehouse <swhiteho at redhat.com>
To: linux clustering <linux-cluster at redhat.com>
Sent: Wednesday, May 7, 2008 2:09:21 PM
Subject: Re: [Linux-cluster] GFS vs GFS2

Hi,

On Wed, 2008-05-07 at 13:44 +0200, Jonas Bj?rklund wrote:
> Hello,
> 
> I would like to know also...
> 
> /Jonas
> 
> On Wed, 7 May 2008, Vimal Gupta wrote:
> 
> > Hi,
> >
> > I have the same question.???
> > Anybody has the answer Please.......???
> >
> > Chris Picton wrote:
> >>  Hi All
> >>
> >>  I am investigating a new cluster installation.
> >>
> >>  Documentation from redhat indicates that GFS2 is not yet production ready.
> >>  Tests I have run show it is *much* faster that gfs for my workload.
> >>
> >>  Is GFS2 not production-ready due to lack of testing, or due to known bugs?
> >>
> >>  Any advice would be appreciated
> >>
> >>  Chris
> >>

The answer is a bit of both. We are getting to the stage where the known
bugs are mostly solved or will be very shortly. You can see the state of
the bug list at any time by going to bugzilla.redhat.com and looking for
any bug with gfs2 in the summary line. There are currently approx 70
such bugs, but please bear in mind that a large number of these are
asking for new features, and some of them are duplicates of the same bug
across different versions of RHEL and/or Fedora.

We are currently at a stage where having a large number of people
helping us in testing would be very helpful. If you have your own
favourite filesystem test, or if you are in a position to run a test
application, then we would be very interested in any reports of
success/failure.

If you do have any problems, then please do:
o Check bugzilla to see if someone else has had the same problem
o Report them (preferably via bugzilla, as that ensures that they won't
get lost somewhere)
o Report them as "Fedora, rawhide" if they relate to the upstream
kernel (either Linus' tree or my -nmw git tree) and indicate in the
comments section which of these kernels you were using
o Send patches if you have them, but please don't let that stop you
reporting bugs. All reports are useful. We might not be able to always
fix each and every report right away, but sometimes patterns emerge via
a number of reports which do allow us to home in on a particularly
tricky issue.
o If you experience a hang, then please include (if possible):
    - A glock lock dump from all nodes (via debugfs)
    - A dlm lock dump from all nodes (via debugfs)
    - A stack trace from all nodes (echo t >/proc/sysrq-trigger)
o If you experience an oops, then please make sure that you include all
the messages (including those which might have been logged just before
the oops itself).

The more people we have testing & reporting bugs, the quicker we can
approach stability.

There is one issue which I'm currently working on relating to a (fairly
rare, but nonetheless possible) race. This happens when two threads
calling ->readpage() race with each other. The reason that this is
problematic is that its the one place left where we are using "try
locks" to get around the page lock/glock lock ordering problem and the
VFS's AOP_TRUNCATED_PAGE return code is not guaranteed to result in
->readpage() being called again if another ->readpage() has raced with
it and brought the page uptodate. As a result "try locks" are the only
option, but for long and complicated reasons when a "try lock" is queued
it might end up triggering a demotion (if a request is pending from a
remote node) which deadlocks due to page lock/glock ordering.

The patch I'm working on at the moment, fixes that problem by failing
the glock (GLR_TRYFAILED) if a demote is needed and scheduling the glock
workqueue to deal with the demotion, thus avoiding the race. The try
lock will then be retried at a later date when it can be successful.

The bugzilla for this is #432057 if you want to follow my progress.

Steve.


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/699ec262/attachment.htm>

From garromo at us.ibm.com  Wed May  7 15:34:50 2008
From: garromo at us.ibm.com (Gary Romo)
Date: Wed, 7 May 2008 09:34:50 -0600
Subject: [Linux-cluster] fence error messages
In-Reply-To: <1210110446.15248.37.camel@localhost.localdomain>
Message-ID: <OFA125DD28.8E27264E-ON87257442.005404F9-87257442.00555E88@us.ibm.com>

Yes I do.

        <fencedevices>
                <fencedevice agent="fence_manual" name="manual_fence"/>
        </fencedevices>

Gary Romo
IBM Global Technology Services
303.458.4415
Email: garromo at us.ibm.com
Pager:1.877.552.9264
Text message: gromo at skytel.com



Lon Hohberger <lhh at redhat.com> 
Sent by: linux-cluster-bounces at redhat.com
05/06/2008 03:47 PM
Please respond to
linux clustering <linux-cluster at redhat.com>


To
linux clustering <linux-cluster at redhat.com>
cc

Subject
Re: [Linux-cluster] fence error messages






On Tue, 2008-05-06 at 15:37 -0600, Gary Romo wrote:
> 
> I am getting these in /var/log/messages 
> 
> May  6 14:46:28 lxomt06e fenced[2849]: fencing node "bogusnode" 
> May  6 14:46:28 lxomt06e fenced[2849]: fence "bogusnode" failed 
> 
> I am basically setting up a single-node cluster, because I don't have
> the 2nd node yet. 
> So I am using manual fence in order to accomplish this. 
> 
> 
>                         <fence> 
>                                 <method name="1"> 
>                                         <device name="manual_fence"
> nodename="bogusnode"/> 
>                                 </method> 
>                         </fence> 

do you have a manual_fence in the <fencedevices> section?

-- Lon


--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/346a54b6/attachment.htm>

From jfranz at freerun.com  Wed May  7 16:23:06 2008
From: jfranz at freerun.com (Jerry Franz)
Date: Wed, 07 May 2008 09:23:06 -0700
Subject: [Linux-cluster] Multipathd not reliably picking up GNBD devices on
	client machines
Message-ID: <4821D76A.3010806@freerun.com>

I've about run out of ideas.

I have assembled a HA stack on Redhat Cluster where a pair of machines 
running DRBD in Primary/Primary mode over bonded gigabit ethernet 
interfaces serve six clustered logical volumes with GFS via GNBD to four 
other machines. All ethernet interfaces are bonded. I've got bonding, 
GNBD, CLVMD, Multipath and DRBD all happy: Except that multipathd simply 
refuses to *reliably* pick up the GNBD devices during system boot. I'll 
run '/etc/init.d/multipathd reload' by hand (sometimes it takes more 
than once) and it will sooner or later pick them up, but I can see no 
rhyme or reason to it: Sometimes it just works, and sometimes it 
doesn't. I'll boot a client machine once and everything might work fine. 
I'll reboot it again, and maybe multipathd won't find the GNBD devices 
(or maybe it will find _some_ of them).

Ideas?

-- 
Benjamin Franz



From bkyoung at gmail.com  Wed May  7 18:14:05 2008
From: bkyoung at gmail.com (Brandon Young)
Date: Wed, 7 May 2008 13:14:05 -0500
Subject: [Linux-cluster] Re: How do you verify/test fencing?
In-Reply-To: <fvs37h$jvo$1@ger.gmane.org>
References: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>
	<1210109704.15248.28.camel@localhost.localdomain>
	<fvs37h$jvo$1@ger.gmane.org>
Message-ID: <824ffea00805071114x34d6b1f1q304a60e1e52e541b@mail.gmail.com>

Unplug the heartbeat cable.

On Wed, May 7, 2008 at 6:19 AM, Chris Picton <lists at tangent.co.za> wrote:

> On Tue, 06 May 2008 17:35:04 -0400, Lon Hohberger wrote:
>
> > On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote:
> >>
> >> Is there a command that you can run to test/veryify that fencing is
> >> working properly?
> >> Or that it is part of the fence if you will? I realize that the primary
> >> focus of the fence is to shut off the other server(s).
> >> However, when I have a cluster up, how can I determine that all of my
> >> nodes are properly fenced?
> >
> >
> > * For testing whether or not fencing works, stop the cluster software on
> > all the nodes and run 'fence_node <nodename>' (where nodename is a host
> > you're not working on).
> >
> > * For testing whether or not a node will be fenced as a matter of
> > recovery, try 'cman_tool services'.  If that node's ID isn't in the
> > "fence" section, it will not be fenced if it fails.
>
> These two step will not test that a node will be fenced automatically if
> it is malfunctioning.
>
> What can be done to a cluster node, to test that it will be automatically
> fenced if there is a problem.
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/71dbeca0/attachment.htm>

From lhh at redhat.com  Wed May  7 18:57:13 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 07 May 2008 14:57:13 -0400
Subject: [Linux-cluster] Red Hat cluster Manager(cman) and rgmanager
In-Reply-To: <D566E8CF3538B54D95B925CB69CB4D2A10060AB8@inblr-exch1.eu.uis.unisys.com>
References: <D566E8CF3538B54D95B925CB69CB4D2A10060AB8@inblr-exch1.eu.uis.unisys.com>
Message-ID: <1210186633.23294.0.camel@dhcp-100-19-208.bos.redhat.com>

On Wed, 2008-05-07 at 17:50 +0530, Mujtaba, Sayed Mohammed wrote:
> Hi,
> 
>  I am interested in studying design of Cluster Manager (cman) and
> rgmanager.

for rgmanager, the README has a lot of the design elements.

> 
>  Is there any specific document available focusing on development of
> these?
> 
>  
> 
>  Or let me know where I can get more information about it as only
> going through 
> 
>  available source code in Red Hat site to understand it is not a good
> idea.
> 
>  
> 
>  
> 
>  Thanks,
> 
> -Mujtaba
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From fog at t.is  Wed May  7 20:57:28 2008
From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=)
Date: Wed, 7 May 2008 20:57:28 -0000
Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is>

Hi,

 

I have a 2 node cluster running RHEL 5.1 x86_64 and fully patched as of today. 

 

If i cold-boot the cluster (both nodes) everything comes up smoothly and i can migrate services between nodes etc...

However when i take one node down i am having difficultys leaving the fence domain. If i kill the fence daemon on the node i am trying to remove gracefully or use cman_tool leave force and reboot it, it comes back up, cman starts and it appears to join the cluster. The CLVMD init script hangs (just sits and hangs) and rgmanager does not start up correctly. Also CLVMD and rgmanager just sit in a zombie state and i have to poweroff or fence the node to get it to reboot....

 

The cluster never stabilizes it self until i cold boot both nodes. Then it is OK until the next reboot. I have read something about similar cases but did not find any magic solution! ;)

 

My cluster.conf is attached.

 

There is no firewall running on the machines in question (chkconfig iptables off;).

 

 

 

Various output from the the that is rebooted:

 

Output from group_tool services:

 

type             level name       id       state

fence            0     default    00000000 JOIN_STOP_WAIT

[1 2]

dlm              1     rgmanager  00000000 JOIN_STOP_WAIT

[1 2]

 

 

Output from group_tool fenced:

1210193027 our_nodeid 1 our_name node-16

1210193027 listen 4 member 5 groupd 7

1210193029 client 3: join default

1210193029 delay post_join 120s post_fail 0s

1210193029 added 2 nodes from ccs

1210193542 client 3: dump

 

 

 

Various output from the other node:

 

Output from group_tool services:

type             level name       id       state

fence            0     default    00010002 JOIN_START_WAIT

[1 2]

dlm              1     clvmd      00020002 none

[2]

dlm              1     rgmanager  00030002 FAIL_ALL_STOPPED

[1 2]

 

Output from group_tool dump fenced:

1210191957 our_nodeid 2 our_name node-17

1210191957 listen 4 member 5 groupd 7

1210191958 client 3: join default

1210191958 delay post_join 120s post_fail 0s

1210191958 added 2 nodes from ccs

1210191958 setid default 65538

1210191958 start default 1 members 2

1210191958 do_recovery stop 0 start 1 finish 0

1210191958 node "node-16" not a cman member, cn 1

1210191958 add first victim node-16

1210191959 node "node-16" not a cman member, cn 1

1210191960 node "node-16" not a cman member, cn 1

1210191961 node "node-16" not a cman member, cn 1

1210191962 node "node-16" not a cman member, cn 1

1210191963 node "node-16" not a cman member, cn 1

1210191964 node "node-16" not a cman member, cn 1

1210191965 node "node-16" not a cman member, cn 1

1210191966 node "node-16" not a cman member, cn 1

1210191967 node "node-16" not a cman member, cn 1

1210191968 node "node-16" not a cman member, cn 1

1210191969 node "node-16" not a cman member, cn 1

1210191970 node "node-16" not a cman member, cn 1

1210191971 node "node-16" not a cman member, cn 1

1210191972 node "node-16" not a cman member, cn 1

1210191973 node "node-16" not a cman member, cn 1

1210191974 reduce victim node-16

1210191974 delay of 16s leaves 0 victims

1210191974 finish default 1

1210191974 stop default

1210191974 start default 2 members 1 2

1210191974 do_recovery stop 1 start 2 finish 1

1210193633 client 3: dump

 

 

Thanks in advanced.

 

K?r kve?ja / Best Regards,

Finnur ?rn Gu?mundsson
Network Engineer - Network Operations
fog at t.is <mailto:fog at t.is> 

TM Software
Ur?arhvarf 6, IS-203 K?pavogur, Iceland
Tel: +354 545 3000 - fax +354 545 3610
www.tm-software.is <http://www.tm-software.is/> 

This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer <http://www.tm-software.is/disclaimer>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/ba0abe6e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 2036 bytes
Desc: cluster.conf
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/ba0abe6e/attachment.obj>

From garromo at us.ibm.com  Wed May  7 22:36:34 2008
From: garromo at us.ibm.com (Gary Romo)
Date: Wed, 7 May 2008 16:36:34 -0600
Subject: [Linux-cluster] Re: How do you verify/test fencing?
In-Reply-To: <824ffea00805071114x34d6b1f1q304a60e1e52e541b@mail.gmail.com>
Message-ID: <OF222C4439.579743C0-ON87257442.007A826C-87257442.007BFADD@us.ibm.com>

We are using multicast address.
I know how to physically test wether fencing is happening or not, but what 
commands can you ask the cluster to report on fencing?
I don't see any, but wanted to double check with the group.  Thanks.


Gary Romo




"Brandon Young" <bkyoung at gmail.com> 
Sent by: linux-cluster-bounces at redhat.com
05/07/2008 12:14 PM
Please respond to
linux clustering <linux-cluster at redhat.com>


To
"linux clustering" <linux-cluster at redhat.com>
cc

Subject
Re: [Linux-cluster] Re: How do you verify/test fencing?






Unplug the heartbeat cable.

On Wed, May 7, 2008 at 6:19 AM, Chris Picton <lists at tangent.co.za> wrote:
On Tue, 06 May 2008 17:35:04 -0400, Lon Hohberger wrote:

> On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote:
>>
>> Is there a command that you can run to test/veryify that fencing is
>> working properly?
>> Or that it is part of the fence if you will? I realize that the primary
>> focus of the fence is to shut off the other server(s).
>> However, when I have a cluster up, how can I determine that all of my
>> nodes are properly fenced?
>
>
> * For testing whether or not fencing works, stop the cluster software on
> all the nodes and run 'fence_node <nodename>' (where nodename is a host
> you're not working on).
>
> * For testing whether or not a node will be fenced as a matter of
> recovery, try 'cman_tool services'.  If that node's ID isn't in the
> "fence" section, it will not be fenced if it fails.

These two step will not test that a node will be fenced automatically if
it is malfunctioning.

What can be done to a cluster node, to test that it will be automatically
fenced if there is a problem.

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080507/c2706337/attachment.htm>

From jas199931 at yahoo.com  Wed May  7 23:41:36 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 7 May 2008 16:41:36 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <424492.84161.qm@web32205.mail.mud.yahoo.com>
Message-ID: <311342.99837.qm@web32207.mail.mud.yahoo.com>


--- Ja S <jas199931 at yahoo.com> wrote:

> > > 
> > > A couple of further questions about the master
> > copy of
> > > lock resources.
> > > 
> > > The first one:
> > > =============
> > > 
> > > Again, assume:
> > > 1) Node A is extremely too busy and handle all
> > > requests
> > > 2) other nodes are just idle and have never
> > handled
> > > any requests
> > > 
> > > According to the documents, Node A will hold all
> > > master copies initially. The thing I am not
> aware
> > of
> > > and unclear is whether the lock manager will
> > evenly
> > > distribute the master copies on Node A to other
> > nodes
> > > when it thinks the number of master copies on
> Node
> > A
> > > is too many?
> > 
> > Locks are only remastered when a node leaves the
> > cluster. In that case
> > all of its nodes will be moved to another node. We
> > do not do dynamic
> > remastering - a resource that is mastered on one
> > node will stay mastered
> > on that node regardless of traffic or load, until
> > all users of the
> > resource have been freed.
> 
> 
> Thank you very much.
> 
> 
> > 
> > > The second one:
> > > ==============
> > > 
> > > Assume a master copy of lock resource is on Node
> > A.
> > > Now Node B holds a local copy of the lock
> > resource.
> > > When the lock queues changed on the local copy
> on
> > Node
> > > B, will the master copy on Node A be updated
> > > simultaneously? If so, when more than one nodes
> > have
> > > the local copy of the same lock resource, how
> the
> > lock
> > > manager to handle the update of the master copy?
> > Using
> > > another lock mechanism to prevent the corruption
> > of
> > > the master copy?
> > > 
> > 
> > All locking happens on the master node. The local
> > copy is just that, a
> > copy. It is updated when the master confirms what
> > has happened. The
> > local copy is there mainly for rebuilding the
> > resource table when a
> > master leaves the cluster, and to keep a track of
> > locks that exist on
> > the local node. The local copy is NOT complete. it
> > only contains local
> > users of a resource.
> > 
> 
> Thanks again for the kind and detailed explanation. 
> 
> 
> I am sorry I have to bother you again as I am having
> more questions. I analysed /proc/cluster/dlm_dir and
> dlm_locks and found some strange things. Please see
> below:
> 
> 
> >From /proc/cluster/dlm_dir:
> 
> In lock space [ABC]:
> This node (node 2) has 445 lock resources in total
> where
> --328   master lock resources
> --117   local copies of lock resources mastered on
> other nodes.
>
> ===============================
> ===============================
> 
> 
> >From /proc/cluster/dlm_locks:
> 
> In lock space [ABC]:
> There are 1678 lock resouces in use where
> --1674  lock resources are mastered by this node
> (node
> 2)
> --4     lock resources are mastered by other nodes,
> within which:
> ----1 lock resource mastered on node 1
> ----1 lock resource mastered on node 3
> ----1 lock resource mastered on node 4
> ----1 lock resource mastered on node 5
> 
> A typical master lock resource in
> /proc/cluster/dlm_locks is:
> Resource 000001000de4fd88 (parent 0000000000000000).
> Name (len=24) "       3         5fafc85"
> Master Copy
> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Granted Queue
> 1ff5036d NL Remote:   4 000603e8
> 80d2013f NL Remote:   5 00040214
> 00240209 NL Remote:   3 0001031d
> 00080095 NL Remote:   1 00040197
> 00010304 NL
> Conversion Queue
> Waiting Queue
> 
> 
> After search for local copy in
> /proc/cluster/dlm_locks, I got:
> Resource 000001002a273618 (parent 0000000000000000).
> Name (len=16) "withdraw 3......"
> Local Copy, Master is node 3
> Granted Queue
> 0004008d PR Master:     0001008c
> Conversion Queue
> Waiting Queue
> 
> --
> Resource 000001003fe69b68 (parent 0000000000000000).
> Name (len=16) "withdraw 5......"
> Local Copy, Master is node 5
> Granted Queue
> 819402ef PR Master:     00010317
> Conversion Queue
> Waiting Queue
> 
> --
> Resource 000001002a2732e8 (parent 0000000000000000).
> Name (len=16) "withdraw 1......"
> Local Copy, Master is node 1
> Granted Queue
> 000401e9 PR Master:     00010074
> Conversion Queue
> Waiting Queue
> 
> --
> Resource 000001004a32e598 (parent 0000000000000000).
> Name (len=16) "withdraw 4......"
> Local Copy, Master is node 4
> Granted Queue
> 1f5b0317 PR Master:     00010203
> Conversion Queue
> Waiting Queue
> 
> These four local copy of lock resources have been
> staying in /proc/cluster/dlm_locks for several days.
> 
> Now my questions:
> 1. In my case, for the same lock space, the number
> of
> master lock resources reported by dlm_dir is much
> SMALLER than that reported in dlm_locks. My
> understanding is that master lock resources listed
> in
> dlm_dir must be larger than or at least the same as
> that reported in dlm_locks. The situation I
> discovered
> on the node does not make any sense to me. Am I
> missing anything? Can you help me to clarify the
> case?

I have found the answer. Yes, I did miss something. I
need to sum all lock resources mastered by the node on
all cluster members. In this case, the total number of
lock resources mastered by the node is just 1674,
which matches the number reported from dlm_locks.
Sorry for asking the question without careful
thinking.


> 2. What can cause "withdraw ...." to be the lock
> resource name? 

After read the gfs source code, it seems that this is
caused by issuing a command like "gfs_tool withdraw
<mountpoint>". However, I checked all command
histroies on all nodes in the cluster, but did not
find any command like this. This question and the next
question remain open. Please help.
 
> 3. These four local copy of lock resources have not
> been released for at least serveral days as I knew.
> How can I find out whether they are in a strange
> dead
> situation or are still waiting for the lock manager
> to release them? How to change the timeout?
> 

Thank you very much for your great further help in
advance.

Jas



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From ccaulfie at redhat.com  Thu May  8 07:21:31 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Thu, 08 May 2008 08:21:31 +0100
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <311342.99837.qm@web32207.mail.mud.yahoo.com>
References: <311342.99837.qm@web32207.mail.mud.yahoo.com>
Message-ID: <4822A9FB.50804@redhat.com>

Ja S wrote:
> --- Ja S <jas199931 at yahoo.com> wrote:
> 
>>>> A couple of further questions about the master
>>> copy of
>>>> lock resources.
>>>>
>>>> The first one:
>>>> =============
>>>>
>>>> Again, assume:
>>>> 1) Node A is extremely too busy and handle all
>>>> requests
>>>> 2) other nodes are just idle and have never
>>> handled
>>>> any requests
>>>>
>>>> According to the documents, Node A will hold all
>>>> master copies initially. The thing I am not
>> aware
>>> of
>>>> and unclear is whether the lock manager will
>>> evenly
>>>> distribute the master copies on Node A to other
>>> nodes
>>>> when it thinks the number of master copies on
>> Node
>>> A
>>>> is too many?
>>> Locks are only remastered when a node leaves the
>>> cluster. In that case
>>> all of its nodes will be moved to another node. We
>>> do not do dynamic
>>> remastering - a resource that is mastered on one
>>> node will stay mastered
>>> on that node regardless of traffic or load, until
>>> all users of the
>>> resource have been freed.
>>
>> Thank you very much.
>>
>>
>>>> The second one:
>>>> ==============
>>>>
>>>> Assume a master copy of lock resource is on Node
>>> A.
>>>> Now Node B holds a local copy of the lock
>>> resource.
>>>> When the lock queues changed on the local copy
>> on
>>> Node
>>>> B, will the master copy on Node A be updated
>>>> simultaneously? If so, when more than one nodes
>>> have
>>>> the local copy of the same lock resource, how
>> the
>>> lock
>>>> manager to handle the update of the master copy?
>>> Using
>>>> another lock mechanism to prevent the corruption
>>> of
>>>> the master copy?
>>>>
>>> All locking happens on the master node. The local
>>> copy is just that, a
>>> copy. It is updated when the master confirms what
>>> has happened. The
>>> local copy is there mainly for rebuilding the
>>> resource table when a
>>> master leaves the cluster, and to keep a track of
>>> locks that exist on
>>> the local node. The local copy is NOT complete. it
>>> only contains local
>>> users of a resource.
>>>
>> Thanks again for the kind and detailed explanation. 
>>
>>
>> I am sorry I have to bother you again as I am having
>> more questions. I analysed /proc/cluster/dlm_dir and
>> dlm_locks and found some strange things. Please see
>> below:
>>
>>
>> >From /proc/cluster/dlm_dir:
>>
>> In lock space [ABC]:
>> This node (node 2) has 445 lock resources in total
>> where
>> --328   master lock resources
>> --117   local copies of lock resources mastered on
>> other nodes.
>>
>> ===============================
>> ===============================
>>
>>
>> >From /proc/cluster/dlm_locks:
>>
>> In lock space [ABC]:
>> There are 1678 lock resouces in use where
>> --1674  lock resources are mastered by this node
>> (node
>> 2)
>> --4     lock resources are mastered by other nodes,
>> within which:
>> ----1 lock resource mastered on node 1
>> ----1 lock resource mastered on node 3
>> ----1 lock resource mastered on node 4
>> ----1 lock resource mastered on node 5
>>
>> A typical master lock resource in
>> /proc/cluster/dlm_locks is:
>> Resource 000001000de4fd88 (parent 0000000000000000).
>> Name (len=24) "       3         5fafc85"
>> Master Copy
>> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00
>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> Granted Queue
>> 1ff5036d NL Remote:   4 000603e8
>> 80d2013f NL Remote:   5 00040214
>> 00240209 NL Remote:   3 0001031d
>> 00080095 NL Remote:   1 00040197
>> 00010304 NL
>> Conversion Queue
>> Waiting Queue
>>
>>
>> After search for local copy in
>> /proc/cluster/dlm_locks, I got:
>> Resource 000001002a273618 (parent 0000000000000000).
>> Name (len=16) "withdraw 3......"
>> Local Copy, Master is node 3
>> Granted Queue
>> 0004008d PR Master:     0001008c
>> Conversion Queue
>> Waiting Queue
>>
>> --
>> Resource 000001003fe69b68 (parent 0000000000000000).
>> Name (len=16) "withdraw 5......"
>> Local Copy, Master is node 5
>> Granted Queue
>> 819402ef PR Master:     00010317
>> Conversion Queue
>> Waiting Queue
>>
>> --
>> Resource 000001002a2732e8 (parent 0000000000000000).
>> Name (len=16) "withdraw 1......"
>> Local Copy, Master is node 1
>> Granted Queue
>> 000401e9 PR Master:     00010074
>> Conversion Queue
>> Waiting Queue
>>
>> --
>> Resource 000001004a32e598 (parent 0000000000000000).
>> Name (len=16) "withdraw 4......"
>> Local Copy, Master is node 4
>> Granted Queue
>> 1f5b0317 PR Master:     00010203
>> Conversion Queue
>> Waiting Queue
>>
>> These four local copy of lock resources have been
>> staying in /proc/cluster/dlm_locks for several days.
>>
>> Now my questions:
>> 1. In my case, for the same lock space, the number
>> of
>> master lock resources reported by dlm_dir is much
>> SMALLER than that reported in dlm_locks. My
>> understanding is that master lock resources listed
>> in
>> dlm_dir must be larger than or at least the same as
>> that reported in dlm_locks. The situation I
>> discovered
>> on the node does not make any sense to me. Am I
>> missing anything? Can you help me to clarify the
>> case?
> 
> I have found the answer. Yes, I did miss something. I
> need to sum all lock resources mastered by the node on
> all cluster members. In this case, the total number of
> lock resources mastered by the node is just 1674,
> which matches the number reported from dlm_locks.
> Sorry for asking the question without careful
> thinking.
> 
> 
>> 2. What can cause "withdraw ...." to be the lock
>> resource name? 
> 
> After read the gfs source code, it seems that this is
> caused by issuing a command like "gfs_tool withdraw
> <mountpoint>". However, I checked all command
> histroies on all nodes in the cluster, but did not
> find any command like this. This question and the next
> question remain open. Please help.


You might like to ask GFS-specific questions on a new thread. I don't
know about GFS and the people who do are probable not reading this one
by now ;-)


>> 3. These four local copy of lock resources have not
>> been released for at least serveral days as I knew.
>> How can I find out whether they are in a strange
>> dead
>> situation or are still waiting for the lock manager
>> to release them? How to change the timeout?

There is no lock timeout for local copies. If a lock is shown in
dlm_locks then either the lock is active somewhere or you have found a bug!

Bear in mind that this is a DLM response, GFS does cache locks but don't
know the details.



-- 

Chrissie



From jas199931 at yahoo.com  Thu May  8 08:03:48 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 01:03:48 -0700 (PDT)
Subject: [Linux-cluster] Lock Resources
In-Reply-To: <4822A9FB.50804@redhat.com>
Message-ID: <143345.94581.qm@web32208.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > --- Ja S <jas199931 at yahoo.com> wrote:
> > 
> >>>> A couple of further questions about the master
> >>> copy of
> >>>> lock resources.
> >>>>
> >>>> The first one:
> >>>> =============
> >>>>
> >>>> Again, assume:
> >>>> 1) Node A is extremely too busy and handle all
> >>>> requests
> >>>> 2) other nodes are just idle and have never
> >>> handled
> >>>> any requests
> >>>>
> >>>> According to the documents, Node A will hold
> all
> >>>> master copies initially. The thing I am not
> >> aware
> >>> of
> >>>> and unclear is whether the lock manager will
> >>> evenly
> >>>> distribute the master copies on Node A to other
> >>> nodes
> >>>> when it thinks the number of master copies on
> >> Node
> >>> A
> >>>> is too many?
> >>> Locks are only remastered when a node leaves the
> >>> cluster. In that case
> >>> all of its nodes will be moved to another node.
> We
> >>> do not do dynamic
> >>> remastering - a resource that is mastered on one
> >>> node will stay mastered
> >>> on that node regardless of traffic or load,
> until
> >>> all users of the
> >>> resource have been freed.
> >>
> >> Thank you very much.
> >>
> >>
> >>>> The second one:
> >>>> ==============
> >>>>
> >>>> Assume a master copy of lock resource is on
> Node
> >>> A.
> >>>> Now Node B holds a local copy of the lock
> >>> resource.
> >>>> When the lock queues changed on the local copy
> >> on
> >>> Node
> >>>> B, will the master copy on Node A be updated
> >>>> simultaneously? If so, when more than one nodes
> >>> have
> >>>> the local copy of the same lock resource, how
> >> the
> >>> lock
> >>>> manager to handle the update of the master
> copy?
> >>> Using
> >>>> another lock mechanism to prevent the
> corruption
> >>> of
> >>>> the master copy?
> >>>>
> >>> All locking happens on the master node. The
> local
> >>> copy is just that, a
> >>> copy. It is updated when the master confirms
> what
> >>> has happened. The
> >>> local copy is there mainly for rebuilding the
> >>> resource table when a
> >>> master leaves the cluster, and to keep a track
> of
> >>> locks that exist on
> >>> the local node. The local copy is NOT complete.
> it
> >>> only contains local
> >>> users of a resource.
> >>>
> >> Thanks again for the kind and detailed
> explanation. 
> >>
> >>
> >> I am sorry I have to bother you again as I am
> having
> >> more questions. I analysed /proc/cluster/dlm_dir
> and
> >> dlm_locks and found some strange things. Please
> see
> >> below:
> >>
> >>
> >> >From /proc/cluster/dlm_dir:
> >>
> >> In lock space [ABC]:
> >> This node (node 2) has 445 lock resources in
> total
> >> where
> >> --328   master lock resources
> >> --117   local copies of lock resources mastered
> on
> >> other nodes.
> >>
> >> ===============================
> >> ===============================
> >>
> >>
> >> >From /proc/cluster/dlm_locks:
> >>
> >> In lock space [ABC]:
> >> There are 1678 lock resouces in use where
> >> --1674  lock resources are mastered by this node
> >> (node
> >> 2)
> >> --4     lock resources are mastered by other
> nodes,
> >> within which:
> >> ----1 lock resource mastered on node 1
> >> ----1 lock resource mastered on node 3
> >> ----1 lock resource mastered on node 4
> >> ----1 lock resource mastered on node 5
> >>
> >> A typical master lock resource in
> >> /proc/cluster/dlm_locks is:
> >> Resource 000001000de4fd88 (parent
> 0000000000000000).
> >> Name (len=24) "       3         5fafc85"
> >> Master Copy
> >> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00
> 00
> >>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 00
> >> Granted Queue
> >> 1ff5036d NL Remote:   4 000603e8
> >> 80d2013f NL Remote:   5 00040214
> >> 00240209 NL Remote:   3 0001031d
> >> 00080095 NL Remote:   1 00040197
> >> 00010304 NL
> >> Conversion Queue
> >> Waiting Queue
> >>
> >>
> >> After search for local copy in
> >> /proc/cluster/dlm_locks, I got:
> >> Resource 000001002a273618 (parent
> 0000000000000000).
> >> Name (len=16) "withdraw 3......"
> >> Local Copy, Master is node 3
> >> Granted Queue
> >> 0004008d PR Master:     0001008c
> >> Conversion Queue
> >> Waiting Queue
> >>
> >> --
> >> Resource 000001003fe69b68 (parent
> 0000000000000000).
> >> Name (len=16) "withdraw 5......"
> >> Local Copy, Master is node 5
> >> Granted Queue
> >> 819402ef PR Master:     00010317
> >> Conversion Queue
> >> Waiting Queue
> >>
> >> --
> >> Resource 000001002a2732e8 (parent
> 0000000000000000).
> >> Name (len=16) "withdraw 1......"
> >> Local Copy, Master is node 1
> >> Granted Queue
> >> 000401e9 PR Master:     00010074
> >> Conversion Queue
> >> Waiting Queue
> >>
> >> --
> >> Resource 000001004a32e598 (parent
> 0000000000000000).
> >> Name (len=16) "withdraw 4......"
> >> Local Copy, Master is node 4
> >> Granted Queue
> >> 1f5b0317 PR Master:     00010203
> >> Conversion Queue
> >> Waiting Queue
> >>
> >> These four local copy of lock resources have been
> >> staying in /proc/cluster/dlm_locks for several
> days.
> >>
> >> Now my questions:
> >> 1. In my case, for the same lock space, the
> number
> >> of
> >> master lock resources reported by dlm_dir is much
> >> SMALLER than that reported in dlm_locks. My
> >> understanding is that master lock resources
> listed
> >> in
> >> dlm_dir must be larger than or at least the same
> as
> >> that reported in dlm_locks. The situation I
> >> discovered
> >> on the node does not make any sense to me. Am I
> >> missing anything? Can you help me to clarify the
> >> case?
> > 
> > I have found the answer. Yes, I did miss
> something. I
> > need to sum all lock resources mastered by the
> node on
> > all cluster members. In this case, the total
> number of
> > lock resources mastered by the node is just 1674,
> > which matches the number reported from dlm_locks.
> > Sorry for asking the question without careful
> > thinking.
> > 
> > 
> >> 2. What can cause "withdraw ...." to be the lock
> >> resource name? 
> > 
> > After read the gfs source code, it seems that this
> is
> > caused by issuing a command like "gfs_tool
> withdraw
> > <mountpoint>". However, I checked all command
> > histroies on all nodes in the cluster, but did not
> > find any command like this. This question and the
> next
> > question remain open. Please help.
> 
> 
> You might like to ask GFS-specific questions on a
> new thread. I don't
> know about GFS and the people who do are probable
> not reading this one
> by now ;-)
> 
> 
> >> 3. These four local copy of lock resources have
> not
> >> been released for at least serveral days as I
> knew.
> >> How can I find out whether they are in a strange
> >> dead
> >> situation or are still waiting for the lock
> manager
> >> to release them? How to change the timeout?
> 
> There is no lock timeout for local copies. If a lock
> is shown in
> dlm_locks then either the lock is active somewhere
> or you have found a bug!
> 
> Bear in mind that this is a DLM response, GFS does
> cache locks but don't
> know the details.
> 


Thank you for the information.

Best,

Jas


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From jas199931 at yahoo.com  Thu May  8 08:49:05 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 01:49:05 -0700 (PDT)
Subject: [Linux-cluster] GFS lock cache or bug?
Message-ID: <163785.26143.qm@web32208.mail.mud.yahoo.com>

Hi, All:

I used to 'ls -la' a subdirecotry, which contains more
than 30,000 small files, on a SAN storage long time
ago just once from Node 5, which sits in the cluster
but does nothing. In other words, Node 5 is an idel
node.

Now when I looked at /proc/cluster/dlm_locks on the
node, I realised that there are many PR locks and the
number of PR clocks is pretty much the same as the
number of files in the subdirectory I used to list. 

Then I randomly picked up some lock resources and
converted the second part (hex number) of the name of
the lock resources to decimal numbers, which are
simply the inode numbers. Then I searched the
subdirectory and confirmed that these inode numbers
match the files in the subdirectory.


Now, my questions are:

1) how can I find out which unix command requires what
kind of locks? Does the ls command really need PR
lock? 

2) how long GFS caches the locks?

3) whether we can configure the caching period?

4) if GFS should not cache the lock for so many days,
then does it mean this is a bug?

5) Is that a way to find out which process requires a
particular lock? Below is a typical record in
dlm_locks on Node 5. Is any piece of information
useful for identifing the process? 

Resource d95d2ccc (parent 00000000). Name (len=24) "  
    5          cb5d35"
Local Copy, Master is node 1
Granted Queue
137203da PR Master:     73980279
Conversion Queue
Waiting Queue


6) If I am sure that no processes or applications are
accessing the subdirectory, then how I can force GFS
release these PR locks so that DLM can release the
corresponding lock resources as well.


Thank you very much for reading the questions and look
forward to hearing from you.

Jas


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From denisb+gmane at gmail.com  Thu May  8 11:05:58 2008
From: denisb+gmane at gmail.com (denis)
Date: Thu, 08 May 2008 13:05:58 +0200
Subject: [Linux-cluster] Re: How do you verify/test fencing?
In-Reply-To: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>
References: <OF17000440.5B2F914B-ON87257441.00707ED9-87257441.0071162E@us.ibm.com>
Message-ID: <fvumqm$23u$1@ger.gmane.org>

Gary Romo wrote:
> 
> Is there a command that you can run to test/veryify that fencing is 
> working properly?
> Or that it is part of the fence if you will?
> I realize that the primary focus of the fence is to shut off the other 
> server(s).
> However, when I have a cluster up, how can I determine that all of my 
> nodes are properly fenced?

well.. If you would like to check that fencing devices are properly 
configured via cluster.conf, issue fence_node NODENAME pr. node. with 
all cluster services running.

Of course, you would ideally want to do this one node at a time.

This will ensure cluster.conf has proper fencing setup and that the 
fencing devices actually work. If the node isn't locked out of the 
cluster when you issue fence_node then there is something wrong..

--
Denis



From s.wendy.cheng at gmail.com  Thu May  8 13:28:22 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Thu, 08 May 2008 09:28:22 -0400
Subject: [Linux-cluster] GFS lock cache or bug?
In-Reply-To: <163785.26143.qm@web32208.mail.mud.yahoo.com>
References: <163785.26143.qm@web32208.mail.mud.yahoo.com>
Message-ID: <4822FFF6.4000309@gmail.com>

Ja S wrote:
> Hi, All:
>   

I have an old write-up about GFS lock cache issues. Shareroot people had 
pulled it into their web site:
http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs

It should explain some of your confusions. The tunables described in 
that write-up are formally included into RHEL 5.1 and RHEL 4.6 right now 
(so no need to ask for private patches).

There is a long story about GFS(1)'s "ls -la" problem that one time I 
did plan to do something about it. Unfortunately I'm having a new job 
now so the better bet is probably going for GFS2.

Will pass some thoughts about GFS1's "ls -la" when I have some spare 
time next week.

-- Wendy

> I used to 'ls -la' a subdirecotry, which contains more
> than 30,000 small files, on a SAN storage long time
> ago just once from Node 5, which sits in the cluster
> but does nothing. In other words, Node 5 is an idel
> node.
>
> Now when I looked at /proc/cluster/dlm_locks on the
> node, I realised that there are many PR locks and the
> number of PR clocks is pretty much the same as the
> number of files in the subdirectory I used to list. 
>
> Then I randomly picked up some lock resources and
> converted the second part (hex number) of the name of
> the lock resources to decimal numbers, which are
> simply the inode numbers. Then I searched the
> subdirectory and confirmed that these inode numbers
> match the files in the subdirectory.
>
>
> Now, my questions are:
>
> 1) how can I find out which unix command requires what
> kind of locks? Does the ls command really need PR
> lock? 
>
> 2) how long GFS caches the locks?
>
> 3) whether we can configure the caching period?
>
> 4) if GFS should not cache the lock for so many days,
> then does it mean this is a bug?
>
> 5) Is that a way to find out which process requires a
> particular lock? Below is a typical record in
> dlm_locks on Node 5. Is any piece of information
> useful for identifing the process? 
>
> Resource d95d2ccc (parent 00000000). Name (len=24) "  
>     5          cb5d35"
> Local Copy, Master is node 1
> Granted Queue
> 137203da PR Master:     73980279
> Conversion Queue
> Waiting Queue
>
>
> 6) If I am sure that no processes or applications are
> accessing the subdirectory, then how I can force GFS
> release these PR locks so that DLM can release the
> corresponding lock resources as well.
>
>
> Thank you very much for reading the questions and look
> forward to hearing from you.
>
> Jas
>
>
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   



From smeacham at charter.net  Thu May  8 13:51:47 2008
From: smeacham at charter.net (smeacham at charter.net)
Date: Thu, 8 May 2008 13:51:47 +0000
Subject: [Linux-cluster] GFS lock cache or bug?
In-Reply-To: <4822FFF6.4000309@gmail.com>
References: <163785.26143.qm@web32208.mail.mud.yahoo.com><4822FFF6.4000309@gmail.com>
Message-ID: <642289788-1210254609-cardhu_decombobulator_blackberry.rim.net-2131772402-@bxe151.bisx.prod.on.blackberry>














Sent via BlackBerry by AT&T

-----Original Message-----
From: Wendy Cheng <s.wendy.cheng at gmail.com>

Date: Thu, 08 May 2008 09:28:22 
To:linux clustering <linux-cluster at redhat.com>
Subject: Re: [Linux-cluster] GFS lock cache or bug?


Ja S wrote:
> Hi, All:
>   

I have an old write-up about GFS lock cache issues. Shareroot people had 
pulled it into their web site:
http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs

It should explain some of your confusions. The tunables described in 
that write-up are formally included into RHEL 5.1 and RHEL 4.6 right now 
(so no need to ask for private patches).

There is a long story about GFS(1)'s "ls -la" problem that one time I 
did plan to do something about it. Unfortunately I'm having a new job 
now so the better bet is probably going for GFS2.

Will pass some thoughts about GFS1's "ls -la" when I have some spare 
time next week.

-- Wendy

> I used to 'ls -la' a subdirecotry, which contains more
> than 30,000 small files, on a SAN storage long time
> ago just once from Node 5, which sits in the cluster
> but does nothing. In other words, Node 5 is an idel
> node.
>
> Now when I looked at /proc/cluster/dlm_locks on the
> node, I realised that there are many PR locks and the
> number of PR clocks is pretty much the same as the
> number of files in the subdirectory I used to list. 
>
> Then I randomly picked up some lock resources and
> converted the second part (hex number) of the name of
> the lock resources to decimal numbers, which are
> simply the inode numbers. Then I searched the
> subdirectory and confirmed that these inode numbers
> match the files in the subdirectory.
>
>
> Now, my questions are:
>
> 1) how can I find out which unix command requires what
> kind of locks? Does the ls command really need PR
> lock? 
>
> 2) how long GFS caches the locks?
>
> 3) whether we can configure the caching period?
>
> 4) if GFS should not cache the lock for so many days,
> then does it mean this is a bug?
>
> 5) Is that a way to find out which process requires a
> particular lock? Below is a typical record in
> dlm_locks on Node 5. Is any piece of information
> useful for identifing the process? 
>
> Resource d95d2ccc (parent 00000000). Name (len=24) "  
>     5          cb5d35"
> Local Copy, Master is node 1
> Granted Queue
> 137203da PR Master:     73980279
> Conversion Queue
> Waiting Queue
>
>
> 6) If I am sure that no processes or applications are
> accessing the subdirectory, then how I can force GFS
> release these PR locks so that DLM can release the
> corresponding lock resources as well.
>
>
> Thank you very much for reading the questions and look
> forward to hearing from you.
>
> Jas
>
>
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From jas199931 at yahoo.com  Thu May  8 14:28:52 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 07:28:52 -0700 (PDT)
Subject: [Linux-cluster] GFS lock cache or bug?
In-Reply-To: <4822FFF6.4000309@gmail.com>
Message-ID: <109539.63539.qm@web32205.mail.mud.yahoo.com>

Hi Wendy:

Thank you very much for the kind answer.

Unfortunately, I am using Red Hat Enterprise Linux WS
release 4 (Nahant Update 5) 2.6.9-42.ELsmp.

When I ran gfs_tool gettune /mnt/ABC, I got:

ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 300
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 5
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 1
quota_account = 1
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100

There is no glock_purge option. I will try to tune
demote_secs, but I don't think it will fix 'ls -la'
issue.


By the way, could you please kindly direct me to a
place where I can find detailed explanations of these
tunable options?


Best,

Jas



--- Wendy Cheng <s.wendy.cheng at gmail.com> wrote:

> Ja S wrote:
> > Hi, All:
> >   
> 
> I have an old write-up about GFS lock cache issues.
> Shareroot people had 
> pulled it into their web site:
>
http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs
> 
> It should explain some of your confusions. The
> tunables described in 
> that write-up are formally included into RHEL 5.1
> and RHEL 4.6 right now 
> (so no need to ask for private patches).
> 
> There is a long story about GFS(1)'s "ls -la"
> problem that one time I 
> did plan to do something about it. Unfortunately I'm
> having a new job 
> now so the better bet is probably going for GFS2.
> 
> Will pass some thoughts about GFS1's "ls -la" when I
> have some spare 
> time next week.
> 
> -- Wendy
> 
> > I used to 'ls -la' a subdirecotry, which contains
> more
> > than 30,000 small files, on a SAN storage long
> time
> > ago just once from Node 5, which sits in the
> cluster
> > but does nothing. In other words, Node 5 is an
> idel
> > node.
> >
> > Now when I looked at /proc/cluster/dlm_locks on
> the
> > node, I realised that there are many PR locks and
> the
> > number of PR clocks is pretty much the same as the
> > number of files in the subdirectory I used to
> list. 
> >
> > Then I randomly picked up some lock resources and
> > converted the second part (hex number) of the name
> of
> > the lock resources to decimal numbers, which are
> > simply the inode numbers. Then I searched the
> > subdirectory and confirmed that these inode
> numbers
> > match the files in the subdirectory.
> >
> >
> > Now, my questions are:
> >
> > 1) how can I find out which unix command requires
> what
> > kind of locks? Does the ls command really need PR
> > lock? 
> >
> > 2) how long GFS caches the locks?
> >
> > 3) whether we can configure the caching period?
> >
> > 4) if GFS should not cache the lock for so many
> days,
> > then does it mean this is a bug?
> >
> > 5) Is that a way to find out which process
> requires a
> > particular lock? Below is a typical record in
> > dlm_locks on Node 5. Is any piece of information
> > useful for identifing the process? 
> >
> > Resource d95d2ccc (parent 00000000). Name (len=24)
> "  
> >     5          cb5d35"
> > Local Copy, Master is node 1
> > Granted Queue
> > 137203da PR Master:     73980279
> > Conversion Queue
> > Waiting Queue
> >
> >
> > 6) If I am sure that no processes or applications
> are
> > accessing the subdirectory, then how I can force
> GFS
> > release these PR locks so that DLM can release the
> > corresponding lock resources as well.
> >
> >
> > Thank you very much for reading the questions and
> look
> > forward to hearing from you.
> >
> > Jas
> >
> >
> >      
>
____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> >   
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From s.wendy.cheng at gmail.com  Thu May  8 15:05:43 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Thu, 08 May 2008 11:05:43 -0400
Subject: [Linux-cluster] GFS lock cache or bug?
In-Reply-To: <109539.63539.qm@web32205.mail.mud.yahoo.com>
References: <109539.63539.qm@web32205.mail.mud.yahoo.com>
Message-ID: <482316C7.2000606@gmail.com>

Ja S wrote:
> Hi Wendy:
>
> Thank you very much for the kind answer.
>
> Unfortunately, I am using Red Hat Enterprise Linux WS
> release 4 (Nahant Update 5) 2.6.9-42.ELsmp.
>
> When I ran gfs_tool gettune /mnt/ABC, I got:
>   
[snip] ..
>
>
> There is no glock_purge option. I will try to tune
> demote_secs, but I don't think it will fix 'ls -la'
> issue.
>   
No, it will not. Don't waste your time. Will try to explain this more 
whenever I get a chance (but not right now).
>
> By the way, could you please kindly direct me to a
> place where I can find detailed explanations of these
> tunable options?
>
>
>   

There is one called readme.gfs_tune - in theory, it is in: 
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_tune.

Just check few minutes ago ... my people page seems to have become Bob 
Peterson's people page but large amount of my old write-ups and 
unpublished patches still there. So if you type "wcheng", you probably 
will get "rpeterso" - contents are mostly the same though. 

There are also few GFS1/GFS2/NFS patches, as well as the detailed NFS 
over GFS documents, GFS glock write-ups, etc, in the (people's page) 
"Patches" and "Project" directories. Feel free to peek and/or try them 
out (but I suspect they'll disappear soon).

On the other hand, if GFS2 is out in time, there is really no point to 
mess around with GFS1 any more - it is old and outdated anyway.

-- Wendy



From pbruna at it-linux.cl  Thu May  8 16:26:10 2008
From: pbruna at it-linux.cl (Patricio A. Bruna)
Date: Thu, 8 May 2008 12:26:10 -0400 (CLT)
Subject: [Linux-cluster] script.sh : status & monitor
In-Reply-To: <1206556464.4684.111.camel@ayanami.boston.devel.redhat.com>
Message-ID: <32156755.38501210263970324.JavaMail.root@lisa.itlinux.cl>

Hi, 
What result do Cluster Suite waits for when execute /etc/init.d/xxx status? 
I guess is a value from RETVAL, if so, which one would be OK and Failed? 

Thanks 

------------------------------------ 
Patricio Bruna V. 
IT Linux Ltda. 
http://www.it-linux.cl 
Fono : (+56-2) 333 0578 
M?vil : (+56-09) 8827 0342 

----- "Lon Hohberger" escribi?: 
On Wed, 2008-03-26 at 14:01 +0100, Alain Moulle wrote: 
> Hi 
> 
> In script.sh we can see these two lines : 
> 
> <action name="status" interval="30s" timeout="0"/> 
> <action name="monitor" interval="30s" timeout="0"/> 

Right now, nothing. 

> I guess one is the periodic status call on services launched 
> by the Cluster Suite but which one ? 

status is the one you want to look for in your script (think: What would 
a SysV init script do?) :) 

-- Lon 

-- 
Linux-cluster mailing list 
Linux-cluster at redhat.com 
https://www.redhat.com/mailman/listinfo/linux-cluster 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080508/5f5bf902/attachment.htm>

From rpeterso at redhat.com  Thu May  8 16:23:20 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 08 May 2008 11:23:20 -0500
Subject: [Linux-cluster] GFS lock cache or bug?
In-Reply-To: <482316C7.2000606@gmail.com>
References: <109539.63539.qm@web32205.mail.mud.yahoo.com>
	<482316C7.2000606@gmail.com>
Message-ID: <1210263800.2764.9.camel@technetium.msp.redhat.com>

Hi,

On Thu, 2008-05-08 at 11:05 -0400, Wendy Cheng wrote:
> Just check few minutes ago ... my people page seems to have become Bob 
> Peterson's people page but large amount of my old write-ups and 
> unpublished patches still there. So if you type "wcheng", you probably 
> will get "rpeterso" - contents are mostly the same though. 

I haven't removed anything, so all of Wendy's patches and contents are
still there.  

> There are also few GFS1/GFS2/NFS patches, as well as the detailed NFS 
> over GFS documents, GFS glock write-ups, etc, in the (people's page) 
> "Patches" and "Project" directories. Feel free to peek and/or try them 
> out (but I suspect they'll disappear soon).

I don't have any plans to make any of Wendy's patches or content
disappear.  In fact, one of the reasons I wanted it moved under my
name was to safeguard it since Wendy left Red Hat.  I didn't want
some Red Hat administrator to say, "What's this wcheng people page?
She doesn't work here anymore; let's delete it all."  This way it's safe.

Regards,

Bob Peterson
Red Hat Clustering & GFS




From dist-list at LEXUM.UMontreal.CA  Thu May  8 17:09:32 2008
From: dist-list at LEXUM.UMontreal.CA (FM)
Date: Thu, 08 May 2008 13:09:32 -0400
Subject: [Linux-cluster] network best practice for cluster?
Message-ID: <482333CC.20104@lexum.umontreal.ca>

Hello,

We read a lot of gfs tuning, number of nodes, etc. But how about the 
network infrastructure ?

is a separate network/vlan for dlm is the way to go ? Do you tune the 
network stack to speed dlm ?

In my server room, it is very simple :
GFS-1
2 directors behind the firewall (using NAT).
and 5 nodes behind them with 2 nicks (using bonding). All requests and 
dlm network traffic are using the same network (and the same bonded card).
The network is Gb

I am very curious to know the best practice ! The cluster is working 
great (especilly since the introduction of the glock_purge parameter) 
but there is always room for improvement !

regards,



From lhh at redhat.com  Thu May  8 19:44:27 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 08 May 2008 15:44:27 -0400
Subject: [Linux-cluster] network best practice for cluster?
In-Reply-To: <482333CC.20104@lexum.umontreal.ca>
References: <482333CC.20104@lexum.umontreal.ca>
Message-ID: <1210275867.4582.14.camel@ayanami.boston.devel.redhat.com>

On Thu, 2008-05-08 at 13:09 -0400, FM wrote:
> Hello,
> 
> We read a lot of gfs tuning, number of nodes, etc. But how about the 
> network infrastructure ?
> 
> is a separate network/vlan for dlm is the way to go ? Do you tune the 
> network stack to speed dlm ?

The cluster (generally) including the DLM and fencing should be on a
separate network from other hosts if possible.

I don't know that a special network just for DLM traffic is necessary.

-- Lon



From jas199931 at yahoo.com  Thu May  8 21:27:00 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 14:27:00 -0700 (PDT)
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
Message-ID: <620879.5230.qm@web32201.mail.mud.yahoo.com>

Hi, All:

I used to post this question before, but have not
received any comments yet. Please allow me post it
again.

I have a subdirectory containing more than 30,000
small files on a SAN storage (GFS1+DLM, RAID10). No
user application knows the existence of the
subdirectory. In other words, the subdirectory is free
of accessing. 

However, it took ages to list the subdirectory on an
absolute idle cluster node. See below:

# time ls -la | wc -l
31767

real    3m5.249s
user    0m0.628s
sys     0m5.137s

There are about 3 minutes spent on somewhere. Does
anyone have any clue what the system was waiting for?


Thanks for your time and wish to see your valuable
comments soon.

Jas


      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From gordan at bobich.net  Thu May  8 21:38:39 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Thu, 08 May 2008 22:38:39 +0100
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <620879.5230.qm@web32201.mail.mud.yahoo.com>
References: <620879.5230.qm@web32201.mail.mud.yahoo.com>
Message-ID: <482372DF.9000706@bobich.net>

30K files?!
That'll take a while even on a local file system.

Gordan

Ja S wrote:
> Hi, All:
> 
> I used to post this question before, but have not
> received any comments yet. Please allow me post it
> again.
> 
> I have a subdirectory containing more than 30,000
> small files on a SAN storage (GFS1+DLM, RAID10). No
> user application knows the existence of the
> subdirectory. In other words, the subdirectory is free
> of accessing. 
> 
> However, it took ages to list the subdirectory on an
> absolute idle cluster node. See below:
> 
> # time ls -la | wc -l
> 31767
> 
> real    3m5.249s
> user    0m0.628s
> sys     0m5.137s
> 
> There are about 3 minutes spent on somewhere. Does
> anyone have any clue what the system was waiting for?
> 
> 
> Thanks for your time and wish to see your valuable
> comments soon.
> 
> Jas
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From s.wendy.cheng at gmail.com  Thu May  8 21:51:03 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Thu, 08 May 2008 17:51:03 -0400
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <620879.5230.qm@web32201.mail.mud.yahoo.com>
References: <620879.5230.qm@web32201.mail.mud.yahoo.com>
Message-ID: <482375C7.2060207@gmail.com>

Ja S wrote:
> Hi, All:
>
> I used to post this question before, but have not
> received any comments yet. Please allow me post it
> again.
>
> I have a subdirectory containing more than 30,000
> small files on a SAN storage (GFS1+DLM, RAID10). No
> user application knows the existence of the
> subdirectory. In other words, the subdirectory is free
> of accessing. 
>   
Short answer is to remember "ls" and "ls -la" are very different 
commands. "ls" is a directory read (that reads from one single file) but 
"ls -la" needs to get file attributes (file size, modification times, 
ownership, etc) from *each* of the files from the subject directory. In 
your case, it needs to read more than 30,000 inodes to get them. The "ls 
-la" is slower for *any* filesystem but particularly troublesome for a 
cluster filesystem such as GFS due to:

1. Cluster locking overheads (it needs readlocks from *each* of the 
files involved).
2. Depending on when and how these files are created. During file 
creation time and if there are lock contentions, GFS has a tendency to 
spread the file locations all over the disk.
3. You use iscsi such that dlm lock traffic and file block access are on 
the same fabric ?  If this is true, you will more or less serialize the 
lock access.

Hope above short answer will ease your confusion.

-- Wendy
> However, it took ages to list the subdirectory on an
> absolute idle cluster node. See below:
>
> # time ls -la | wc -l
> 31767
>
> real    3m5.249s
> user    0m0.628s
> sys     0m5.137s
>
> There are about 3 minutes spent on somewhere. Does
> anyone have any clue what the system was waiting for?
>
>
> Thanks for your time and wish to see your valuable
> comments soon.
>
> Jas
>
>
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   



From jas199931 at yahoo.com  Thu May  8 22:29:41 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 15:29:41 -0700 (PDT)
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <482375C7.2060207@gmail.com>
Message-ID: <792305.60086.qm@web32203.mail.mud.yahoo.com>

Hi, Wendy:

Thanks for your so prompt and kind explanation. It is
very helpful. According to your comments, I did
another test. See below:
 
# stat abc/
  File: `abc/'
  Size: 8192            Blocks: 6024       IO Block:
4096   directory
Device: fc00h/64512d    Inode: 1065226     Links: 2
Access: (0770/drwxrwx---)  Uid: (    0/    root)  
Gid: (    0/    root)
Access: 2008-05-08 06:18:58.000000000 +0000
Modify: 2008-04-15 03:02:24.000000000 +0000
Change: 2008-04-15 07:11:52.000000000 +0000

# cd abc/
# time ls | wc -l 
31764

real    0m44.797s
user    0m0.189s
sys     0m2.276s

The real time in this test is much shorter than the
previous one. However, it is still reasonable long. As
you said, the ?ls? command only reads the single
directory file. In my case, the directory file itself
is only 8192 bytes. The time spent on disk IO should
be included in ?sys 0m2.276s?. Although DLM needs time
to lookup the location of the corresponding master
lock resource and to process locking, the system
should not take about 42 seconds to complete the ?ls?
command. So, what is the hidden issue or is there a
way to identify possible bottlenecks? 

Great thanks in advance.

Jas

--- Wendy Cheng <s.wendy.cheng at gmail.com> wrote:

> Ja S wrote:
> > Hi, All:
> >
> > I used to post this question before, but have not
> > received any comments yet. Please allow me post it
> > again.
> >
> > I have a subdirectory containing more than 30,000
> > small files on a SAN storage (GFS1+DLM, RAID10).
> No
> > user application knows the existence of the
> > subdirectory. In other words, the subdirectory is
> free
> > of accessing. 
> >   
> Short answer is to remember "ls" and "ls -la" are
> very different 
> commands. "ls" is a directory read (that reads from
> one single file) but 
> "ls -la" needs to get file attributes (file size,
> modification times, 
> ownership, etc) from *each* of the files from the
> subject directory. In 
> your case, it needs to read more than 30,000 inodes
> to get them. The "ls 
> -la" is slower for *any* filesystem but particularly
> troublesome for a 
> cluster filesystem such as GFS due to:
> 
> 1. Cluster locking overheads (it needs readlocks
> from *each* of the 
> files involved).
> 2. Depending on when and how these files are
> created. During file 
> creation time and if there are lock contentions, GFS
> has a tendency to 
> spread the file locations all over the disk.
> 3. You use iscsi such that dlm lock traffic and file
> block access are on 
> the same fabric ?  If this is true, you will more or
> less serialize the 
> lock access.
> 
> Hope above short answer will ease your confusion.
> 
> -- Wendy
> > However, it took ages to list the subdirectory on
> an
> > absolute idle cluster node. See below:
> >
> > # time ls -la | wc -l
> > 31767
> >
> > real    3m5.249s
> > user    0m0.628s
> > sys     0m5.137s
> >
> > There are about 3 minutes spent on somewhere. Does
> > anyone have any clue what the system was waiting
> for?
> >
> >
> > Thanks for your time and wish to see your valuable
> > comments soon.
> >
> > Jas
> >
> >
> >      
>
____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> >   
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From rpeterso at redhat.com  Thu May  8 22:29:20 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 08 May 2008 17:29:20 -0500
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <620879.5230.qm@web32201.mail.mud.yahoo.com>
References: <620879.5230.qm@web32201.mail.mud.yahoo.com>
Message-ID: <1210285760.2764.37.camel@technetium.msp.redhat.com>

On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote:
> Hi, All:
> 
> I used to post this question before, but have not
> received any comments yet. Please allow me post it
> again.
> 
> I have a subdirectory containing more than 30,000
> small files on a SAN storage (GFS1+DLM, RAID10). No
> user application knows the existence of the
> subdirectory. In other words, the subdirectory is free
> of accessing. 
> 
> However, it took ages to list the subdirectory on an
> absolute idle cluster node. See below:
> 
> # time ls -la | wc -l
> 31767
> 
> real    3m5.249s
> user    0m0.628s
> sys     0m5.137s
> 
> There are about 3 minutes spent on somewhere. Does
> anyone have any clue what the system was waiting for?
> 
> 
> Thanks for your time and wish to see your valuable
> comments soon.
> 
> Jas

Hi Jas,

I believe the answer to your question is in the FAQ:

http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow

Regards,

Bob Peterson
Red Hat Clustering & GFS




From jas199931 at yahoo.com  Thu May  8 22:44:12 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 15:44:12 -0700 (PDT)
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <482372DF.9000706@bobich.net>
Message-ID: <598939.9469.qm@web32202.mail.mud.yahoo.com>


--- Gordan Bobic <gordan at bobich.net> wrote:

> 30K files?!
> That'll take a while even on a local file system.

Not really. Last week I made a copy of the directory
on the local hard disk (ext3). See the test results
for both "ls" and "ls -la" commands:

# time ls -la | wc -l
31767

real    0m2.967s
user    0m0.627s
sys     0m0.689s

# time ls | wc -l
31764

real    0m1.508s
user    0m0.262s
sys     0m0.082s


Comparing with the results in my previous email, does
it indicate that GFS is not designed for sharing huge
number of small files? I heard that GFS is originally
designed for sharing small number of larger files. Is
that true? If so, could you please kindly suggest a
file system which can handle huge number of concurrent
requests on many many number of small files?

Thanks for your interest.

Jas

> 
> Gordan
> 
> Ja S wrote:
> > Hi, All:
> > 
> > I used to post this question before, but have not
> > received any comments yet. Please allow me post it
> > again.
> > 
> > I have a subdirectory containing more than 30,000
> > small files on a SAN storage (GFS1+DLM, RAID10).
> No
> > user application knows the existence of the
> > subdirectory. In other words, the subdirectory is
> free
> > of accessing. 
> > 
> > However, it took ages to list the subdirectory on
> an
> > absolute idle cluster node. See below:
> > 
> > # time ls -la | wc -l
> > 31767
> > 
> > real    3m5.249s
> > user    0m0.628s
> > sys     0m5.137s
> > 
> > There are about 3 minutes spent on somewhere. Does
> > anyone have any clue what the system was waiting
> for?
> > 
> > 
> > Thanks for your time and wish to see your valuable
> > comments soon.
> > 
> > Jas
> > 
> > 
> >      
>
____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From andrew at ntsg.umt.edu  Thu May  8 22:52:18 2008
From: andrew at ntsg.umt.edu (Andrew A. Neuschwander)
Date: Thu, 8 May 2008 16:52:18 -0600 (MDT)
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <1210285760.2764.37.camel@technetium.msp.redhat.com>
References: <620879.5230.qm@web32201.mail.mud.yahoo.com>
	<1210285760.2764.37.camel@technetium.msp.redhat.com>
Message-ID: <40809.10.8.105.69.1210287138.squirrel@secure.ntsg.umt.edu>

I've looked at this problem a bit as well. My system is a 4Gb FC SAN with
a bonded GigE DLM dedicated network. Stat'ing 30,000 files in 3 minutes on
GFS isn't unreasonable considering that it must get and release the gfs
locks. In this scenario, you are averaging about 6ms per file stat. When
we did our tests, all of our subsystems (FC, Net, CPU, Memory, Disk) were
near idle. I think the 6ms is simply the accumulated latency of all the
subsystems involved. There is a lot of work happening in that short period
of time.

-A
-- 
Andrew A. Neuschwander, RHCE
Linux Systems/Software Engineer
College of Forestry and Conservation
The University of Montana
http://www.ntsg.umt.edu
andrew at ntsg.umt.edu - 406.243.6310


On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote:
> On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote:
>> Hi, All:
>>
>> I used to post this question before, but have not
>> received any comments yet. Please allow me post it
>> again.
>>
>> I have a subdirectory containing more than 30,000
>> small files on a SAN storage (GFS1+DLM, RAID10). No
>> user application knows the existence of the
>> subdirectory. In other words, the subdirectory is free
>> of accessing.
>>
>> However, it took ages to list the subdirectory on an
>> absolute idle cluster node. See below:
>>
>> # time ls -la | wc -l
>> 31767
>>
>> real    3m5.249s
>> user    0m0.628s
>> sys     0m5.137s
>>
>> There are about 3 minutes spent on somewhere. Does
>> anyone have any clue what the system was waiting for?
>>
>>
>> Thanks for your time and wish to see your valuable
>> comments soon.
>>
>> Jas
>
> Hi Jas,
>
> I believe the answer to your question is in the FAQ:
>
> http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow
>
> Regards,
>
> Bob Peterson
> Red Hat Clustering & GFS
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>



From jas199931 at yahoo.com  Fri May  9 06:37:34 2008
From: jas199931 at yahoo.com (Ja S)
Date: Thu, 8 May 2008 23:37:34 -0700 (PDT)
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <40809.10.8.105.69.1210287138.squirrel@secure.ntsg.umt.edu>
Message-ID: <388070.61398.qm@web32206.mail.mud.yahoo.com>

Hi, Andrew:

Thank you very much for the help. Yes, your
explanation really makes sense. I buy it.

But I would like to discuss it a little bit further.
The following message was part of my previous reply to
Wendy. Just paste it here for your convenience. 


# stat abc/
  File: `abc/'
  Size: 8192            Blocks: 6024       IO Block:
4096   directory
Device: fc00h/64512d    Inode: 1065226     Links: 2
Access: (0770/drwxrwx---)  Uid: (    0/    root)  
Gid: (    0/    root)
Access: 2008-05-08 06:18:58.000000000 +0000
Modify: 2008-04-15 03:02:24.000000000 +0000
Change: 2008-04-15 07:11:52.000000000 +0000

# cd abc/
# time ls | wc -l 
31764

real    0m44.797s
user    0m0.189s
sys     0m2.276s


>From the test results, it seems that the system really
only used 2.276 seconds to perform the disk IO, read
the directory and count the number of files.

I am not sure whether I missed anything or not. I
really cannot understand how the system took about 42
seconds to process the lock on the single directory.

Any further comments? 

Thanks again in advance,

Jas


--- "Andrew A. Neuschwander" <andrew at ntsg.umt.edu>
wrote:

> I've looked at this problem a bit as well. My system
> is a 4Gb FC SAN with
> a bonded GigE DLM dedicated network. Stat'ing 30,000
> files in 3 minutes on
> GFS isn't unreasonable considering that it must get
> and release the gfs
> locks. In this scenario, you are averaging about 6ms
> per file stat. When
> we did our tests, all of our subsystems (FC, Net,
> CPU, Memory, Disk) were
> near idle. I think the 6ms is simply the accumulated
> latency of all the
> subsystems involved. There is a lot of work
> happening in that short period
> of time.
> 
> -A
> -- 
> Andrew A. Neuschwander, RHCE
> Linux Systems/Software Engineer
> College of Forestry and Conservation
> The University of Montana
> http://www.ntsg.umt.edu
> andrew at ntsg.umt.edu - 406.243.6310
> 
> 
> On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote:
> > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote:
> >> Hi, All:
> >>
> >> I used to post this question before, but have not
> >> received any comments yet. Please allow me post
> it
> >> again.
> >>
> >> I have a subdirectory containing more than 30,000
> >> small files on a SAN storage (GFS1+DLM, RAID10).
> No
> >> user application knows the existence of the
> >> subdirectory. In other words, the subdirectory is
> free
> >> of accessing.
> >>
> >> However, it took ages to list the subdirectory on
> an
> >> absolute idle cluster node. See below:
> >>
> >> # time ls -la | wc -l
> >> 31767
> >>
> >> real    3m5.249s
> >> user    0m0.628s
> >> sys     0m5.137s
> >>
> >> There are about 3 minutes spent on somewhere.
> Does
> >> anyone have any clue what the system was waiting
> for?
> >>
> >>
> >> Thanks for your time and wish to see your
> valuable
> >> comments soon.
> >>
> >> Jas
> >
> > Hi Jas,
> >
> > I believe the answer to your question is in the
> FAQ:
> >
> >
>
http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow
> >
> > Regards,
> >
> > Bob Peterson
> > Red Hat Clustering & GFS
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> >
> >
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From l.dardini at comune.prato.it  Fri May  9 07:15:54 2008
From: l.dardini at comune.prato.it (Leandro Dardini)
Date: Fri, 9 May 2008 09:15:54 +0200
Subject: R: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <388070.61398.qm@web32206.mail.mud.yahoo.com>
References: <40809.10.8.105.69.1210287138.squirrel@secure.ntsg.umt.edu>
	<388070.61398.qm@web32206.mail.mud.yahoo.com>
Message-ID: <6F861500A5092B4C8CD653DE20A4AA0D60641D@exchange3.comune.prato.local>

Just remember to disable atime on the GFS volume. If atime is enabled maybe there is the lock contention for the writing of this info if multiple clients "read" the directory.

Leandro

> -----Messaggio originale-----
> Da: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] Per conto di Ja S
> Inviato: venerd? 9 maggio 2008 8.38
> A: linux clustering
> Oggetto: Re: [Linux-cluster] Why GFS is so slow? What it is 
> waiting for?
> 
> Hi, Andrew:
> 
> Thank you very much for the help. Yes, your explanation 
> really makes sense. I buy it.
> 
> But I would like to discuss it a little bit further.
> The following message was part of my previous reply to Wendy. 
> Just paste it here for your convenience. 
> 
> 
> # stat abc/
>   File: `abc/'
>   Size: 8192            Blocks: 6024       IO Block:
> 4096   directory
> Device: fc00h/64512d    Inode: 1065226     Links: 2
> Access: (0770/drwxrwx---)  Uid: (    0/    root)  
> Gid: (    0/    root)
> Access: 2008-05-08 06:18:58.000000000 +0000
> Modify: 2008-04-15 03:02:24.000000000 +0000
> Change: 2008-04-15 07:11:52.000000000 +0000
> 
> # cd abc/
> # time ls | wc -l
> 31764
> 
> real    0m44.797s
> user    0m0.189s
> sys     0m2.276s
> 
> 
> >From the test results, it seems that the system really
> only used 2.276 seconds to perform the disk IO, read the 
> directory and count the number of files.
> 
> I am not sure whether I missed anything or not. I really 
> cannot understand how the system took about 42 seconds to 
> process the lock on the single directory.
> 
> Any further comments? 
> 
> Thanks again in advance,
> 
> Jas
> 
> 
> --- "Andrew A. Neuschwander" <andrew at ntsg.umt.edu>
> wrote:
> 
> > I've looked at this problem a bit as well. My system is a 
> 4Gb FC SAN 
> > with a bonded GigE DLM dedicated network. Stat'ing 30,000 
> files in 3 
> > minutes on GFS isn't unreasonable considering that it must get and 
> > release the gfs locks. In this scenario, you are averaging 
> about 6ms 
> > per file stat. When we did our tests, all of our subsystems 
> (FC, Net, 
> > CPU, Memory, Disk) were near idle. I think the 6ms is simply the 
> > accumulated latency of all the subsystems involved. There 
> is a lot of 
> > work happening in that short period of time.
> > 
> > -A
> > --
> > Andrew A. Neuschwander, RHCE
> > Linux Systems/Software Engineer
> > College of Forestry and Conservation
> > The University of Montana
> > http://www.ntsg.umt.edu
> > andrew at ntsg.umt.edu - 406.243.6310
> > 
> > 
> > On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote:
> > > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote:
> > >> Hi, All:
> > >>
> > >> I used to post this question before, but have not received any 
> > >> comments yet. Please allow me post
> > it
> > >> again.
> > >>
> > >> I have a subdirectory containing more than 30,000 small 
> files on a 
> > >> SAN storage (GFS1+DLM, RAID10).
> > No
> > >> user application knows the existence of the 
> subdirectory. In other 
> > >> words, the subdirectory is
> > free
> > >> of accessing.
> > >>
> > >> However, it took ages to list the subdirectory on
> > an
> > >> absolute idle cluster node. See below:
> > >>
> > >> # time ls -la | wc -l
> > >> 31767
> > >>
> > >> real    3m5.249s
> > >> user    0m0.628s
> > >> sys     0m5.137s
> > >>
> > >> There are about 3 minutes spent on somewhere.
> > Does
> > >> anyone have any clue what the system was waiting
> > for?
> > >>
> > >>
> > >> Thanks for your time and wish to see your
> > valuable
> > >> comments soon.
> > >>
> > >> Jas
> > >
> > > Hi Jas,
> > >
> > > I believe the answer to your question is in the
> > FAQ:
> > >
> > >
> >
> http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow
> > >
> > > Regards,
> > >
> > > Bob Peterson
> > > Red Hat Clustering & GFS
> > >
> > >
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> > >
> > >
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> 
> 
>       
> ______________________________________________________________
> ______________________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.  
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From jas199931 at yahoo.com  Fri May  9 07:44:55 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 9 May 2008 00:44:55 -0700 (PDT)
Subject: R: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <6F861500A5092B4C8CD653DE20A4AA0D60641D@exchange3.comune.prato.local>
Message-ID: <291451.6237.qm@web32205.mail.mud.yahoo.com>

Hi, Leandro:

Thanks for the good reminder. Yes, we did.

Any other comments?

Best,

Jas

--- Leandro Dardini <l.dardini at comune.prato.it> wrote:

> Just remember to disable atime on the GFS volume. If
> atime is enabled maybe there is the lock contention
> for the writing of this info if multiple clients
> "read" the directory.
> 
> Leandro
> 
> > -----Messaggio originale-----
> > Da: linux-cluster-bounces at redhat.com 
> > [mailto:linux-cluster-bounces at redhat.com] Per
> conto di Ja S
> > Inviato: venerd?9 maggio 2008 8.38
> > A: linux clustering
> > Oggetto: Re: [Linux-cluster] Why GFS is so slow?
> What it is 
> > waiting for?
> > 
> > Hi, Andrew:
> > 
> > Thank you very much for the help. Yes, your
> explanation 
> > really makes sense. I buy it.
> > 
> > But I would like to discuss it a little bit
> further.
> > The following message was part of my previous
> reply to Wendy. 
> > Just paste it here for your convenience. 
> > 
> > 
> > # stat abc/
> >   File: `abc/'
> >   Size: 8192            Blocks: 6024       IO
> Block:
> > 4096   directory
> > Device: fc00h/64512d    Inode: 1065226     Links:
> 2
> > Access: (0770/drwxrwx---)  Uid: (    0/    root)  
> > Gid: (    0/    root)
> > Access: 2008-05-08 06:18:58.000000000 +0000
> > Modify: 2008-04-15 03:02:24.000000000 +0000
> > Change: 2008-04-15 07:11:52.000000000 +0000
> > 
> > # cd abc/
> > # time ls | wc -l
> > 31764
> > 
> > real    0m44.797s
> > user    0m0.189s
> > sys     0m2.276s
> > 
> > 
> > >From the test results, it seems that the system
> really
> > only used 2.276 seconds to perform the disk IO,
> read the 
> > directory and count the number of files.
> > 
> > I am not sure whether I missed anything or not. I
> really 
> > cannot understand how the system took about 42
> seconds to 
> > process the lock on the single directory.
> > 
> > Any further comments? 
> > 
> > Thanks again in advance,
> > 
> > Jas
> > 
> > 
> > --- "Andrew A. Neuschwander" <andrew at ntsg.umt.edu>
> > wrote:
> > 
> > > I've looked at this problem a bit as well. My
> system is a 
> > 4Gb FC SAN 
> > > with a bonded GigE DLM dedicated network.
> Stat'ing 30,000 
> > files in 3 
> > > minutes on GFS isn't unreasonable considering
> that it must get and 
> > > release the gfs locks. In this scenario, you are
> averaging 
> > about 6ms 
> > > per file stat. When we did our tests, all of our
> subsystems 
> > (FC, Net, 
> > > CPU, Memory, Disk) were near idle. I think the
> 6ms is simply the 
> > > accumulated latency of all the subsystems
> involved. There 
> > is a lot of 
> > > work happening in that short period of time.
> > > 
> > > -A
> > > --
> > > Andrew A. Neuschwander, RHCE
> > > Linux Systems/Software Engineer
> > > College of Forestry and Conservation
> > > The University of Montana
> > > http://www.ntsg.umt.edu
> > > andrew at ntsg.umt.edu - 406.243.6310
> > > 
> > > 
> > > On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote:
> > > > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote:
> > > >> Hi, All:
> > > >>
> > > >> I used to post this question before, but have
> not received any 
> > > >> comments yet. Please allow me post
> > > it
> > > >> again.
> > > >>
> > > >> I have a subdirectory containing more than
> 30,000 small 
> > files on a 
> > > >> SAN storage (GFS1+DLM, RAID10).
> > > No
> > > >> user application knows the existence of the 
> > subdirectory. In other 
> > > >> words, the subdirectory is
> > > free
> > > >> of accessing.
> > > >>
> > > >> However, it took ages to list the
> subdirectory on
> > > an
> > > >> absolute idle cluster node. See below:
> > > >>
> > > >> # time ls -la | wc -l
> > > >> 31767
> > > >>
> > > >> real    3m5.249s
> > > >> user    0m0.628s
> > > >> sys     0m5.137s
> > > >>
> > > >> There are about 3 minutes spent on somewhere.
> > > Does
> > > >> anyone have any clue what the system was
> waiting
> > > for?
> > > >>
> > > >>
> > > >> Thanks for your time and wish to see your
> > > valuable
> > > >> comments soon.
> > > >>
> > > >> Jas
> > > >
> > > > Hi Jas,
> > > >
> > > > I believe the answer to your question is in
> the
> > > FAQ:
> > > >
> > > >
> > >
> >
>
http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow
> > > >
> > > > Regards,
> > > >
> > > > Bob Peterson
> > > > Red Hat Clustering & GFS
> > > >
> > > >
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > >
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > >
> > > >
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > 
> > 
> > 
> >       
> >
>
______________________________________________________________
> > ______________________
> > Be a better friend, newshound, and
> > know-it-all with Yahoo! Mobile.  Try it now.  
> >
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From vimal at monster.co.in  Fri May  9 09:14:12 2008
From: vimal at monster.co.in (Vimal Gupta)
Date: Fri, 09 May 2008 14:44:12 +0530
Subject: [Linux-cluster] GFS CLuster with LIDS
Message-ID: <482415E4.5000508@monster.co.in>

Hi All,

I am having CentOs with LIDS running on that system . Can I implement 
GFS cluster on that node with the lids.
Anyone have same kind of exp. please share...

-- 

Vimal Gupta
Sr. System Administrator
Monster.com India Pvt.Ltd.



From Klaus.Steinberger at physik.uni-muenchen.de  Fri May  9 09:14:01 2008
From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger)
Date: Fri, 9 May 2008 11:14:01 +0200
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <20080509074522.96CD3618E0A@hormel.redhat.com>
References: <20080509074522.96CD3618E0A@hormel.redhat.com>
Message-ID: <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de>

Hi,

> However, it took ages to list the subdirectory on an
> absolute idle cluster node. See below:
>
> # time ls -la | wc -l
> 31767
>
> real    3m5.249s
> user    0m0.628s
> sys     0m5.137s
>
> There are about 3 minutes spent on somewhere. Does
> anyone have any clue what the system was waiting for?

Did you tune glock's?  I found that it's very important for performance of 
GFS.

I'm doing the following tunings currently:

gfs_tool settune /export/data/etp quota_account 0
gfs_tool settune /export/data/etp glock_purge 50
gfs_tool settune /export/data/etp demote_secs 200
gfs_tool settune /export/data/etp statfs_fast 1

Switch off quota off course only if you don't need it. All this tunings have 
to be done every time after mounting, so do it in a init.d script running 
after GFS mount, and of course do it on every node.

Here is the link to the glock paper:

http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4

The glock tuning (glock_purge and demote_secs parameters) definitly solved  a 
problem we had here with the Tivoli Backup Client. Before it was running for 
days and sometimes even did give up. We observed heavy lock traffic.

After changing the glock parameters times for the backup did go down 
dramatically, we now can run a Incremental Backup on a 4 TByte filesystem in 
under 4 hours. So give it a try.

There is some more tuning, which could be done unfortunately just on creation 
of filesystem. The default number of Resource Groups is ways too large for 
nowadays TByte Filesystems. 

Sincerly,
Klaus


-- 
Klaus Steinberger         Beschleunigerlaboratorium
Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748 Garching, Germany
FAX:   (+49 89)289 14280  EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE
URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2002 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080509/4c003b96/attachment.p7s>

From jas199931 at yahoo.com  Fri May  9 09:25:21 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 9 May 2008 02:25:21 -0700 (PDT)
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de>
Message-ID: <489981.61809.qm@web32204.mail.mud.yahoo.com>

Hi, Klaus:

Thank you very much for your kind answer.

Tunning the parameters sounds really interesting. I
should give it a try.

By the way, how did you come up with these new
parameter values? Did you calculate them based on 
some measures or simply pick them up and test.

Best,

Jas


--- Klaus Steinberger
<Klaus.Steinberger at physik.uni-muenchen.de> wrote:

> Hi,
> 
> > However, it took ages to list the subdirectory on
> an
> > absolute idle cluster node. See below:
> >
> > # time ls -la | wc -l
> > 31767
> >
> > real    3m5.249s
> > user    0m0.628s
> > sys     0m5.137s
> >
> > There are about 3 minutes spent on somewhere. Does
> > anyone have any clue what the system was waiting
> for?
> 
> Did you tune glock's?  I found that it's very
> important for performance of 
> GFS.
> 
> I'm doing the following tunings currently:
> 
> gfs_tool settune /export/data/etp quota_account 0
> gfs_tool settune /export/data/etp glock_purge 50
> gfs_tool settune /export/data/etp demote_secs 200
> gfs_tool settune /export/data/etp statfs_fast 1
> 
> Switch off quota off course only if you don't need
> it. All this tunings have 
> to be done every time after mounting, so do it in a
> init.d script running 
> after GFS mount, and of course do it on every node.
> 
> Here is the link to the glock paper:
> 
>
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> 
> The glock tuning (glock_purge and demote_secs
> parameters) definitly solved  a 
> problem we had here with the Tivoli Backup Client.
> Before it was running for 
> days and sometimes even did give up. We observed
> heavy lock traffic.
> 
> After changing the glock parameters times for the
> backup did go down 
> dramatically, we now can run a Incremental Backup on
> a 4 TByte filesystem in 
> under 4 hours. So give it a try.
> 
> There is some more tuning, which could be done
> unfortunately just on creation 
> of filesystem. The default number of Resource Groups
> is ways too large for 
> nowadays TByte Filesystems. 
> 
> Sincerly,
> Klaus
> 
> 
> -- 
> Klaus Steinberger         Beschleunigerlaboratorium
> Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748
> Garching, Germany
> FAX:   (+49 89)289 14280  EMail:
> Klaus.Steinberger at Physik.Uni-Muenchen.DE
> URL:
>
http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
> > --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From martin.fuerstenau at oce.com  Fri May  9 10:53:42 2008
From: martin.fuerstenau at oce.com (Martin Fuerstenau)
Date: Fri, 09 May 2008 12:53:42 +0200
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <489981.61809.qm@web32204.mail.mud.yahoo.com>
References: <489981.61809.qm@web32204.mail.mud.yahoo.com>
Message-ID: <1210330422.11974.14.camel@lx002140.ops.de>

Hi,

I had (nearly) the same problem. A slow gfs. From the beginning. Two
weeks ago the cluster crashed every time the load became heavier.

What was the reason? A rotten gfs. The gfs uses leafnodes for data an
leafnodes for metadata whithin the filesystem. And the problem was in
the metadata leafnodes.

Have you checked the Filesystem? Unmount it from all nodes and use
gfs_fsck on the filesystem. In my case it reported (and repaired) tons
of unused leafnoedes and some other errors. First time I started it
without the -y (for yes). Well, after one hour ot typing y I killed it
and started it with -y. The work was done whithin an hour for 1TB. Now
the filesystem is clean and it was like a turboloader and Nitrogen
injection for a car. Fast as it was never before. 

Maybe there is a bug in the mkfs command or so. I will never use a gfs
without a filesystem check after creation

Martin Fuerstenau
Seniro System Engineer
Oce Printing Systems, Poing

On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote:
> Hi, Klaus:
> 
> Thank you very much for your kind answer.
> 
> Tunning the parameters sounds really interesting. I
> should give it a try.
> 
> By the way, how did you come up with these new
> parameter values? Did you calculate them based on 
> some measures or simply pick them up and test.
> 
> Best,
> 
> Jas
> 
> 
> --- Klaus Steinberger
> <Klaus.Steinberger at physik.uni-muenchen.de> wrote:
> 
> > Hi,
> > 
> > > However, it took ages to list the subdirectory on
> > an
> > > absolute idle cluster node. See below:
> > >
> > > # time ls -la | wc -l
> > > 31767
> > >
> > > real    3m5.249s
> > > user    0m0.628s
> > > sys     0m5.137s
> > >
> > > There are about 3 minutes spent on somewhere. Does
> > > anyone have any clue what the system was waiting
> > for?
> > 
> > Did you tune glock's?  I found that it's very
> > important for performance of 
> > GFS.
> > 
> > I'm doing the following tunings currently:
> > 
> > gfs_tool settune /export/data/etp quota_account 0
> > gfs_tool settune /export/data/etp glock_purge 50
> > gfs_tool settune /export/data/etp demote_secs 200
> > gfs_tool settune /export/data/etp statfs_fast 1
> > 
> > Switch off quota off course only if you don't need
> > it. All this tunings have 
> > to be done every time after mounting, so do it in a
> > init.d script running 
> > after GFS mount, and of course do it on every node.
> > 
> > Here is the link to the glock paper:
> > 
> >
> http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > 
> > The glock tuning (glock_purge and demote_secs
> > parameters) definitly solved  a 
> > problem we had here with the Tivoli Backup Client.
> > Before it was running for 
> > days and sometimes even did give up. We observed
> > heavy lock traffic.
> > 
> > After changing the glock parameters times for the
> > backup did go down 
> > dramatically, we now can run a Incremental Backup on
> > a 4 TByte filesystem in 
> > under 4 hours. So give it a try.
> > 
> > There is some more tuning, which could be done
> > unfortunately just on creation 
> > of filesystem. The default number of Resource Groups
> > is ways too large for 
> > nowadays TByte Filesystems. 
> > 
> > Sincerly,
> > Klaus
> > 
> > 
> > -- 
> > Klaus Steinberger         Beschleunigerlaboratorium
> > Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748
> > Garching, Germany
> > FAX:   (+49 89)289 14280  EMail:
> > Klaus.Steinberger at Physik.Uni-Muenchen.DE
> > URL:
> >
> http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
> > > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
Martin F?rstenau        Tel.    : (49) 8121-72-4684
Oce Printing Systems  Fax     : (49) 8121-72-4996
OI-12                        E-Mail  : martin.fuerstenau at oce.com
Siemensallee 2
85586 Poing
Germany



Visit Oce at drupa! Register online now: <http://drupa.oce.com>

This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law.

If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message.

Thank you for your co-operation.





From jas199931 at yahoo.com  Fri May  9 11:51:58 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 9 May 2008 04:51:58 -0700 (PDT)
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <1210330422.11974.14.camel@lx002140.ops.de>
Message-ID: <293470.66323.qm@web32204.mail.mud.yahoo.com>

Hi Martin:

Thanks for your reply indeed.

--- Martin Fuerstenau <martin.fuerstenau at oce.com>
wrote:

> Hi,
> 
> I had (nearly) the same problem. A slow gfs. From
> the beginning. Two
> weeks ago the cluster crashed every time the load
> became heavier.
> 
> What was the reason? A rotten gfs. The gfs uses
> leafnodes for data an
> leafnodes for metadata whithin the filesystem. And
> the problem was in
> the metadata leafnodes.
> 
> Have you checked the Filesystem? Unmount it from all
> nodes and use
> gfs_fsck on the filesystem. 

No, not yet. I am afraid I cannot umount the file
sytem then do the gfs_fsck since the server downtime
is totally forbidden. 

Is there any other way to reclaim the unused or lost
blocks ( I guess  leafnodes you mentioned meant to be
the disk block, correct me if I am wrong.)? 

Should "gfs_tool settune /mnt/points inoded_secs 10"
work for a heavy loaded node with freqent create and
delete file operations?


>In my case it reported
> (and repaired) tons
> of unused leafnoedes and some other errors. First
> time I started it
> without the -y (for yes). Well, after one hour ot
> typing y I killed it
> and started it with -y. The work was done whithin an
> hour for 1TB. Now
> the filesystem is clean and it was like a
> turboloader and Nitrogen
> injection for a car. Fast as it was never before. 

Great. Sounds fantastic. However, if the low
performance is caused by the "rotten" gfs, will your
now cleaned file system be possibly messed up again
after a certain period? Do you have a smart way to
monitor the status of your file system in order to
make a regular downtime schedule and "force" your
manager to prove it, :-) ? If you do, I am eager to
know.

Thanks again and look forward to your next reply.

Best,

Jas




> Maybe there is a bug in the mkfs command or so. I
> will never use a gfs
> without a filesystem check after creation
> 
> Martin Fuerstenau
> Seniro System Engineer
> Oce Printing Systems, Poing
> 
> On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote:
> > Hi, Klaus:
> > 
> > Thank you very much for your kind answer.
> > 
> > Tunning the parameters sounds really interesting.
> I
> > should give it a try.
> > 
> > By the way, how did you come up with these new
> > parameter values? Did you calculate them based on 
> > some measures or simply pick them up and test.
> > 
> > Best,
> > 
> > Jas
> > 
> > 
> > --- Klaus Steinberger
> > <Klaus.Steinberger at physik.uni-muenchen.de> wrote:
> > 
> > > Hi,
> > > 
> > > > However, it took ages to list the subdirectory
> on
> > > an
> > > > absolute idle cluster node. See below:
> > > >
> > > > # time ls -la | wc -l
> > > > 31767
> > > >
> > > > real    3m5.249s
> > > > user    0m0.628s
> > > > sys     0m5.137s
> > > >
> > > > There are about 3 minutes spent on somewhere.
> Does
> > > > anyone have any clue what the system was
> waiting
> > > for?
> > > 
> > > Did you tune glock's?  I found that it's very
> > > important for performance of 
> > > GFS.
> > > 
> > > I'm doing the following tunings currently:
> > > 
> > > gfs_tool settune /export/data/etp quota_account
> 0
> > > gfs_tool settune /export/data/etp glock_purge 50
> > > gfs_tool settune /export/data/etp demote_secs
> 200
> > > gfs_tool settune /export/data/etp statfs_fast 1
> > > 
> > > Switch off quota off course only if you don't
> need
> > > it. All this tunings have 
> > > to be done every time after mounting, so do it
> in a
> > > init.d script running 
> > > after GFS mount, and of course do it on every
> node.
> > > 
> > > Here is the link to the glock paper:
> > > 
> > >
> >
>
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > > 
> > > The glock tuning (glock_purge and demote_secs
> > > parameters) definitly solved  a 
> > > problem we had here with the Tivoli Backup
> Client.
> > > Before it was running for 
> > > days and sometimes even did give up. We observed
> > > heavy lock traffic.
> > > 
> > > After changing the glock parameters times for
> the
> > > backup did go down 
> > > dramatically, we now can run a Incremental
> Backup on
> > > a 4 TByte filesystem in 
> > > under 4 hours. So give it a try.
> > > 
> > > There is some more tuning, which could be done
> > > unfortunately just on creation 
> > > of filesystem. The default number of Resource
> Groups
> > > is ways too large for 
> > > nowadays TByte Filesystems. 
> > > 
> > > Sincerly,
> > > Klaus
> > > 
> > > 
> > > -- 
> > > Klaus Steinberger        
> Beschleunigerlaboratorium
> > > Phone: (+49 89)289 14287  Am Coulombwall 6,
> D-85748
> > > Garching, Germany
> > > FAX:   (+49 89)289 14280  EMail:
> > > Klaus.Steinberger at Physik.Uni-Muenchen.DE
> > > URL:
> > >
> >
>
http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
> > > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> > 
> > 
> >      
>
____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> Martin F?rstenau        Tel.    : (49) 8121-72-4684
> Oce Printing Systems  Fax     : (49) 8121-72-4996
> OI-12                        E-Mail  :
> martin.fuerstenau at oce.com
> Siemensallee 2
> 85586 Poing
> Germany
> 
> 
> 
> Visit Oce at drupa! Register online now:
> <http://drupa.oce.com>
> 
> This message and attachment(s) are intended solely
> for use by the addressee and may contain information
> that is privileged, confidential or otherwise exempt
> from disclosure under applicable law.
> 
> If you are not the intended recipient or agent
> thereof responsible for delivering this message to
> the intended recipient, you are hereby notified that
> any dissemination, distribution or copying of this
> communication is strictly prohibited.
> 
> If you have received this communication in error,
> please notify the sender immediately by telephone
> and with a 'reply' message.
> 
> Thank you for your co-operation.
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From martin.fuerstenau at oce.com  Fri May  9 12:39:40 2008
From: martin.fuerstenau at oce.com (Martin Fuerstenau)
Date: Fri, 09 May 2008 14:39:40 +0200
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <293470.66323.qm@web32204.mail.mud.yahoo.com>
References: <293470.66323.qm@web32204.mail.mud.yahoo.com>
Message-ID: <1210336780.11974.22.camel@lx002140.ops.de>

Hi,

unfortunaley not. According to my informaiotns (which are mainly from
this list and from the wiki) for each node of the cluster this structure
(journal) is established on the filesystem. If you read the manpage for
gfs_fsck you see, that it must be unmounted from all nodes.

If you have the problem I had you should plan a maintenance window
asap. 

My problem started as mentioned with a slow gfs from the beginning and
lead to clustercrashs after 7 months. All my problems were fixed by the
check. Perhaps is the same with your system.

Yours - Martin

On Fri, 2008-05-09 at 04:51 -0700, Ja S wrote:
> Hi Martin:
> 
> Thanks for your reply indeed.
> 
> --- Martin Fuerstenau <martin.fuerstenau at oce.com>
> wrote:
> 
> > Hi,
> > 
> > I had (nearly) the same problem. A slow gfs. From
> > the beginning. Two
> > weeks ago the cluster crashed every time the load
> > became heavier.
> > 
> > What was the reason? A rotten gfs. The gfs uses
> > leafnodes for data an
> > leafnodes for metadata whithin the filesystem. And
> > the problem was in
> > the metadata leafnodes.
> > 
> > Have you checked the Filesystem? Unmount it from all
> > nodes and use
> > gfs_fsck on the filesystem. 
> 
> No, not yet. I am afraid I cannot umount the file
> sytem then do the gfs_fsck since the server downtime
> is totally forbidden. 
> 
> Is there any other way to reclaim the unused or lost
> blocks ( I guess  leafnodes you mentioned meant to be
> the disk block, correct me if I am wrong.)? 
> 
> Should "gfs_tool settune /mnt/points inoded_secs 10"
> work for a heavy loaded node with freqent create and
> delete file operations?
> 
> 
> >In my case it reported
> > (and repaired) tons
> > of unused leafnoedes and some other errors. First
> > time I started it
> > without the -y (for yes). Well, after one hour ot
> > typing y I killed it
> > and started it with -y. The work was done whithin an
> > hour for 1TB. Now
> > the filesystem is clean and it was like a
> > turboloader and Nitrogen
> > injection for a car. Fast as it was never before. 
> 
> Great. Sounds fantastic. However, if the low
> performance is caused by the "rotten" gfs, will your
> now cleaned file system be possibly messed up again
> after a certain period? Do you have a smart way to
> monitor the status of your file system in order to
> make a regular downtime schedule and "force" your
> manager to prove it, :-) ? If you do, I am eager to
> know.
> 
> Thanks again and look forward to your next reply.
> 
> Best,
> 
> Jas
> 
> 
> 
> 
> > Maybe there is a bug in the mkfs command or so. I
> > will never use a gfs
> > without a filesystem check after creation
> > 
> > Martin Fuerstenau
> > Seniro System Engineer
> > Oce Printing Systems, Poing
> > 
> > On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote:
> > > Hi, Klaus:
> > > 
> > > Thank you very much for your kind answer.
> > > 
> > > Tunning the parameters sounds really interesting.
> > I
> > > should give it a try.
> > > 
> > > By the way, how did you come up with these new
> > > parameter values? Did you calculate them based on 
> > > some measures or simply pick them up and test.
> > > 
> > > Best,
> > > 
> > > Jas
> > > 
> > > 
> > > --- Klaus Steinberger
> > > <Klaus.Steinberger at physik.uni-muenchen.de> wrote:
> > > 
> > > > Hi,
> > > > 
> > > > > However, it took ages to list the subdirectory
> > on
> > > > an
> > > > > absolute idle cluster node. See below:
> > > > >
> > > > > # time ls -la | wc -l
> > > > > 31767
> > > > >
> > > > > real    3m5.249s
> > > > > user    0m0.628s
> > > > > sys     0m5.137s
> > > > >
> > > > > There are about 3 minutes spent on somewhere.
> > Does
> > > > > anyone have any clue what the system was
> > waiting
> > > > for?
> > > > 
> > > > Did you tune glock's?  I found that it's very
> > > > important for performance of 
> > > > GFS.
> > > > 
> > > > I'm doing the following tunings currently:
> > > > 
> > > > gfs_tool settune /export/data/etp quota_account
> > 0
> > > > gfs_tool settune /export/data/etp glock_purge 50
> > > > gfs_tool settune /export/data/etp demote_secs
> > 200
> > > > gfs_tool settune /export/data/etp statfs_fast 1
> > > > 
> > > > Switch off quota off course only if you don't
> > need
> > > > it. All this tunings have 
> > > > to be done every time after mounting, so do it
> > in a
> > > > init.d script running 
> > > > after GFS mount, and of course do it on every
> > node.
> > > > 
> > > > Here is the link to the glock paper:
> > > > 
> > > >
> > >
> >
> http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > > > 
> > > > The glock tuning (glock_purge and demote_secs
> > > > parameters) definitly solved  a 
> > > > problem we had here with the Tivoli Backup
> > Client.
> > > > Before it was running for 
> > > > days and sometimes even did give up. We observed
> > > > heavy lock traffic.
> > > > 
> > > > After changing the glock parameters times for
> > the
> > > > backup did go down 
> > > > dramatically, we now can run a Incremental
> > Backup on
> > > > a 4 TByte filesystem in 
> > > > under 4 hours. So give it a try.
> > > > 
> > > > There is some more tuning, which could be done
> > > > unfortunately just on creation 
> > > > of filesystem. The default number of Resource
> > Groups
> > > > is ways too large for 
> > > > nowadays TByte Filesystems. 
> > > > 
> > > > Sincerly,
> > > > Klaus
> > > > 
> > > > 
> > > > -- 
> > > > Klaus Steinberger        
> > Beschleunigerlaboratorium
> > > > Phone: (+49 89)289 14287  Am Coulombwall 6,
> > D-85748
> > > > Garching, Germany
> > > > FAX:   (+49 89)289 14280  EMail:
> > > > Klaus.Steinberger at Physik.Uni-Muenchen.DE
> > > > URL:
> > > >
> > >
> >
> http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
> > > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > >
> > >
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > > 
> > > 
> > >      
> >
> ____________________________________________________________________________________
> > > Be a better friend, newshound, and 
> > > know-it-all with Yahoo! Mobile.  Try it now. 
> >
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > Martin F?rstenau        Tel.    : (49) 8121-72-4684
> > Oce Printing Systems  Fax     : (49) 8121-72-4996
> > OI-12                        E-Mail  :
> > martin.fuerstenau at oce.com
> > Siemensallee 2
> > 85586 Poing
> > Germany
> > 
> > 
> > 
> > Visit Oce at drupa! Register online now:
> > <http://drupa.oce.com>
> > 
> > This message and attachment(s) are intended solely
> > for use by the addressee and may contain information
> > that is privileged, confidential or otherwise exempt
> > from disclosure under applicable law.
> > 
> > If you are not the intended recipient or agent
> > thereof responsible for delivering this message to
> > the intended recipient, you are hereby notified that
> > any dissemination, distribution or copying of this
> > communication is strictly prohibited.
> > 
> > If you have received this communication in error,
> > please notify the sender immediately by telephone
> > and with a 'reply' message.
> > 
> > Thank you for your co-operation.
> > 
> > 
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
> https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> 
> 
>       ____________________________________________________________________________________
> Be a better friend, newshound, and 
> know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 

Visit Oce at drupa! Register online now: <http://drupa.oce.com>

This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law.

If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message.

Thank you for your co-operation.





From jas199931 at yahoo.com  Fri May  9 12:59:27 2008
From: jas199931 at yahoo.com (Ja S)
Date: Fri, 9 May 2008 05:59:27 -0700 (PDT)
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <1210336780.11974.22.camel@lx002140.ops.de>
Message-ID: <138485.59267.qm@web32208.mail.mud.yahoo.com>

Hi, Martin:

Another big thanks to you for your kind reply and
suggestions.

Best,

Jas

--- Martin Fuerstenau <martin.fuerstenau at oce.com>
wrote:

> Hi,
> 
> unfortunaley not. According to my informaiotns
> (which are mainly from
> this list and from the wiki) for each node of the
> cluster this structure
> (journal) is established on the filesystem. If you
> read the manpage for
> gfs_fsck you see, that it must be unmounted from all
> nodes.
> 
> If you have the problem I had you should plan a
> maintenance window
> asap. 
> 
> My problem started as mentioned with a slow gfs from
> the beginning and
> lead to clustercrashs after 7 months. All my
> problems were fixed by the
> check. Perhaps is the same with your system.
> 
> Yours - Martin
> 
> On Fri, 2008-05-09 at 04:51 -0700, Ja S wrote:
> > Hi Martin:
> > 
> > Thanks for your reply indeed.
> > 
> > --- Martin Fuerstenau <martin.fuerstenau at oce.com>
> > wrote:
> > 
> > > Hi,
> > > 
> > > I had (nearly) the same problem. A slow gfs.
> From
> > > the beginning. Two
> > > weeks ago the cluster crashed every time the
> load
> > > became heavier.
> > > 
> > > What was the reason? A rotten gfs. The gfs uses
> > > leafnodes for data an
> > > leafnodes for metadata whithin the filesystem.
> And
> > > the problem was in
> > > the metadata leafnodes.
> > > 
> > > Have you checked the Filesystem? Unmount it from
> all
> > > nodes and use
> > > gfs_fsck on the filesystem. 
> > 
> > No, not yet. I am afraid I cannot umount the file
> > sytem then do the gfs_fsck since the server
> downtime
> > is totally forbidden. 
> > 
> > Is there any other way to reclaim the unused or
> lost
> > blocks ( I guess  leafnodes you mentioned meant to
> be
> > the disk block, correct me if I am wrong.)? 
> > 
> > Should "gfs_tool settune /mnt/points inoded_secs
> 10"
> > work for a heavy loaded node with freqent create
> and
> > delete file operations?
> > 
> > 
> > >In my case it reported
> > > (and repaired) tons
> > > of unused leafnoedes and some other errors.
> First
> > > time I started it
> > > without the -y (for yes). Well, after one hour
> ot
> > > typing y I killed it
> > > and started it with -y. The work was done
> whithin an
> > > hour for 1TB. Now
> > > the filesystem is clean and it was like a
> > > turboloader and Nitrogen
> > > injection for a car. Fast as it was never
> before. 
> > 
> > Great. Sounds fantastic. However, if the low
> > performance is caused by the "rotten" gfs, will
> your
> > now cleaned file system be possibly messed up
> again
> > after a certain period? Do you have a smart way to
> > monitor the status of your file system in order to
> > make a regular downtime schedule and "force" your
> > manager to prove it, :-) ? If you do, I am eager
> to
> > know.
> > 
> > Thanks again and look forward to your next reply.
> > 
> > Best,
> > 
> > Jas
> > 
> > 
> > 
> > 
> > > Maybe there is a bug in the mkfs command or so.
> I
> > > will never use a gfs
> > > without a filesystem check after creation
> > > 
> > > Martin Fuerstenau
> > > Seniro System Engineer
> > > Oce Printing Systems, Poing
> > > 
> > > On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote:
> > > > Hi, Klaus:
> > > > 
> > > > Thank you very much for your kind answer.
> > > > 
> > > > Tunning the parameters sounds really
> interesting.
> > > I
> > > > should give it a try.
> > > > 
> > > > By the way, how did you come up with these new
> > > > parameter values? Did you calculate them based
> on 
> > > > some measures or simply pick them up and test.
> > > > 
> > > > Best,
> > > > 
> > > > Jas
> > > > 
> > > > 
> > > > --- Klaus Steinberger
> > > > <Klaus.Steinberger at physik.uni-muenchen.de>
> wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > > However, it took ages to list the
> subdirectory
> > > on
> > > > > an
> > > > > > absolute idle cluster node. See below:
> > > > > >
> > > > > > # time ls -la | wc -l
> > > > > > 31767
> > > > > >
> > > > > > real    3m5.249s
> > > > > > user    0m0.628s
> > > > > > sys     0m5.137s
> > > > > >
> > > > > > There are about 3 minutes spent on
> somewhere.
> > > Does
> > > > > > anyone have any clue what the system was
> > > waiting
> > > > > for?
> > > > > 
> > > > > Did you tune glock's?  I found that it's
> very
> > > > > important for performance of 
> > > > > GFS.
> > > > > 
> > > > > I'm doing the following tunings currently:
> > > > > 
> > > > > gfs_tool settune /export/data/etp
> quota_account
> > > 0
> > > > > gfs_tool settune /export/data/etp
> glock_purge 50
> > > > > gfs_tool settune /export/data/etp
> demote_secs
> > > 200
> > > > > gfs_tool settune /export/data/etp
> statfs_fast 1
> > > > > 
> > > > > Switch off quota off course only if you
> don't
> > > need
> > > > > it. All this tunings have 
> > > > > to be done every time after mounting, so do
> it
> > > in a
> > > > > init.d script running 
> > > > > after GFS mount, and of course do it on
> every
> > > node.
> > > > > 
> > > > > Here is the link to the glock paper:
> > > > > 
> > > > >
> > > >
> > >
> >
>
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > > > > 
> > > > > The glock tuning (glock_purge and
> demote_secs
> > > > > parameters) definitly solved  a 
> > > > > problem we had here with the Tivoli Backup
> > > Client.
> > > > > Before it was running for 
> > > > > days and sometimes even did give up. We
> observed
> > > > > heavy lock traffic.
> > > > > 
> > > > > After changing the glock parameters times
> for
> > > the
> > > > > backup did go down 
> > > > > dramatically, we now can run a Incremental
> > > Backup on
> > > > > a 4 TByte filesystem in 
> > > > > under 4 hours. So give it a try.
> > > > > 
> > > > > There is some more tuning, which could be
> done
> > > > > unfortunately just on creation 
> > > > > of filesystem. The default number of
> Resource
> > > Groups
> > > > > is ways too large for 
> > > > > nowadays TByte Filesystems. 
> > > > > 
> > > > > Sincerly,
> > > > > Klaus
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Klaus Steinberger        
> > > Beschleunigerlaboratorium
> > > > > Phone: (+49 89)289 14287  Am Coulombwall 6,
> > > D-85748
> > > > > Garching, Germany
> > > > > FAX:   (+49 89)289 14280  EMail:
> > > > > Klaus.Steinberger at Physik.Uni-Muenchen.DE
> > > > > URL:
> > > > >
> > > >
> > >
> >
>
http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
> > > > > > --
> > > > > Linux-cluster mailing list
> > > > > Linux-cluster at redhat.com
> > > > >
> > > >
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > > 
> > > > 
> > > > 
> > > >      
> > >
> >
>
____________________________________________________________________________________
> > > > Be a better friend, newshound, and 
> > > > know-it-all with Yahoo! Mobile.  Try it now. 
> > >
> >
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > > 
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster at redhat.com
> > > >
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > > 
> > > Martin F?rstenau        Tel.    : (49)
> 8121-72-4684
> > > Oce Printing Systems  Fax     : (49)
> 8121-72-4996
> > > OI-12                        E-Mail  :
> > > martin.fuerstenau at oce.com
> > > Siemensallee 2
> > > 85586 Poing
> > > Germany
> > > 
> > > 
> > > 
> > > Visit Oce at drupa! Register online now:
> > > <http://drupa.oce.com>
> > > 
> > > This message and attachment(s) are intended
> solely
> > > for use by the addressee and may contain
> information
> > > that is privileged, confidential or otherwise
> exempt
> > > from disclosure under applicable law.
> > > 
> > > If you are not the intended recipient or agent
> > > thereof responsible for delivering this message
> to
> > > the intended recipient, you are hereby notified
> that
> > > any dissemination, distribution or copying of
> this
> > > communication is strictly prohibited.
> > > 
> > > If you have received this communication in
> error,
> > > please notify the sender immediately by
> telephone
> > > and with a 'reply' message.
> > > 
> > > Thank you for your co-operation.
> > > 
> > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster at redhat.com
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > 
> > 
> > 
> >      
>
____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> Visit Oce at drupa! Register online now:
> <http://drupa.oce.com>
> 
> This message and attachment(s) are intended solely
> for use by the addressee and may contain information
> that is privileged, confidential or otherwise exempt
> from disclosure under applicable law.
> 
> If you are not the intended recipient or agent
> thereof responsible for delivering this message to
> the intended recipient, you are hereby notified that
> any dissemination, distribution or copying of this
> communication is strictly prohibited.
> 
> If you have received this communication in error,
> please notify the sender immediately by telephone
> and with a 'reply' message.
> 
> Thank you for your co-operation.
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From lists at tangent.co.za  Fri May  9 14:34:32 2008
From: lists at tangent.co.za (Chris Picton)
Date: Fri, 9 May 2008 14:34:32 +0000 (UTC)
Subject: [Linux-cluster] Re: GFS vs GFS2
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
	<Pine.LNX.4.64.0805071344100.17044@localhost>
	<1210162161.3345.26.camel@localhost.localdomain>
Message-ID: <g01ndn$fap$1@ger.gmane.org>

On Wed, 07 May 2008 13:09:21 +0100, Steven Whitehouse wrote:
>> >>
>> >>  Is GFS2 not production-ready due to lack of testing, or due to
>> >>  known bugs?
>> >>
>> >>  Any advice would be appreciated
>> >>
>> >>  Chris
>> >>
>> >>
> The answer is a bit of both. We are getting to the stage where the known
> bugs are mostly solved or will be very shortly. You can see the state of
> the bug list at any time by going to bugzilla.redhat.com and looking for
> any bug with gfs2 in the summary line. There are currently approx 70
> such bugs, but please bear in mind that a large number of these are
> asking for new features, and some of them are duplicates of the same bug
> across different versions of RHEL and/or Fedora.
> 
> We are currently at a stage where having a large number of people
> helping us in testing would be very helpful. If you have your own
> favourite filesystem test, or if you are in a position to run a test
> application, then we would be very interested in any reports of
> success/failure.

Thank you for the update.

I assume that if things go according to plan, we wont see a supported 
gfs2 in 5.2, but probably will in 5.3?

I, oddly enough, currently have a situation where running some bonnie++ 
tests causes machines to hang using gfs, but not gfs2.  I will file a bug 
report when I can.


Chris



From linux-cluster at merctech.com  Fri May  9 14:41:57 2008
From: linux-cluster at merctech.com (linux-cluster at merctech.com)
Date: Fri, 09 May 2008 10:41:57 -0400
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: Your message of "Fri, 09 May 2008 11:14:01 +0200."
	<200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de>
References: <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de>
	<20080509074522.96CD3618E0A@hormel.redhat.com>
Message-ID: <23156.1210344117@mirchi>



In the message dated: Fri, 09 May 2008 11:14:01 +0200,
The pithy ruminations from Klaus Steinberger on 
<[Linux-cluster] Re: Why GFS is so slow? What it is waiting for?> were:
=> --===============1371945295==
	[SNIP!]
=> 
=> There is some more tuning, which could be done unfortunately just on creati
=> on
=> of filesystem. The default number of Resource Groups is ways too large for
=> 
=> nowadays TByte Filesystems.

I would appreciate it greatly if you could expand on this.

I'm setting up a cluster that will have several filesystems in the 3~6TB range.
This will be GFS1 over LVM2, with SAN (no iSCSI) connections to the servers, if
that has any bearing on the tuning suggestions.

Thanks,

Mark

=> 
=> Sincerly,
=> Klaus
=> 
=> 
=> =2D-
=> Klaus Steinberger         Beschleunigerlaboratorium
=> Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748 Garching, Germany
=> =46AX:   (+49 89)289 14280  EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE
=> URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/




From theophanis_kontogiannis at yahoo.gr  Fri May  9 15:31:07 2008
From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis)
Date: Fri, 9 May 2008 18:31:07 +0300
Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue
In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is>
References: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is>
Message-ID: <009f01c8b1e9$b48f36c0$9f01a8c0@corp.netone.gr>

Hi Finnur,

 

The LV is running on top of DRBD?

Please provide us with a bit more details.

 

Thank you,

Theophanis Kontogiannis.

 

 

  _____  

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Finnur Orn Gu?mundsson - TM Software
Sent: Wednesday, May 07, 2008 11:57 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue

 

Hi,

 

I have a 2 node cluster running RHEL 5.1 x86_64 and fully patched as of today. 

 

If i cold-boot the cluster (both nodes) everything comes up smoothly and i can migrate services between nodes etc...

However when i take one node down i am having difficultys leaving the fence domain. If i kill the fence daemon on the node i am trying to remove gracefully or use cman_tool leave force and reboot it, it comes back up, cman starts and it appears to join the cluster. The CLVMD init script hangs (just sits and hangs) and rgmanager does not start up correctly. Also CLVMD and rgmanager just sit in a zombie state and i have to poweroff or fence the node to get it to reboot....

 

The cluster never stabilizes it self until i cold boot both nodes. Then it is OK until the next reboot. I have read something about similar cases but did not find any magic solution! ;)

 

My cluster.conf is attached.

 

There is no firewall running on the machines in question (chkconfig iptables off;).

 

 

 

Various output from the the that is rebooted:

 

Output from group_tool services:

 

type             level name       id       state

fence            0     default    00000000 JOIN_STOP_WAIT

[1 2]

dlm              1     rgmanager  00000000 JOIN_STOP_WAIT

[1 2]

 

 

Output from group_tool fenced:

1210193027 our_nodeid 1 our_name node-16

1210193027 listen 4 member 5 groupd 7

1210193029 client 3: join default

1210193029 delay post_join 120s post_fail 0s

1210193029 added 2 nodes from ccs

1210193542 client 3: dump

 

 

 

Various output from the other node:

 

Output from group_tool services:

type             level name       id       state

fence            0     default    00010002 JOIN_START_WAIT

[1 2]

dlm              1     clvmd      00020002 none

[2]

dlm              1     rgmanager  00030002 FAIL_ALL_STOPPED

[1 2]

 

Output from group_tool dump fenced:

1210191957 our_nodeid 2 our_name node-17

1210191957 listen 4 member 5 groupd 7

1210191958 client 3: join default

1210191958 delay post_join 120s post_fail 0s

1210191958 added 2 nodes from ccs

1210191958 setid default 65538

1210191958 start default 1 members 2

1210191958 do_recovery stop 0 start 1 finish 0

1210191958 node "node-16" not a cman member, cn 1

1210191958 add first victim node-16

1210191959 node "node-16" not a cman member, cn 1

1210191960 node "node-16" not a cman member, cn 1

1210191961 node "node-16" not a cman member, cn 1

1210191962 node "node-16" not a cman member, cn 1

1210191963 node "node-16" not a cman member, cn 1

1210191964 node "node-16" not a cman member, cn 1

1210191965 node "node-16" not a cman member, cn 1

1210191966 node "node-16" not a cman member, cn 1

1210191967 node "node-16" not a cman member, cn 1

1210191968 node "node-16" not a cman member, cn 1

1210191969 node "node-16" not a cman member, cn 1

1210191970 node "node-16" not a cman member, cn 1

1210191971 node "node-16" not a cman member, cn 1

1210191972 node "node-16" not a cman member, cn 1

1210191973 node "node-16" not a cman member, cn 1

1210191974 reduce victim node-16

1210191974 delay of 16s leaves 0 victims

1210191974 finish default 1

1210191974 stop default

1210191974 start default 2 members 1 2

1210191974 do_recovery stop 1 start 2 finish 1

1210193633 client 3: dump

 

 

Thanks in advanced.

 

K?r kve?ja / Best Regards,

Finnur ?rn Gu?mundsson
Network Engineer - Network Operations
fog at t.is

TM Software
Ur?arhvarf 6, IS-203 K?pavogur, Iceland
Tel: +354 545 3000 - fax +354 545 3610
www.tm-software.is <http://www.tm-software.is/> 

This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080509/46389a6c/attachment.htm>

From fog at t.is  Fri May  9 16:02:00 2008
From: fog at t.is (=?utf-8?B?RmlubnVyIMOWcm4gR3XDsG11bmRzc29uIC0gVE0gU29mdA==?= =?utf-8?B?d2FyZQ==?=)
Date: Fri, 9 May 2008 16:02:00 -0000
Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue
In-Reply-To: <009f01c8b1e9$b48f36c0$9f01a8c0@corp.netone.gr>
References: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is>
	<009f01c8b1e9$b48f36c0$9f01a8c0@corp.netone.gr>
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F77902069695@SKYHQAMX08.klasi.is>

Hi,

 

Nop,

 

The shared storage is provided by IBM SVC (SAN Volume Controller) through Qlogic 24xx HBA cards. The switches are Brocade 48000.

 

Devices are created on top of dm-multipath devices.

 

I really think this has something to do with the fence daemon since i am unable to leave the fence domain gracefully on a cold boot of the whole cluster.

 

Thanks,

Finnur

 

 

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Theophanis Kontogiannis
Sent: 9. ma? 2008 15:31
To: 'linux clustering'
Subject: RE: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue

 

Hi Finnur,

 

The LV is running on top of DRBD?

Please provide us with a bit more details.

 

Thank you,

Theophanis Kontogiannis.

 

 

________________________________

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Finnur Orn Gu?mundsson - TM Software
Sent: Wednesday, May 07, 2008 11:57 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue

 

Hi,

 

I have a 2 node cluster running RHEL 5.1 x86_64 and fully patched as of today. 

 

If i cold-boot the cluster (both nodes) everything comes up smoothly and i can migrate services between nodes etc...

However when i take one node down i am having difficultys leaving the fence domain. If i kill the fence daemon on the node i am trying to remove gracefully or use cman_tool leave force and reboot it, it comes back up, cman starts and it appears to join the cluster. The CLVMD init script hangs (just sits and hangs) and rgmanager does not start up correctly. Also CLVMD and rgmanager just sit in a zombie state and i have to poweroff or fence the node to get it to reboot....

 

The cluster never stabilizes it self until i cold boot both nodes. Then it is OK until the next reboot. I have read something about similar cases but did not find any magic solution! ;)

 

My cluster.conf is attached.

 

There is no firewall running on the machines in question (chkconfig iptables off;).

 

 

 

Various output from the the that is rebooted:

 

Output from group_tool services:

 

type             level name       id       state

fence            0     default    00000000 JOIN_STOP_WAIT

[1 2]

dlm              1     rgmanager  00000000 JOIN_STOP_WAIT

[1 2]

 

 

Output from group_tool fenced:

1210193027 our_nodeid 1 our_name node-16

1210193027 listen 4 member 5 groupd 7

1210193029 client 3: join default

1210193029 delay post_join 120s post_fail 0s

1210193029 added 2 nodes from ccs

1210193542 client 3: dump

 

 

 

Various output from the other node:

 

Output from group_tool services:

type             level name       id       state

fence            0     default    00010002 JOIN_START_WAIT

[1 2]

dlm              1     clvmd      00020002 none

[2]

dlm              1     rgmanager  00030002 FAIL_ALL_STOPPED

[1 2]

 

Output from group_tool dump fenced:

1210191957 our_nodeid 2 our_name node-17

1210191957 listen 4 member 5 groupd 7

1210191958 client 3: join default

1210191958 delay post_join 120s post_fail 0s

1210191958 added 2 nodes from ccs

1210191958 setid default 65538

1210191958 start default 1 members 2

1210191958 do_recovery stop 0 start 1 finish 0

1210191958 node "node-16" not a cman member, cn 1

1210191958 add first victim node-16

1210191959 node "node-16" not a cman member, cn 1

1210191960 node "node-16" not a cman member, cn 1

1210191961 node "node-16" not a cman member, cn 1

1210191962 node "node-16" not a cman member, cn 1

1210191963 node "node-16" not a cman member, cn 1

1210191964 node "node-16" not a cman member, cn 1

1210191965 node "node-16" not a cman member, cn 1

1210191966 node "node-16" not a cman member, cn 1

1210191967 node "node-16" not a cman member, cn 1

1210191968 node "node-16" not a cman member, cn 1

1210191969 node "node-16" not a cman member, cn 1

1210191970 node "node-16" not a cman member, cn 1

1210191971 node "node-16" not a cman member, cn 1

1210191972 node "node-16" not a cman member, cn 1

1210191973 node "node-16" not a cman member, cn 1

1210191974 reduce victim node-16

1210191974 delay of 16s leaves 0 victims

1210191974 finish default 1

1210191974 stop default

1210191974 start default 2 members 1 2

1210191974 do_recovery stop 1 start 2 finish 1

1210193633 client 3: dump

 

 

Thanks in advanced.

 

K?r kve?ja / Best Regards,

Finnur ?rn Gu?mundsson
Network Engineer - Network Operations
fog at t.is

TM Software
Ur?arhvarf 6, IS-203 K?pavogur, Iceland
Tel: +354 545 3000 - fax +354 545 3610
www.tm-software.is <http://www.tm-software.is/> 

This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080509/cc60ee87/attachment.htm>

From Klaus.Steinberger at physik.uni-muenchen.de  Sat May 10 07:10:19 2008
From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger)
Date: Sat, 10 May 2008 09:10:19 +0200
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <20080509160012.D4B2061924F@hormel.redhat.com>
References: <20080509160012.D4B2061924F@hormel.redhat.com>
Message-ID: <200805100910.20104.Klaus.Steinberger@physik.uni-muenchen.de>


> I would appreciate it greatly if you could expand on this.

Yep, I used -r 2048 for my new 6 TByte filesystem. Here some information about 
it:

The default for resource group size:

       -r MegaBytes
              gfs_mkfs will try to make Resource Groups about this big.   The
              default is 256 MB.

From the cluster FAQ:

How can I performance-tune GFS or make it any faster?

You shouldn't expect GFS to perform as fast as non-clustered file systems 
because it needs to do inter-node locking and file system coordination. That 
said, there are some things you can do to improve GFS performance.

    *
      Use -r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems.
      The issue has to do with the size of the GFS resource groups, which is 
an internal GFS structure for managing the file system data. This is an 
internal GFS structure, not to be confused with rgmanager's Resource Groups. 
Some file system slowdown can be blamed on having a large number of RGs. The 
bigger your file system, the more RGs you need. By default, gfs_mkfs carves 
your file system into 256MB RGs, but it allows you to specify a preferred RG 
size. The default, 256MB, is good for average size file systems, but you can 
increase performance on a bigger file system by using a bigger RG size. For 
example, my 40TB file system needs 156438 RGs of 256MB each and whenever GFS 
has to run that linked list, it takes a long time. The same 40TB file system 
can be created with bigger RGs--2048MB--requiring only 19555 of them. The 
time savings is dramatic: It took nearly 23 minutes for my system to read in 
all 156438 RG Structures with 256MB RGs, but only 4 minutes to read in the 
19555 RG Structures for my 2048MB RGs. The time to do an operation like df on 
an empty file system dropped from 24 seconds with 256MB RGs, to under a 
second with 2048MB RGs. I'm sure that increasing the size of the RGs would 
help gfs_fsck's performance as well. Future versions of gfs_mkfs and 
mkfs.gfs2 will dynamically choose an RG size to reduce the RG overhead.

Sincerly,
Klaus

-- 
Klaus Steinberger         Beschleunigerlaboratorium
Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748 Garching, Germany
FAX:   (+49 89)289 14280  EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE
URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2002 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080510/3da9d107/attachment.p7s>

From oliveiros.cristina at gmail.com  Sat May 10 19:07:27 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Sat, 10 May 2008 20:07:27 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <481E4544.1020301@bobich.net>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
	<481E4544.1020301@bobich.net>
Message-ID: <f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>

Hello, Gordan,
Are you sure those are the packages?
When I try to yum install gfs-utils and kmod-gfs, it says it doesn't know
those packages...

The other three are installed ok.

Help....

Best,
Oliveiros

2008/5/5 Gordan Bobic <gordan at bobich.net>:

> Oliveiros Cristina wrote:
>
>> Howdy List,
>> I would like to install gfs on a two node cluster running both fedora 8.
>>
>> Can anyone please kindly supply me with some links for the procedure?
>>
>
> First part of the procedure is to not use FC if you plan for this to be
> useful. FC7+ comes only with GFS2. There are no GFS1 packages included,  and
> GFS2 isn't stable yet.
>
>  Which packages are needed, where to get them, that sort of things.
>>
>
> cman
> openais
> gfs-utils
> kmod-gfs
> rgmanager
>
> Can't remember if there may be more.
>
>  I've already googled up and down a little but I couldn't find no
>> rigourous information on this, or maybe I am just blind :-)
>>
>
> This is probably a not a bad place to start:
>
> https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS#section-GFS-Documentation
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080510/90365e1d/attachment.htm>

From gordan at bobich.net  Sat May 10 19:16:55 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Sat, 10 May 2008 20:16:55 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>	<481E4544.1020301@bobich.net>
	<f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>
Message-ID: <4825F4A7.9060609@bobich.net>

Oliveiros Cristina wrote:
> Hello, Gordan,
> Are you sure those are the packages?
> When I try to yum install gfs-utils and kmod-gfs, it says it doesn't 
> know those packages...
> 
> The other three are installed ok.

That's what the package names are on CentOS / RHEL5. I can't see why 
they would be different on Fedora, but you can always do:

# yum list | grep -i gfs

and see what that returns. It is possible that kmod-gfs is actually 
built into the kernel itself (Fedora have much more frequent complete 
kernel updates, as stability is not the main requirement), so there is 
no separate package.

If I had to hazard a guess, then gfs-utils isn't there because GFS1 
isn't included in Fedora, only GFS2. So try gfs2-utils. The output of 
"yum list" should make it obvious if this is the case.

I said it before and I'll say it again - use FC's GFS2 at your peril. 
Last time I tried it (~6 months ago), it didn't work at all.

Gordan



From oliveiros.cristina at gmail.com  Sat May 10 19:24:28 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Sat, 10 May 2008 20:24:28 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <4825F4A7.9060609@bobich.net>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
	<481E4544.1020301@bobich.net>
	<f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>
	<4825F4A7.9060609@bobich.net>
Message-ID: <f54607780805101224w2328e56fh846ba566fe436946@mail.gmail.com>

Hello, Gordan.
Thank you for your email

here's what it says

[langolier at bravo ~]$ yum list|grep -i gfs
fgfs-Atlas.i386                          0.3.1-5.fc8
fedora
fgfs-base.noarch                         0.9.11-0.1.pre1.fc8
fedora
gfs-artemisia-fonts.noarch               20070415-1.fc8
updates
gfs-baskerville-fonts.noarch             20070327-3.fc8
updates
gfs-bodoni-classic-fonts.noarch          20070415-2.fc8
updates
gfs-bodoni-fonts.noarch                  20070415-1.fc8
updates
gfs-complutum-fonts.noarch               20070413-3.fc8
updates
gfs-didot-classic-fonts.noarch           20070415-1.fc8
updates
gfs-didot-fonts.noarch                   20070616-2.fc8
updates
gfs-gazis-fonts.noarch                   20070417-2.fc8
updates
gfs-neohellenic-fonts.noarch             20070415-1.fc8
updates
gfs-olga-fonts.noarch                    20060908-1.fc8
updates
gfs-porson-fonts.noarch                  20060908-3.fc8
updates
gfs-solomos-fonts.noarch                 20071114-2.fc8
updates
gfs-theokritos-fonts.noarch              20070415-2.fc8
updates
gfs2-utils.i386                          2.03.00-3.fc8          updates

I need to use gfs , not gfs2.
If it isn't included in fc, the alternative is to build from sources?

Best,
Oliveiros


2008/5/10 Gordan Bobic <gordan at bobich.net>:

> Oliveiros Cristina wrote:
>
>> Hello, Gordan,
>> Are you sure those are the packages?
>> When I try to yum install gfs-utils and kmod-gfs, it says it doesn't know
>> those packages...
>>
>> The other three are installed ok.
>>
>
> That's what the package names are on CentOS / RHEL5. I can't see why they
> would be different on Fedora, but you can always do:
>
> # yum list | grep -i gfs
>
> and see what that returns. It is possible that kmod-gfs is actually built
> into the kernel itself (Fedora have much more frequent complete kernel
> updates, as stability is not the main requirement), so there is no separate
> package.
>
> If I had to hazard a guess, then gfs-utils isn't there because GFS1 isn't
> included in Fedora, only GFS2. So try gfs2-utils. The output of "yum list"
> should make it obvious if this is the case.
>
> I said it before and I'll say it again - use FC's GFS2 at your peril. Last
> time I tried it (~6 months ago), it didn't work at all.
>
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080510/eea6aa55/attachment.htm>

From gordan at bobich.net  Sat May 10 19:31:27 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Sat, 10 May 2008 20:31:27 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <f54607780805101224w2328e56fh846ba566fe436946@mail.gmail.com>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>	<481E4544.1020301@bobich.net>	<f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>	<4825F4A7.9060609@bobich.net>
	<f54607780805101224w2328e56fh846ba566fe436946@mail.gmail.com>
Message-ID: <4825F80F.6040101@bobich.net>

Oliveiros Cristina wrote:
> Hello, Gordan.
> Thank you for your email
> 
> here's what it says
> 
> [langolier at bravo ~]$ yum list|grep -i gfs
[...]
> gfs2-utils.i386                          2.03.00-3.fc8          updates   
> 
> I need to use gfs , not gfs2.
> If it isn't included in fc, the alternative is to build from sources?

Personally I'd just use RHEL5/CentOS5, but if you want to go the sources 
route, good luck.

Gordan



From oliveiros.cristina at gmail.com  Sat May 10 19:39:14 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Sat, 10 May 2008 12:39:14 -0700
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <4825F80F.6040101@bobich.net>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
	<481E4544.1020301@bobich.net>
	<f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>
	<4825F4A7.9060609@bobich.net>
	<f54607780805101224w2328e56fh846ba566fe436946@mail.gmail.com>
	<4825F80F.6040101@bobich.net>
Message-ID: <f54607780805101239h4726bc88vf3c053a892239254@mail.gmail.com>

Hello, Gordan.
I didn't make a decision, was just asking.

According to what yum list said,
There is no way to install it through rpms, is my understanding correct?

Oliveiros

2008/5/10 Gordan Bobic <gordan at bobich.net>:

> Oliveiros Cristina wrote:
>
>> Hello, Gordan.
>> Thank you for your email
>>
>> here's what it says
>>
>> [langolier at bravo ~]$ yum list|grep -i gfs
>>
> [...]
>
>> gfs2-utils.i386                          2.03.00-3.fc8          updates
>> I need to use gfs , not gfs2.
>> If it isn't included in fc, the alternative is to build from sources?
>>
>
> Personally I'd just use RHEL5/CentOS5, but if you want to go the sources
> route, good luck.
>
>
> Gordan
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080510/69e1184f/attachment.htm>

From gordan at bobich.net  Sat May 10 23:12:26 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Sun, 11 May 2008 00:12:26 +0100
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <f54607780805101239h4726bc88vf3c053a892239254@mail.gmail.com>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>	<481E4544.1020301@bobich.net>	<f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>	<4825F4A7.9060609@bobich.net>	<f54607780805101224w2328e56fh846ba566fe436946@mail.gmail.com>	<4825F80F.6040101@bobich.net>
	<f54607780805101239h4726bc88vf3c053a892239254@mail.gmail.com>
Message-ID: <48262BDA.6010505@bobich.net>

Oliveiros Cristina wrote:
> According to what yum list said,
> There is no way to install it through rpms, is my understanding correct?

Yes, that's the size of it. GFS1 simply doesn't ship with FC6+

Gordan



From michael.osullivan at auckland.ac.nz  Sun May 11 11:04:49 2008
From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan)
Date: Sun, 11 May 2008 23:04:49 +1200
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
Message-ID: <4826D2D1.7010103@auckland.ac.nz>

Hi everyone,

I have set up a small experimental network with a linux cluster and SAN 
that I want to have high data availability. There are 2 servers that I 
have put into a cluster using conga (thank you luci and ricci). There 
are 2 storage devices, each consisting of a basic server with 2 x 1TB 
disks. The cluster servers and the storage devices each have 2 NICs and 
are connected using 2 gigabit ethernet switches.

I have created a single striped logical volume on each storage device 
using the 2 disks (to try and speed up I/O on the volume). These volumes 
(one on each storage device) are presented to the cluster servers using 
iSCSI (on the cluster servers) and iSCSI target (on the storage 
devices). Since there are multiple NICs on the storage devices I have 
set up two iSCSI portals to each logical volume. I have then used mdadm 
to ensure the volumes are accessible via multipath.

Finally, since I want the storage devices to present the data in a 
highly available way I have used mdadm to create a software raid-5 
across the two multipathed volumes (I realise this is essentially 
mirroring on the 2 storage devices but I am trying to set this up to be 
extensible to extra storage devices). My next step is to present the 
raid array (of the two multipathed volumes - one on each storage device) 
as a GFS to the cluster servers to ensure that locking of access to the 
data is handled properly.

I have recently read that multipathing is possible within GFS, but raid 
is not (yet). Since I want the two storage devices in a raid-5 array and 
I am using iSCSI I'm not sure if I should try and use GFS to do the 
multipathing. Also, being a linux/storage/clustering newbie I'm not sure 
if my approach is the best thing to do. I want to make sure that my 
system has no single point of failure that will make any of the data 
inaccessible. I'm pretty sure our network design supports this. I assume 
(if I configure it right) the cluster will ensure services will keep 
going if one of the cluster servers goes down. Thus the only weak point 
was the storage devices which I hope I have now strengthened by 
essentially implementing network raid across iSCSI and then presented as 
a single GFS.

I would really appreciate comments/advice/constructive criticism as I 
have really been learning much of this as I go.

Cheers, Mike



From swhiteho at redhat.com  Mon May 12 08:50:14 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Mon, 12 May 2008 09:50:14 +0100
Subject: [Linux-cluster] Re: GFS vs GFS2
In-Reply-To: <g01ndn$fap$1@ger.gmane.org>
References: <fvp3be$er6$1@ger.gmane.org> <48219556.9060901@monster.co.in>
	<Pine.LNX.4.64.0805071344100.17044@localhost>
	<1210162161.3345.26.camel@localhost.localdomain>
	<g01ndn$fap$1@ger.gmane.org>
Message-ID: <1210582214.3635.493.camel@quoit>

Hi,

On Fri, 2008-05-09 at 14:34 +0000, Chris Picton wrote:
> On Wed, 07 May 2008 13:09:21 +0100, Steven Whitehouse wrote:
> >> >>
> >> >>  Is GFS2 not production-ready due to lack of testing, or due to
> >> >>  known bugs?
> >> >>
> >> >>  Any advice would be appreciated
> >> >>
> >> >>  Chris
> >> >>
> >> >>
> > The answer is a bit of both. We are getting to the stage where the known
> > bugs are mostly solved or will be very shortly. You can see the state of
> > the bug list at any time by going to bugzilla.redhat.com and looking for
> > any bug with gfs2 in the summary line. There are currently approx 70
> > such bugs, but please bear in mind that a large number of these are
> > asking for new features, and some of them are duplicates of the same bug
> > across different versions of RHEL and/or Fedora.
> > 
> > We are currently at a stage where having a large number of people
> > helping us in testing would be very helpful. If you have your own
> > favourite filesystem test, or if you are in a position to run a test
> > application, then we would be very interested in any reports of
> > success/failure.
> 
> Thank you for the update.
> 
> I assume that if things go according to plan, we wont see a supported 
> gfs2 in 5.2, but probably will in 5.3?
> 
That is quite likely, yes.

> I, oddly enough, currently have a situation where running some bonnie++ 
> tests causes machines to hang using gfs, but not gfs2.  I will file a bug 
> report when I can.
> 
> 
> Chris
> 
Ok, all such information is useful. Thanks,

Steve.




From sanelson at gmail.com  Mon May 12 10:14:08 2008
From: sanelson at gmail.com (Stephen Nelson-Smith)
Date: Mon, 12 May 2008 11:14:08 +0100
Subject: [Linux-cluster] Oracle Shared-Nothing
Message-ID: <b6131fdc0805120314p364773cdh6d445e52e0d02fc3@mail.gmail.com>

Hi,

I want to implement a shared-nothing active/passive failover cluster
for Oracle 10g.  RAC is out of budget.

I'm looking at drbd + heartbeat or cluster suite.

Any experiences? recommendations? gotchas?

In particular, any idea whether Oracle would support a non-RAC setup?

S.



From lhh at redhat.com  Mon May 12 15:46:26 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 12 May 2008 11:46:26 -0400
Subject: [Linux-cluster] GFS CLuster with LIDS
In-Reply-To: <482415E4.5000508@monster.co.in>
References: <482415E4.5000508@monster.co.in>
Message-ID: <1210607186.10406.31.camel@ayanami.boston.devel.redhat.com>

On Fri, 2008-05-09 at 14:44 +0530, Vimal Gupta wrote:
> Hi All,
> 
> I am having CentOs with LIDS running on that system . Can I implement 
> GFS cluster on that node with the lids.
> Anyone have same kind of exp. please share...

Could you provide a link to LIDS ?

-- Lon




From rpeterso at redhat.com  Mon May 12 17:06:06 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 12 May 2008 12:06:06 -0500
Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for?
In-Reply-To: <200805100910.20104.Klaus.Steinberger@physik.uni-muenchen.de>
References: <20080509160012.D4B2061924F@hormel.redhat.com>
	<200805100910.20104.Klaus.Steinberger@physik.uni-muenchen.de>
Message-ID: <1210611966.2738.14.camel@technetium.msp.redhat.com>

On Sat, 2008-05-10 at 09:10 +0200, Klaus Steinberger wrote:
> How can I performance-tune GFS or make it any faster?
(snip)
>       Use -r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems.

Actually, this is a delicate balance.  If you have too many resource
groups (RGs) then it spends a lot of time searching to find the
one it needs, but once it finds the RG, the bitmap search will be fast.
If you have fewer RGs, it will spend less time searching for the right
one, but the bitmaps for each will be bigger, so it will spend more
time searching the bitmap once it has been found.

I've written a performance enhancement to the "bitfit" algorithm for
GFS2 that increases the speed of bitmap searches, making it more speedy
to use fewer RGs with larger bitmaps.  That code could be back-ported to
GFS, but it hasn't been done yet.  Actually, there are a lot of
performance improvements done for GFS2 that COULD theoretically be
ported back to GFS, if someone took the time.  Perhaps I'll open an
RFE bugzilla and post any patches I come up with to the cluster-devel
mailing list.

Regards,

Bob Peterson
Red Hat Clustering & GFS




From lhh at redhat.com  Mon May 12 17:12:48 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 12 May 2008 13:12:48 -0400
Subject: [Linux-cluster] Oracle Shared-Nothing
In-Reply-To: <b6131fdc0805120314p364773cdh6d445e52e0d02fc3@mail.gmail.com>
References: <b6131fdc0805120314p364773cdh6d445e52e0d02fc3@mail.gmail.com>
Message-ID: <1210612368.10406.50.camel@ayanami.boston.devel.redhat.com>

On Mon, 2008-05-12 at 11:14 +0100, Stephen Nelson-Smith wrote:
> Hi,
> 
> I want to implement a shared-nothing active/passive failover cluster
> for Oracle 10g.  RAC is out of budget.
> 
> I'm looking at drbd + heartbeat or cluster suite.
> 
> Any experiences? recommendations? gotchas?
> 
> In particular, any idea whether Oracle would support a non-RAC setup?

They support non-RAC configurations, but I doubt they would support
running the database on DRBD.  You should call Oracle on this one.

Also, I **think** buying Oracle Database 10g Release 2 these days gets
you Oracle's failover technology called Cluster Ware - so you might not
need heartbeat or rgmanager (Cluster Suite component that provides
failover for off-the-shelf apps).

Again, call Oracle and ask.  They want your money, so surely they will
answer your questions ;)

If you're going to spend the money for Oracle (and you need failover
support), I'd really recommend getting a FC or iSCSI RAID array with
dual redundant internal controllers and a remote power switch.

There are some good SCSI arrays available at lower price points than FC
and often iSCSI solutions, as well (but stay away from JBOD/host-RAID
configurations).
 
-- Lon



From jas199931 at yahoo.com  Mon May 12 23:34:45 2008
From: jas199931 at yahoo.com (Ja S)
Date: Mon, 12 May 2008 16:34:45 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request? 
Message-ID: <887833.81823.qm@web32204.mail.mud.yahoo.com>

Hi, All:


When an application on a cluster node A needs to
access a file on a SAN storage, how DLM process the
lock request? 

Should DLM firstly determine whether there already
exists a lock resource mapped to the file, by doing
the following things in the order 1) looking at the
master lock resources on the node A, 2) searching the
local copies of lock resources on the node A, 3)
searching the lock directory on the node A to find out
whether a master lock resource assosicated with the
file exists on another node, 4) sending messages to
other nodes in the cluster for the location of the
master lock resource? 

I ask this question because from some online articles,
it seems that DLM will always search the cluster-wide
lock directory across the whole cluster first  to find
the location of the master lock resource. 

Can anyone kindly confirm the order of processes that
DLM does?

Many thanks in advance.

Jas



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From fdinitto at redhat.com  Tue May 13 04:52:13 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Tue, 13 May 2008 06:52:13 +0200 (CEST)
Subject: [Linux-cluster] GFS on fedora
In-Reply-To: <48262BDA.6010505@bobich.net>
References: <f54607780805041533t5daeee3eo73ed8ea3c59d8cc@mail.gmail.com>
	<481E4544.1020301@bobich.net>
	<f54607780805101207k31b0e1b4i79206c6d94c42ae9@mail.gmail.com>
	<4825F4A7.9060609@bobich.net>
	<f54607780805101224w2328e56fh846ba566fe436946@mail.gmail.com>
	<4825F80F.6040101@bobich.net>
	<f54607780805101239h4726bc88vf3c053a892239254@mail.gmail.com>
	<48262BDA.6010505@bobich.net>
Message-ID: <Pine.LNX.4.64.0805130649200.24155@trider-g7>


Hi guys,

On Sun, 11 May 2008, Gordan Bobic wrote:

> Oliveiros Cristina wrote:
>> According to what yum list said,
>> There is no way to install it through rpms, is my understanding correct?
>
> Yes, that's the size of it. GFS1 simply doesn't ship with FC6+
>

There are a few reasons why GFS1 is not in Fedora anylonger.

The first and most important one:

http://fedoraproject.org/wiki/Packaging/Guidelines#head-5d326feb10ebf0624361729239c58719e31b6f93

Fedora does not allow external kernel modules anylonger. GFS1 will never 
be upstream.

In order to run GFS1, a patch to the main kernel is required, and this 
patch will never be included upstream either. That makes it basically 
impossible for us to maintain a separate rpm repository to provide GFS1 
without duplicating a lot of work in maintain an external kernel.

Fabio

--
I'm going to make him an offer he can't refuse.



From vimal at monster.co.in  Tue May 13 05:27:21 2008
From: vimal at monster.co.in (Vimal Gupta)
Date: Tue, 13 May 2008 10:57:21 +0530
Subject: [Linux-cluster] GFS CLuster with LIDS
In-Reply-To: <1210607186.10406.31.camel@ayanami.boston.devel.redhat.com>
References: <482415E4.5000508@monster.co.in>
	<1210607186.10406.31.camel@ayanami.boston.devel.redhat.com>
Message-ID: <482926B9.3000505@monster.co.in>

Hi Lon,

Sorry For delay, Here is the Link for LIDS...

http://www.lids.org/document/build_lids-0.2-1.html



Lon Hohberger wrote:
> On Fri, 2008-05-09 at 14:44 +0530, Vimal Gupta wrote:
>   
>> Hi All,
>>
>> I am having CentOs with LIDS running on that system . Can I implement 
>> GFS cluster on that node with the lids.
>> Anyone have same kind of exp. please share...
>>     
>
> Could you provide a link to LIDS ?
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>   


-- 

Vimal Gupta
Sr. System Administrator
Monster.com India Pvt.Ltd.



From sanelson at gmail.com  Tue May 13 06:51:46 2008
From: sanelson at gmail.com (Stephen Nelson-Smith)
Date: Tue, 13 May 2008 07:51:46 +0100
Subject: [Linux-cluster] Oracle Shared-Nothing
In-Reply-To: <1210612368.10406.50.camel@ayanami.boston.devel.redhat.com>
References: <b6131fdc0805120314p364773cdh6d445e52e0d02fc3@mail.gmail.com>
	<1210612368.10406.50.camel@ayanami.boston.devel.redhat.com>
Message-ID: <b6131fdc0805122351x407b81eex8c2c393e6fc3a923@mail.gmail.com>

Hi...

On Mon, May 12, 2008 at 6:12 PM, Lon Hohberger <lhh at redhat.com> wrote:
>
> On Mon, 2008-05-12 at 11:14 +0100, Stephen Nelson-Smith wrote:
>  > Hi,
>  >
>  > I want to implement a shared-nothing active/passive failover cluster
>  > for Oracle 10g.  RAC is out of budget.
>  >
>  > I'm looking at drbd + heartbeat or cluster suite.
>  >
>  > Any experiences? recommendations? gotchas?
>  >
>  > In particular, any idea whether Oracle would support a non-RAC setup?
>
>  They support non-RAC configurations, but I doubt they would support
>  running the database on DRBD.  You should call Oracle on this one.

I will :)

>  Also, I **think** buying Oracle Database 10g Release 2 these days gets
>  you Oracle's failover technology called Cluster Ware - so you might not
>  need heartbeat or rgmanager (Cluster Suite component that provides
>  failover for off-the-shelf apps).

I have that in mind, yes.

>  If you're going to spend the money for Oracle (and you need failover
>  support), I'd really recommend getting a FC or iSCSI RAID array with
>  dual redundant internal controllers and a remote power switch.

The client is dead set against a RAID array, partly on cost (budget
v.tight), but also on physical space in the rack - there's only 2U
left, and a new rack costs ?1000 pcm.

Do I recall mention of cmirror + GNBD as a possible solution to
shared-nothing, no-disk-array setups?

>  -- Lon

S.



From ccaulfie at redhat.com  Tue May 13 07:06:57 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 13 May 2008 08:06:57 +0100
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <887833.81823.qm@web32204.mail.mud.yahoo.com>
References: <887833.81823.qm@web32204.mail.mud.yahoo.com>
Message-ID: <48293E11.5070405@redhat.com>

Ja S wrote:
> Hi, All:
> 
> 
> When an application on a cluster node A needs to
> access a file on a SAN storage, how DLM process the
> lock request? 
> 
> Should DLM firstly determine whether there already
> exists a lock resource mapped to the file, by doing
> the following things in the order 1) looking at the
> master lock resources on the node A, 2) searching the
> local copies of lock resources on the node A, 3)
> searching the lock directory on the node A to find out
> whether a master lock resource assosicated with the
> file exists on another node, 4) sending messages to
> other nodes in the cluster for the location of the
> master lock resource? 
> 
> I ask this question because from some online articles,
> it seems that DLM will always search the cluster-wide
> lock directory across the whole cluster first  to find
> the location of the master lock resource. 
> 
> Can anyone kindly confirm the order of processes that
> DLM does?
> 


This should be very well documented, as it's common amongst DLM
implementations.

If a node needs to lock a resource that it doesn't know about then it
hashes the name to get a directory node ID, than asks that node for the
master node. if there is no master node (the resource is not active)
then the requesting node is made master

if the node does know the master, (other locks on the resource exist)
then it will go straight to that master node.

The node then asks the master for the lock.

The lock status (granted, waiting) is recorded in the local copy.

Chrissie



From jas199931 at yahoo.com  Tue May 13 08:31:49 2008
From: jas199931 at yahoo.com (Ja S)
Date: Tue, 13 May 2008 01:31:49 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <48293E11.5070405@redhat.com>
Message-ID: <598349.86681.qm@web32204.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > Hi, All:
> > 
> > 
> > When an application on a cluster node A needs to
> > access a file on a SAN storage, how DLM process
> the
> > lock request? 
> > 
> > Should DLM firstly determine whether there already
> > exists a lock resource mapped to the file, by
> doing
> > the following things in the order 1) looking at
> the
> > master lock resources on the node A, 2) searching
> the
> > local copies of lock resources on the node A, 3)
> > searching the lock directory on the node A to find
> out
> > whether a master lock resource assosicated with
> the
> > file exists on another node, 4) sending messages
> to
> > other nodes in the cluster for the location of the
> > master lock resource? 
> > 
> > I ask this question because from some online
> articles,
> > it seems that DLM will always search the
> cluster-wide
> > lock directory across the whole cluster first  to
> find
> > the location of the master lock resource. 
> > 
> > Can anyone kindly confirm the order of processes
> that
> > DLM does?
> > 
> 
> 
> This should be very well documented, as it's common
> amongst DLM
> implementations.
> 

I think I may be blind. I have not yet found a
document which describes the sequence of processes in
a precise way. I tried to read the source code but I
gave up due to lack of comments.


> If a node needs to lock a resource that it doesn't
> know about then it
> hashes the name to get a directory node ID, than
> asks that node for the
> master node. if there is no master node (the
> resource is not active)
> then the requesting node is made master
> 
> if the node does know the master, (other locks on
> the resource exist)
> then it will go straight to that master node.


Thanks for the description. 

However, one point is still not clear to me is how a
node can conclude whether it __knows__ the lock
resource or not?

Will the node search 1) the list of master lock
resources owned by itself, then 2) the list of local
copies of lock resouces stored on itself, then 3) the
lock directory on itself, sequentially? or just search
1) and 2) then if it cannot find any, it will get the
node ID based on a hash function (possibly the output
of the hash function is itself?) who may hold the
location of the master lock resource, then ask the
node for the master node, and so on? 


If so, what exact search algorithms are used, the
linear search, the binary search, or what else?


I would like to understand the processes in an exact
and precise way since our system has been heavily
loaded. Sometime there are more than 100K lock
resouces on a node. Understanding every bit of the
details will help us tune the current system.

Many thanks for your time and look forward to your
kind reply,

Regards,

Jas

> The node then asks the master for the lock.
> 
> The lock status (granted, waiting) is recorded in
> the local copy.
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



From ccaulfie at redhat.com  Tue May 13 08:41:27 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 13 May 2008 09:41:27 +0100
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <598349.86681.qm@web32204.mail.mud.yahoo.com>
References: <598349.86681.qm@web32204.mail.mud.yahoo.com>
Message-ID: <48295437.9080500@redhat.com>

Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
>> Ja S wrote:
>>> Hi, All:
>>>
>>>
>>> When an application on a cluster node A needs to
>>> access a file on a SAN storage, how DLM process
>> the
>>> lock request? 
>>>
>>> Should DLM firstly determine whether there already
>>> exists a lock resource mapped to the file, by
>> doing
>>> the following things in the order 1) looking at
>> the
>>> master lock resources on the node A, 2) searching
>> the
>>> local copies of lock resources on the node A, 3)
>>> searching the lock directory on the node A to find
>> out
>>> whether a master lock resource assosicated with
>> the
>>> file exists on another node, 4) sending messages
>> to
>>> other nodes in the cluster for the location of the
>>> master lock resource? 
>>>
>>> I ask this question because from some online
>> articles,
>>> it seems that DLM will always search the
>> cluster-wide
>>> lock directory across the whole cluster first  to
>> find
>>> the location of the master lock resource. 
>>>
>>> Can anyone kindly confirm the order of processes
>> that
>>> DLM does?
>>>
>>
>> This should be very well documented, as it's common
>> amongst DLM
>> implementations.
>>
> 
> I think I may be blind. I have not yet found a
> document which describes the sequence of processes in
> a precise way. I tried to read the source code but I
> gave up due to lack of comments.
> 
> 
>> If a node needs to lock a resource that it doesn't
>> know about then it
>> hashes the name to get a directory node ID, than
>> asks that node for the
>> master node. if there is no master node (the
>> resource is not active)
>> then the requesting node is made master
>>
>> if the node does know the master, (other locks on
>> the resource exist)
>> then it will go straight to that master node.
> 
> 
> Thanks for the description. 
> 
> However, one point is still not clear to me is how a
> node can conclude whether it __knows__ the lock
> resource or not?

A node knows the resource if it has a local copy. It's as simple as that.




-- 

Chrissie



From sasmaz at itu.edu.tr  Tue May 13 08:43:48 2008
From: sasmaz at itu.edu.tr (aydin sasmaz)
Date: Tue, 13 May 2008 11:43:48 +0300
Subject: [Linux-cluster] High availability xen cluster
In-Reply-To: <47E103BD.4030704@artegence.com>
References: <4eccbcc3e1e1f2b73b7cd81b3bff73b6@mail.van-schelve.de>
	<47E103BD.4030704@artegence.com>
Message-ID: <018401c8b4d5$7a6719b0$6f354d10$@edu.tr>

Hi

I would like to implement automatic failover. Is there any way to do with using cluster suite and redhat ap 5.1?

regards

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki
Sent: Wednesday, March 19, 2008 2:15 PM
To: public at van-schelve.de; linux clustering
Subject: Re: [Linux-cluster] High availability xen cluster

> There are three disks in each data centre. Currently I only use the disks
> in rza.
> 
> At the moment I'm testing with a two node cluster. The virtual machines are
> on SAN disks and I can live migrate from one node to the other one. But
> what I have to cover is the disaster. What happens when the fabric in rza
> crashes? My virtual maschines are unavailable. What I'm thinking about is a
> hardware based
> mirroring between the both fabrics and break up the mirror when the
> disaster happens or we need to power off the storage for maintenance. But
> my problem is that I see duplicate pv id's in this situation. I cannot
> mirror based on lvm because it is too slow.

Do You want do implement automatic failover or manual?
You need hardware based mirroring - I'm sure that Hitachi support it(but
it cost come $ for license). The second choice is DRBD[1] with fe. iSCSI
or GNBD, but You have SAN which is better.
If You need automatic failover, You have to set device-mapper-multipath
with Active/Standby configuration where Active is Your rza and Standby
Your secondary rzb[2]. In this case You have to set also synchronous
replication both side.


[1] - http://www.drbd.org/
[2] -
http://storagefoo.blogspot.com/2006/08/linux-native-multipathing-device.html

Best Regards
Maciej Bogucki

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




From sasmaz at itu.edu.tr  Tue May 13 08:43:48 2008
From: sasmaz at itu.edu.tr (aydin sasmaz)
Date: Tue, 13 May 2008 11:43:48 +0300
Subject: [Linux-cluster] High availability xen cluster
In-Reply-To: <47E103BD.4030704@artegence.com>
References: <4eccbcc3e1e1f2b73b7cd81b3bff73b6@mail.van-schelve.de>
	<47E103BD.4030704@artegence.com>
Message-ID: <018401c8b4d5$7a6719b0$6f354d10$@edu.tr>

Hi

I would like to implement automatic failover. Is there any way to do with using cluster suite and redhat ap 5.1?

regards

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki
Sent: Wednesday, March 19, 2008 2:15 PM
To: public at van-schelve.de; linux clustering
Subject: Re: [Linux-cluster] High availability xen cluster

> There are three disks in each data centre. Currently I only use the disks
> in rza.
> 
> At the moment I'm testing with a two node cluster. The virtual machines are
> on SAN disks and I can live migrate from one node to the other one. But
> what I have to cover is the disaster. What happens when the fabric in rza
> crashes? My virtual maschines are unavailable. What I'm thinking about is a
> hardware based
> mirroring between the both fabrics and break up the mirror when the
> disaster happens or we need to power off the storage for maintenance. But
> my problem is that I see duplicate pv id's in this situation. I cannot
> mirror based on lvm because it is too slow.

Do You want do implement automatic failover or manual?
You need hardware based mirroring - I'm sure that Hitachi support it(but
it cost come $ for license). The second choice is DRBD[1] with fe. iSCSI
or GNBD, but You have SAN which is better.
If You need automatic failover, You have to set device-mapper-multipath
with Active/Standby configuration where Active is Your rza and Standby
Your secondary rzb[2]. In this case You have to set also synchronous
replication both side.


[1] - http://www.drbd.org/
[2] -
http://storagefoo.blogspot.com/2006/08/linux-native-multipathing-device.html

Best Regards
Maciej Bogucki

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




From jas199931 at yahoo.com  Tue May 13 08:49:16 2008
From: jas199931 at yahoo.com (Ja S)
Date: Tue, 13 May 2008 01:49:16 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <48295437.9080500@redhat.com>
Message-ID: <412133.31700.qm@web32203.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > --- Christine Caulfield <ccaulfie at redhat.com>
> wrote:
> > 
> >> Ja S wrote:
> >>> Hi, All:
> >>>
> >>>
> >>> When an application on a cluster node A needs to
> >>> access a file on a SAN storage, how DLM process
> >> the
> >>> lock request? 
> >>>
> >>> Should DLM firstly determine whether there
> already
> >>> exists a lock resource mapped to the file, by
> >> doing
> >>> the following things in the order 1) looking at
> >> the
> >>> master lock resources on the node A, 2)
> searching
> >> the
> >>> local copies of lock resources on the node A, 3)
> >>> searching the lock directory on the node A to
> find
> >> out
> >>> whether a master lock resource assosicated with
> >> the
> >>> file exists on another node, 4) sending messages
> >> to
> >>> other nodes in the cluster for the location of
> the
> >>> master lock resource? 
> >>>
> >>> I ask this question because from some online
> >> articles,
> >>> it seems that DLM will always search the
> >> cluster-wide
> >>> lock directory across the whole cluster first 
> to
> >> find
> >>> the location of the master lock resource. 
> >>>
> >>> Can anyone kindly confirm the order of processes
> >> that
> >>> DLM does?
> >>>
> >>
> >> This should be very well documented, as it's
> common
> >> amongst DLM
> >> implementations.
> >>
> > 
> > I think I may be blind. I have not yet found a
> > document which describes the sequence of processes
> in
> > a precise way. I tried to read the source code but
> I
> > gave up due to lack of comments.
> > 
> > 
> >> If a node needs to lock a resource that it
> doesn't
> >> know about then it
> >> hashes the name to get a directory node ID, than
> >> asks that node for the
> >> master node. if there is no master node (the
> >> resource is not active)
> >> then the requesting node is made master
> >>
> >> if the node does know the master, (other locks on
> >> the resource exist)
> >> then it will go straight to that master node.
> > 
> > 
> > Thanks for the description. 
> > 
> > However, one point is still not clear to me is how
> a
> > node can conclude whether it __knows__ the lock
> > resource or not?
> 
> A node knows the resource if it has a local copy.
> It's as simple as that.
> 

If the node is a human and has a brain, it can
"immediately" recall that it knows the lock resouce.
However, for a computer program, it does not "know"
anything until it search the target in what it has on
hand.

Therefore, the point here is the __search__. What
should the node search and in which order, and how it
searches?

If I missed anything, please kindly point out so that
I can clarify my question as clear as possible.


Thanks again for your time and kind reply.

Jas
> 
> 
> -- 
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      



From ccaulfie at redhat.com  Tue May 13 09:06:06 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Tue, 13 May 2008 10:06:06 +0100
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <412133.31700.qm@web32203.mail.mud.yahoo.com>
References: <412133.31700.qm@web32203.mail.mud.yahoo.com>
Message-ID: <482959FE.7040002@redhat.com>

Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
>> Ja S wrote:
>>> --- Christine Caulfield <ccaulfie at redhat.com>
>> wrote:
>>>> Ja S wrote:
>>>>> Hi, All:
>>>>>
>>>>>
>>>>> When an application on a cluster node A needs to
>>>>> access a file on a SAN storage, how DLM process
>>>> the
>>>>> lock request? 
>>>>>
>>>>> Should DLM firstly determine whether there
>> already
>>>>> exists a lock resource mapped to the file, by
>>>> doing
>>>>> the following things in the order 1) looking at
>>>> the
>>>>> master lock resources on the node A, 2)
>> searching
>>>> the
>>>>> local copies of lock resources on the node A, 3)
>>>>> searching the lock directory on the node A to
>> find
>>>> out
>>>>> whether a master lock resource assosicated with
>>>> the
>>>>> file exists on another node, 4) sending messages
>>>> to
>>>>> other nodes in the cluster for the location of
>> the
>>>>> master lock resource? 
>>>>>
>>>>> I ask this question because from some online
>>>> articles,
>>>>> it seems that DLM will always search the
>>>> cluster-wide
>>>>> lock directory across the whole cluster first 
>> to
>>>> find
>>>>> the location of the master lock resource. 
>>>>>
>>>>> Can anyone kindly confirm the order of processes
>>>> that
>>>>> DLM does?
>>>>>
>>>> This should be very well documented, as it's
>> common
>>>> amongst DLM
>>>> implementations.
>>>>
>>> I think I may be blind. I have not yet found a
>>> document which describes the sequence of processes
>> in
>>> a precise way. I tried to read the source code but
>> I
>>> gave up due to lack of comments.
>>>
>>>
>>>> If a node needs to lock a resource that it
>> doesn't
>>>> know about then it
>>>> hashes the name to get a directory node ID, than
>>>> asks that node for the
>>>> master node. if there is no master node (the
>>>> resource is not active)
>>>> then the requesting node is made master
>>>>
>>>> if the node does know the master, (other locks on
>>>> the resource exist)
>>>> then it will go straight to that master node.
>>>
>>> Thanks for the description. 
>>>
>>> However, one point is still not clear to me is how
>> a
>>> node can conclude whether it __knows__ the lock
>>> resource or not?
>> A node knows the resource if it has a local copy.
>> It's as simple as that.
>>
> 
> If the node is a human and has a brain, it can
> "immediately" recall that it knows the lock resouce.
> However, for a computer program, it does not "know"
> anything until it search the target in what it has on
> hand.
> 
> Therefore, the point here is the __search__. What
> should the node search and in which order, and how it
> searches?
> 
> If I missed anything, please kindly point out so that
> I can clarify my question as clear as possible.
> 
>

I think you're trying to make this more complicated than it is. As I've
said several times now, a node "knows" a resource if there is a local
lock on it. That's it! It's not more or less difficult than that, really
it isn't! If the node doesn't have a local lock on the resource then it
doesn't "know" it and has to ask the directory node where it is
mastered. (As I'm sure you already know, locks are  known by their lock
ID numbers, so there's no "search" involved there either).

There is no "search" for a lock around the cluster, that's what the
directory node provides. And as I have already said, that is located by
hashing the resource name to yield a node ID.

So, if you like, the "search" you seem to be looking for is simply a
hash of the resource name. But it's not really a search, and it's only
invoked when the node first encounters a resource.

Chrissie



From jas199931 at yahoo.com  Tue May 13 09:51:48 2008
From: jas199931 at yahoo.com (Ja S)
Date: Tue, 13 May 2008 02:51:48 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <482959FE.7040002@redhat.com>
Message-ID: <394218.92537.qm@web32202.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > --- Christine Caulfield <ccaulfie at redhat.com>
> wrote:
> > 
> >> Ja S wrote:
> >>> --- Christine Caulfield <ccaulfie at redhat.com>
> >> wrote:
> >>>> Ja S wrote:
> >>>>> Hi, All:
> >>>>>
> >>>>>
> >>>>> When an application on a cluster node A needs
> to
> >>>>> access a file on a SAN storage, how DLM
> process
> >>>> the
> >>>>> lock request? 
> >>>>>
> >>>>> Should DLM firstly determine whether there
> >> already
> >>>>> exists a lock resource mapped to the file, by
> >>>> doing
> >>>>> the following things in the order 1) looking
> at
> >>>> the
> >>>>> master lock resources on the node A, 2)
> >> searching
> >>>> the
> >>>>> local copies of lock resources on the node A,
> 3)
> >>>>> searching the lock directory on the node A to
> >> find
> >>>> out
> >>>>> whether a master lock resource assosicated
> with
> >>>> the
> >>>>> file exists on another node, 4) sending
> messages
> >>>> to
> >>>>> other nodes in the cluster for the location of
> >> the
> >>>>> master lock resource? 
> >>>>>
> >>>>> I ask this question because from some online
> >>>> articles,
> >>>>> it seems that DLM will always search the
> >>>> cluster-wide
> >>>>> lock directory across the whole cluster first 
> >> to
> >>>> find
> >>>>> the location of the master lock resource. 
> >>>>>
> >>>>> Can anyone kindly confirm the order of
> processes
> >>>> that
> >>>>> DLM does?
> >>>>>
> >>>> This should be very well documented, as it's
> >> common
> >>>> amongst DLM
> >>>> implementations.
> >>>>
> >>> I think I may be blind. I have not yet found a
> >>> document which describes the sequence of
> processes
> >> in
> >>> a precise way. I tried to read the source code
> but
> >> I
> >>> gave up due to lack of comments.
> >>>
> >>>
> >>>> If a node needs to lock a resource that it
> >> doesn't
> >>>> know about then it
> >>>> hashes the name to get a directory node ID,
> than
> >>>> asks that node for the
> >>>> master node. if there is no master node (the
> >>>> resource is not active)
> >>>> then the requesting node is made master
> >>>>
> >>>> if the node does know the master, (other locks
> on
> >>>> the resource exist)
> >>>> then it will go straight to that master node.
> >>>
> >>> Thanks for the description. 
> >>>
> >>> However, one point is still not clear to me is
> how
> >> a
> >>> node can conclude whether it __knows__ the lock
> >>> resource or not?
> >> A node knows the resource if it has a local copy.
> >> It's as simple as that.
> >>
> > 
> > If the node is a human and has a brain, it can
> > "immediately" recall that it knows the lock
> resouce.
> > However, for a computer program, it does not
> "know"
> > anything until it search the target in what it has
> on
> > hand.
> > 
> > Therefore, the point here is the __search__. What
> > should the node search and in which order, and how
> it
> > searches?
> > 
> > If I missed anything, please kindly point out so
> that
> > I can clarify my question as clear as possible.
> > 
> >
> 
> I think you're trying to make this more complicated
> than it is. 



Maybe, :-), Just want to know what exact happened.



> As I've
> said several times now, a node "knows" a resource if
> there is a local
> lock on it. That's it! It's not more or less
> difficult than that, really
> it isn't! 

At the same time, there could be 30K local locks on a
node in our system. How are these local locks stored
or mapped, in a hash table, or a big but sparse array?
>From the source code, I guess the local locks are
stored in a list. Correct me if I am wrong since I
really have not yet studied the code very carefully.


> If the node doesn't have a local lock on
> the resource then it
> doesn't "know" it and has to ask the directory node
> where it is
> mastered. 

Does it mean even if the node owns the master lock
resource but it doesn't have a local lock associated
with the master lock resource, it still needs to ask
the directory node?



> (As I'm sure you already know, locks are 
> known by their lock
> ID numbers, so there's no "search" involved there
> either).

True. When a request on a file has been issued, the
inode number of file (in hex) will be used to make up
the name of lock resource (the second number of the
name). 

It is true that the node has the list of lock
resources (either local copy or master copy) as long
as it has local locks. However, the node can just like
a teacher, who has a list of students and the students
are known by their names or student IDs. When the
teacher want to fill up the final grade for each
student, he still needs to look at the form and search
for the student name and put the grade beside the
name. The search can be done according to the student
ID if the form is sorted by the student ID or by the
student surname if the form is sorted by the surname.
Either way, the teacher still needs to __search__.
Same thing should be applied to the node. The node may
use a smart way to search the lock resources kept in
the list, possibly a hash function (but I doubt there
is a very good hash function which can find the
location of the target lock resource immediately). 

Am I still wrong?

> 
> There is no "search" for a lock around the cluster,
> that's what the
> directory node provides. And as I have already said,
> that is located by
> hashing the resource name to yield a node ID.

Yes, yes, I think I didn't say it clearly. The lock
resource is located by hashing the resource name to
yield a node ID. But before hashing, the node still
needs to perform the search within the list or
whatever data strucute that keeps the local locks on
itself to find out whether the target lock resource is
already in use or "known". Isn't it? I am sorry it
seems that I am so stubborn.

Thanks for your patient. You are a really good helper.

Jas

> So, if you like, the "search" you seem to be looking
> for is simply a
> hash of the resource name. But it's not really a
> search, and it's only
> invoked when the node first encounters a resource.
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      



From oliveiros.cristina at gmail.com  Tue May 13 10:59:16 2008
From: oliveiros.cristina at gmail.com (Oliveiros Cristina)
Date: Tue, 13 May 2008 03:59:16 -0700
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <482959FE.7040002@redhat.com>
References: <412133.31700.qm@web32203.mail.mud.yahoo.com>
	<482959FE.7040002@redhat.com>
Message-ID: <f54607780805130359u3597643bx4cd6c5caa4087cd@mail.gmail.com>

Hello, Christine and Ja S.
I've been following this thread, because I need, like Ja, a detailed
knowledge of the DLM inner workings. Your explanations were detailed and
clear, Christine, but, just for the sake of having it documented, do you
know where I can download a white paper or article telling this whole story?

Thanks in advance

Best,
Oliveiros

2008/5/13 Christine Caulfield <ccaulfie at redhat.com>:

> Ja S wrote:
> > --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> >
> >> Ja S wrote:
> >>> --- Christine Caulfield <ccaulfie at redhat.com>
> >> wrote:
> >>>> Ja S wrote:
> >>>>> Hi, All:
> >>>>>
> >>>>>
> >>>>> When an application on a cluster node A needs to
> >>>>> access a file on a SAN storage, how DLM process
> >>>> the
> >>>>> lock request?
> >>>>>
> >>>>> Should DLM firstly determine whether there
> >> already
> >>>>> exists a lock resource mapped to the file, by
> >>>> doing
> >>>>> the following things in the order 1) looking at
> >>>> the
> >>>>> master lock resources on the node A, 2)
> >> searching
> >>>> the
> >>>>> local copies of lock resources on the node A, 3)
> >>>>> searching the lock directory on the node A to
> >> find
> >>>> out
> >>>>> whether a master lock resource assosicated with
> >>>> the
> >>>>> file exists on another node, 4) sending messages
> >>>> to
> >>>>> other nodes in the cluster for the location of
> >> the
> >>>>> master lock resource?
> >>>>>
> >>>>> I ask this question because from some online
> >>>> articles,
> >>>>> it seems that DLM will always search the
> >>>> cluster-wide
> >>>>> lock directory across the whole cluster first
> >> to
> >>>> find
> >>>>> the location of the master lock resource.
> >>>>>
> >>>>> Can anyone kindly confirm the order of processes
> >>>> that
> >>>>> DLM does?
> >>>>>
> >>>> This should be very well documented, as it's
> >> common
> >>>> amongst DLM
> >>>> implementations.
> >>>>
> >>> I think I may be blind. I have not yet found a
> >>> document which describes the sequence of processes
> >> in
> >>> a precise way. I tried to read the source code but
> >> I
> >>> gave up due to lack of comments.
> >>>
> >>>
> >>>> If a node needs to lock a resource that it
> >> doesn't
> >>>> know about then it
> >>>> hashes the name to get a directory node ID, than
> >>>> asks that node for the
> >>>> master node. if there is no master node (the
> >>>> resource is not active)
> >>>> then the requesting node is made master
> >>>>
> >>>> if the node does know the master, (other locks on
> >>>> the resource exist)
> >>>> then it will go straight to that master node.
> >>>
> >>> Thanks for the description.
> >>>
> >>> However, one point is still not clear to me is how
> >> a
> >>> node can conclude whether it __knows__ the lock
> >>> resource or not?
> >> A node knows the resource if it has a local copy.
> >> It's as simple as that.
> >>
> >
> > If the node is a human and has a brain, it can
> > "immediately" recall that it knows the lock resouce.
> > However, for a computer program, it does not "know"
> > anything until it search the target in what it has on
> > hand.
> >
> > Therefore, the point here is the __search__. What
> > should the node search and in which order, and how it
> > searches?
> >
> > If I missed anything, please kindly point out so that
> > I can clarify my question as clear as possible.
> >
> >
>
> I think you're trying to make this more complicated than it is. As I've
> said several times now, a node "knows" a resource if there is a local
> lock on it. That's it! It's not more or less difficult than that, really
> it isn't! If the node doesn't have a local lock on the resource then it
> doesn't "know" it and has to ask the directory node where it is
> mastered. (As I'm sure you already know, locks are  known by their lock
> ID numbers, so there's no "search" involved there either).
>
> There is no "search" for a lock around the cluster, that's what the
> directory node provides. And as I have already said, that is located by
> hashing the resource name to yield a node ID.
>
> So, if you like, the "search" you seem to be looking for is simply a
> hash of the resource name. But it's not really a search, and it's only
> invoked when the node first encounters a resource.
>
> Chrissie
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080513/3987dbcc/attachment.htm>

From s.wendy.cheng at gmail.com  Tue May 13 19:05:33 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 13 May 2008 14:05:33 -0500
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <4826D2D1.7010103@auckland.ac.nz>
References: <4826D2D1.7010103@auckland.ac.nz>
Message-ID: <4829E67D.2050602@gmail.com>

Michael O'Sullivan wrote:
> Hi everyone,
>
> I have set up a small experimental network with a linux cluster and 
> SAN that I want to have high data availability. There are 2 servers 
> that I have put into a cluster using conga (thank you luci and ricci). 
> There are 2 storage devices, each consisting of a basic server with 2 
> x 1TB disks. The cluster servers and the storage devices each have 2 
> NICs and are connected using 2 gigabit ethernet switches.

It is a little bit hard to figure out the exact configuration based on 
this description (a diagram would help if you can). In general, I don't 
think GFS tuned well with iscsi, particularly the latency could spike if 
DLM traffic gets mingled with file data traffic, regardless your network 
bandwidth. However, I don't have enough data to support the speculation. 
It is also very application dependent. One key question is what kind of 
GFS applications you plan to dispatch in this environment ?

I see you have a SAN here .. Any reason to choose iscsi over FC ?

>
> I have created a single striped logical volume on each storage device 
> using the 2 disks (to try and speed up I/O on the volume). These 
> volumes (one on each storage device) are presented to the cluster 
> servers using iSCSI (on the cluster servers) and iSCSI target (on the 
> storage devices). Since there are multiple NICs on the storage devices 
> I have set up two iSCSI portals to each logical volume. I have then 
> used mdadm to ensure the volumes are accessible via multipath.

The iscsi target function is carried out by the storage device 
(firmware) or you use Linux's iscsi target ?
>
> Finally, since I want the storage devices to present the data in a 
> highly available way I have used mdadm to create a software raid-5 
> across the two multipathed volumes (I realise this is essentially 
> mirroring on the 2 storage devices but I am trying to set this up to 
> be extensible to extra storage devices). My next step is to present 
> the raid array (of the two multipathed volumes - one on each storage 
> device) as a GFS to the cluster servers to ensure that locking of 
> access to the data is handled properly.

So you're going to have CLVM built on top of software RAID ? .. This 
looks cumbersome. Again, a diagram could help people understand more.

-- Wendy
>
> I have recently read that multipathing is possible within GFS, but 
> raid is not (yet). Since I want the two storage devices in a raid-5 
> array and I am using iSCSI I'm not sure if I should try and use GFS to 
> do the multipathing. Also, being a linux/storage/clustering newbie I'm 
> not sure if my approach is the best thing to do. I want to make sure 
> that my system has no single point of failure that will make any of 
> the data inaccessible. I'm pretty sure our network design supports 
> this. I assume (if I configure it right) the cluster will ensure 
> services will keep going if one of the cluster servers goes down. Thus 
> the only weak point was the storage devices which I hope I have now 
> strengthened by essentially implementing network raid across iSCSI and 
> then presented as a single GFS.
>
> I would really appreciate comments/advice/constructive criticism as I 
> have really been learning much of this as I go.
>
>



From rick.ochoa at gmail.com  Tue May 13 20:47:55 2008
From: rick.ochoa at gmail.com (rick ochoa)
Date: Tue, 13 May 2008 16:47:55 -0400
Subject: [Linux-cluster] GFS, Locking, Read-Only, and high processor loads
Message-ID: <A4E4767D-DF9E-4056-A0AB-E7D41384BC43@gmail.com>

Hi,

I'm setting up a GFS implementation and was wondering what kind of  
tuning parameters I can set for both read-only and read-write.

I work for a company that is migrating to a SAN, implementing GFS as  
the filesystem. We currently rsync our data from a master server to 5  
front-end webservers running Apache and PHP. The rsyncs take an  
extraordinarily long time as our content (currently >2.5 million small  
files) grows, and does not scale very well as we add more front-end  
machines. Our thinking was to put content generated on two inward  
facing editorial machines on the SAN as read/write, and our web front- 
ends as read-only. All temporary files and logging would write to  
local disk. The goal of our initial work was to create this content  
filesystem, mount the disks, eliminate the rsyncs, and free up our  
rsync server for use as a slave database server.

We used the Luci to configure a node and fencing on a new front-end,  
and formatted and configured our disk with it. Our deploy plan was to  
set this machine up, put it behind the load-balancer, and have it  
operate under normal load for a few days to "burn it in." Once  
complete, we would begin to migrate the other four front-ends over to  
the SAN, mounted RO after a reinstall of the OS.

This procedure worked without too much issue until we hit the fourth  
machine in the cluster, where the cpu load went terrifyingly high and  
we got many "D" state httpd processes. Googling "uninterruptible sleep  
GFS php" I found references from 2006 about file locking with php and  
its use of flock() at the start of a session. The disks were remounted  
as "spectator" in an attempt to limit disk I/O on journals. This  
seemed to help, but as it was the end of the day seems a false  
positive. The next day, CPU load was again incredibly high, and after  
much flailing about we went back to local ext3 disks to buy us some  
time.

I'm reading through this list, which is very informative. I'm  
attempting to tune our GFS mounts a bit, watching the output of  
gfs_tool counters on the filesystems, and looking for any anomalies.  
Here's a more detailed description of our setup:

Our hardware configuration consists of a NexSAN SATABoy populated with  
8 750GB disks (RAID 5/4.7Tb), and a Brocade Silkworm 3800 for data and  
fencing. We purchased QLogic single-port, 4Gb HBAs for our servers.  
(more info available on request)

The RAID has 4 partitions, 2 are not mounted:

	local - (not mounted) 500GB, extents 4.0MB, block size 4KB,  
attributes -wi-ao,
		dlm lock protocol - mount /usr/local_san (rw)
		this is a copy of /usr/local, which can be synced to all hosts
	code - 	(not mounted) 500GB, extents 4.0MB, block size 4KB,  
attributes -wi-ao,
		dlm lock protocol - mount /web/code (rw)
		this is a copy of /huffpo/web/prod, without the www content and tmp  
trees
	tmp -   500GB, extents 4.0MB, block size 4KB, attributes -wi-a-,
		dlm lock protocol - mount /web/prod/tmp (rw)
		this is the temporary directory for front-end web code
	www -   2TB, extents 4MB, block size 4KB, attributes -wi-ao,
		dlm local protocol - mount /web/prod/www (ro)
		read-only content directory, 4 hosts, /etc/fstab options at the time  
were ro
                 read/write on 1 host

	we have ~2 more TB available, currently not in use


After reading the list a bit, I've come up with the following tunings  
for read-only:

      gfs_tool settune /web/prod/www/content glock_purge 80
      gfs_tool settune /web/prod/www/content quota_account 0
      gfs_tool settune /web/prod/www/content demote_secs 60
      gfs_tool settune /web/prod/www/content scand_secs 30

      /etc/fstab has spectator,noatime,num_glockd=32 as mount options

And the read/write host has:

      gfs_tool settune /web/prod/www/content statfs_fast 1

      /etc/fstab has num_glockd=32,noatime as mount options


I've noticed using gfs_tool counters /web/prod/www/content usually has  
sub 80k locks for the read/write host running rsync, and sub 10k locks  
for the one (and only) read-only host, where previously the number of  
locks on all hosts numbered ~80k.

Can I be a bit more aggressive with locks on read-only filesystems  
with the current tunings enabled? I'm not sure what the purpose of the  
locks on read-only filesystems serve in this instance.

Is there a better configuration for heavy reads on a GFS filesystem  
that is read only? vmstat -d gives me for this filesystem:
disk- ------------reads------------ ------------writes----------- ----- 
IO------
[...]
sdc   411192  82490 3998862 7402555    607    645   10016    3837       
0    695

My big fear is although the systems currently seem to be running  
without too much incident, as I add nodes back into the cluster the  
number of locks and system load will again run high. As we transition  
from using rsync to writing directly onto the SAN, the number of locks  
on rw hosts should go down because the spendy directory scans should  
be removed.

Are there certain other optimizations I could use to lower the lock  
counts?




From gordan at bobich.net  Tue May 13 21:08:38 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Tue, 13 May 2008 22:08:38 +0100
Subject: [Linux-cluster] GFS, Locking, Read-Only, and high processor loads
In-Reply-To: <A4E4767D-DF9E-4056-A0AB-E7D41384BC43@gmail.com>
References: <A4E4767D-DF9E-4056-A0AB-E7D41384BC43@gmail.com>
Message-ID: <482A0356.7070503@bobich.net>

rick ochoa wrote:

> I work for a company that is migrating to a SAN, implementing GFS as the 
> filesystem. We currently rsync our data from a master server to 5 
> front-end webservers running Apache and PHP. The rsyncs take an 
> extraordinarily long time as our content (currently >2.5 million small 
> files) grows, and does not scale very well as we add more front-end 
> machines. Our thinking was to put content generated on two inward facing 
> editorial machines on the SAN as read/write, and our web front-ends as 
> read-only. All temporary files and logging would write to local disk. 
> The goal of our initial work was to create this content filesystem, 
> mount the disks, eliminate the rsyncs, and free up our rsync server for 
> use as a slave database server.

You may have options that don't require SAN. If you're happy to continue 
with DAS (i.e. the cost of SAN doesn't exceed the cost of having 
separate disks in each machine with the number of machines you foresee 
using in the near future), you may do well with DRBD instead of a SAN.

> We used the Luci to configure a node and fencing on a new front-end, and 
> formatted and configured our disk with it. Our deploy plan was to set 
> this machine up, put it behind the load-balancer, and have it operate 
> under normal load for a few days to "burn it in." Once complete, we 
> would begin to migrate the other four front-ends over to the SAN, 
> mounted RO after a reinstall of the OS.
>
> This procedure worked without too much issue until we hit the fourth 
> machine in the cluster, where the cpu load went terrifyingly high and we 
> got many "D" state httpd processes. Googling "uninterruptible sleep GFS 
> php" I found references from 2006 about file locking with php and its 
> use of flock() at the start of a session. The disks were remounted as 
> "spectator" in an attempt to limit disk I/O on journals. This seemed to 
> help, but as it was the end of the day seems a false positive. The next 
> day, CPU load was again incredibly high, and after much flailing about 
> we went back to local ext3 disks to buy us some time.

If you have lots of I/O on lots of files in few directories, you may be 
out of luck. A lot of the overhead of GFS (or any similar FS) is 
unavoidable be - the locking between the nodes has to be synchronised 
for every file open.

Mounting with noatime,nodiratime,noquota may help a bit, but you will 
never see performance with frequent access to lots of small files that 
gets anywhere near local disk performance.

There are, however, other options. If DAS is an option for you (and it 
sounds like it is), look into GlusterFS. It's performance isn't great 
per se (may well be worse than GFS) if you use it the intended way, but 
you can use it as a file replication system. If you point your web 
directory directly at the file store (if you do this, you must be 100% 
sure that NOTHING you do to those files will involve any kind of 
writing, or things can get unpredictable and files can get corrupted). 
This means you'll get local disk performance with the advantage of not 
having to rsync the data. As long as all nodes are connected, the file 
changes on the master server will get sent out to the replicas. If you 
need to reboot a node, you'll need to ensure that it's consistent, which 
is done by forcing a resync by firing off a find to read the first byte 
of every file on the mount point. This will force the node to check that 
it's files are up to date against other nodes. Note that this will cause 
increased load on all the other nodes while it completes, so use with care.

Gordan



From kelsey.hightower at gmail.com  Tue May 13 21:26:56 2008
From: kelsey.hightower at gmail.com (Kelsey Hightower)
Date: Tue, 13 May 2008 17:26:56 -0400
Subject: [Linux-cluster] Complete cluster.conf Schema Description
Message-ID: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com>

Hello,
I have been searching the web for weeks. I am trying to get the complete
cluster.conf schema description. I have found a link that describes most of
the options but it seems to omit the resources, services, and anything
related to configuring failover services.

http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080513/e8567f37/attachment.htm>

From s.wendy.cheng at gmail.com  Tue May 13 23:22:05 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 13 May 2008 18:22:05 -0500
Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for?
In-Reply-To: <792305.60086.qm@web32203.mail.mud.yahoo.com>
References: <792305.60086.qm@web32203.mail.mud.yahoo.com>
Message-ID: <482A229D.2040108@gmail.com>

Ja S wrote:
> Hi, Wendy:
>
> Thanks for your so prompt and kind explanation. It is
> very helpful. According to your comments, I did
> another test. See below:
>  
> # stat abc/
>   File: `abc/'
>   Size: 8192            Blocks: 6024       IO Block:
> 4096   directory
> Device: fc00h/64512d    Inode: 1065226     Links: 2
> Access: (0770/drwxrwx---)  Uid: (    0/    root)  
> Gid: (    0/    root)
> Access: 2008-05-08 06:18:58.000000000 +0000
> Modify: 2008-04-15 03:02:24.000000000 +0000
> Change: 2008-04-15 07:11:52.000000000 +0000
>
> # cd abc/
> # time ls | wc -l 
> 31764
>
> real    0m44.797s
> user    0m0.189s
> sys     0m2.276s
>
> The real time in this test is much shorter than the
> previous one. However, it is still reasonable long. As
> you said, the ?ls? command only reads the single
> directory file. In my case, the directory file itself
> is only 8192 bytes. The time spent on disk IO should
> be included in ?sys 0m2.276s?. Although DLM needs time
> to lookup the location of the corresponding master
> lock resource and to process locking, the system
> should not take about 42 seconds to complete the ?ls?
> command. So, what is the hidden issue or is there a
> way to identify possible bottlenecks? 
>
>   
IIRC, disk IO wait time is excluded from "sys", so you really can't 
conclude the lion share of your wall (real) time is due to DLM locking. 
We don't know for sure unless you can provide the relevant profiling 
data (try to learn how to use OProfile and/or SystemTap to see where 
exactly your system is waiting at). Latency issues like this is tricky. 
It would be foolish to conclude anything just by reading the command 
output without knowing the surrounding configuration and/or run time 
environment.

If small file read latency is important to you, did you turn off storage 
device's readahead ? Did you try different Linux kernel elevator 
algorithms ? Did you make sure your other network traffic didn't block 
DLM traffic ? Be aware latency and bandwidth are two different things. A 
big and fat network link doesn't automatically imply a quick response 
time though it may carry more bandwidth.

-- Wendy



From jas199931 at yahoo.com  Wed May 14 00:56:44 2008
From: jas199931 at yahoo.com (Ja S)
Date: Tue, 13 May 2008 17:56:44 -0700 (PDT)
Subject: [Linux-cluster] Locks reported by gfs_tool lockdump does not match
	that presented in dlm_locks. Any reason??
Message-ID: <389733.26852.qm@web32203.mail.mud.yahoo.com>

Hi, All:

For a given lock space, at the same time, I saved a
copy of the output of ?gfs_tool lockdump? as
?gfs_locks? and a copy of dlm_locks. 

Then I checked the locks presents in the two saved
files. I realized that the number of locks in
gfs_locks is not the same as the locks presented in
dlm_locks.

For instance, 
>From dlm_locks:
9980 NL locks, where
--7984 locks are from remote nodes
--0 locks are on remote nodes
--1996 locks are processed on its own master lock
resources
0 CR locks, where
--0 locks are from remote nodes
--0 locks are on remote nodes
--0 locks are processed on its own master lock
resources
0 CW locks, where
--0 locks are from remote nodes
--0 locks are on remote nodes
--0 locks are processed on its own master lock
resources
1173 PR locks, where
--684 locks are from remote nodes
--32 locks are on remote nodes
--457 locks are processed on its own master lock
resources
0 PW locks, where
--0 locks are from remote nodes
--0 locks are on remote nodes
--0 locks are processed on its own master lock
resources
47 EX locks, where
--46 locks are from remote nodes
--0 locks are on remote nodes
--1 locks are processed on its own master lock
resources

In summary, 
11200 locks in total, where
-- 8714 locks are from remote nodes (entries with ?
Remote: ?)
-- 32 locks are on remote nodes (entries with ?
Master: ?)
-- 2454 locks are processed on its own master lock
resources (entries with only lock ID and lock mode)

These locks are all in the granted queue. There is
nothing under the conversion and waiting queues.
======================================

>From gfs_locks, there are 2932 locks in total, ( grep
?^Glock ? and count the entries). Then for each Glock
I got the second number which is the ID of a lock
resource, and searched the ID in dlm_locks. I then
split the searched results into two groups as shown
below:
--46 locks are associated with local copies of master
lock resources on remote nodes
--2886 locks are associated with master lock resources
on the node itself


======================================
Now, I tried to find the relationship between the five
numbers from two sources but ended up nowhere.
Dlm_locks:
-- 8714 locks are from remote nodes 
-- 32 locks are on remote nodes
-- 2454 locks are processed on its own master lock
resources 
Gfs_locks:
--46 locks are associated with local copies of master
lock resources on remote nodes
--2886 locks are associated with master lock resources
on the node itself

Can anyone kindly point out the relationships between
the number of locks presented in dlm_locks and
gfs_locks?


Thanks for your time on reading this long question and
look forward to your help.

Jas



      



From s.wendy.cheng at gmail.com  Wed May 14 01:43:24 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Tue, 13 May 2008 20:43:24 -0500
Subject: [Linux-cluster] Locks reported by gfs_tool lockdump does not
	match that presented in dlm_locks. Any reason??
In-Reply-To: <389733.26852.qm@web32203.mail.mud.yahoo.com>
References: <389733.26852.qm@web32203.mail.mud.yahoo.com>
Message-ID: <482A43BC.8040807@gmail.com>

Ja S wrote:
> Hi, All:
>
> For a given lock space, at the same time, I saved a
> copy of the output of ?gfs_tool lockdump? as
> ?gfs_locks? and a copy of dlm_locks. 
>
> Then I checked the locks presents in the two saved
> files. I realized that the number of locks in
> gfs_locks is not the same as the locks presented in
> dlm_locks.
>
> For instance, 
> >From dlm_locks:
> 9980 NL locks, where
> --7984 locks are from remote nodes
> --0 locks are on remote nodes
> --1996 locks are processed on its own master lock
> resources
> 0 CR locks, where
> --0 locks are from remote nodes
> --0 locks are on remote nodes
> --0 locks are processed on its own master lock
> resources
> 0 CW locks, where
> --0 locks are from remote nodes
> --0 locks are on remote nodes
> --0 locks are processed on its own master lock
> resources
> 1173 PR locks, where
> --684 locks are from remote nodes
> --32 locks are on remote nodes
> --457 locks are processed on its own master lock
> resources
> 0 PW locks, where
> --0 locks are from remote nodes
> --0 locks are on remote nodes
> --0 locks are processed on its own master lock
> resources
> 47 EX locks, where
> --46 locks are from remote nodes
> --0 locks are on remote nodes
> --1 locks are processed on its own master lock
> resources
>
> In summary, 
> 11200 locks in total, where
> -- 8714 locks are from remote nodes (entries with ?
> Remote: ?)
> -- 32 locks are on remote nodes (entries with ?
> Master: ?)
> -- 2454 locks are processed on its own master lock
> resources (entries with only lock ID and lock mode)
>
> These locks are all in the granted queue. There is
> nothing under the conversion and waiting queues.
> ======================================
>
> >From gfs_locks, there are 2932 locks in total, ( grep
> ?^Glock ? and count the entries). Then for each Glock
> I got the second number which is the ID of a lock
> resource, and searched the ID in dlm_locks. I then
> split the searched results into two groups as shown
> below:
> --46 locks are associated with local copies of master
> lock resources on remote nodes
> --2886 locks are associated with master lock resources
> on the node itself
>
>
> ======================================
> Now, I tried to find the relationship between the five
> numbers from two sources but ended up nowhere.
> Dlm_locks:
> -- 8714 locks are from remote nodes 
> -- 32 locks are on remote nodes
> -- 2454 locks are processed on its own master lock
> resources 
> Gfs_locks:
> --46 locks are associated with local copies of master
> lock resources on remote nodes
> --2886 locks are associated with master lock resources
> on the node itself
>
> Can anyone kindly point out the relationships between
> the number of locks presented in dlm_locks and
> gfs_locks?
>
>
> Thanks for your time on reading this long question and
> look forward to your help.
>
>   
I doubt this will help anything from practical point of view.. 
understanding how to run Oprofile and/or SystemTap will probably help 
you more on the long run. However, if you want to know .. the following 
are why they are different:

GFS locking is controlled by a subsysgtem called "glock". Glock is 
designed to run and interact with *different* distributed lock managers; 
e.g. in RHEL 3, other than DLM, it also works with another lock manager 
called "GULM". Only active locks has an one-to-one correspondence with 
the lock entities inside lock manager. If a glock is in UNLOCK state, 
lock manager may or may not have the subject lock in its record - they 
are subject to get purged depending on memory and/or resource pressure. 
The other way around is also true. A lock may exist in lock manager's 
database but it could have been removed from glock subsystem. Glock 
itself doesn't know about cluster configuration so it relies on external 
lock manager to do inter-node communication. On the other hand, it 
carries some other functions such as data flushing to disk when glock is 
demoted from exclusive (write) to shared (read).

-- Wendy



From jas199931 at yahoo.com  Wed May 14 04:32:04 2008
From: jas199931 at yahoo.com (Ja S)
Date: Tue, 13 May 2008 21:32:04 -0700 (PDT)
Subject: [Linux-cluster] Locks reported by gfs_tool lockdump does not
	match that presented in dlm_locks. Any reason??
In-Reply-To: <482A43BC.8040807@gmail.com>
Message-ID: <131844.26918.qm@web32208.mail.mud.yahoo.com>


--- Wendy Cheng <s.wendy.cheng at gmail.com> wrote:

> Ja S wrote:
> > Hi, All:
> >
> > For a given lock space, at the same time, I saved
> a
> > copy of the output of ?gfs_tool lockdump?as
> > ?gfs_locks?and a copy of dlm_locks. 
> >
> > Then I checked the locks presents in the two saved
> > files. I realized that the number of locks in
> > gfs_locks is not the same as the locks presented
> in
> > dlm_locks.
> >
> > For instance, 
> > >From dlm_locks:
> > 9980 NL locks, where
> > --7984 locks are from remote nodes
> > --0 locks are on remote nodes
> > --1996 locks are processed on its own master lock
> > resources
> > 0 CR locks, where
> > --0 locks are from remote nodes
> > --0 locks are on remote nodes
> > --0 locks are processed on its own master lock
> > resources
> > 0 CW locks, where
> > --0 locks are from remote nodes
> > --0 locks are on remote nodes
> > --0 locks are processed on its own master lock
> > resources
> > 1173 PR locks, where
> > --684 locks are from remote nodes
> > --32 locks are on remote nodes
> > --457 locks are processed on its own master lock
> > resources
> > 0 PW locks, where
> > --0 locks are from remote nodes
> > --0 locks are on remote nodes
> > --0 locks are processed on its own master lock
> > resources
> > 47 EX locks, where
> > --46 locks are from remote nodes
> > --0 locks are on remote nodes
> > --1 locks are processed on its own master lock
> > resources
> >
> > In summary, 
> > 11200 locks in total, where
> > -- 8714 locks are from remote nodes (entries with
> ?> > Remote: ?
> > -- 32 locks are on remote nodes (entries with ?> >
Master: ?
> > -- 2454 locks are processed on its own master lock
> > resources (entries with only lock ID and lock
> mode)
> >
> > These locks are all in the granted queue. There is
> > nothing under the conversion and waiting queues.
> > ======================================
> >
> > >From gfs_locks, there are 2932 locks in total, (
> grep
> > ?^Glock ?and count the entries). Then for each
> Glock
> > I got the second number which is the ID of a lock
> > resource, and searched the ID in dlm_locks. I then
> > split the searched results into two groups as
> shown
> > below:
> > --46 locks are associated with local copies of
> master
> > lock resources on remote nodes
> > --2886 locks are associated with master lock
> resources
> > on the node itself
> >
> >
> > ======================================
> > Now, I tried to find the relationship between the
> five
> > numbers from two sources but ended up nowhere.
> > Dlm_locks:
> > -- 8714 locks are from remote nodes 
> > -- 32 locks are on remote nodes
> > -- 2454 locks are processed on its own master lock
> > resources 
> > Gfs_locks:
> > --46 locks are associated with local copies of
> master
> > lock resources on remote nodes
> > --2886 locks are associated with master lock
> resources
> > on the node itself
> >
> > Can anyone kindly point out the relationships
> between
> > the number of locks presented in dlm_locks and
> > gfs_locks?
> >
> >
> > Thanks for your time on reading this long question
> and
> > look forward to your help.
> >
> >   
> I doubt this will help anything from practical point
> of view.. 
> understanding how to run Oprofile and/or SystemTap
> will probably help 
> you more on the long run. However, if you want to
> know .. the following 
> are why they are different:
> 
> GFS locking is controlled by a subsysgtem called
> "glock". Glock is 
> designed to run and interact with *different*
> distributed lock managers; 
> e.g. in RHEL 3, other than DLM, it also works with
> another lock manager 
> called "GULM". Only active locks has an one-to-one
> correspondence with 
> the lock entities inside lock manager. If a glock is
> in UNLOCK state, 
> lock manager may or may not have the subject lock in
> its record - they 
> are subject to get purged depending on memory and/or
> resource pressure. 
> The other way around is also true. A lock may exist
> in lock manager's 
> database but it could have been removed from glock
> subsystem. Glock 
> itself doesn't know about cluster configuration so
> it relies on external 
> lock manager to do inter-node communication. On the
> other hand, it 
> carries some other functions such as data flushing
> to disk when glock is 
> demoted from exclusive (write) to shared (read).


Thanks for the explanation. It is very helpful.

Jas


> -- Wendy
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      



From ccaulfie at redhat.com  Wed May 14 07:23:18 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 14 May 2008 08:23:18 +0100
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <394218.92537.qm@web32202.mail.mud.yahoo.com>
References: <394218.92537.qm@web32202.mail.mud.yahoo.com>
Message-ID: <482A9366.1080500@redhat.com>

Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
>> Ja S wrote:
>>> --- Christine Caulfield <ccaulfie at redhat.com>
>> wrote:
>>>> Ja S wrote:
>>>>> --- Christine Caulfield <ccaulfie at redhat.com>
>>>> wrote:
>>>>>> Ja S wrote:
>>>>>>> Hi, All:
>>>>>>>
>>>>>>>
>>>>>>> When an application on a cluster node A needs
>> to
>>>>>>> access a file on a SAN storage, how DLM
>> process
>>>>>> the
>>>>>>> lock request? 
>>>>>>>
>>>>>>> Should DLM firstly determine whether there
>>>> already
>>>>>>> exists a lock resource mapped to the file, by
>>>>>> doing
>>>>>>> the following things in the order 1) looking
>> at
>>>>>> the
>>>>>>> master lock resources on the node A, 2)
>>>> searching
>>>>>> the
>>>>>>> local copies of lock resources on the node A,
>> 3)
>>>>>>> searching the lock directory on the node A to
>>>> find
>>>>>> out
>>>>>>> whether a master lock resource assosicated
>> with
>>>>>> the
>>>>>>> file exists on another node, 4) sending
>> messages
>>>>>> to
>>>>>>> other nodes in the cluster for the location of
>>>> the
>>>>>>> master lock resource? 
>>>>>>>
>>>>>>> I ask this question because from some online
>>>>>> articles,
>>>>>>> it seems that DLM will always search the
>>>>>> cluster-wide
>>>>>>> lock directory across the whole cluster first 
>>>> to
>>>>>> find
>>>>>>> the location of the master lock resource. 
>>>>>>>
>>>>>>> Can anyone kindly confirm the order of
>> processes
>>>>>> that
>>>>>>> DLM does?
>>>>>>>
>>>>>> This should be very well documented, as it's
>>>> common
>>>>>> amongst DLM
>>>>>> implementations.
>>>>>>
>>>>> I think I may be blind. I have not yet found a
>>>>> document which describes the sequence of
>> processes
>>>> in
>>>>> a precise way. I tried to read the source code
>> but
>>>> I
>>>>> gave up due to lack of comments.
>>>>>
>>>>>
>>>>>> If a node needs to lock a resource that it
>>>> doesn't
>>>>>> know about then it
>>>>>> hashes the name to get a directory node ID,
>> than
>>>>>> asks that node for the
>>>>>> master node. if there is no master node (the
>>>>>> resource is not active)
>>>>>> then the requesting node is made master
>>>>>>
>>>>>> if the node does know the master, (other locks
>> on
>>>>>> the resource exist)
>>>>>> then it will go straight to that master node.
>>>>> Thanks for the description. 
>>>>>
>>>>> However, one point is still not clear to me is
>> how
>>>> a
>>>>> node can conclude whether it __knows__ the lock
>>>>> resource or not?
>>>> A node knows the resource if it has a local copy.
>>>> It's as simple as that.
>>>>
>>> If the node is a human and has a brain, it can
>>> "immediately" recall that it knows the lock
>> resouce.
>>> However, for a computer program, it does not
>> "know"
>>> anything until it search the target in what it has
>> on
>>> hand.
>>>
>>> Therefore, the point here is the __search__. What
>>> should the node search and in which order, and how
>> it
>>> searches?
>>>
>>> If I missed anything, please kindly point out so
>> that
>>> I can clarify my question as clear as possible.
>>>
>>>
>> I think you're trying to make this more complicated
>> than it is. 
> 
> 
> 
> Maybe, :-), Just want to know what exact happened.
> 
> 
> 
>> As I've
>> said several times now, a node "knows" a resource if
>> there is a local
>> lock on it. That's it! It's not more or less
>> difficult than that, really
>> it isn't! 
> 
> At the same time, there could be 30K local locks on a
> node in our system. How are these local locks stored
> or mapped, in a hash table, or a big but sparse array?
>>From the source code, I guess the local locks are
> stored in a list. Correct me if I am wrong since I
> really have not yet studied the code very carefully.
> 
> 
>> If the node doesn't have a local lock on
>> the resource then it
>> doesn't "know" it and has to ask the directory node
>> where it is
>> mastered. 
> 
> Does it mean even if the node owns the master lock
> resource but it doesn't have a local lock associated
> with the master lock resource, it still needs to ask
> the directory node?
> 
> 
> 
>> (As I'm sure you already know, locks are 
>> known by their lock
>> ID numbers, so there's no "search" involved there
>> either).
> 
> True. When a request on a file has been issued, the
> inode number of file (in hex) will be used to make up
> the name of lock resource (the second number of the
> name). 
> 
> It is true that the node has the list of lock
> resources (either local copy or master copy) as long
> as it has local locks. However, the node can just like
> a teacher, who has a list of students and the students
> are known by their names or student IDs. When the
> teacher want to fill up the final grade for each
> student, he still needs to look at the form and search
> for the student name and put the grade beside the
> name. The search can be done according to the student
> ID if the form is sorted by the student ID or by the
> student surname if the form is sorted by the surname.
> Either way, the teacher still needs to __search__.
> Same thing should be applied to the node. The node may
> use a smart way to search the lock resources kept in
> the list, possibly a hash function (but I doubt there
> is a very good hash function which can find the
> location of the target lock resource immediately). 
> 
> Am I still wrong?
> 
>> There is no "search" for a lock around the cluster,
>> that's what the
>> directory node provides. And as I have already said,
>> that is located by
>> hashing the resource name to yield a node ID.
> 
> Yes, yes, I think I didn't say it clearly. The lock
> resource is located by hashing the resource name to
> yield a node ID. But before hashing, the node still
> needs to perform the search within the list or
> whatever data strucute that keeps the local locks on
> itself to find out whether the target lock resource is
> already in use or "known". Isn't it? I am sorry it
> seems that I am so stubborn.
> 
> Thanks for your patient. You are a really good helper.
> 
> Jas
> 
>> So, if you like, the "search" you seem to be looking
>> for is simply a
>> hash of the resource name. But it's not really a
>> search, and it's only
>> invoked when the node first encounters a resource.
>>
>


hash tables, hash tables, hash tables ;-)


-- 

Chrissie



From jas199931 at yahoo.com  Wed May 14 08:51:08 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 01:51:08 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <482A9366.1080500@redhat.com>
Message-ID: <560093.12792.qm@web32208.mail.mud.yahoo.com>

> >> If the node doesn't have a local lock on
> >> the resource then it
> >> doesn't "know" it and has to ask the directory
> >> node where it is mastered. 

> > Does it mean even if the node owns the master lock
> > resource but it doesn't have a local lock
> > associated with the master lock resource, it
> > still needs to ask the directory node?


> hash tables, hash tables, hash tables ;-)

Sure. Now I see what do you mean "knows". Thanks.

Could you please kindly answer my last question above?

Best,

Jas



      



From jakub.suchy at enlogit.cz  Wed May 14 08:59:41 2008
From: jakub.suchy at enlogit.cz (Jakub Suchy)
Date: Wed, 14 May 2008 10:59:41 +0200
Subject: [Linux-cluster] SeznamFS
Message-ID: <20080514085941.GA22634@localhost>

Hi,
seznam.cz (Czech competitor of Google) has announced it's "SeznamFS" -
http://seznamfs.sourceforge.net/

-- cut --
SeznamFS is distributed binlogging filesystem based on FUSE. It works
similar to MySQL, it creates a binary log containing all write
operations and provides it to slaves as master. Every server has its own
server ID and therefore it's possible to use master-master replication
(with the same limitations as MySQL master-master replication has) or
multimaster round replication. For more information have look at
documentation.
-- cut --

Seems interesting...

Jakub Suchy

-- 
Jakub Such? <jakub.suchy at enlogit.cz>
GSM: +420 - 777 817 949

Enlogit s.r.o, U Cukrovaru 509/4, 400 07 ?st? nad Labem
tel.: +420 - 474 745 159, fax: +420 - 474 745 160
e-mail: info at enlogit.cz, web: http://www.enlogit.cz

Energy & Logic in IT



From ccaulfie at redhat.com  Wed May 14 09:04:13 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 14 May 2008 10:04:13 +0100
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <560093.12792.qm@web32208.mail.mud.yahoo.com>
References: <560093.12792.qm@web32208.mail.mud.yahoo.com>
Message-ID: <482AAB0D.2020900@redhat.com>

Ja S wrote:
>>>> If the node doesn't have a local lock on
>>>> the resource then it
>>>> doesn't "know" it and has to ask the directory
>>>> node where it is mastered. 
> 
>>> Does it mean even if the node owns the master lock
>>> resource but it doesn't have a local lock
>>> associated with the master lock resource, it
>>> still needs to ask the directory node?
> 
> 
>> hash tables, hash tables, hash tables ;-)
> 
> Sure. Now I see what do you mean "knows". Thanks.
> 
> Could you please kindly answer my last question above?

The answer is "No" ... because it's in the resource hash table.

... see, I told you it was all hash tables ...



Chrissie



From gordan at bobich.net  Wed May 14 09:05:27 2008
From: gordan at bobich.net (gordan at bobich.net)
Date: Wed, 14 May 2008 10:05:27 +0100 (BST)
Subject: [Linux-cluster] SeznamFS
In-Reply-To: <20080514085941.GA22634@localhost>
References: <20080514085941.GA22634@localhost>
Message-ID: <alpine.LRH.1.10.0805141004200.4462@skynet.shatteredsilicon.net>

Sounds very much like MySQL FS. Is this an update to that project, a 
reinvention of that wheel, or something entirely different?

Gordan

On Wed, 14 May 2008, Jakub Suchy wrote:

> Hi,
> seznam.cz (Czech competitor of Google) has announced it's "SeznamFS" -
> http://seznamfs.sourceforge.net/
>
> -- cut --
> SeznamFS is distributed binlogging filesystem based on FUSE. It works
> similar to MySQL, it creates a binary log containing all write
> operations and provides it to slaves as master. Every server has its own
> server ID and therefore it's possible to use master-master replication
> (with the same limitations as MySQL master-master replication has) or
> multimaster round replication. For more information have look at
> documentation.
> -- cut --
>
> Seems interesting...
>
> Jakub Suchy
>
> -- 
> Jakub Such? <jakub.suchy at enlogit.cz>
> GSM: +420 - 777 817 949
>
> Enlogit s.r.o, U Cukrovaru 509/4, 400 07 ?st? nad Labem
> tel.: +420 - 474 745 159, fax: +420 - 474 745 160
> e-mail: info at enlogit.cz, web: http://www.enlogit.cz
>
> Energy & Logic in IT
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>

From jas199931 at yahoo.com  Wed May 14 09:31:15 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 02:31:15 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <482AAB0D.2020900@redhat.com>
Message-ID: <951183.12102.qm@web32203.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> >>>> If the node doesn't have a local lock on
> >>>> the resource then it
> >>>> doesn't "know" it and has to ask the directory
> >>>> node where it is mastered. 
> > 
> >>> Does it mean even if the node owns the master
> lock
> >>> resource but it doesn't have a local lock
> >>> associated with the master lock resource, it
> >>> still needs to ask the directory node?
> > 
> > 
> >> hash tables, hash tables, hash tables ;-)
> > 
> > Sure. Now I see what do you mean "knows". Thanks.
> > 
> > Could you please kindly answer my last question
> above?
> 
> The answer is "No" ... because it's in the resource
> hash table.
> 
> ... see, I told you it was all hash tables ...
> 

OK. Let's summarise what I have learned from you. If I
am wrong, correct me please.


A node has a hash table (HT1) which hold the master
lock resources and local copies of master lock
resources on remote nodes. It also has another hash
table (HT2) which holds the content of the lock
directory.

When an application on a node A requests a lock on a
file, DLM feeds the inode number of the file into a
hash function and uses the returned hash value to
check whether there is a corresponding lock resource
record in the hash table HT1. If the record exists,
DLM then processes the lock request on the lock
resources (either master or local copy). 

If not, DLM feeds the inode number into another hash
function to obtain a node ID (for example node B)
which holds the master node information of the target
lock resource. DLM then talks with node B and gets the
master node ID (for example node C) from the hash
table HT2 on node B. Finally, DLM gets the target lock
resource from the hash table HT1 on the node C and
processes the lock request.

Am I right this time, or still missing something (a
third hash table?) ?

Best,

Jas
> 
> Chrissie
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      



From fdinitto at redhat.com  Wed May 14 09:31:21 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 14 May 2008 11:31:21 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.99.01 (development snapshot) released
Message-ID: <Pine.LNX.4.64.0805141127180.24155@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The cluster team and its community are proud to announce the 2st release
from the master branch: 2.99.01.

GFS1 is *known to be broken* in this release, do _NOT_ use!

The 2.99.XX releases are _NOT_ meant to be used for production
environments.. yet.

You have been warned: *this code will have no mercy* for your servers and
your data.

The master branch is the main development tree that receives all new
features, code, clean up and a whole brand new set of bugs,

At some point in time this code will become the 3.0 stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 2.99 releases and more important report
problems.

In order to build the 2.99.01 release you will need:

- - openais 0.83 or higher
- - linux kernel (git snapshot or 2.6.26-rc2) from
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
(but can run on 2.6.25 in compatibility mode)

NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We
are still shipping shared libraries but remember that they can change
anytime without warning. A bunch of new shared libraries have been added and
more will come.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.01.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.99.00):

Christine Caulfield (4):
       [CMAN] Remove external dependancies from config modules
       [CMAN] Fix localhost checking that I broke last week.
       [CMAN] make qdisk compile on i386
       [CMAN] fix cman_tool join -X

David Teigland (18):
       fence: fence_tool list and fenced_domain_nodes()
       fence_tool: fix list command
       libdlm: use linux/dlm.h from 2.6.26-rc
       libdlmcontrol: new lib interface to dlm_controld
       dlm_controld: fix build problems in previous commit
       libdlmcontrol: filling out code
       dlm_controld: filling out code
       dlm_controld: code for info/debug queries
       dlm_tool: add libdlmcontrol query commands
       daemons: mostly daemonization stuff
       daemons: queries
       dlm_controld: fix waiting for removed node
       dlm_controld: options to disable fencing/quorum dependency
       dlm_controld: dlm_tool query fixes
       dlm_tool: refine list output
       dlm_controld: remove unworking re-merge detection
       dlm_controld/gfs_controld: ignore write(2) return value on plock dev
       dlm_controld: use started_count to detect remerges

Fabio M. Di Nitto (19):
       [BUILD] Change build system to cope with new libdlmcontrol
       libdlm: fix libdlmcontrol in Makefile
       [CMAN] Do not query ccs as it might not be the right config plugin
       [CCS] Detach dependency on ccsd to run the cluster
       [CCS] Fix build with gcc-4.3
       [CMAN] Set default syslog facility at build time
       [BUILD] Allow users to set path to init.d
       [MISC] Fix build errors with Fedora default build options
       [MISC] Fix more build errors with Fedora default build options
       [MISC] Fix even more build errors with Fedora default build options
       [BUILD] Fix install when building from a separate tree
       [MISC] Fix some gfs2 build warnings
       [BUILD] Require 2.6.26 kernel to build
       [GNBD] Update gnbd to work with 2.6.26
       [GFS] Make gfs build with 2.6.26 (DO NOT USE!)
       [RGMANAGER] ^M's are good for DOS, bad for UNIX
       [BUILD] Move fencelib in /usr/share
       [MISC] Cast some love to init scripts
       [CMAN] Fix path to cman_tool

Lon Hohberger (2):
       [cman] Close sockets in error state in gfs_controld / dlmtest2 / groupd test
       [rgmanager] Fix #441582 - symlinks in mount points causing failures

Marc - A. Dahlhaus (1):
       [MISC] Add version string to -V options of dlm_tool and group deamons

Marek 'marx' Grac (2):
       [FENCE] SSH support using stdin options
       [FENCE] Fix #435154: Support for 24 port APC fencing device

  ccs/Makefile                          |    2 +-
  ccs/ccs_test/Makefile                 |    2 +-
  ccs/ccs_test/ccs_test.c               |   73 +-
  ccs/ccs_tool/Makefile                 |   11 +-
  ccs/ccs_tool/update.c                 |    2 +-
  ccs/ccsais/Makefile                   |   13 +-
  ccs/ccsais/config.c                   |   19 +-
  ccs/daemon/Makefile                   |    7 +-
  ccs/daemon/ccsd.c                     |    3 +-
  ccs/lib/Makefile                      |   40 -
  ccs/lib/ccs.h                         |   25 -
  ccs/lib/libccs.c                      |  764 --------------
  ccs/libccscompat/Makefile             |   37 +
  ccs/libccscompat/libccscompat.c       |  764 ++++++++++++++
  ccs/libccscompat/libccscompat.h       |   29 +
  ccs/libccsconfdb/Makefile             |   56 +
  ccs/libccsconfdb/ccs.h                |   27 +
  ccs/libccsconfdb/libccs.c             |  576 +++++++++++
  cman/cman_tool/Makefile               |    4 +-
  cman/cman_tool/cman_tool.h            |    4 +-
  cman/cman_tool/join.c                 |   21 +-
  cman/cman_tool/main.c                 |   18 +-
  cman/daemon/Makefile                  |    3 +-
  cman/daemon/ais.c                     |    6 +-
  cman/daemon/cman-preconfig.c          |  282 ++++--
  cman/daemon/cman.h                    |    2 +-
  cman/daemon/nodelist.h                |    1 -
  cman/init.d/cman.in                   |   25 +-
  cman/init.d/qdiskd                    |   19 +-
  cman/qdisk/daemon_init.c              |   13 +-
  cman/qdisk/main.c                     |    4 +-
  cman/qdisk/scandisk.c                 |   20 +-
  configure                             |   43 +-
  dlm/Makefile                          |    2 +-
  dlm/lib/51-dlm.rules                  |    4 -
  dlm/lib/Makefile                      |   84 --
  dlm/lib/libaislock.c                  |  468 ---------
  dlm/lib/libaislock.h                  |  190 ----
  dlm/lib/libdlm.c                      | 1541 ----------------------------
  dlm/lib/libdlm.h                      |  296 ------
  dlm/lib/libdlm_internal.h             |    9 -
  dlm/libdlm/51-dlm.rules               |    4 +
  dlm/libdlm/Makefile                   |   84 ++
  dlm/libdlm/libaislock.c               |  468 +++++++++
  dlm/libdlm/libaislock.h               |  190 ++++
  dlm/libdlm/libdlm.c                   | 1540 ++++++++++++++++++++++++++++
  dlm/libdlm/libdlm.h                   |  296 ++++++
  dlm/libdlm/libdlm_internal.h          |    9 +
  dlm/libdlmcontrol/Makefile            |   53 +
  dlm/libdlmcontrol/libdlmcontrol.h     |  108 ++
  dlm/libdlmcontrol/main.c              |  416 ++++++++
  dlm/tests/usertest/Makefile           |    2 +-
  dlm/tests/usertest/dlmtest2.c         |    2 +-
  dlm/tool/Makefile                     |   11 +-
  dlm/tool/main.c                       |  362 ++++++--
  fence/agents/apc/fence_apc.py         |   34 +-
  fence/agents/lib/fencing.py.py        |    4 +-
  fence/agents/rackswitch/do_rack.c     |   20 +-
  fence/agents/scsi/scsi_reserve        |   26 +-
  fence/agents/xvm/fence_xvm.c          |    4 +-
  fence/agents/xvm/fence_xvmd.c         |    6 +-
  fence/agents/xvm/xml.c                |    2 +-
  fence/fence_tool/fence_tool.c         |   95 ++-
  fence/fenced/cpg.c                    |   45 +-
  fence/fenced/fd.h                     |   14 +-
  fence/fenced/fenced.h                 |    6 +-
  fence/fenced/group.c                  |    8 +-
  fence/fenced/main.c                   |  103 ++-
  fence/libfence/agent.c                |    9 +-
  fence/libfenced/libfenced.h           |    7 +-
  fence/libfenced/main.c                |   46 +-
  gfs-kernel/src/gfs/ops_address.c      |    2 +-
  gfs-kernel/src/gfs/ops_super.c        |    6 +-
  gfs-kernel/src/gfs/quota.c            |    4 +-
  gfs2/init.d/gfs2                      |   13 +-
  gfs2/libgfs2/gfs2_log.c               |    5 +-
  gfs2/mkfs/main_mkfs.c                 |    3 +-
  gfs2/mount/mtab.c                     |    5 +-
  gfs2/tool/sb.c                        |    3 +-
  gnbd-kernel/src/gnbd.c                |   91 +-
  gnbd-kernel/src/gnbd.h                |    4 +-
  group/daemon/cman.c                   |    4 +-
  group/daemon/cpg.c                    |    2 +-
  group/daemon/main.c                   |   18 +-
  group/dlm_controld/Makefile           |   10 +-
  group/dlm_controld/action.c           |    2 +-
  group/dlm_controld/config.c           |   10 +
  group/dlm_controld/config.h           |    6 +
  group/dlm_controld/cpg.c              |  390 ++++++--
  group/dlm_controld/deadlock.c         |   10 +-
  group/dlm_controld/dlm_controld.h     |   35 +-
  group/dlm_controld/dlm_daemon.h       |   43 +-
  group/dlm_controld/group.c            |   28 +-
  group/dlm_controld/main.c             |  497 ++++++++--
  group/dlm_controld/plock.c            |   45 +-
  group/gfs_controld/lock_dlm.h         |    1 -
  group/gfs_controld/main.c             |   41 +-
  group/gfs_controld/plock.c            |    6 +-
  group/test/clientd.c                  |    2 +-
  group/tool/main.c                     |   14 +-
  make/defines.mk.input                 |    3 +
  make/install.mk                       |    8 +-
  make/uninstall.mk                     |    2 +-
  rgmanager/init.d/rgmanager.in         |   15 +-
  rgmanager/src/clulib/cman.c           |    6 +-
  rgmanager/src/clulib/daemon_init.c    |   14 +-
  rgmanager/src/clulib/msg_cluster.c    |   26 +-
  rgmanager/src/clulib/msgtest.c        |    3 +-
  rgmanager/src/daemons/clurmtabd_lib.c |    2 +-
  rgmanager/src/daemons/main.c          |    3 +-
  rgmanager/src/resources/ASEHAagent.sh | 1786 ++++++++++++++++----------------
  rgmanager/src/resources/clusterfs.sh  |    2 +-
  rgmanager/src/resources/fs.sh         |    2 +-
  rgmanager/src/resources/netfs.sh      |    2 +-
  114 files changed, 7567 insertions(+), 5090 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIVAwUBSCqxcwgUGcMLQ3qJAQIjKg/5Ae9UJ+cpRoc2szSFhlcLyvoo5plIumjn
lQN1v3+yBPO8ZKw75flqGHkMbVF2fv8UMyHSyoaKiNOxRtwomwguM82nd67kbP2a
k7C4alAa2HzF4qkbtxCoML4TQfY7ZrzYbnY3CPyXSyzCw/GZnIn/JzoglgkKu+Xn
t2DExSo42YDMbE53oQn32iqDZnGbJUEbsB8XD3fH5l/whoGW4cbBAeKgITLuNXDl
c+EwxQt2aU3XyRlAeCv3MqKgRlqzB43OBWBx4qcw1VqRR/OYyO90/5XMoroqyA4m
IdRAFf9Ex7TdrFnopEt+zjfcCvPW3/nk969cbzWVGs31AqTIlbHKT9F/tf8sl6xm
Tm5nD5N+J64Zb7IDKCGOrRarSIydP9bXNDmkYZ4Ak1LAN2eB3w60uR9OLH66ADiS
EaF5hbb5bXuaDIrVBYeLtkja1VgorA1RRcZ6QEKBlrvbaBrbJIPmgpwnD6WwMt5H
03EJ2JK8g5vEOL9z5+ylalR/EJw1DrKwyClsabvLQoIdwnP2urush3rWIaCdj3K9
qeVIBEFz/J6PQCPXbNlzth5pgEs58Hhw+F1i8Z/JJUCEUUDIUaqz6FHE7s5U6c6A
wlR2VJi50e9GJ0oULnZr/ehwlS4u/WknG4GpVvUtWmztOsFZQQaQc49ohSOTQ9Bf
7XzxkkVLxnI=
=RAk5
-----END PGP SIGNATURE-----



From ccaulfie at redhat.com  Wed May 14 09:39:40 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Wed, 14 May 2008 10:39:40 +0100
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <951183.12102.qm@web32203.mail.mud.yahoo.com>
References: <951183.12102.qm@web32203.mail.mud.yahoo.com>
Message-ID: <482AB35C.7090202@redhat.com>

Ja S wrote:
> --- Christine Caulfield <ccaulfie at redhat.com> wrote:
> 
>> Ja S wrote:
>>>>>> If the node doesn't have a local lock on
>>>>>> the resource then it
>>>>>> doesn't "know" it and has to ask the directory
>>>>>> node where it is mastered. 
>>>>> Does it mean even if the node owns the master
>> lock
>>>>> resource but it doesn't have a local lock
>>>>> associated with the master lock resource, it
>>>>> still needs to ask the directory node?
>>>
>>>> hash tables, hash tables, hash tables ;-)
>>> Sure. Now I see what do you mean "knows". Thanks.
>>>
>>> Could you please kindly answer my last question
>> above?
>>
>> The answer is "No" ... because it's in the resource
>> hash table.
>>
>> ... see, I told you it was all hash tables ...
>>
> 
> OK. Let's summarise what I have learned from you. If I
> am wrong, correct me please.
> 
> 
> A node has a hash table (HT1) which hold the master
> lock resources and local copies of master lock
> resources on remote nodes. It also has another hash
> table (HT2) which holds the content of the lock
> directory.
> 
> When an application on a node A requests a lock on a
> file, DLM feeds the inode number of the file into a
> hash function and uses the returned hash value to
> check whether there is a corresponding lock resource
> record in the hash table HT1. If the record exists,
> DLM then processes the lock request on the lock
> resources (either master or local copy). 
> 
> If not, DLM feeds the inode number into another hash
> function to obtain a node ID (for example node B)
> which holds the master node information of the target
> lock resource. DLM then talks with node B and gets the
> master node ID (for example node C) from the hash
> table HT2 on node B. Finally, DLM gets the target lock
> resource from the hash table HT1 on the node C and
> processes the lock request.
> 
> Am I right this time, or still missing something (a
> third hash table?) ?
>

No, that's correct. It's missing a lot of detail, but the overview is fair.

There's a conflation you've done there that is OK for a simplisitic
discussion of GFS but hides an important abstraction.

The DLM does not deal in inode numbers, it only deals in resource names.
The application that uses the DLM (this includes GFS) decides what the
resource names are. GFS uses some system I don't know about but looks
like it might include the inode number. clvmd (for example) uses LV
UUIDs or VG names for its resource names for instance.

These resources are isolated from each other in separate lockspaces.
Lockspace is a mandatory parameter to all locking calls.

Chrissie



From jas199931 at yahoo.com  Wed May 14 09:57:26 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 02:57:26 -0700 (PDT)
Subject: [Linux-cluster] What is the order of processing a lock request?
In-Reply-To: <482AB35C.7090202@redhat.com>
Message-ID: <306033.27076.qm@web32207.mail.mud.yahoo.com>


--- Christine Caulfield <ccaulfie at redhat.com> wrote:

> Ja S wrote:
> > --- Christine Caulfield <ccaulfie at redhat.com>
> wrote:
> > 
> >> Ja S wrote:
> >>>>>> If the node doesn't have a local lock on
> >>>>>> the resource then it
> >>>>>> doesn't "know" it and has to ask the
> directory
> >>>>>> node where it is mastered. 
> >>>>> Does it mean even if the node owns the master
> >> lock
> >>>>> resource but it doesn't have a local lock
> >>>>> associated with the master lock resource, it
> >>>>> still needs to ask the directory node?
> >>>
> >>>> hash tables, hash tables, hash tables ;-)
> >>> Sure. Now I see what do you mean "knows".
> Thanks.
> >>>
> >>> Could you please kindly answer my last question
> >> above?
> >>
> >> The answer is "No" ... because it's in the
> resource
> >> hash table.
> >>
> >> ... see, I told you it was all hash tables ...
> >>
> > 
> > OK. Let's summarise what I have learned from you.
> If I
> > am wrong, correct me please.
> > 
> > 
> > A node has a hash table (HT1) which hold the
> master
> > lock resources and local copies of master lock
> > resources on remote nodes. It also has another
> hash
> > table (HT2) which holds the content of the lock
> > directory.
> > 
> > When an application on a node A requests a lock on
> a
> > file, DLM feeds the inode number of the file into
> a
> > hash function and uses the returned hash value to
> > check whether there is a corresponding lock
> resource
> > record in the hash table HT1. If the record
> exists,
> > DLM then processes the lock request on the lock
> > resources (either master or local copy). 
> > 
> > If not, DLM feeds the inode number into another
> hash
> > function to obtain a node ID (for example node B)
> > which holds the master node information of the
> target
> > lock resource. DLM then talks with node B and gets
> the
> > master node ID (for example node C) from the hash
> > table HT2 on node B. Finally, DLM gets the target
> lock
> > resource from the hash table HT1 on the node C and
> > processes the lock request.
> > 
> > Am I right this time, or still missing something
> (a
> > third hash table?) ?
> >
> 
> No, that's correct. It's missing a lot of detail,
> but the overview is fair.
> 
> There's a conflation you've done there that is OK
> for a simplisitic
> discussion of GFS but hides an important
> abstraction.
> 
> The DLM does not deal in inode numbers, it only
> deals in resource names.
> The application that uses the DLM (this includes
> GFS) decides what the
> resource names are. GFS uses some system I don't
> know about but looks
> like it might include the inode number. clvmd (for
> example) uses LV
> UUIDs or VG names for its resource names for
> instance.
> 
> These resources are isolated from each other in
> separate lockspaces.
> Lockspace is a mandatory parameter to all locking
> calls.

Clear. Great thanks to your detailed explanation.

All the best,

Jas



      



From jas199931 at yahoo.com  Wed May 14 12:23:29 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 05:23:29 -0700 (PDT)
Subject: [Linux-cluster] which journaling file system is used in GFS? 
Message-ID: <314708.36071.qm@web32207.mail.mud.yahoo.com>

Hi, All:

>From some online articles, in ext3, there are journal,
ordered, and writeback three types of journaling file
systems. Also in ext3, we can attach  the journaling
file system  to the journal block device located on a
different partition. 

I have not yet found related information for GFS.

My questions are:

1. Does GFS also support the three types of journaling
file systems? If not, what journaling file system is
used in GFS?

2. What command I can use to find out which type of
journaling file system is used in the existing GFS
file sytem?

3. When updating journal files, does DLM process locks
on the journal files as well?

4. Can I attach  the journaling file system  to the
journal block device located on a different LUN for
GFS (just like ext3)?

Thanks in advance,

Jas



      



From mpartio at gmail.com  Wed May 14 12:32:34 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Wed, 14 May 2008 15:32:34 +0300
Subject: [Linux-cluster] kmod-gfs removed
Message-ID: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com>

Hello

the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package.
What's up with this?

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080514/300253e5/attachment.htm>

From s.wendy.cheng at gmail.com  Wed May 14 15:01:16 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Wed, 14 May 2008 11:01:16 -0400
Subject: [Linux-cluster] which journaling file system is used in GFS?
In-Reply-To: <314708.36071.qm@web32207.mail.mud.yahoo.com>
References: <314708.36071.qm@web32207.mail.mud.yahoo.com>
Message-ID: <482AFEBC.90707@gmail.com>

Ja S wrote:
> Hi, All:
>
> >From some online articles, in ext3, there are journal,
> ordered, and writeback three types of journaling file
> systems. Also in ext3, we can attach  the journaling
> file system  to the journal block device located on a
> different partition. 
>   

GFS *is* a journaling filesystem, same as EXT3. All journaling 
filesystem has journal(s) which is (are) almost an equivalence of 
database logging. The internal logic of journaling could be different 
and we call it journaling "mode".
> I have not yet found related information for GFS.
>
> My questions are:
>
> 1. Does GFS also support the three types of journaling
> file systems? If not, what journaling file system is
> used in GFS?
>   
So please don't use "journaling file system" to describe journal. 
Practically, GFS has only one type of journaling (write-back) but it 
supports data journaling thru "gfs_tool setflag" command (see "man 
gfs_tool). GFS2 has improved this by moving the "setflag" command into 
mount command (so it is less confusing) and has been designed to use 
three journaling modes (write-back, order-write, and data journaling, 
with order-write as its default). It (GFS2), however, doesn't allow 
external journaling devices yet.

I understand moving ext3 journal into an external device and/or moving 
journaling mode from its default (order write) into "write back" can 
significantly lift its performance. These tricks can *not* be applied to 
GFS.

-- Wendy



From lhh at redhat.com  Wed May 14 16:29:40 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 14 May 2008 12:29:40 -0400
Subject: [Linux-cluster] Oracle Shared-Nothing
In-Reply-To: <b6131fdc0805122351x407b81eex8c2c393e6fc3a923@mail.gmail.com>
References: <b6131fdc0805120314p364773cdh6d445e52e0d02fc3@mail.gmail.com>
	<1210612368.10406.50.camel@ayanami.boston.devel.redhat.com>
	<b6131fdc0805122351x407b81eex8c2c393e6fc3a923@mail.gmail.com>
Message-ID: <1210782580.13237.48.camel@ayanami.boston.devel.redhat.com>

On Tue, 2008-05-13 at 07:51 +0100, Stephen Nelson-Smith wrote:

> The client is dead set against a RAID array, partly on cost (budget
> v.tight), but also on physical space in the rack - there's only 2U
> left, and a new rack costs ?1000 pcm.

Someday....  =)

However, I can't attest to the stability of Oracle on DRBD.  I would try
with an evaluation license or developer license of Oracle 10g first
before deploying in production.

-- Lon



From jas199931 at yahoo.com  Wed May 14 20:44:49 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 13:44:49 -0700 (PDT)
Subject: [Linux-cluster] which journaling file system is used in GFS?
In-Reply-To: <482AFEBC.90707@gmail.com>
Message-ID: <240117.69840.qm@web32202.mail.mud.yahoo.com>


--- Wendy Cheng <s.wendy.cheng at gmail.com> wrote:

> Ja S wrote:
> > Hi, All:
> >
> > >From some online articles, in ext3, there are
> journal,
> > ordered, and writeback three types of journaling
> file
> > systems. Also in ext3, we can attach  the
> journaling
> > file system  to the journal block device located
> on a
> > different partition. 
> >   
> 
> GFS *is* a journaling filesystem, same as EXT3. All
> journaling 
> filesystem has journal(s) which is (are) almost an
> equivalence of 
> database logging. The internal logic of journaling
> could be different 
> and we call it journaling "mode".
> > I have not yet found related information for GFS.
> >
> > My questions are:
> >
> > 1. Does GFS also support the three types of
> journaling
> > file systems? If not, what journaling file system
> is
> > used in GFS?
> >   
> So please don't use "journaling file system" to
> describe journal. 
> Practically, GFS has only one type of journaling
> (write-back) but it 
> supports data journaling thru "gfs_tool setflag"
> command (see "man 
> gfs_tool). GFS2 has improved this by moving the
> "setflag" command into 
> mount command (so it is less confusing) and has been
> designed to use 
> three journaling modes (write-back, order-write, and
> data journaling, 
> with order-write as its default). It (GFS2),
> however, doesn't allow 
> external journaling devices yet.
> 
> I understand moving ext3 journal into an external
> device and/or moving 
> journaling mode from its default (order write) into
> "write back" can 
> significantly lift its performance. These tricks can
> *not* be applied to 
> GFS.

Thank you very much indeed for the clarification.

Jas



      



From lhh at redhat.com  Wed May 14 21:52:40 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 14 May 2008 17:52:40 -0400
Subject: [Linux-cluster] Complete cluster.conf Schema Description
In-Reply-To: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com>
References: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com>
Message-ID: <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-05-13 at 17:26 -0400, Kelsey Hightower wrote:
> Hello,
> 
> 
> I have been searching the web for weeks. I am trying to get the
> complete cluster.conf schema description. I have found a link that
> describes most of the options but it seems to omit the resources,
> services, and anything related to configuring failover services.
> 
> 
> http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html

Try this:

  http://people.redhat.com/lhh/ra-info-0.1.tar.gz

$ sha256sum ra-info-0.1.tar.gz 
7c34a082cc88d9f976a544b3e19be56ef8cf73a1a8c395178b40f16e0f16ad5d
ra-info-0.1.tar.gz

It's a simple XSLT program that translates Resource Agent metadata to
HTML and fires up a text web-browser to look at it.

There's probably lots of typos in the RA metadata.  Feedback is
appreciated.

Basically;

  tar -xzvf ra-info-0.1.tar.gz
  cd ra-info-0.1
  ./ra-info /usr/share/cluster/service.sh  [or whatever agent you want]

I'll generate web pages for all the agents later.  This is a start,
though... :)

-- Lon




From lhh at redhat.com  Wed May 14 22:00:32 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 14 May 2008 18:00:32 -0400
Subject: [Linux-cluster] Complete cluster.conf Schema Description
In-Reply-To: <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com>
References: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com>
	<1210801960.13237.56.camel@ayanami.boston.devel.redhat.com>
Message-ID: <1210802432.13237.58.camel@ayanami.boston.devel.redhat.com>


On Wed, 2008-05-14 at 17:52 -0400, Lon Hohberger wrote:
> On Tue, 2008-05-13 at 17:26 -0400, Kelsey Hightower wrote:
> > Hello,
> > 
> > 
> > I have been searching the web for weeks. I am trying to get the
> > complete cluster.conf schema description. I have found a link that
> > describes most of the options but it seems to omit the resources,
> > services, and anything related to configuring failover services.
> > 
> > 
> > http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html
> 
> Try this:
> 
>   http://people.redhat.com/lhh/ra-info-0.1.tar.gz
> 
> $ sha256sum ra-info-0.1.tar.gz 
> 7c34a082cc88d9f976a544b3e19be56ef8cf73a1a8c395178b40f16e0f16ad5d
> ra-info-0.1.tar.gz

Had a bug - try 0.2:

  http://people.redhat.com/lhh/ra-info-0.2.tar.gz

$ sha256sum ra-info-0.2.tar.gz 
6d8a40ae8a6a4006406ff07186331f22279e9aa796f0bedbcd592f6d7c62e856
ra-info-0.2.tar.gz


Example output looks like this:

  http://people.redhat.com/lhh/service.sh.html

-- Lon




From jas199931 at yahoo.com  Wed May 14 22:04:04 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 15:04:04 -0700 (PDT)
Subject: [Linux-cluster] When using GFS+DLM,
	will DLM manage the locks on journals?
In-Reply-To: <482AFEBC.90707@gmail.com>
Message-ID: <647561.52231.qm@web32208.mail.mud.yahoo.com>

Hi, All:

Just want to get a clarification.

When using GFS+DLM, will the locks of journals be
managed also by DLM in the same way as that for normal
data files?

Thanks in advance.

Jas


      



From gordan at bobich.net  Wed May 14 22:21:56 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 14 May 2008 23:21:56 +0100
Subject: [Linux-cluster] When using GFS+DLM,	will DLM manage the locks
	on journals?
In-Reply-To: <647561.52231.qm@web32208.mail.mud.yahoo.com>
References: <647561.52231.qm@web32208.mail.mud.yahoo.com>
Message-ID: <482B6604.7020608@bobich.net>

Ja S wrote:
> Hi, All:
> 
> Just want to get a clarification.
> 
> When using GFS+DLM, will the locks of journals be
> managed also by DLM in the same way as that for normal
> data files?

My understanding is that there is no locking on the journals across 
nodes except when a node gets fenced and it's journal needs to be 
replayed to ensure data is consistent. Each node has it's own journal.

Gordan



From jas199931 at yahoo.com  Wed May 14 22:27:55 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 15:27:55 -0700 (PDT)
Subject: [Linux-cluster] When using GFS+DLM,
	will DLM manage the locks on journals?
In-Reply-To: <482B6604.7020608@bobich.net>
Message-ID: <217848.54530.qm@web32202.mail.mud.yahoo.com>


--- Gordan Bobic <gordan at bobich.net> wrote:

> Ja S wrote:
> > Hi, All:
> > 
> > Just want to get a clarification.
> > 
> > When using GFS+DLM, will the locks of journals be
> > managed also by DLM in the same way as that for
> normal
> > data files?
> 
> My understanding is that there is no locking on the
> journals across 
> nodes except when a node gets fenced and it's
> journal needs to be 
> replayed to ensure data is consistent. Each node has
> it's own journal.

Thanks. Then what this "can't acquire the journal
glock:" error is about?

Jas

> Gordan
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 



      



From gordan at bobich.net  Wed May 14 22:32:21 2008
From: gordan at bobich.net (Gordan Bobic)
Date: Wed, 14 May 2008 23:32:21 +0100
Subject: [Linux-cluster] When using GFS+DLM,	will DLM manage the locks
	on journals?
In-Reply-To: <217848.54530.qm@web32202.mail.mud.yahoo.com>
References: <217848.54530.qm@web32202.mail.mud.yahoo.com>
Message-ID: <482B6875.8090401@bobich.net>

Ja S wrote:
> --- Gordan Bobic <gordan at bobich.net> wrote:
> 
>> Ja S wrote:
>>> Hi, All:
>>>
>>> Just want to get a clarification.
>>>
>>> When using GFS+DLM, will the locks of journals be
>>> managed also by DLM in the same way as that for
>> normal
>>> data files?
>> My understanding is that there is no locking on the
>> journals across 
>> nodes except when a node gets fenced and it's
>> journal needs to be 
>> replayed to ensure data is consistent. Each node has
>> it's own journal.
> 
> Thanks. Then what this "can't acquire the journal
> glock:" error is about?

I think the journals are allocated on a first-come first-served basis to 
the nodes as they connect to the shared storage. Each node locks it's 
own journal to ensure that it is marked as "in use". That's why you'll 
see that message at mount time. But I don't think there is any journal 
locking going on under normal operation.

Gordan



From cfeist at redhat.com  Wed May 14 22:33:24 2008
From: cfeist at redhat.com (Chris Feist)
Date: Wed, 14 May 2008 17:33:24 -0500
Subject: [Linux-cluster] kmod-gfs removed
In-Reply-To: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com>
References: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com>
Message-ID: <482B68B4.1010402@redhat.com>

Mikko Partio wrote:
> Hello
> 
> the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package.
> What's up with this?

How does this happen?  When you're doing a 'yum update', just install the 
kernel?  Can you post the output of the commands that try to remove kmod-gfs?

Thanks,
Chris

> 
> Regards
> 
> Mikko
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From kelsey.hightower at gmail.com  Wed May 14 23:38:20 2008
From: kelsey.hightower at gmail.com (Kelsey Hightower)
Date: Wed, 14 May 2008 19:38:20 -0400
Subject: [Linux-cluster] Complete cluster.conf Schema Description
In-Reply-To: <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com>
References: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com>
	<1210801960.13237.56.camel@ayanami.boston.devel.redhat.com>
Message-ID: <1242753f0805141638v5b2c42f0q5c3cc0910fa4f6a4@mail.gmail.com>

This was what I was looking for, thanks a lot.

On Wed, May 14, 2008 at 5:52 PM, Lon Hohberger <lhh at redhat.com> wrote:

>
> On Tue, 2008-05-13 at 17:26 -0400, Kelsey Hightower wrote:
> > Hello,
> >
> >
> > I have been searching the web for weeks. I am trying to get the
> > complete cluster.conf schema description. I have found a link that
> > describes most of the options but it seems to omit the resources,
> > services, and anything related to configuring failover services.
> >
> >
> > http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html
>
> Try this:
>
>  http://people.redhat.com/lhh/ra-info-0.1.tar.gz
>
> $ sha256sum ra-info-0.1.tar.gz
> 7c34a082cc88d9f976a544b3e19be56ef8cf73a1a8c395178b40f16e0f16ad5d
> ra-info-0.1.tar.gz
>
> It's a simple XSLT program that translates Resource Agent metadata to
> HTML and fires up a text web-browser to look at it.
>
> There's probably lots of typos in the RA metadata.  Feedback is
> appreciated.
>
> Basically;
>
>  tar -xzvf ra-info-0.1.tar.gz
>  cd ra-info-0.1
>  ./ra-info /usr/share/cluster/service.sh  [or whatever agent you want]
>
> I'll generate web pages for all the agents later.  This is a start,
> though... :)
>
> -- Lon
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080514/d5b1d26d/attachment.htm>

From jas199931 at yahoo.com  Thu May 15 00:27:37 2008
From: jas199931 at yahoo.com (Ja S)
Date: Wed, 14 May 2008 17:27:37 -0700 (PDT)
Subject: [Linux-cluster] When using GFS+DLM,
	will DLM manage the locks on journals?
In-Reply-To: <482B6875.8090401@bobich.net>
Message-ID: <852258.61230.qm@web32208.mail.mud.yahoo.com>


--- Gordan Bobic <gordan at bobich.net> wrote:

> Ja S wrote:
> > --- Gordan Bobic <gordan at bobich.net> wrote:
> > 
> >> Ja S wrote:
> >>> Hi, All:
> >>>
> >>> Just want to get a clarification.
> >>>
> >>> When using GFS+DLM, will the locks of journals
> be
> >>> managed also by DLM in the same way as that for
> >> normal
> >>> data files?
> >> My understanding is that there is no locking on
> the
> >> journals across 
> >> nodes except when a node gets fenced and it's
> >> journal needs to be 
> >> replayed to ensure data is consistent. Each node
> has
> >> it's own journal.
> > 
> > Thanks. Then what this "can't acquire the journal
> > glock:" error is about?
> 
> I think the journals are allocated on a first-come
> first-served basis to 
> the nodes as they connect to the shared storage.
> Each node locks it's 
> own journal to ensure that it is marked as "in use".
> That's why you'll 
> see that message at mount time. But I don't think
> there is any journal 
> locking going on under normal operation.

Great thanks again.

Jas


      



From mpartio at gmail.com  Thu May 15 05:43:14 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Thu, 15 May 2008 08:43:14 +0300
Subject: [Linux-cluster] kmod-gfs removed
In-Reply-To: <482B68B4.1010402@redhat.com>
References: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com>
	<482B68B4.1010402@redhat.com>
Message-ID: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>

On Thu, May 15, 2008 at 1:33 AM, Chris Feist <cfeist at redhat.com> wrote:

> Mikko Partio wrote:
>
>> Hello
>>
>> the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package.
>> What's up with this?
>>
>
> How does this happen?  When you're doing a 'yum update', just install the
> kernel?  Can you post the output of the commands that try to remove
> kmod-gfs?
>

sh-3.1# uname -a
Linux node1 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008 x86_64
x86_64 x86_64 GNU/Linux

sh-3.1# yum check-update

kernel.x86_64                            2.6.18-53.1.19.el5     updates
kernel-devel.x86_64                      2.6.18-53.1.19.el5     updates
kernel-headers.x86_64                    2.6.18-53.1.19.el5     updates

sh-3.1# yum update

=============================================================================
 Package                 Arch       Version          Repository        Size
=============================================================================
Installing:
 kernel                  x86_64     2.6.18-53.1.19.el5  updates
15 M
 kernel-devel            x86_64     2.6.18-53.1.19.el5  updates
4.9 M
Updating:
 kernel-headers          x86_64     2.6.18-53.1.19.el5  updates
822 k
Removing:
 kernel                  x86_64     2.6.18-8.1.15.el5  installed          72
M
 kernel-devel            x86_64     2.6.18-8.1.15.el5  installed          15
M
Removing for dependencies:
 kmod-gfs                x86_64     0.1.16-6.2.6.18_8.1.15.el5
installed         466 k

Transaction Summary
=============================================================================
Install      2 Package(s)
Update       1 Package(s)
Remove       3 Package(s)

Total download size: 21 M
Is this ok [y/N]: N
Exiting on user Command


Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080515/8491d234/attachment.htm>

From Alain.Moulle at bull.net  Thu May 15 07:22:57 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 15 May 2008 09:22:57 +0200
Subject: [Linux-cluster] Meaning of checkinterval in cluster.conf
Message-ID: <482BE4D1.8010609@bull.net>

Hi

I don't remember the meaning of checkinterval value in
service record in cluster.conf with regard to the monitor
and status values in script.sh ?

Thanks
Regards
Alain Moull?



From cfeist at redhat.com  Thu May 15 11:44:28 2008
From: cfeist at redhat.com (Chris Feist)
Date: Thu, 15 May 2008 06:44:28 -0500
Subject: [Linux-cluster] kmod-gfs removed
In-Reply-To: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>
References: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com>	
	<482B68B4.1010402@redhat.com>
	<2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>
Message-ID: <482C221C.1020102@redhat.com>

Mikko Partio wrote:
> On Thu, May 15, 2008 at 1:33 AM, Chris Feist <cfeist at redhat.com> wrote:
> 
>> Mikko Partio wrote:
>>
>>> Hello
>>>
>>> the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package.
>>> What's up with this?
>>>
>> How does this happen?  When you're doing a 'yum update', just install the
>> kernel?  Can you post the output of the commands that try to remove
>> kmod-gfs?
>>
> 
> sh-3.1# uname -a
> Linux node1 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008 x86_64
> x86_64 x86_64 GNU/Linux
> 
> sh-3.1# yum check-update
> 
> kernel.x86_64                            2.6.18-53.1.19.el5     updates
> kernel-devel.x86_64                      2.6.18-53.1.19.el5     updates
> kernel-headers.x86_64                    2.6.18-53.1.19.el5     updates
> 
> sh-3.1# yum update
> 
> =============================================================================
>  Package                 Arch       Version          Repository        Size
> =============================================================================
> Installing:
>  kernel                  x86_64     2.6.18-53.1.19.el5  updates
> 15 M
>  kernel-devel            x86_64     2.6.18-53.1.19.el5  updates
> 4.9 M
> Updating:
>  kernel-headers          x86_64     2.6.18-53.1.19.el5  updates
> 822 k
> Removing:
>  kernel                  x86_64     2.6.18-8.1.15.el5  installed          72
> M
>  kernel-devel            x86_64     2.6.18-8.1.15.el5  installed          15
> M
> Removing for dependencies:
>  kmod-gfs                x86_64     0.1.16-6.2.6.18_8.1.15.el5


You have an old kmod-gfs, you should upgrade to the latest one (which doesn't 
depend on a specific kernel, but it depends on a specific kABI).



From rpeterso at redhat.com  Thu May 15 13:23:10 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Thu, 15 May 2008 08:23:10 -0500
Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks
	on journals?
In-Reply-To: <852258.61230.qm@web32208.mail.mud.yahoo.com>
References: <852258.61230.qm@web32208.mail.mud.yahoo.com>
Message-ID: <1210857790.21738.8.camel@technetium.msp.redhat.com>

On Wed, 2008-05-14 at 17:27 -0700, Ja S wrote:
> > >> My understanding is that there is no locking on
> > the
> > >> journals across 
> > >> nodes except when a node gets fenced and it's
> > >> journal needs to be 
> > >> replayed to ensure data is consistent. Each node
> > has
> > >> it's own journal.

Hi,

The journals in GFS are special files with cluster-wide locks
("glocks"), so inter-node locking still applies.  IIRC, all
nodes keep a "read" lock on all the journals.  However,
every node is assigned a primary journal and uses that journal
only, under a "write" lock which means there is no lock
contention except during recovery situations where a journal
has to be replayed.

Regards,

Bob Peterson
Red Hat Clustering & GFS




From Alain.Moulle at bull.net  Thu May 15 14:21:00 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 15 May 2008 16:21:00 +0200
Subject: [Linux-cluster] CS5 / About qdisk parameters
Message-ID: <482C46CC.5050306@bull.net>

Hi Lon

Thans again, but that's strange because in the man , the recommended
values are :
intervall="1" tko="10" and so we have a result < 21s which is the
default value of heart-beat timer, so not a hair above like you
recommened in previous email ...
extract of man qddisk :

         interval="1"
            This is the frequency of read/write cycles, in seconds.

         tko="10"
            This  is  the  number  of  cycles  a node must miss in order to be
            declared dead.

?

So the better values to match with the default heart-beat timeout of 21s should
be :

interval="2" and tko="11"

right ?

Thanks
Regards
Alain Moull?



From lhh at redhat.com  Thu May 15 15:07:50 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 15 May 2008 11:07:50 -0400
Subject: [Linux-cluster] CS5 / About qdisk parameters
In-Reply-To: <482C46CC.5050306@bull.net>
References: <482C46CC.5050306@bull.net>
Message-ID: <1210864070.13237.65.camel@ayanami.boston.devel.redhat.com>


On Thu, 2008-05-15 at 16:21 +0200, Alain Moulle wrote:
> Hi Lon
> 
> Thans again, but that's strange because in the man , the recommended
> values are :
> intervall="1" tko="10" and so we have a result < 21s which is the
> default value of heart-beat timer, so not a hair above like you
> recommened in previous email ...
> extract of man qddisk :
> 
>          interval="1"
>             This is the frequency of read/write cycles, in seconds.
> 
>          tko="10"
>             This  is  the  number  of  cycles  a node must miss in order to be
>             declared dead.
> 
> ?
> 
> So the better values to match with the default heart-beat timeout of 21s should
> be :
> 
> interval="2" and tko="11"
> 
> right ?

Yes, but you don't want to match it.

You want qdisk to timeout before CMAN with enough time so that ifthe
qdisk master node dies, there is enough time to elect a new master
*before* CMAN would normally transition.

On RHEL4, the default CMAN timeout is 21 seconds.

On RHEL5, it's 5 seconds - which must be tweaked currently using the
totem <token ... > parameter.

I intend to make qdiskd automatically detect the CMAN death detection
time in the near future and automatically configure itself, because this
is something users/administrators just *shouldn't* have to deal with...

(Does anyone disagree with that? :) )

Anyway, here's a graphical representation as to why qdiskd needs to time
out (long) before CMAN:

http://people.redhat.com/lhh/cmanvsqdisk.png

-- Lon



From lhh at redhat.com  Thu May 15 15:09:59 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 15 May 2008 11:09:59 -0400
Subject: [Linux-cluster] CS5 / About qdisk parameters
In-Reply-To: <1210864070.13237.65.camel@ayanami.boston.devel.redhat.com>
References: <482C46CC.5050306@bull.net>
	<1210864070.13237.65.camel@ayanami.boston.devel.redhat.com>
Message-ID: <1210864199.13237.68.camel@ayanami.boston.devel.redhat.com>

On Thu, 2008-05-15 at 11:07 -0400, Lon Hohberger wrote:

> Anyway, here's a graphical representation as to why qdiskd needs to time
> out (long) before CMAN:
> 
> http://people.redhat.com/lhh/cmanvsqdisk.png

Hrm, on a second look, the timing isn't 100% accurate there.  However,
the reasoning is.

-- Lon



From lhh at redhat.com  Thu May 15 15:13:42 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 15 May 2008 11:13:42 -0400
Subject: [Linux-cluster] Meaning of checkinterval in cluster.conf
In-Reply-To: <482BE4D1.8010609@bull.net>
References: <482BE4D1.8010609@bull.net>
Message-ID: <1210864422.13237.72.camel@ayanami.boston.devel.redhat.com>


On Thu, 2008-05-15 at 09:22 +0200, Alain Moulle wrote:
> Hi
> 
> I don't remember the meaning of checkinterval value in
> service record in cluster.conf with regard to the monitor
> and status values in script.sh ?

In RHEL3 clumanager, checkinterval was the frequency the entire service
was checked.

In rgmanager, check times are per-resource at a minimum granularity of
10 seconds.

There's no such parameter in script.sh/service.sh/etc.

If you change the <action name="status" interval="XXX">, it will check
more or less frequently.

You can do the same thing by adding <action name="status"
interval="xxx"/> as a child of the service in the resource tree.

(I need to put that on the ResourceTrees page on the wiki).

-- Lon



From kpodesta at redbrick.dcu.ie  Thu May 15 17:34:15 2008
From: kpodesta at redbrick.dcu.ie (Karl Podesta)
Date: Thu, 15 May 2008 18:34:15 +0100
Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node
Message-ID: <20080515173415.GA25881@minerva.redbrick.dcu.ie>

Hi folks, 

I've just added a 3rd node to a live RHEL 3 cluster (RHEL 3 Update 7), 
which was added successfully. But on the third node when I run clustat, 
I get the message "No Quorum - Service States Unknown". The other two
nodes are running fine and clustat displays all services. A message on 
one of the other nodes from /var/log/messages gives:

cluquorumd[8049]: <warning> Dropping connect from (node3-IP): Unauthorized

Seems like the other two nodes are rejecting the third's advances
towards joining quorum. Is there anything I can do about this? 
Would appreciate any pointers, I couldn't find an answer in the archives
(I notice the question was asked before too). The three nodes are listed
as "Active" in clustat on all nodes, but the third obviously just can't
join the quorum, despite reboot of the third node. 

Thanks & regards, 
Karl

-- 
Karl Podesta
Systems Engineer, Securelinx Ltd., Ireland
http://www.securelinx.com/



From orkcu at yahoo.com  Thu May 15 18:40:31 2008
From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=)
Date: Thu, 15 May 2008 11:40:31 -0700 (PDT)
Subject: [Linux-cluster] kmod-gfs removed
In-Reply-To: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>
Message-ID: <81184.14267.qm@web50601.mail.re2.yahoo.com>




--- On Thu, 5/15/08, Mikko Partio <mpartio at gmail.com> wrote:

> From: Mikko Partio <mpartio at gmail.com>
> Subject: Re: [Linux-cluster] kmod-gfs removed
> To: "Chris Feist" <cfeist at redhat.com>
> Cc: "linux clustering" <linux-cluster at redhat.com>
> Received: Thursday, May 15, 2008, 1:43 AM
> On Thu, May 15, 2008 at 1:33 AM, Chris Feist
> <cfeist at redhat.com> wrote:
> 
> > Mikko Partio wrote:
> >
> >> Hello
> >>
> >> the latest kernel patch for RHEL 5.1 wants to
> remove kmod-gfs -package.
> >> What's up with this?
> >>
> >
> > How does this happen?  When you're doing a
> 'yum update', just install the
> > kernel?  Can you post the output of the commands that
> try to remove
> > kmod-gfs?
> >
> 
> sh-3.1# uname -a
> Linux node1 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38
> EST 2008 x86_64
> x86_64 x86_64 GNU/Linux
> 
> sh-3.1# yum check-update
> 
> kernel.x86_64                            2.6.18-53.1.19.el5
>     updates
> kernel-devel.x86_64                      2.6.18-53.1.19.el5
>     updates
> kernel-headers.x86_64                    2.6.18-53.1.19.el5
>     updates
> 
> sh-3.1# yum update
> 
> =============================================================================
>  Package                 Arch       Version         
> Repository        Size
> =============================================================================
> Installing:
>  kernel                  x86_64     2.6.18-53.1.19.el5 
> updates
> 15 M
>  kernel-devel            x86_64     2.6.18-53.1.19.el5 
> updates
> 4.9 M
> Updating:
>  kernel-headers          x86_64     2.6.18-53.1.19.el5 
> updates
> 822 k
> Removing:
>  kernel                  x86_64     2.6.18-8.1.15.el5 
> installed          72
> M
>  kernel-devel            x86_64     2.6.18-8.1.15.el5 
> installed          15
> M
> Removing for dependencies:
>  kmod-gfs                x86_64    
> 0.1.16-6.2.6.18_8.1.15.el5
> installed         466 k

yum try to remove kmod-gfs because its depende of the kernel version that its trying to remove, which is not right because you are trying to update a kernel and it should means just install the package without remove any old versions.
or do you change the default configuration of yum?

cu
roger





      __________________________________________________________________
Looking for the perfect gift? Give the gift of Flickr! 

http://www.flickr.com/gift/



From lhh at redhat.com  Thu May 15 21:06:16 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Thu, 15 May 2008 17:06:16 -0400
Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node
In-Reply-To: <20080515173415.GA25881@minerva.redbrick.dcu.ie>
References: <20080515173415.GA25881@minerva.redbrick.dcu.ie>
Message-ID: <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com>


On Thu, 2008-05-15 at 18:34 +0100, Karl Podesta wrote:
> Hi folks, 
> 
> I've just added a 3rd node to a live RHEL 3 cluster (RHEL 3 Update 7), 
> which was added successfully. But on the third node when I run clustat, 
> I get the message "No Quorum - Service States Unknown". The other two
> nodes are running fine and clustat displays all services. A message on 
> one of the other nodes from /var/log/messages gives:
> 
> cluquorumd[8049]: <warning> Dropping connect from (node3-IP): Unauthorized
> 
> Seems like the other two nodes are rejecting the third's advances
> towards joining quorum. Is there anything I can do about this? 
> Would appreciate any pointers, I couldn't find an answer in the archives
> (I notice the question was asked before too). The three nodes are listed
> as "Active" in clustat on all nodes, but the third obviously just can't
> join the quorum, despite reboot of the third node. 

The md5sum of /etc/cluster.xml is the same for all nodes, right?

-- Lon



From paul at huffingtonpost.com  Thu May 15 21:44:48 2008
From: paul at huffingtonpost.com (Paul Berry)
Date: Thu, 15 May 2008 17:44:48 -0400
Subject: [Linux-cluster] GFS in High Traffic ?
Message-ID: <38435f290805151444n11ad8168m720be465c2e09f2a@mail.gmail.com>

Hey guys - we're struggling with a GFS setup to get our 8 high traffic
servers onto a NEXSAN SataBoy so that we can leave our RSYNC process
which we've pushed to the extents of its capacity

We don't have all that much data, its less then  1TB total. The trick
is that these files get requested simultaneously under pretty
significant load. And as soon as we get 3 or 4 servers mounted to the
SAN we get melt-downs.

We also struggled today with one server messing with the journals and
taking down the other servers that were looking at the SAN (disaster).

The broad question I'd love to hear - is GFS a good solution to get
into for a situation like this?

I'd love to hear thoughts on this, and suggestions on the right path
is this seems like the wrong one

Best regards,
Pau



From Alain.Moulle at bull.net  Fri May 16 07:56:14 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Fri, 16 May 2008 09:56:14 +0200
Subject: [Linux-cluster] Re: CS5 / About qdisk parameters
Message-ID: <482D3E1E.5030705@bull.net>

Hi Lon

Sorry Lon, but it is not completely clear again for me ... :

when you write that default cman timeout on RHEL5 is 5 seconds, you
mean that the heart-beat timeout is 5s ? whereas each hello message is
sent every 5s too ?
And the totem in cluster.conf to modify it was in my understanding the
"deadnode_timer" in the cman record ... what is the "token" you mention ?

And finally, my would be to set deadnode_timer="21s" for cman and to keep
interval="1" and tko="10" for quorum disk.

Just a precision, it on a only two nodes cluster with quorum disk.

Thanks to confirm these points.

Regards
Alain Moull?

> Yes, but you don't want to match it.

> You want qdisk to timeout before CMAN with enough time so that ifthe
> qdisk master node dies, there is enough time to elect a new master
> *before* CMAN would normally transition.

> On RHEL4, the default CMAN timeout is 21 seconds.

> On RHEL5, it's 5 seconds - which must be tweaked currently using the
> totem <token ... > parameter.

> I intend to make qdiskd automatically detect the CMAN death detection
> time in the near future and automatically configure itself, because this
> is something users/administrators just *shouldn't* have to deal with...

> (Does anyone disagree with that?   :)   )

> Anyway, here's a graphical representation as to why qdiskd needs to time
> out (long) before CMAN:

> http://people.redhat.com/lhh/cmanvsqdisk.png

> -- Lon



From lhh at redhat.com  Fri May 16 19:11:33 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Fri, 16 May 2008 15:11:33 -0400
Subject: [Linux-cluster] Re: CS5 / About qdisk parameters
In-Reply-To: <482D3E1E.5030705@bull.net>
References: <482D3E1E.5030705@bull.net>
Message-ID: <1210965093.3019.175.camel@localhost.localdomain>


On Fri, 2008-05-16 at 09:56 +0200, Alain Moulle wrote:
> Hi Lon
> 
> Sorry Lon, but it is not completely clear again for me ... :
> 
> when you write that default cman timeout on RHEL5 is 5 seconds, you
> mean that the heart-beat timeout is 5s ? whereas each hello message is
> sent every 5s too ?

On RHEL5, the parameters are different - but basically, on RHEL5, the
*equivalent* of the deadnode_timer is "5" seconds by default.

(Specifying other values for it is quite different, however)

> And the totem in cluster.conf to modify it was in my understanding the
> "deadnode_timer" in the cman record ... what is the "token" you mention ?

  RHEL4:
  <cman deadnode_timer="21" />

  RHEL5:
  <totem token="21000" />


> And finally, my would be to set deadnode_timer="21s" for cman and to keep
> interval="1" and tko="10" for quorum disk.

Right. :)  Using the defaults on rhel4 should work wonderfully.


-- Lon




From lpleiman at redhat.com  Sat May 17 00:37:28 2008
From: lpleiman at redhat.com (Leo Pleiman)
Date: Fri, 16 May 2008 20:37:28 -0400
Subject: [Linux-cluster] GFS in High Traffic ?
In-Reply-To: <38435f290805151444n11ad8168m720be465c2e09f2a@mail.gmail.com>
References: <38435f290805151444n11ad8168m720be465c2e09f2a@mail.gmail.com>
Message-ID: <482E28C8.6000304@redhat.com>

Paul,

We have similar demands at my customer, and with larger file systems. We 
have gotten good results by placing the cluster traffic on a dedicated 
interface. Once the cluster traffic (as defined in the cluster.conf 
file) was placed on a dedicated interface all our stabilization problems 
disappeared. If you hardware is interface limited, you can use vlan 
tagging and place the cluster traffic in a dedicated vlan. It doesn't 
provide the additional bandwidth but it seems to dramatically help.

When we asked the same question, the general answer from the developers 
was, "it is always a good idea to place cluster traffic on a dedicated 
interface." As an interesting note, for an oracle RAC installation, the 
Oracle cluster traffic MUST be on a dedicated interface.

Paul Berry wrote:
> Hey guys - we're struggling with a GFS setup to get our 8 high traffic
> servers onto a NEXSAN SataBoy so that we can leave our RSYNC process
> which we've pushed to the extents of its capacity
>
> We don't have all that much data, its less then  1TB total. The trick
> is that these files get requested simultaneously under pretty
> significant load. And as soon as we get 3 or 4 servers mounted to the
> SAN we get melt-downs.
>
> We also struggled today with one server messing with the journals and
> taking down the other servers that were looking at the SAN (disaster).
>
> The broad question I'd love to hear - is GFS a good solution to get
> into for a situation like this?
>
> I'd love to hear thoughts on this, and suggestions on the right path
> is this seems like the wrong one
>
> Best regards,
> Pau
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>   

-- 
Leo J Pleiman
Senior Consultant, GPS Federal
410-688-3873

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lpleiman.vcf
Type: text/x-vcard
Size: 194 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080516/8e5ff3a8/attachment.vcf>

From anujhere at gmail.com  Sun May 18 11:56:39 2008
From: anujhere at gmail.com (=?UTF-8?Q?Anuj_Singh_(=E0=A4=85=E0=A4=A8=E0=A5=81=E0=A4=9C)?=)
Date: Sun, 18 May 2008 17:26:39 +0530
Subject: [Linux-cluster] cluster make fail on RHEL5 "libdlm.c:324: error: "
Message-ID: <3120c9e30805180456t13af5a8bmd55f05933fbc47e@mail.gmail.com>

Hi,I have kernel version 2.6.18-8.el5 on rhel5.

Downloaded cluster source as follows:

1.git clone git://sources.redhat.com/git/cluster.git
3. cd cluster
2. git checkout -b RHEL5 origin/RHEL5


./configure --kernel_src=/usr/src/kernels/2.6.18-8.el5-i686/


Now running make command giving me error.

gcc -L../../cman/lib -L../../dlm/lib -L//usr/lib/openais -o dlm_controld
main.o member_cman.o group.o action.o deadlock.o ../lib/libgroup.a
../../ccs/lib/libccs.a -lcman -ldlm -lcpg -lSaCkpt
/usr/bin/ld: cannot find -ldlm
collect2: ld returned 1 exit status
make[2]: *** [dlm_controld] Error 1
make[2]: Leaving directory `/usr/local/cluster/group/dlm_controld'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/local/cluster/group'
make: *** [all] Error 2


I did cd into dlm directory and:

[root at localhost dlm]# ./configure
--kernel_src=/usr/src/kernels/2.6.18-8.el5-i686/

Configuring Makefiles for your system...
Completed Makefile configuration


Now make command giving me error as follows:

[root at localhost dlm]# make
make -C lib all
make[1]: Entering directory `/usr/local/cluster/dlm/lib'
gcc -Wall  -g -I. -O2  -D_REENTRANT -c -o libdlm.o libdlm.c
libdlm.c: In function 'set_version_v5':
libdlm.c:324: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:325: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:326: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'set_version_v6':
libdlm.c:335: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:336: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:337: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'detect_kernel_version':
libdlm.c:443: error: storage size of 'v' isn't known
libdlm.c:446: error: invalid application of 'sizeof' to incomplete type
'struct dlm_device_version'
libdlm.c:448: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:449: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:450: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:452: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:453: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:454: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:443: warning: unused variable 'v'
libdlm.c: In function 'do_dlm_dispatch':
libdlm.c:590: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'ls_lock_v6':
libdlm.c:835: error: 'struct dlm_lock_params' has no member named 'xid'
libdlm.c:837: error: 'struct dlm_lock_params' has no member named 'timeout'
libdlm.c: In function 'ls_lock':
libdlm.c:897: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'dlm_ls_lockx':
libdlm.c:921: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'dlm_ls_unlock':
libdlm.c:1073: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'dlm_ls_deadlock_cancel':
libdlm.c:1105: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:1121: error: 'DLM_USER_DEADLOCK' undeclared (first use in this
function)
libdlm.c:1121: error: (Each undeclared identifier is reported only once
libdlm.c:1121: error: for each function it appears in.)
libdlm.c: In function 'dlm_ls_purge':
libdlm.c:1140: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:1151: error: 'DLM_USER_PURGE' undeclared (first use in this
function)
libdlm.c:1152: error: 'union <anonymous>' has no member named 'purge'
libdlm.c:1153: error: 'union <anonymous>' has no member named 'purge'
libdlm.c: In function 'create_lockspace':
libdlm.c:1317: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'release_lockspace':
libdlm.c:1423: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c: In function 'dlm_kernel_version':
libdlm.c:1509: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:1510: error: invalid use of undefined type 'struct
dlm_device_version'
libdlm.c:1511: error: invalid use of undefined type 'struct
dlm_device_version'
make[1]: *** [libdlm.o] Error 1
make[1]: Leaving directory `/usr/local/cluster/dlm/lib'
make: *** [all] Error 2


how to resolve this error?


Thanks and Regards
Anuj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080518/1ecdc569/attachment.htm>

From mpartio at gmail.com  Mon May 19 07:26:07 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Mon, 19 May 2008 10:26:07 +0300
Subject: [Linux-cluster] kmod-gfs removed
In-Reply-To: <81184.14267.qm@web50601.mail.re2.yahoo.com>
References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>
	<81184.14267.qm@web50601.mail.re2.yahoo.com>
Message-ID: <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com>

On Thu, May 15, 2008 at 9:40 PM, Roger Pe?a <orkcu at yahoo.com> wrote:

> yum try to remove kmod-gfs because its depende of the kernel version that
> its trying to remove, which is not right because you are trying to update a
> kernel and it should means just install the package without remove any old
> versions.
> or do you change the default configuration of yum?


I have only added an extra repo.

When I did this upgrade and rebooted, the node could not see gfs-mounts any
more (obviously, since the gfs-module was not there). Then I had to remove
kmod-gfs -package with yum (lots of errors) and re-install it with yum
again. After a reboot everything is working again.

Regards

MIkko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080519/7b596eea/attachment.htm>

From kpodesta at redbrick.dcu.ie  Mon May 19 11:25:39 2008
From: kpodesta at redbrick.dcu.ie (Karl Podesta)
Date: Mon, 19 May 2008 12:25:39 +0100
Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node
In-Reply-To: <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com>
References: <20080515173415.GA25881@minerva.redbrick.dcu.ie>
	<1210885576.32213.38.camel@ayanami.boston.devel.redhat.com>
Message-ID: <20080519112539.GF16481@minerva.redbrick.dcu.ie>

On Thu, May 15, 2008 at 05:06:16PM -0400, Lon Hohberger wrote:
> > cluquorumd[8049]: <warning> Dropping connect from (node3-IP): Unauthorized
> > 
> > Seems like the other two nodes are rejecting the third's advances
> > towards joining quorum. Is there anything I can do about this? 
> > Would appreciate any pointers, I couldn't find an answer in the archives
> > (I notice the question was asked before too). The three nodes are listed
> > as "Active" in clustat on all nodes, but the third obviously just can't
> > join the quorum, despite reboot of the third node. 
> 
> The md5sum of /etc/cluster.xml is the same for all nodes, right?
> 
> -- Lon

Indeed the sums/files are the same for all nodes... 

However it turns out the issue was resolved by rebooting the existing
two production nodes! I'm not sure if just restarting clumanager/cluquorumd
on the existing nodes would have made the difference, but when I wasn't
having any luck getting the third node into the quorum, we scheduled a 
reboot of the existing two nodes, then when all 3 nodes came back up they
had all joined quorum successfully, and services could be listed/migrated
on all of the nodes. Fixed. 

I know there are probably few people using RHEL 3 cluster anymore, but 
I found this useful to know; that I need to schedule downtime in future
if requested to add a 3rd node to a live 2-node cluster...

Thanks a lot for the help as per usual, the list is excellent reading!

Karl

-- 
Karl Podesta
Systems Engineer, Securelinx Ltd., Ireland
http://www.securelinx.com/



From fdinitto at redhat.com  Mon May 19 11:26:44 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Mon, 19 May 2008 13:26:44 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.99.02 (development snapshot) released
Message-ID: <Pine.LNX.4.64.0805191320420.5892@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The cluster team and its community are proud to announce the 3rd release
from the master branch: 2.99.02.

The 2.99.XX releases are _NOT_ meant to be used for production
environments.. yet.

You have been warned: *this code will have no mercy* for your servers and
your data.

The master branch is the main development tree that receives all new
features, code, clean up and a whole brand new set of bugs,

At some point in time this code will become the 3.0 stable release.

Everybody with test equipment and time to spare, is highly encouraged to
download, install and test the 2.99 releases and more important report
problems.

In order to build the 2.99.02 release you will need:

- - openais 0.83 or higher
- - linux kernel (git snapshot or 2.6.26-rc3) from
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git
(but can run on 2.6.25 in compatibility mode)

NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We
are still shipping shared libraries but remember that they can change
anytime without warning. A bunch of new shared libraries have been added.

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.02.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.99.01):

Bob Peterson (1):
       Replace put_inode with drop_inode

Fabio M. Di Nitto (11):
       [FENCE] Rename bladecenter as it should be .pl -> .py
       [DLM] Remove unused header file
       [BUILD] Add --without_kernel_modules configure option
       [BUILD] Free toplevel config/ dir
       [CONFIG] Create config/ subsystem
       [CONFIG] Add missing Makefiles
       [CCS] Make a bunch of functions static
       [BUILD] Stop using DEVEL.DATE library soname
       [GFS] Fix comment
       [INIT] Do not start services automatically
       [GFS] Sync with gfs2 init script

Jonathan Brassow (1):
       rgmanager/lvm.sh:  HA LVM wasn't working on IA64

Marek 'marx' Grac (3):
       [FENCE] Fix name of the option in fencing library
       [FENCE] Fix problem with different menu for admin/user for APC
       [FENCE] Fix typo in name of the exceptions in fencing agents

  Makefile                                      |   23 +-
  ccs/Makefile                                  |    2 +-
  ccs/ccs_test/Makefile                         |   44 --
  ccs/ccs_test/ccs_test.c                       |  158 -------
  ccs/libccscompat/libccscompat.c               |    6 +-
  ccs/libccsconfdb/Makefile                     |   56 ---
  ccs/libccsconfdb/ccs.h                        |   27 --
  ccs/libccsconfdb/libccs.c                     |  576 -------------------------
  ccs/man/Makefile                              |    1 -
  ccs/man/ccs_test.8                            |  138 ------
  cman/init.d/cman.in                           |    6 +-
  cman/init.d/qdiskd                            |    6 +-
  config/Makefile                               |   17 +
  config/copyright.cf                           |   22 -
  config/libs/Makefile                          |   17 +
  config/libs/libccsconfdb/Makefile             |   56 +++
  config/libs/libccsconfdb/ccs.h                |   27 ++
  config/libs/libccsconfdb/libccs.c             |  576 +++++++++++++++++++++++++
  config/tools/Makefile                         |   17 +
  config/tools/ccs_test/Makefile                |   44 ++
  config/tools/ccs_test/ccs_test.c              |  158 +++++++
  config/tools/man/Makefile                     |   17 +
  config/tools/man/ccs_test.8                   |  138 ++++++
  configure                                     |   32 +-
  dlm/include/list.h                            |  325 --------------
  fence/agents/apc/fence_apc.py                 |   30 ++-
  fence/agents/bladecenter/fence_bladecenter.pl |   90 ----
  fence/agents/bladecenter/fence_bladecenter.py |   90 ++++
  fence/agents/drac/fence_drac5.py              |    4 +-
  fence/agents/ilo/fence_ilo.py                 |    4 +-
  fence/agents/ipmilan/ipmilan.c                |    2 +-
  fence/agents/lib/fencing.py.py                |    2 +-
  fence/agents/scsi/scsi_reserve                |    6 +-
  fence/agents/wti/fence_wti.py                 |    4 +-
  gfs-kernel/src/gfs/ops_super.c                |   11 +-
  gfs/init.d/gfs                                |   15 +-
  gfs2/init.d/gfs2                              |    6 +-
  make/copyright.cf                             |   22 +
  make/defines.mk.input                         |    8 +-
  make/fencebuild.mk                            |    2 +-
  make/official_release_version                 |    1 +
  rgmanager/init.d/rgmanager.in                 |    6 +-
  rgmanager/src/resources/lvm.sh                |    2 +-
  43 files changed, 1286 insertions(+), 1508 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIVAwUBSDFkAQgUGcMLQ3qJAQJPeg/8C5BxkynDvsjfgSyjUHlzG/zZe5p4viXH
NQtYZk/3nFRBXqvZCYS+gHkdMQvRmJzEHCknLryJZMrZaq5Nj5gn8RERrtFUZ81C
6DWGyqkiqERBsMffR0nkZ//gqkktPx2AaAMFQ5nLd8v6qHvY2SdTwjaV/7ucLiWz
sTRC7samneKqj8Et6cgSId2a818xEI6LX9h4fXiwIO2DH7yHK/bvHYhatLYgPvQn
0VQ8XwKkvafUjPEBurkzgh9E4GVvOG35KTS8X/ib6whT0oJFRhkofJG2oCv1sULt
lkbGLaUiBL0DW66Z/ypXmK8IBEtgXRjE0DmfoK9xGBJBlolobmLNZ4A/pdTaBeW1
s8Qq763/BeZ5Z6pEtHQzwMcHwQjhg0mGWtmthr9TGfJ/EhsoYnp7DHLKZ89ldItE
dEHq94VTZ7QpsKPg7HBSahJEvHUzPM20GSyl7hSmx4Nuno2iftR/IUbCjVEKxPHa
0ePadvsndxuQsyVjRSseLRNHeAW0NvMY82rV9UzEX05Fi6ryT2308DzNi9018LWe
baQ+slrg7oWJNcInOwkjNcYMxm6VGPwqTyvrlb/BTZUVhZdium7A3zswx/cPt+qG
kV3cfkSNGIz/K9CqjdlE/pQFV6SqR7ILOmg4M717vMzdJcWBehD1QEtGtXxyNkSa
xG/QjxC2mZw=
=1s3u
-----END PGP SIGNATURE-----



From lhh at redhat.com  Mon May 19 15:00:54 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 19 May 2008 11:00:54 -0400
Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node
In-Reply-To: <20080519112539.GF16481@minerva.redbrick.dcu.ie>
References: <20080515173415.GA25881@minerva.redbrick.dcu.ie>
	<1210885576.32213.38.camel@ayanami.boston.devel.redhat.com>
	<20080519112539.GF16481@minerva.redbrick.dcu.ie>
Message-ID: <1211209254.32213.65.camel@ayanami.boston.devel.redhat.com>


On Mon, 2008-05-19 at 12:25 +0100, Karl Podesta wrote:

> However it turns out the issue was resolved by rebooting the existing
> two production nodes! I'm not sure if just restarting clumanager/cluquorumd
> on the existing nodes would have made the difference, but when I wasn't
> having any luck getting the third node into the quorum, we scheduled a 
> reboot of the existing two nodes, then when all 3 nodes came back up they
> had all joined quorum successfully, and services could be listed/migrated
> on all of the nodes. Fixed. 
> 
> I know there are probably few people using RHEL 3 cluster anymore, but 
> I found this useful to know; that I need to schedule downtime in future
> if requested to add a 3rd node to a live 2-node cluster...

Well, it /should/ just work.  Maybe there's something that was missed,
like adding an entry explicitly to /etc/hosts.

It drops the connection attempt if the message subsystem key doesn't
match - which is why I asked about md5sum.  Also, it's strange that
cluqourumd would not work but clumembd did - they use the same code.

Maybe the other daemons on the existing cluster nodes didn't reload
correctly (service clumanager reload may have fixed it?).

What release of clumanager was it ?

-- Lon



From kpodesta at redbrick.dcu.ie  Mon May 19 15:30:38 2008
From: kpodesta at redbrick.dcu.ie (Karl Podesta)
Date: Mon, 19 May 2008 16:30:38 +0100
Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node
In-Reply-To: <1211209254.32213.65.camel@ayanami.boston.devel.redhat.com>
References: <20080515173415.GA25881@minerva.redbrick.dcu.ie>
	<1210885576.32213.38.camel@ayanami.boston.devel.redhat.com>
	<20080519112539.GF16481@minerva.redbrick.dcu.ie>
	<1211209254.32213.65.camel@ayanami.boston.devel.redhat.com>
Message-ID: <20080519153038.GC28780@minerva.redbrick.dcu.ie>

On Mon, May 19, 2008 at 11:00:54AM -0400, Lon Hohberger wrote:
> Well, it /should/ just work.  Maybe there's something that was missed,
> like adding an entry explicitly to /etc/hosts.
> 
> It drops the connection attempt if the message subsystem key doesn't
> match - which is why I asked about md5sum.  Also, it's strange that
> cluqourumd would not work but clumembd did - they use the same code.
> 
> Maybe the other daemons on the existing cluster nodes didn't reload
> correctly (service clumanager reload may have fixed it?).
> 
> What release of clumanager was it ?
> 
> -- Lon

Well I followed procedure as directly from the manual, and before 
adding the node I scp'd over /etc/hosts, /etc/passwd, /etc/groups 
etc., made relevant changes, and made sure disk mounts were accessible 
and service software could run OK. Then I added the node through the 
GUI Cluster Config tool on one of the existing nodes, saved, copied
/etc/cluster.xml to the new member, and started clumanager. All nodes
immediately listed the 3rd node in clustat, it's just that the 3rd
node couldn't list services, and instead had the quorum error above. 

All nodes were RHAS3, the two existing ones had been built with 
Update 2, but were kept updated, the new node was built with Update 7
to recognise new hardware and was also updated via RHN prior to 
adding cluster services. It is possible that just restarting clumanager
on those 2 nodes may have fixed it, but just in case this would affect
running services we scheduled downtime, then just brought all the nodes 
down and back up again. 

Version of clumanager is 1.2.28-1, redhat-config-cluster is 1.0.8-1

I did think it odd alright, clumembd was definitely running...

Thanks & regards, 
Karl

--
Karl Podesta
Systems Engineer, Securelinx Ltd., Ireland
http://www.securelinx.com/ 



From wcyoung at buffalo.edu  Mon May 19 20:18:51 2008
From: wcyoung at buffalo.edu (Wes Young)
Date: Mon, 19 May 2008 16:18:51 -0400
Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck
Message-ID: <CFBECD46-53BB-41E1-AC59-C5F311247E0B@buffalo.edu>

I'm having a little trouble with an older installation of RHEL4,  
cluster/GFS.

One of my cluster nodes crashed the other day, when I brought it back  
up I got a the error:

GFS: Trying to join cluster "lock_dlm", "oss:mydisk"
GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS...
GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock...
GFS: fsid=oss:mydisk.0: jid=0: Looking at journal...
attempt to access beyond end of device
sdb: rw=0, want=19149432840, limit=858673152
GFS: fsid=oss:mydisk.0: fatal: I/O error

I tried to run the gfs_fsck and get a Segmentation fault.

So, I upgraded the cluster software (latest RHEL4 tag), compile and get:

# gfs_fsck -V
GFS fsck DEVEL.1211222576 (built May 19 2008 15:05:16)
Copyright (C) Red Hat, Inc.  2004-2005  All rights reserved.

[root at sproc cluster]# gfs_fsck -vv /dev/sdb
Initializing fsck
Initializing lists...
(bio.c:140)     Writing to 65536 - 16 4096
Initializing special inodes...
(file.c:45)     readi:  Offset (400) is >= the file size (400).
(super.c:226)   5 journals found.
Validating Resource Group index.
Level 1 check.
Segmentation fault

Which is a little further (it didn't do the Level 1 check) than I got  
last time, but still bails on me.

not being a GFS pro here, and a little gfs_tool list.. work, the  
volume seems to be there, just feels like the server crash damaged  
some important bits along the way.

The data on this drive isn't that critical, just looking to see if i'm  
missing something dumb, or verification that the partition is hosed  
(or just not worth trying to really recover the 400 gigs of data at  
this point).

If this should go to the devel list, please let me know.
--
Wes Young
Network Security Analyst
CIT - University at Buffalo
  -----------------------------------------------
| my OpenID:        | http://tinyurl.com/2zu2d3 |
  -----------------------------------------------







-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080519/f93064b3/attachment.p7s>

From lhh at redhat.com  Mon May 19 20:28:00 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Mon, 19 May 2008 16:28:00 -0400
Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node
In-Reply-To: <20080519153038.GC28780@minerva.redbrick.dcu.ie>
References: <20080515173415.GA25881@minerva.redbrick.dcu.ie>
	<1210885576.32213.38.camel@ayanami.boston.devel.redhat.com>
	<20080519112539.GF16481@minerva.redbrick.dcu.ie>
	<1211209254.32213.65.camel@ayanami.boston.devel.redhat.com>
	<20080519153038.GC28780@minerva.redbrick.dcu.ie>
Message-ID: <1211228880.32213.77.camel@ayanami.boston.devel.redhat.com>

On Mon, 2008-05-19 at 16:30 +0100, Karl Podesta wrote:
> Version of clumanager is 1.2.28-1, redhat-config-cluster is 1.0.8-1

I suspect you hit this:

https://bugzilla.redhat.com/show_bug.cgi?id=172886

-- Lon




From michael.osullivan at auckland.ac.nz  Mon May 19 21:15:16 2008
From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan)
Date: Tue, 20 May 2008 09:15:16 +1200
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
Message-ID: <4831EDE4.5090600@auckland.ac.nz>

Thanks for your response Wendy. Please see a diagram of the system at 
http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or 
http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the 
fullscreen view) that (I hope) explains the setup. We are not using FC 
as we are building the SAN with commodity components (the total cost of 
the system was less than NZ $9000). The SAN is designed to hold files 
for staff and students in our department, I'm not sure exactly what 
applications will use the GFS. We are using iscsi-target software 
although we may upgrade to using firmware in the future. We have used 
CLVM on top of software RAID, I agree there are many levels to this 
system, but I couldn't find the necessary is hardware/software to 
implement this in a simpler way. I am hoping the list may be helpful here.

What I wanted to do was the following:

Build a SAN from commodity hardware that has no single point of failure 
and acts like a single file system. The ethernet fabric provide two 
paths from each server to each storage device (hence two NICs on all the 
boxes). Each device contains a single logical disk (striped here across 
two disks for better performance, there is along story behind why we 
have two disks in each box). These devices (2+) are presented using 
iSCSI to 2 (or more) servers, but are put together in a RAID-5 
configuration so a single failure of a device will not interrupt access 
to the data.

I used iSCSI as we use ethernet for cost reasons. I used mdadm for 
multipath as I could not find another way to get the servers to see two 
iSCSI portals as a single device. I then used mdadm and raided the two 
iSCSI disks together to get the RAID-5 configuration I wanted. Finally I 
had to create a logical volume for the GFS system so that servers could 
properly access the network RAID array. I am more than happy to change 
this to make it more effective as long as:

1) It doesn't cost very much;
2) The no single point of failure property is maintained;
3) The servers see the SAN as a single entity (that way devices can be 
added and removed with a minimum of fuss).

Thanks again for any help/advice/suggestions. I am very new to 
implementing storage networks, so any help is great.

Regards, Mike



From JACOB_LIBERMAN at Dell.com  Mon May 19 22:05:26 2008
From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com)
Date: Mon, 19 May 2008 17:05:26 -0500
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <4831EDE4.5090600@auckland.ac.nz>
References: <4831EDE4.5090600@auckland.ac.nz>
Message-ID: <398B0D66E5559F4696716218E0A3C27665C81E@ausx3mps329.aus.amer.dell.com>

Hi Mike,

I took a peak at the diagram.  Does the blue cylinder represent an
Ethernet switch?

You may want to add another switch if it's a full redundant mesh
topology youre after. 

Thanks, Jacob 

> -----Original Message-----
> From: linux-cluster-bounces at redhat.com 
> [mailto:linux-cluster-bounces at redhat.com] On Behalf Of 
> Michael O'Sullivan
> Sent: Monday, May 19, 2008 4:15 PM
> To: linux-cluster at redhat.com
> Subject: Re: [Linux-cluster] GFS, iSCSI, multipaths and RAID
> 
> Thanks for your response Wendy. Please see a diagram of the 
> system at http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or 
> http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen 
> for the fullscreen view) that (I hope) explains the setup. We 
> are not using FC as we are building the SAN with commodity 
> components (the total cost of the system was less than NZ 
> $9000). The SAN is designed to hold files for staff and 
> students in our department, I'm not sure exactly what 
> applications will use the GFS. We are using iscsi-target 
> software although we may upgrade to using firmware in the 
> future. We have used CLVM on top of software RAID, I agree 
> there are many levels to this system, but I couldn't find the 
> necessary is hardware/software to implement this in a simpler 
> way. I am hoping the list may be helpful here.
> 
> What I wanted to do was the following:
> 
> Build a SAN from commodity hardware that has no single point 
> of failure and acts like a single file system. The ethernet 
> fabric provide two paths from each server to each storage 
> device (hence two NICs on all the boxes). Each device 
> contains a single logical disk (striped here across two disks 
> for better performance, there is along story behind why we 
> have two disks in each box). These devices (2+) are presented 
> using iSCSI to 2 (or more) servers, but are put together in a 
> RAID-5 configuration so a single failure of a device will not 
> interrupt access to the data.
> 
> I used iSCSI as we use ethernet for cost reasons. I used 
> mdadm for multipath as I could not find another way to get 
> the servers to see two iSCSI portals as a single device. I 
> then used mdadm and raided the two iSCSI disks together to 
> get the RAID-5 configuration I wanted. Finally I had to 
> create a logical volume for the GFS system so that servers 
> could properly access the network RAID array. I am more than 
> happy to change this to make it more effective as long as:
> 
> 1) It doesn't cost very much;
> 2) The no single point of failure property is maintained;
> 3) The servers see the SAN as a single entity (that way 
> devices can be added and removed with a minimum of fuss).
> 
> Thanks again for any help/advice/suggestions. I am very new 
> to implementing storage networks, so any help is great.
> 
> Regards, Mike
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From ross at kallisti.us  Mon May 19 23:03:47 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Mon, 19 May 2008 19:03:47 -0400
Subject: [Linux-cluster] New fencing method
Message-ID: <20080519230347.GA30667@kallisti.us>

Hello everyone,

I wrote a new fencing method script that fences by remotely shutting
down a switchport.  The idea is to fabric fence an iSCSI client by
shutting down the port used for iSCSI connectivity.

This should work on any Ethernet switch that implements IF-MIB -
that's more or less any managed Ethernet switch.  It works by setting
IF-MIB::ifAdminStatus.ifIndex to down(1) - ie, disable the switchport
that the node is plugged into.

However, I'm having trouble finding how to integrate my script into
the fence_node system.  Is there a config file somewhere, or will I
need to build a custom version of fence_node?

I've attached my script.  It uses python and pysnmp v2.  Feel free to
use it, integrate it, forget about it, etc

Thanks in advance for any help!

-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_snmp.py
Type: text/x-python
Size: 4880 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080519/d8d9c393/attachment.py>

From d.skorupa at wasko.pl  Tue May 20 06:41:41 2008
From: d.skorupa at wasko.pl (Darek Skorupa)
Date: Tue, 20 May 2008 08:41:41 +0200
Subject: [Linux-cluster] New fencing method
In-Reply-To: <20080519230347.GA30667@kallisti.us>
References: <20080519230347.GA30667@kallisti.us>
Message-ID: <483272A5.4060304@wasko.pl>


> However, I'm having trouble finding how to integrate my script into
> the fence_node system.  Is there a config file somewhere, or will I
> need to build a custom version of fence_node?
>
>   
I think, you should copy fence_snmp script to /sbin folder and if script 
will exit with '0' status fencing is successful in otherwise is 
unsuccessful.

Am I understand it in good way ??

Darek



From jergendutch at gmail.com  Tue May 20 12:11:45 2008
From: jergendutch at gmail.com (Jergen Dutch)
Date: Tue, 20 May 2008 14:11:45 +0200
Subject: [Linux-cluster] any tricks for per-directory gfs quotas
Message-ID: <9db683200805200511k201059bat43098d6fc2d58dbe@mail.gmail.com>

Hi,

Is there any trick that gives the effect of per-directory quotas
without requiring a given user to own the files or be writing to the
files?

Thanks
JD



From adas at redhat.com  Tue May 20 13:22:18 2008
From: adas at redhat.com (Abhijith Das)
Date: Tue, 20 May 2008 08:22:18 -0500
Subject: [Linux-cluster] any tricks for per-directory gfs quotas
In-Reply-To: <9db683200805200511k201059bat43098d6fc2d58dbe@mail.gmail.com>
References: <9db683200805200511k201059bat43098d6fc2d58dbe@mail.gmail.com>
Message-ID: <4832D08A.4020302@redhat.com>

Jergen Dutch wrote:
> Hi,
>
> Is there any trick that gives the effect of per-directory quotas
> without requiring a given user to own the files or be writing to the
> files?
>
> Thanks
> JD
>   
You can assign GFS quotas for UIDs or GIDs. In your case, user quotas
are clearly out because you can't have users own files. I haven't
completely thought through this, but there might be a clumsy way of
using group quotas to accomplish what you want... not entirely sure
you'd want to do such a thing though :-). Not to mention, this would
probably work only with a small number of directories and users and it's
going to be difficult to automate it and make it work seamlessly.

- 1 group per directory you want to monitor for quota. i.e for each
quota-monitored directory 'foo' have a group 'foo-grp' and setup GFS
quotas for these groups.

- For each such directory 'foo', do 'chgrp foo-grp foo' and 'chmod g+s
foo'. (all files and directories in 'foo' created subsequent to this
operation will have group 'foo-grp'. You can change the GIDs on the
previously existing files of the directory by hand)

Oh, and quotas for nested directories probably won't work, not
accurately at least. :-)

Hey, you asked for tricks :-) and this is what I could come up with.
Maybe somebody else would be able come up with a much better idea or
throw mine out of the window.

Cheers,
--Abhi



From ross at kallisti.us  Tue May 20 14:59:21 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Tue, 20 May 2008 10:59:21 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <483272A5.4060304@wasko.pl>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
Message-ID: <20080520145921.GA5250@kallisti.us>

On Tue, May 20, 2008 at 08:41:41AM +0200, Darek Skorupa wrote:
> >However, I'm having trouble finding how to integrate my script into
> >the fence_node system.  Is there a config file somewhere, or will I
> >need to build a custom version of fence_node?
> >
> >  
> I think, you should copy fence_snmp script to /sbin folder and if script 
> will exit with '0' status fencing is successful in otherwise is 
> unsuccessful.
> 
> Am I understand it in good way ??

Yep - I was wondering if there was additional ways to teach the
cluster utilities how to setup the parameters.

Then I assume I would write something like this in cluster.conf:

<clusternodes>
	<clusternode name="sensor01.dc3" nodeid="1" votes="1">
		<fence>
			<method name="1">
				<device name="switch1" ifIndex="10020"/>
			</method>
		</fence>
	</clusternode>
	<clusternode name="sensor02.dc3" nodeid="2" votes="1">
		<fence>
			<method name="1">
				<device name="switch2" ifIndex="10021"/>
			</method>
		</fence>
	</clusternode>
</clusternodes>
<fencedevices>
	<fencedevice agent="fence_snmp" ipaddr="10.0.0.1" comm="sw1comm" name="switch1" version="2c"/>
	<fencedevice agent="fence_snmp" ipaddr="10.0.0.2" comm="sw2comm" name="switch2" version="1"/>
</fencedevices>




But how does fence_node know to call these like:

fence_snmp -o down -v 2c -a 10.0.0.1 -c sw1comm -i 10020
fence_snmp -o down -v  1 -a 10.0.0.2 -c sw2comm -i 10021




-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37



From Alain.Moulle at bull.net  Tue May 20 15:22:30 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 20 May 2008 17:22:30 +0200
Subject: [Linux-cluster] CS5 / heart beat tuning
Message-ID: <4832ECB6.40400@bull.net>

Hi Lon

Something bothers me about the CS5 defaut heart-beat timeout :
you wrote that it was now default 5s instead of 21s with CS4.
So : what is the new default period for HELLO messages ? because
it was also 5s with CS4 ...

And a strange thing : I have already tested several times
the failover with CS5 without any totem record in cluster.conf
(just a remaining deadnode_timer="21" in cman record) and when
I stopped one node, the other proceed to fence after 21s ... not 5s ..

????

Thanks
Regards
Alain Moull?



From rpeterso at redhat.com  Tue May 20 16:40:14 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 20 May 2008 11:40:14 -0500
Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck
In-Reply-To: <CFBECD46-53BB-41E1-AC59-C5F311247E0B@buffalo.edu>
References: <CFBECD46-53BB-41E1-AC59-C5F311247E0B@buffalo.edu>
Message-ID: <1211301614.10437.24.camel@technetium.msp.redhat.com>

On Mon, 2008-05-19 at 16:18 -0400, Wes Young wrote:
> I'm having a little trouble with an older installation of RHEL4,  
> cluster/GFS.
> 
> One of my cluster nodes crashed the other day, when I brought it back  
> up I got a the error:
> 
> GFS: Trying to join cluster "lock_dlm", "oss:mydisk"
> GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS...
> GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock...
> GFS: fsid=oss:mydisk.0: jid=0: Looking at journal...
> attempt to access beyond end of device
> sdb: rw=0, want=19149432840, limit=858673152
> GFS: fsid=oss:mydisk.0: fatal: I/O error

Hi Wes,

Sorry for the long post, but this needs some explanation.

>From your email, it sounds like you have corruption in your
resource group index file (rindex).  You might be the victim 
of this bug:
https://bugzilla.redhat.com/show_bug.cgi?id=436383

If so, there's a fix to gfs_fsck to repair the damage.  This is
associated with this bug record:
https://bugzilla.redhat.com/show_bug.cgi?id=440896

While working on that bug, I discovered some kinds of
corruption that confuse the gfs_fsck's rindex repair code.
That's described in bug: 
https://bugzilla.redhat.com/show_bug.cgi?id=442271

I don't think any of these fixes are generally available
yet, except in patch form; I think they're scheduled for
4.7.  The last one, 442271, is only written against RHEL5
at the moment, so I don't have plans to fix it in RHEL4 yet.

So here's what I recommend:

First, determine for sure if this is the problem by doing
something like this:

mount the file system
gfs_tool rindex /mnt/gfs | grep "4294967292"
(there /mnt/gfs is your mount point)
umount the file system

If it comes back with "ri_data = 429496729" then that IS the
problem, in which case you need to acquire the fixes to
the first two bugs listed.  You can do this a number of
ways: (1) wait until 4.7 comes out, (2) get the patches from
the bugzilla and build them from the source tree, (3) grab the
RHEL4 branch from the cluster git tree and build from there,
because it should include those two fixes.  IIRC, I think that
the fix to gfs_grow (the original cause of this corruption)
has been released as a z-stream fix for 4.6 too, but I don't
think we did that for gfs_fsck.

If it comes back with no output, then there's a
different kind of corruption in your rindex.
You could try to build a RHEL4 version of the patch
from bug 442271 and see if it fixes your corruption.
So this at your own risk; we cannot be responsible for
your data.  I recommend making a full backup before trying
anything.  Depending on the size of the file system and
your amount of free storage, you could dd the entire GFS
device to a file you can restore.

You could also save off your file system metadata and
put it on an ftp server or web server so I can grab it
then I'll use it "in the name of 442271" to figure out
if the most recent patch in the bz will fix the corruption
and if not, I will adjust the 442271 patch so it does.
The problem with that is: there is no code in RHEL4 to
do this either.  I built a RHEL4 version of a tool
(gfs2_edit) that can save off your metadata, but I may need
to bring it up to date with recent changes first.
Either way, this might take some time to resolve.

Regards,

Bob Peterson
Red Hat Clustering & GFS




From wcyoung at buffalo.edu  Tue May 20 17:18:11 2008
From: wcyoung at buffalo.edu (Wes Young)
Date: Tue, 20 May 2008 13:18:11 -0400
Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck
In-Reply-To: <1211301614.10437.24.camel@technetium.msp.redhat.com>
References: <CFBECD46-53BB-41E1-AC59-C5F311247E0B@buffalo.edu>
	<1211301614.10437.24.camel@technetium.msp.redhat.com>
Message-ID: <CD02D7AD-BE30-48F0-A472-4F577C8318FA@buffalo.edu>


On May 20, 2008, at 12:40 PM, Bob Peterson wrote:

> On Mon, 2008-05-19 at 16:18 -0400, Wes Young wrote:
>> I'm having a little trouble with an older installation of RHEL4,
>> cluster/GFS.
>>
>> One of my cluster nodes crashed the other day, when I brought it back
>> up I got a the error:
>>
>> GFS: Trying to join cluster "lock_dlm", "oss:mydisk"
>> GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS...
>> GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock...
>> GFS: fsid=oss:mydisk.0: jid=0: Looking at journal...
>> attempt to access beyond end of device
>> sdb: rw=0, want=19149432840, limit=858673152
>> GFS: fsid=oss:mydisk.0: fatal: I/O error
>
> Hi Wes,
>
> Sorry for the long post, but this needs some explanation.
>
>> From your email, it sounds like you have corruption in your
> resource group index file (rindex).  You might be the victim
> of this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=436383
>
> If so, there's a fix to gfs_fsck to repair the damage.  This is
> associated with this bug record:
> https://bugzilla.redhat.com/show_bug.cgi?id=440896
>
> While working on that bug, I discovered some kinds of
> corruption that confuse the gfs_fsck's rindex repair code.
> That's described in bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=442271
>
> I don't think any of these fixes are generally available
> yet, except in patch form; I think they're scheduled for
> 4.7.  The last one, 442271, is only written against RHEL5
> at the moment, so I don't have plans to fix it in RHEL4 yet.
>
> So here's what I recommend:
>
> First, determine for sure if this is the problem by doing
> something like this:
>
> mount the file system
> gfs_tool rindex /mnt/gfs | grep "4294967292"
> (there /mnt/gfs is your mount point)
> umount the file system


That's the problem though, it won't actually let me mount the "disk"  
because of this problem.

Sounds like my best option is to try and patch the gfs_fsck code in  
RHE4 and see if it still seg-faults on me...

If that doesn't work, i'm guessing a move to RHEL5 would be the next  
step, but given the actual value of the data, probably not worth it at  
this point.

Thanks for the info. I'll let you know how it goes.
--
Wes Young
Network Security Analyst
CIT - University at Buffalo
  -----------------------------------------------
| my OpenID:        | http://tinyurl.com/2zu2d3 |
  -----------------------------------------------







-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2421 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080520/a57fdb38/attachment.p7s>

From rpeterso at redhat.com  Tue May 20 17:23:42 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 20 May 2008 12:23:42 -0500
Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck
In-Reply-To: <CD02D7AD-BE30-48F0-A472-4F577C8318FA@buffalo.edu>
References: <CFBECD46-53BB-41E1-AC59-C5F311247E0B@buffalo.edu>
	<1211301614.10437.24.camel@technetium.msp.redhat.com>
	<CD02D7AD-BE30-48F0-A472-4F577C8318FA@buffalo.edu>
Message-ID: <1211304222.10437.27.camel@technetium.msp.redhat.com>

On Tue, 2008-05-20 at 13:18 -0400, Wes Young wrote:
> That's the problem though, it won't actually let me mount the "disk"  
> because of this problem.

Hm.  That must be some crazy corruption to not even allow a mount.

> Sounds like my best option is to try and patch the gfs_fsck code in  
> RHE4 and see if it still seg-faults on me...

I'd try the second bug's patch first.  If that doesn't work,
try the third bug's patch.

> If that doesn't work, i'm guessing a move to RHEL5 would be the next  
> step, but given the actual value of the data, probably not worth it at  
> this point.
> 
> Thanks for the info. I'll let you know how it goes.

I wouldn't mind getting a look at the corruption, so let me
know if you want to go that route.  Saving the metadata does not
save any user data, so your confidentiality is protected.
I'll see what I can do to get that tool functional again.

Regards,

Bob Peterson
Red Hat Clustering & GFS




From barbos at gmail.com  Tue May 20 18:19:17 2008
From: barbos at gmail.com (Alex Kompel)
Date: Tue, 20 May 2008 11:19:17 -0700
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <4831EDE4.5090600@auckland.ac.nz>
References: <4831EDE4.5090600@auckland.ac.nz>
Message-ID: <3ae027040805201119m3598c736gf9ca60470e7584ee@mail.gmail.com>

On Mon, May 19, 2008 at 2:15 PM, Michael O'Sullivan
<michael.osullivan at auckland.ac.nz> wrote:
> Thanks for your response Wendy. Please see a diagram of the system at
> http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or
> http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the
> fullscreen view) that (I hope) explains the setup. We are not using FC as we
> are building the SAN with commodity components (the total cost of the system
> was less than NZ $9000). The SAN is designed to hold files for staff and
> students in our department, I'm not sure exactly what applications will use
> the GFS. We are using iscsi-target software although we may upgrade to using
> firmware in the future. We have used CLVM on top of software RAID, I agree
> there are many levels to this system, but I couldn't find the necessary is
> hardware/software to implement this in a simpler way. I am hoping the list
> may be helpful here.
>

So what do you want to get out of this configuration? iSCSI SAN, GFS
cluster or both? I don't see any reason for 2 additional servers
running GFS on top of iSCSI SAN.

If you need iSCSI SAN with iscsi-target then there are number of
articles on how to set it up. For example:
http://www.pcpro.co.uk/realworld/82284/san-on-the-cheap/page1.html Or
just google for iscsi-target drdb and heartbeat.

If you need GFS then you can run it on the storage servers (there is
no need for iSCSI in between).

If you need both then it can get tricky but you can try splitting your
raid arrays in a way that half is used by GFS cluster and half is for
DRDB volumes with iSCSI luns on top and RedHat Cluster acting as a
heartbeat for failover (provided you can also do regular failover with
GFS running on the same cluster - I have never tried it before).

-Alex



From michael.osullivan at auckland.ac.nz  Tue May 20 19:25:47 2008
From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan)
Date: Wed, 21 May 2008 07:25:47 +1200
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <20080520160011.78F95619943@hormel.redhat.com>
References: <20080520160011.78F95619943@hormel.redhat.com>
Message-ID: <483325BB.9080909@auckland.ac.nz>

Thanks Jacob,

Originally we designed a full core-edge configuration with ethernet 
switches, but our design algorithms (initially for Fiber Channel) did 
not account for the tree structure of ethernet when using unmanaged 
switches (this is relatively straightforward to incorporate, but we had 
already purchased the network...!). However, we do have two unmanaged 
switches connecting the servers to the storage devices so there are two 
paths between all the boxes.

Thanks, Mike



From lhh at redhat.com  Tue May 20 19:35:18 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 20 May 2008 15:35:18 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <483272A5.4060304@wasko.pl>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
Message-ID: <1211312118.771.138.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-05-20 at 08:41 +0200, Darek Skorupa wrote:
> > However, I'm having trouble finding how to integrate my script into
> > the fence_node system.  Is there a config file somewhere, or will I
> > need to build a custom version of fence_node?
> >
> >   
> I think, you should copy fence_snmp script to /sbin folder and if script 
> will exit with '0' status fencing is successful in otherwise is 
> unsuccessful.
> 

That's step one.

The agent as noted doesn't appear to take arguments from STDIN.  Try
looking here for more information:

   http://sources.redhat.com/cluster/wiki/FenceAgentAPI

-- Lon




From jparsons at redhat.com  Tue May 20 19:52:25 2008
From: jparsons at redhat.com (James Parsons)
Date: Tue, 20 May 2008 15:52:25 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
Message-ID: <48332BF9.1000703@redhat.com>

Lon Hohberger wrote:

>On Tue, 2008-05-20 at 08:41 +0200, Darek Skorupa wrote:
>  
>
>>>However, I'm having trouble finding how to integrate my script into
>>>the fence_node system.  Is there a config file somewhere, or will I
>>>need to build a custom version of fence_node?
>>>
>>>  
>>>      
>>>
>>I think, you should copy fence_snmp script to /sbin folder and if script 
>>will exit with '0' status fencing is successful in otherwise is 
>>unsuccessful.
>>
>>    
>>
>
>That's step one.
>
>The agent as noted doesn't appear to take arguments from STDIN.  Try
>looking here for more information:
>
>   http://sources.redhat.com/cluster/wiki/FenceAgentAPI
>  
>
I think a good pattern to use for an agent is fence_rsa, if you know 
python. It is a nice vanilla agent.

-Jim



From lhh at redhat.com  Tue May 20 20:31:18 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Tue, 20 May 2008 16:31:18 -0400
Subject: [Linux-cluster] CS5 / heart beat tuning
In-Reply-To: <4832ECB6.40400@bull.net>
References: <4832ECB6.40400@bull.net>
Message-ID: <1211315478.771.151.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-05-20 at 17:22 +0200, Alain Moulle wrote:
> Hi Lon
> 
> Something bothers me about the CS5 defaut heart-beat timeout :
> you wrote that it was now default 5s instead of 21s with CS4.
> So : what is the new default period for HELLO messages ? because
> it was also 5s with CS4 ...

There are no hello messages in rhel5; check 'man openais.conf' and look
at the sections dealing with 'totem'.

On RHEL5 with CMAN, the default totem 'token' timeout is 10000 (I
thought it was 5000, but looking at the code proved differently on lines
487-492 of cman/daemon/ais.c).

I still don't understand why it your configuration would be behaving as
if totem's token timeout was increased to 21,000 though...


> And a strange thing : I have already tested several times
> the failover with CS5 without any totem record in cluster.conf
> (just a remaining deadnode_timer="21" in cman record) and when
> I stopped one node, the other proceed to fence after 21s ... not 5s ..

That's strange.  Deadnode_timer is ignored in the RHEL5 branch.

-- Lon



From ross at kallisti.us  Tue May 20 22:13:48 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Tue, 20 May 2008 18:13:48 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
Message-ID: <20080520221348.GD6881@kallisti.us>

On Tue, May 20, 2008 at 03:35:18PM -0400, Lon Hohberger wrote:
> 
> On Tue, 2008-05-20 at 08:41 +0200, Darek Skorupa wrote:
> > > However, I'm having trouble finding how to integrate my script into
> > > the fence_node system.  Is there a config file somewhere, or will I
> > > need to build a custom version of fence_node?
> > >
> > >   
> > I think, you should copy fence_snmp script to /sbin folder and if script 
> > will exit with '0' status fencing is successful in otherwise is 
> > unsuccessful.
> > 
> 
> That's step one.
> 
> The agent as noted doesn't appear to take arguments from STDIN.  Try
> looking here for more information:
> 
>    http://sources.redhat.com/cluster/wiki/FenceAgentAPI

Awesome - thanks for the pointer.  That makes so much more sense and
explains how fence_tool can introspect the options passed to the
particular fencer.

I've attached an updated version that follows the specifications at
the above link.  I've got a two node cluster running configured with
it, though I haven't done any substantial testing yet.

Thanks for the help - feel free to use/distribute/forget about it :)
-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fence_snmp.py
Type: text/x-python
Size: 6259 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080520/2c0bf2a5/attachment.py>

From fdinitto at redhat.com  Wed May 21 09:35:44 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Wed, 21 May 2008 11:35:44 +0200 (CEST)
Subject: [Linux-cluster] New fencing method
In-Reply-To: <20080520221348.GD6881@kallisti.us>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
	<20080520221348.GD6881@kallisti.us>
Message-ID: <Pine.LNX.4.64.0805211133410.5892@trider-g7>


Hi Ross,

On Tue, 20 May 2008, Ross Vandegrift wrote:

> I've attached an updated version that follows the specifications at
> the above link.  I've got a two node cluster running configured with
> it, though I haven't done any substantial testing yet.
>
> Thanks for the help - feel free to use/distribute/forget about it :)

As soon as you feel the code is ready i will be very glad to include it in 
the release. Please make sure to choose an appropriate licence like GPL2 
so that we can easily redistribute and add a copyright entry in the file 
since it's all your work and you also deserve credits for it.

Fabio

--
I'm going to make him an offer he can't refuse.



From lhh at redhat.com  Wed May 21 17:24:03 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 21 May 2008 13:24:03 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <Pine.LNX.4.64.0805211133410.5892@trider-g7>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
	<20080520221348.GD6881@kallisti.us>
	<Pine.LNX.4.64.0805211133410.5892@trider-g7>
Message-ID: <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com>


On Wed, 2008-05-21 at 11:35 +0200, Fabio M. Di Nitto wrote: 
> Hi Ross,
> 
> On Tue, 20 May 2008, Ross Vandegrift wrote:
> 
> > I've attached an updated version that follows the specifications at
> > the above link.  I've got a two node cluster running configured with
> > it, though I haven't done any substantial testing yet.
> >
> > Thanks for the help - feel free to use/distribute/forget about it :)
> 
> As soon as you feel the code is ready i will be very glad to include it in 
> the release. Please make sure to choose an appropriate licence like GPL2 
> so that we can easily redistribute and add a copyright entry in the file 
> since it's all your work and you also deserve credits for it.

I'd recommend calling it something besides fence_snmp in the tree -
because other agents also use SNMP.  For example:

  fence_ethernet ?

-- Lon



From jparsons at redhat.com  Wed May 21 17:47:20 2008
From: jparsons at redhat.com (James Parsons)
Date: Wed, 21 May 2008 13:47:20 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com>
References: <20080519230347.GA30667@kallisti.us>
	<483272A5.4060304@wasko.pl>	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>	<20080520221348.GD6881@kallisti.us>	<Pine.LNX.4.64.0805211133410.5892@trider-g7>
	<1211390643.3174.1.camel@ayanami.boston.devel.redhat.com>
Message-ID: <48346028.6030804@redhat.com>

Lon Hohberger wrote:

>On Wed, 2008-05-21 at 11:35 +0200, Fabio M. Di Nitto wrote: 
>  
>
>>Hi Ross,
>>
>>On Tue, 20 May 2008, Ross Vandegrift wrote:
>>
>>    
>>
>>>I've attached an updated version that follows the specifications at
>>>the above link.  I've got a two node cluster running configured with
>>>it, though I haven't done any substantial testing yet.
>>>
>>>Thanks for the help - feel free to use/distribute/forget about it :)
>>>      
>>>
>>As soon as you feel the code is ready i will be very glad to include it in 
>>the release. Please make sure to choose an appropriate licence like GPL2 
>>so that we can easily redistribute and add a copyright entry in the file 
>>since it's all your work and you also deserve credits for it.
>>    
>>
>
>I'd recommend calling it something besides fence_snmp in the tree -
>because other agents also use SNMP.  For example:
>
>  fence_ethernet ?
>
>-- Lon
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>https://www.redhat.com/mailman/listinfo/linux-cluster
>  
>
Could you include some doc on how to use it, please? You can use one of 
the existing agent man pages as a template.

Thanks,

-J



From ross at kallisti.us  Wed May 21 23:35:26 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Wed, 21 May 2008 19:35:26 -0400
Subject: [Linux-cluster] New fencing method
In-Reply-To: <48346028.6030804@redhat.com>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
	<20080520221348.GD6881@kallisti.us>
	<Pine.LNX.4.64.0805211133410.5892@trider-g7>
	<1211390643.3174.1.camel@ayanami.boston.devel.redhat.com>
	<48346028.6030804@redhat.com>
Message-ID: <20080521233526.GA21955@kallisti.us>

On Wed, May 21, 2008 at 01:47:20PM -0400, James Parsons wrote:
> Lon Hohberger wrote:
> >I'd recommend calling it something besides fence_snmp in the tree -
> >because other agents also use SNMP.  For example:
> >
> > fence_ethernet ?
> >
> Could you include some doc on how to use it, please? You can use one of 
> the existing agent man pages as a template.

Done and done.  I settled on fence_ifmib, since there's nothing
specific to ethernet about IF-MIB, and it could apply to many
different technologies.

Diff against today's git is attached.

-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rhcs-fence-ifmib.diff
Type: text/x-diff
Size: 11340 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080521/c3389ba5/attachment.bin>

From michael.osullivan at auckland.ac.nz  Thu May 22 02:12:27 2008
From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan)
Date: Thu, 22 May 2008 14:12:27 +1200
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <20080521160013.8C3D6619BED@hormel.redhat.com>
References: <20080521160013.8C3D6619BED@hormel.redhat.com>
Message-ID: <4834D68B.9010309@auckland.ac.nz>

Hi Alex,

We wanted an iSCSI SAN that has highly available data, hence the need 
for 2 (or more storage devices) and a reliable storage network (omitted 
from the diagram). Many of the articles I have read for iSCSI don't 
address multipathing to the iSCSI devices, in our configuration iSCSI 
Disk 1 presented as /dev/sdc and /dev/sdd on each server (and iSCSI Disk 
2 presented as /dev/sde and /dev/sdf), but it wan't clear how to let the 
servers know that the two iSCSI portals attached to the same target - 
thus I used mdadm. Also, I wanted to raid the iSCSI disks to make sure 
the data stays highly available - thus the second use of mdadm. Now we 
had a single iSCSI raid array spread over 2 (or more) devices which 
provides the iSCSI SAN. However, I wanted to make sure the servers did 
not try to access the same data simultaneously, so I used GFS to ensure 
correct use of the iSCSI SAN. If I understand correctly it seems like 
the multipathing and raiding may be possible in Red Hat Cluster Suite 
GFS without using iSCSI? Or to use iSCSI with some other software to 
ensure proper locking happens for the iSCSI raid array? I am reading the 
link you suggested to see what other people have done, but as always any 
suggestions, etc are more than welcome.

Thanks, Mike



From s.wendy.cheng at gmail.com  Thu May 22 04:20:01 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Wed, 21 May 2008 23:20:01 -0500
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <3ae027040805201119m3598c736gf9ca60470e7584ee@mail.gmail.com>
References: <4831EDE4.5090600@auckland.ac.nz>
	<3ae027040805201119m3598c736gf9ca60470e7584ee@mail.gmail.com>
Message-ID: <4834F471.8030107@gmail.com>

Alex Kompel wrote:
> On Mon, May 19, 2008 at 2:15 PM, Michael O'Sullivan
> <michael.osullivan at auckland.ac.nz> wrote:
>   
>> Thanks for your response Wendy. Please see a diagram of the system at
>> http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or
>> http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the
>> fullscreen view) that (I hope) explains the setup. We are not using FC as we
>> are building the SAN with commodity components (the total cost of the system
>> was less than NZ $9000). The SAN is designed to hold files for staff and
>> students in our department, I'm not sure exactly what applications will use
>> the GFS. We are using iscsi-target software although we may upgrade to using
>> firmware in the future. We have used CLVM on top of software RAID, I agree
>> there are many levels to this system, but I couldn't find the necessary is
>> hardware/software to implement this in a simpler way. I am hoping the list
>> may be helpful here.
>>
>>     
>
> So what do you want to get out of this configuration? iSCSI SAN, GFS
> cluster or both? I don't see any reason for 2 additional servers
> running GFS on top of iSCSI SAN.
>   
There are advantages (for 2 additional storage servers) because serving 
data traffic over IP network has its own overhead(s). They offload CPU 
as well as memory consumption(s) away from GFS nodes. If done right, the 
setup could emulate high end SAN box using commodity hardware to provide 
low cost solutions. The issue here is how to find the right set of 
software subcomponents to build this configuration. I personally never 
use Linux iscsi target or multi-path md devices - so can't comment on 
their features and/or performance characteristics. I was hoping folks 
well versed in these Linux modules (software raid, dm multi-path, clvm 
raid level etc) could provide their comments. Check out linux-lvm and/or 
dm-devel mailing lists .. you may be able to find good links and/or 
ideas there, or even start to generate interesting discussions from 
scratch.

So, if this configuration will be used as a research project, I'm 
certainly interested to read the final report. Let us know what works 
and which one sucks.

If it is for a production system to store critical data, better to do 
more searches to see what are available in the market (to replace the 
components grouped inside the "iscsi-raid" box in your diagram - it is 
too complicated to isolate issues if problems popped up). There should 
be  plenty of them out there (e.g. Netapp has offered iscsi SAN boxes 
with additional feature set such as failover, data de-duplication, 
backup, performance monitoring, etc). At the same time, it would be nice 
to have support group to call if things go wrong.

 From GFS side, I learned from previous GFS-GNBD experiences that 
serving data from IP networks have its overhead and it is not as cheap 
as people would expect. The issue is further complicated by the newer 
Red Hat cluster infra-structure that also places non-trivial amount of 
workloads on the TCP/IP stacks. So separating these IP traffic(s) 
(cluster HA, data, and/or GFS node access by applications) should be a 
priority to make the whole setup works.

-- Wendy








From s.wendy.cheng at gmail.com  Thu May 22 04:34:47 2008
From: s.wendy.cheng at gmail.com (Wendy Cheng)
Date: Wed, 21 May 2008 23:34:47 -0500
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <4834D68B.9010309@auckland.ac.nz>
References: <20080521160013.8C3D6619BED@hormel.redhat.com>
	<4834D68B.9010309@auckland.ac.nz>
Message-ID: <4834F7E7.7090903@gmail.com>

Michael O'Sullivan wrote:
> Hi Alex,
>
> We wanted an iSCSI SAN that has highly available data, hence the need 
> for 2 (or more storage devices) and a reliable storage network 
> (omitted from the diagram). Many of the articles I have read for iSCSI 
> don't address multipathing to the iSCSI devices, in our configuration 
> iSCSI Disk 1 presented as /dev/sdc and /dev/sdd on each server (and 
> iSCSI Disk 2 presented as /dev/sde and /dev/sdf), but it wan't clear 
> how to let the servers know that the two iSCSI portals attached to the 
> same target - thus I used mdadm. Also, I wanted to raid the iSCSI 
> disks to make sure the data stays highly available - thus the second 
> use of mdadm. Now we had a single iSCSI raid array spread over 2 (or 
> more) devices which provides the iSCSI SAN. However, I wanted to make 
> sure the servers did not try to access the same data simultaneously, 
> so I used GFS to ensure correct use of the iSCSI SAN. If I understand 
> correctly it seems like the multipathing and raiding may be possible 
> in Red Hat Cluster Suite GFS without using iSCSI? Or to use iSCSI with 
> some other software to ensure proper locking happens for the iSCSI 
> raid array? I am reading the link you suggested to see what other 
> people have done, but as always any suggestions, etc are more than 
> welcome.
>

Check out dm-multipath (*not* md-multi-path) to see whether you can make 
use of it:
http://www.redhat.com/docs/manuals/csgfs/browse/4.6/DM_Multipath/MPIO_description.html

-- Wendy



From Alain.Moulle at bull.net  Thu May 22 07:36:01 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 22 May 2008 09:36:01 +0200
Subject: [Linux-cluster] CS5 / heart beat tuning
Message-ID: <48352261.9070102@bull.net>

Hi Lon

OK it seems I miss some big evolutions with CS5 versus CS4 ...
Where can I find a short documentation (or all documentation)
to understand all evolutions of CS5 , like openais , etc. ?

Thanks
Regards
Alain Moull?

On Tue, 2008-05-20 at 17:22 +0200, Alain Moulle wrote:

>> Hi Lon
>>
>> Something bothers me about the CS5 defaut heart-beat timeout :
>> you wrote that it was now default 5s instead of 21s with CS4.
>> So : what is the new default period for HELLO messages ? because
>> it was also 5s with CS4 ...


There are no hello messages in rhel5; check 'man openais.conf' and look
at the sections dealing with 'totem'.

On RHEL5 with CMAN, the default totem 'token' timeout is 10000 (I
thought it was 5000, but looking at the code proved differently on lines
487-492 of cman/daemon/ais.c).

I still don't understand why it your configuration would be behaving as
if totem's token timeout was increased to 21,000 though...



>> And a strange thing : I have already tested several times
>> the failover with CS5 without any totem record in cluster.conf
>> (just a remaining deadnode_timer="21" in cman record) and when
>> I stopped one node, the other proceed to fence after 21s ... not 5s ..


That's strange.  Deadnode_timer is ignored in the RHEL5 branch.

-- Lon



From ccaulfie at redhat.com  Thu May 22 07:44:38 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Thu, 22 May 2008 08:44:38 +0100
Subject: [Linux-cluster] CS5 / heart beat tuning
In-Reply-To: <48352261.9070102@bull.net>
References: <48352261.9070102@bull.net>
Message-ID: <48352466.8040400@redhat.com>

Alain Moulle wrote:
> Hi Lon
> 
> OK it seems I miss some big evolutions with CS5 versus CS4 ...
> Where can I find a short documentation (or all documentation)
> to understand all evolutions of CS5 , like openais , etc. ?
> 

http://sources.redhat.com/cluster/wiki/HomePage?action=AttachFile&do=view&target=aiscman.pdf

-- 

Chrissie



From fdinitto at redhat.com  Thu May 22 08:02:47 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 22 May 2008 10:02:47 +0200 (CEST)
Subject: [Linux-cluster] New fencing method
In-Reply-To: <20080521233526.GA21955@kallisti.us>
References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl>
	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>
	<20080520221348.GD6881@kallisti.us>
	<Pine.LNX.4.64.0805211133410.5892@trider-g7>
	<1211390643.3174.1.camel@ayanami.boston.devel.redhat.com>
	<48346028.6030804@redhat.com> <20080521233526.GA21955@kallisti.us>
Message-ID: <Pine.LNX.4.64.0805220959420.5892@trider-g7>


Hi Ross,

On Wed, 21 May 2008, Ross Vandegrift wrote:

> On Wed, May 21, 2008 at 01:47:20PM -0400, James Parsons wrote:
>> Lon Hohberger wrote:
>>> I'd recommend calling it something besides fence_snmp in the tree -
>>> because other agents also use SNMP.  For example:
>>>
>>> fence_ethernet ?
>>>
>> Could you include some doc on how to use it, please? You can use one of
>> the existing agent man pages as a template.
>
> Done and done.  I settled on fence_ifmib, since there's nothing
> specific to ethernet about IF-MIB, and it could apply to many
> different technologies.
>
> Diff against today's git is attached.
>

thank you very much. fence_ifmib is now in git master branch.

I did a few changes to plug it in. I will need to review the fence 
building script to avoid that hack for the COPYRIGHT header generation but 
it's low priority in my list for now. I made sure to keep the original one 
in the header.

Could you please consider adding a few print to the help menu to add 
information about build date, release version and your copyright to be 
consistent with the other fence agents?

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.



From jamesbewley at gmail.com  Thu May 22 09:25:38 2008
From: jamesbewley at gmail.com (James Bewley)
Date: Thu, 22 May 2008 10:25:38 +0100
Subject: [Linux-cluster] Fault tollerant filesystem
Message-ID: <e43d2d410805220225j2e495a45w86b082453a9159d2@mail.gmail.com>

Hi all,

I'm running a cluster and looking for a fault tolerant filesystem.

currently have failover via linux-ha and need a shared drive between 4
machines idealy no one machine would need to be master and the removal of
any machine would not compromise data integrity.

Can anyone suggest a good implementation that will fill my requirements?


Regards

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080522/55d6e743/attachment.htm>

From a_mdl at mail.ru  Thu May 22 10:08:19 2008
From: a_mdl at mail.ru (Denis Medvedev)
Date: Thu, 22 May 2008 14:08:19 +0400
Subject: =?koi8-r?Q?Re=3A_[Linux-cluster]_Fault_tollerant_filesystem?=
In-Reply-To: <e43d2d410805220225j2e495a45w86b082453a9159d2@mail.gmail.com>
References: <e43d2d410805220225j2e495a45w86b082453a9159d2@mail.gmail.com>
Message-ID: <E1Jz7id-0002Sg-00.a_mdl-mail-ru@f152.mail.ru>

Hello,
you can try www.cleversafe.org
they provide iscsi fault-tolerant multi-node storage solution
 Denis Medvedev

-----Original Message-----
From: "James Bewley" <jamesbewley at gmail.com>
To: linux-cluster at redhat.com
Date: Thu, 22 May 2008 10:25:38 +0100
Subject: [Linux-cluster] Fault tollerant filesystem

> 
> Hi all,
> 
> I'm running a cluster and looking for a fault tolerant filesystem.
> 
> currently have failover via linux-ha and need a shared drive between 4
> machines idealy no one machine would need to be master and the removal of
> any machine would not compromise data integrity.
> 
> Can anyone suggest a good implementation that will fill my requirements?
> 
> 
> Regards
> 
> James
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 



From jamesbewley at gmail.com  Thu May 22 10:30:17 2008
From: jamesbewley at gmail.com (James Bewley)
Date: Thu, 22 May 2008 11:30:17 +0100
Subject: [Linux-cluster] Fault tollerant filesystem
In-Reply-To: <E1Jz7id-0002Sg-00.a_mdl-mail-ru@f152.mail.ru>
References: <e43d2d410805220225j2e495a45w86b082453a9159d2@mail.gmail.com>
	<E1Jz7id-0002Sg-00.a_mdl-mail-ru@f152.mail.ru>
Message-ID: <e43d2d410805220330m6a1205f2ic747e021a3524fe4@mail.gmail.com>

Yes that would be nice, my budget is very much smaller than that.

>From what i've read so far, the best looking solution appears to be
DRDB and NFS.  With a structure similar to the following (v. bad) asci
diagram:

               -----------------         -----------------
               |   r/w node |       |   r/w  node |
                ------------------       ------------------

                                   ^
                           NFS mount
                                   v

----------------------                             ---------------------
|   DRDB node |   <- linux HA ->    |   DRDB node |
----------------------                             --------------------


Does GFS have any advantages over NFS,  or am i being ignorant to the
prupose GFS?


James



2008/5/22 Denis Medvedev <a_mdl at mail.ru>:
>
> Hello,
> you can try www.cleversafe.org
> they provide iscsi fault-tolerant multi-node storage solution
>  Denis Medvedev
>
> -----Original Message-----
> From: "James Bewley" <jamesbewley at gmail.com>
> To: linux-cluster at redhat.com
> Date: Thu, 22 May 2008 10:25:38 +0100
> Subject: [Linux-cluster] Fault tollerant filesystem
>
> >
> > Hi all,
> >
> > I'm running a cluster and looking for a fault tolerant filesystem.
> >
> > currently have failover via linux-ha and need a shared drive between 4
> > machines idealy no one machine would need to be master and the removal of
> > any machine would not compromise data integrity.
> >
> > Can anyone suggest a good implementation that will fill my requirements?
> >
> >
> > Regards
> >
> > James
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



--
Aviation Briefing Ltd.


Registered in England and Wales

Company No: 3709975 Registered Office: Glen Yeo House, Station Road,
Congresbury, North Somerset BS49 5DY



From sasmaz at itu.edu.tr  Thu May 22 11:25:00 2008
From: sasmaz at itu.edu.tr (aydin sasmaz)
Date: Thu, 22 May 2008 14:25:00 +0300
Subject: [Linux-cluster] Fault tollerant filesystem
In-Reply-To: <e43d2d410805220330m6a1205f2ic747e021a3524fe4@mail.gmail.com>
References: <e43d2d410805220225j2e495a45w86b082453a9159d2@mail.gmail.com>	<E1Jz7id-0002Sg-00.a_mdl-mail-ru@f152.mail.ru>
	<e43d2d410805220330m6a1205f2ic747e021a3524fe4@mail.gmail.com>
Message-ID: <023401c8bbfe$81c2e560$8548b020$@edu.tr>

Hi 

It is suitable to use enbd or redhat gnbd solution with gfs2. 

cheers

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of James Bewley
Sent: Thursday, May 22, 2008 1:30 PM
To: Denis Medvedev; linux clustering
Subject: Re: [Linux-cluster] Fault tollerant filesystem

Yes that would be nice, my budget is very much smaller than that.

>From what i've read so far, the best looking solution appears to be
DRDB and NFS.  With a structure similar to the following (v. bad) asci
diagram:

               -----------------         -----------------
               |   r/w node |       |   r/w  node |
                ------------------       ------------------

                                   ^
                           NFS mount
                                   v

----------------------                             ---------------------
|   DRDB node |   <- linux HA ->    |   DRDB node |
----------------------                             --------------------


Does GFS have any advantages over NFS,  or am i being ignorant to the
prupose GFS?


James



2008/5/22 Denis Medvedev <a_mdl at mail.ru>:
>
> Hello,
> you can try www.cleversafe.org
> they provide iscsi fault-tolerant multi-node storage solution
>  Denis Medvedev
>
> -----Original Message-----
> From: "James Bewley" <jamesbewley at gmail.com>
> To: linux-cluster at redhat.com
> Date: Thu, 22 May 2008 10:25:38 +0100
> Subject: [Linux-cluster] Fault tollerant filesystem
>
> >
> > Hi all,
> >
> > I'm running a cluster and looking for a fault tolerant filesystem.
> >
> > currently have failover via linux-ha and need a shared drive between 4
> > machines idealy no one machine would need to be master and the removal
of
> > any machine would not compromise data integrity.
> >
> > Can anyone suggest a good implementation that will fill my requirements?
> >
> >
> > Regards
> >
> > James
> >
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> >
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



--
Aviation Briefing Ltd.


Registered in England and Wales

Company No: 3709975 Registered Office: Glen Yeo House, Station Road,
Congresbury, North Somerset BS49 5DY

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From Alain.Moulle at bull.net  Thu May 22 12:44:02 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 22 May 2008 14:44:02 +0200
Subject: [Linux-cluster] CS5 / cluster_id in cluster.conf ?
Message-ID: <48356A92.5040800@bull.net>

Hi

With CS5, is there always the possibility to set cluster_id
in cluster.conf : <cman cluster_id="x"
likewise with CS4 ?
(where this cluster_id was used instead
of generated from cluster name)

Thanks
Regards
Alain Moull?



From ccaulfie at redhat.com  Thu May 22 12:53:05 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Thu, 22 May 2008 13:53:05 +0100
Subject: [Linux-cluster] CS5 / cluster_id in cluster.conf ?
In-Reply-To: <48356A92.5040800@bull.net>
References: <48356A92.5040800@bull.net>
Message-ID: <48356CB1.7040308@redhat.com>

Alain Moulle wrote:
> Hi
> 
> With CS5, is there always the possibility to set cluster_id
> in cluster.conf : <cman cluster_id="x"
> likewise with CS4 ?
> (where this cluster_id was used instead
> of generated from cluster name)
> 

Err, yes. It was you that requested the feature in the first place !

https://bugzilla.redhat.com/show_bug.cgi?id=219588

Chrissie



From Alain.Moulle at bull.net  Thu May 22 14:43:01 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Thu, 22 May 2008 16:43:01 +0200
Subject: [Linux-cluster] CS5 still problem "Node x is undead"
Message-ID: <48358675.4050506@bull.net>

Hi Lon

I've applied the patch (see resulting code below) but the patch
does not solve the problem.

Is there another patch linked to this problem ?

Thanks
Regards
Alain Moull?

>> when testing a two-nodes cluster with quorum disk, when
>> I poweroff the node1 , node 2 fences well the node 1 and
>> failovers the service, but in log of node 2 I have before and after
>> the fence success messages  many messages like this:
>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for
node 2
>> Apr 24 11:30:08 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.


http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9


Resulting code after patch application in cman/qdisk/main.c :
===========================================================
                   Transition from Online -> Evicted
                 */
                if (ni[x].ni_misses > ctx->qc_tko &&
                     state_run(ni[x].ni_status.ps_state)) {

                        /*
                           Mark our internal views as dead if nodes miss too
                           many heartbeats...  This will cause a master
                           transition if no live master exists.
                         */
                        if (ni[x].ni_status.ps_state >= S_RUN &&
                            ni[x].ni_seen) {
                                clulog(LOG_DEBUG, "Node %d DOWN\n",
                                       ni[x].ni_status.ps_nodeid);
                                ni[x].ni_seen = 0;
                        }

                        ni[x].ni_state = S_EVICT;
                        ni[x].ni_status.ps_state = S_EVICT;
                        ni[x].ni_evil_incarnation =
                                ni[x].ni_status.ps_incarnation;

                        /*
                           Write eviction notice if we're the master.
                         */
                        if (ctx->qc_status == S_MASTER) {
                                clulog(LOG_NOTICE,
                                       "Writing eviction notice for node %d\n",
                                       ni[x].ni_status.ps_nodeid);
                                qd_write_status(ctx, ni[x].ni_status.ps_nodeid,
                                                S_EVICT, NULL, NULL, NULL);
                                if (ctx->qc_flags & RF_ALLOW_KILL) {
                                        clulog(LOG_DEBUG, "Telling CMAN to "
                                                "kill the node\n");
                                        cman_kill_node(ctx->qc_ch,
                                                ni[x].ni_status.ps_nodeid);
                                }
                        }

                        /* Clear our master mask for the node after eviction */
                        if (mask)
                                clear_bit(mask, (ni[x].ni_status.ps_nodeid-1),
                                          sizeof(memb_mask_t));
                        continue;
                }



From sasmaz at itu.edu.tr  Thu May 22 16:02:43 2008
From: sasmaz at itu.edu.tr (aydin sasmaz)
Date: Thu, 22 May 2008 19:02:43 +0300
Subject: [Linux-cluster] GNBD CLuster
In-Reply-To: <Pine.LNX.4.64.0805220959420.5892@trider-g7>
References: <20080519230347.GA30667@kallisti.us>
	<483272A5.4060304@wasko.pl>	<1211312118.771.138.camel@ayanami.boston.devel.redhat.com>	<20080520221348.GD6881@kallisti.us>	<Pine.LNX.4.64.0805211133410.5892@trider-g7>	<1211390643.3174.1.camel@ayanami.boston.devel.redhat.com>	<48346028.6030804@redhat.com>
	<20080521233526.GA21955@kallisti.us>
	<Pine.LNX.4.64.0805220959420.5892@trider-g7>
Message-ID: <025c01c8bc25$49ca84c0$dd5f8e40$@edu.tr>

Hi All,

I would like to share disk volumes to the other nodes in my cluster with
using a high available gnbd cluster. In this topology, In addition to
cnodes, there are two hpdl380 server connected to msa20 enclosure by scsi
cabling. So they are presented with disk volumes. At this point, 

1) I wouldn't like to use gfs solution 
2) Keep serving disk volumes to cnodes when one of two node gnbd cluster
fails. I mean, would like to migrate gnbd service to the failover cluster
node
3) when one of the gnbd server fails would like to fence it with a proper
method by using HP-ILO fence device. But don't know how to test failing gnbd
server.

Platforms :Two HP DL380 g3 servers with RHAP5.1 vith virtualization, cluster
and cluster storage 
	     Two HP MSA20 enclosures	

Any advice would be appreciated
Thanks

aydin



From jerlyon at gmail.com  Thu May 22 17:03:46 2008
From: jerlyon at gmail.com (Jeremy Lyon)
Date: Thu, 22 May 2008 11:03:46 -0600
Subject: [Linux-cluster] 
	Cluster starts, but a node won't rejoin after reboot
Message-ID: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com>

Hi,

I'm running Cluster 2 on RHEL 5.2 (I saw this behavior on 5.1 and updated
just yesterday to see if it fixed it, but no luck) and I'm seeing issues
when I reboot a node.  I tried increasing the post_join_delay to 60 and the
totem token to 25000, but nothing seems to be working.

During the boot when the cman init script runs, I see openais messages on
the current running node for anywhere between 15 to 30 seconds:

May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from
0.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Creating commit token
because I am the rep.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq
received 89
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for
ring 560
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering COMMIT state.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] position [0] member
151.117.65.61:
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] previous ring seq 1372 rep
151.117.65.61
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89
received flag 1
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Did not need to originate
any messages in recovery.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ]         r(0) ip(
151.117.65.61)
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ]         r(0) ip(
151.117.65.61)
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:20 lxomp83k openais[3602]: [SYNC ] This node is within the
primary component and will provide service.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state.
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] got nodejoin message
151.117.65.61
May 22 11:52:20 lxomp83k openais[3602]: [CPG  ] got joinlist message from
node 1
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from
9.

That repeats until I finally see this...

May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Creating commit token
because I am the rep.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq
received 89
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for
ring 568
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering COMMIT state.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [0] member
151.117.65.61:
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1380 rep
151.117.65.61
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89
received flag 1
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [1] member
151.117.65.62:
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1368 rep
151.117.65.62
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru c high delivered c
received flag 1
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Did not need to originate
any messages in recovery.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(
151.117.65.61)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(
151.117.65.61)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(
151.117.65.62)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:27 lxomp83k openais[3602]: [CLM  ]         r(0) ip(
151.117.65.62)
May 22 11:52:27 lxomp83k openais[3602]: [SYNC ] This node is within the
primary component and will provide service.
May 22 11:52:27 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state.
May 22 11:52:27 lxomp83k openais[3602]: [MAIN ] Killing node lxomp84k
because it has rejoined the cluster with existing state


At this point when the second node comes up, I can login and run service
cman stop and service cman start.  On that start the node joins the cluster
immediately with no issue.


[root at lxomp84k ~]# uname -a
Linux lxomp84k 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64
x86_64 x86_64 GNU/Linux
[root at lxomp84k ~]# rpm -q cman
cman-2.0.84-2.el5


Any suggestions??

TIA,
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080522/fb7e1281/attachment.htm>

From fog at t.is  Thu May 22 17:12:38 2008
From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=)
Date: Thu, 22 May 2008 17:12:38 -0000
Subject: [Linux-cluster] Cluster starts,
	but a node won't rejoin after reboot
In-Reply-To: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com>
References: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com>
Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F779020C6285@SKYHQAMX08.klasi.is>

Hi,

 

I'm having the exact same issue on a RHEL 5.2 system, and have a open support case with Redhat. When it will be resolved i can post the details ....

 

Thanks,

Finnur

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeremy Lyon
Sent: 22. ma? 2008 17:04
To: linux clustering
Subject: [Linux-cluster] Cluster starts, but a node won't rejoin after reboot

 

Hi,

I'm running Cluster 2 on RHEL 5.2 (I saw this behavior on 5.1 and updated just yesterday to see if it fixed it, but no luck) and I'm seeing issues when I reboot a node.  I tried increasing the post_join_delay to 60 and the totem token to 25000, but nothing seems to be working.

During the boot when the cman init script runs, I see openais messages on the current running node for anywhere between 15 to 30 seconds:

May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 0.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 560
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering COMMIT state.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61:
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] previous ring seq 1372 rep 151.117.65.61
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:20 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service.
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state.
May 22 11:52:20 lxomp83k openais[3602]: [CLM  ] got nodejoin message 151.117.65.61
May 22 11:52:20 lxomp83k openais[3602]: [CPG  ] got joinlist message from node 1
May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 9.

That repeats until I finally see this...

May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 568
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering COMMIT state.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61:
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1380 rep 151.117.65.61
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [1] member 151.117.65.62:
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1368 rep 151.117.65.62
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru c high delivered c received flag 1
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery.
May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] CLM CONFIGURATION CHANGE
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] New Configuration:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.61)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.62)
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Left:
May 22 11:52:26 lxomp83k openais[3602]: [CLM  ] Members Joined:
May 22 11:52:27 lxomp83k openais[3602]: [CLM  ]         r(0) ip(151.117.65.62)
May 22 11:52:27 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service.
May 22 11:52:27 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state.
May 22 11:52:27 lxomp83k openais[3602]: [MAIN ] Killing node lxomp84k because it has rejoined the cluster with existing state


At this point when the second node comes up, I can login and run service cman stop and service cman start.  On that start the node joins the cluster immediately with no issue.


[root at lxomp84k ~]# uname -a
Linux lxomp84k 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[root at lxomp84k ~]# rpm -q cman
cman-2.0.84-2.el5


Any suggestions??

TIA,
Jeremy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080522/964d4a31/attachment.htm>

From ross at kallisti.us  Thu May 22 17:52:23 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Thu, 22 May 2008 13:52:23 -0400
Subject: [Linux-cluster] concurrent write performance
Message-ID: <20080522175223.GB27548@kallisti.us>

Hi everyone,

I've been doing some tests with a clustered GFS installation that will
evetually host an application that will make heavy use of concurrent
writes across nodes.

Testing such a scenarios with a script designed to simulate multiple
writers shows that add I add writer processes across nodes,
performance drops off.  This makes some sense to me, as the nodes need
to do more complicated neogtiation of locking.

Two questions:

1) What is the expected scalability of GFS for many writer nodes as
the number of nodes increases?

2) What kinds of things can I do to increase random write performance
on GFS?  I'm even interested in things that cause some trade-off with
read performance.

I've got the filesystem mounted on all nodes with noatime,quota=off.

My filesystem isn't large enough to benefit from reducing the number
of resource groups.

It looks like drop_count for the dlm isn't there anymore.  I looked
at /sys/kernel/config/dlm/cluster - what do the various items in there
tune, and which can I try to mess with to help write performance?

Finally, I don't see any sign of statfs_slots in the current gfs2_tool
gettune output.  Is there an equivalent I can muck with?

-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37



From mpartio at gmail.com  Fri May 23 06:04:18 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Fri, 23 May 2008 09:04:18 +0300
Subject: [Linux-cluster] Problems with gfs_grow
Message-ID: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com>

Hello all,

I tried to expand my gfs filesystem from 1,5TB to 2TB. I added the new 500G
disk to volume manager etc, and finally run gfs_grow. The command finished
without warnings, but a few seconds after that my cluster crashed with
"Kernel Panic - not syncing. Fatal exception". When I got the cluster up
again and executed gfs_fsck on the filesystem I get this error:

sh-3.1# gfs_fsck -v /dev/xxx-vg/xxx-lv
Initializing fsck
Initializing lists...
Initializing special inodes...
Validating Resource Group index.
Level 1 check.
5167 resource groups found.
(passed)
Setting block ranges...
Can't seek to last block in file system: 4738147774
Unable to determine the boundaries of the file system.
Freeing buffers.

What could be the problem?

Regards

Mikko

Info on the system:

CentOS 5.1

sh-3.1# rpm -qa |grep gfs
gfs2-utils-0.1.38-1.el5
kmod-gfs-0.1.19-7.el5_1.1
gfs-utils-0.1.12-1.el5

sh-3.1# uname -a
Linux xxx 2.6.18-53.1.21.el5 #1 SMP Tue May 20 09:35:07 EDT 2008 x86_64
x86_64 x86_64 GNU/Linux
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080523/160ca424/attachment.htm>

From denisb+gmane at gmail.com  Fri May 23 07:47:31 2008
From: denisb+gmane at gmail.com (denis)
Date: Fri, 23 May 2008 09:47:31 +0200
Subject: [Linux-cluster] Re: kmod-gfs removed
In-Reply-To: <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com>
References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>	<81184.14267.qm@web50601.mail.re2.yahoo.com>
	<2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com>
Message-ID: <g15sqj$dsj$1@ger.gmane.org>

Mikko Partio wrote:
>     yum try to remove kmod-gfs because its depende of the kernel version
>     that its trying to remove, which is not right because you are trying
>     to update a kernel and it should means just install the package
>     without remove any old versions.
>     or do you change the default configuration of yum?
> I have only added an extra repo.
> 
> When I did this upgrade and rebooted, the node could not see gfs-mounts 
> any more (obviously, since the gfs-module was not there). Then I had to 
> remove kmod-gfs -package with yum (lots of errors) and re-install it 
> with yum again. After a reboot everything is working again.

What is the status of this one?

I am seeing the same here (did not perform upgrade yet) :

Removing:
  kernel     x86_64     2.6.18-53.1.19.el5  installed
  kmod-gfs   x86_64     0.1.19-7.el5_1.1    installed

The only gfs related packages in the upgrade list are :
  gfs-utils  x86_64     0.1.17-1.el5 rhel-x86_64-server-cluster-storage-5
  gfs2-utils x86_64     0.1.44-1.el5 rhel-x86_64-server-5


Specifically how should one perform the upgrade with the least amount of 
hassle?

Regards
--
Denis



From vimal at monster.co.in  Fri May 23 08:18:00 2008
From: vimal at monster.co.in (Vimal Gupta)
Date: Fri, 23 May 2008 13:48:00 +0530
Subject: [Linux-cluster] kernel: dlm: lockspace  ERROR !!!
Message-ID: <48367DB8.2090704@monster.co.in>

Hi,

I made a cluster of two nodes sharing HDD via GNBD. when I was going to 
mount the exported partition on Node B the Node B got hang. And After 
the hardboot now I am getting these following entries in my 
/var/log/message of both nodes

Node A
May 23 13:42:26 mint10 kernel: dlm: lockspace 20001 from 1 type 1 not found
May 23 13:42:26 mint10 kernel: dlm: lockspace 30001 from 1 type 1 not found

Node  B
May 23 13:45:59 mint26 kernel: dlm: lockspace 20002 from 2 type 1 not found
May 23 13:46:00 mint26 kernel: dlm: lockspace 30002 from 2 type 1 not found

And ON node B [clurgmgrd] <defunct> showing
I am not able to kill this pid
Please reply ASAP

-- 

Vimal Gupta
Sr. System Administrator
Monster.com India Pvt.Ltd.



From denisb+gmane at gmail.com  Fri May 23 08:18:32 2008
From: denisb+gmane at gmail.com (denis)
Date: Fri, 23 May 2008 10:18:32 +0200
Subject: [Linux-cluster] Re: kmod-gfs removed
In-Reply-To: <g15sqj$dsj$1@ger.gmane.org>
References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>	<81184.14267.qm@web50601.mail.re2.yahoo.com>	<2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com>
	<g15sqj$dsj$1@ger.gmane.org>
Message-ID: <g15uko$jmh$1@ger.gmane.org>

denis wrote:
> What is the status of this one?
> I am seeing the same here (did not perform upgrade yet) :
> Removing:
>  kernel     x86_64     2.6.18-53.1.19.el5  installed
>  kmod-gfs   x86_64     0.1.19-7.el5_1.1    installed
> The only gfs related packages in the upgrade list are :
>  gfs-utils  x86_64     0.1.17-1.el5 rhel-x86_64-server-cluster-storage-5
>  gfs2-utils x86_64     0.1.44-1.el5 rhel-x86_64-server-5

Scratch that, these appear to be installed too :
  kmod-gfs  x86_64  0.1.23-5.el5   rhel-x86_64-server-cluster-storage-5
  kmod-gfs2 x86_64  1.92-1.1.el5   rhel-x86_64-server-cluster-storage-5

> Specifically how should one perform the upgrade with the least amount of 
> hassle?

So I guess an update should work out fine afterall?!

Regards
--
Denis



From mpartio at gmail.com  Fri May 23 08:37:35 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Fri, 23 May 2008 11:37:35 +0300
Subject: [Linux-cluster] Re: kmod-gfs removed
In-Reply-To: <g15uko$jmh$1@ger.gmane.org>
References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>
	<81184.14267.qm@web50601.mail.re2.yahoo.com>
	<2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com>
	<g15sqj$dsj$1@ger.gmane.org> <g15uko$jmh$1@ger.gmane.org>
Message-ID: <2ca799770805230137r1ec3ae19r269c955cabe6d74e@mail.gmail.com>

On Fri, May 23, 2008 at 11:18 AM, denis
<denisb+gmane at gmail.com<denisb%2Bgmane at gmail.com>>
wrote:

> denis wrote:
>
>> What is the status of this one?
>> I am seeing the same here (did not perform upgrade yet) :
>> Removing:
>>  kernel     x86_64     2.6.18-53.1.19.el5  installed
>>  kmod-gfs   x86_64     0.1.19-7.el5_1.1    installed
>> The only gfs related packages in the upgrade list are :
>>  gfs-utils  x86_64     0.1.17-1.el5 rhel-x86_64-server-cluster-storage-5
>>  gfs2-utils x86_64     0.1.44-1.el5 rhel-x86_64-server-5
>>
>
> Scratch that, these appear to be installed too :
>  kmod-gfs  x86_64  0.1.23-5.el5   rhel-x86_64-server-cluster-storage-5
>  kmod-gfs2 x86_64  1.92-1.1.el5   rhel-x86_64-server-cluster-storage-5
>
>  Specifically how should one perform the upgrade with the least amount of
>> hassle?
>>
>
> So I guess an update should work out fine afterall?!



Is this RHEL/CentOS 5.2? I don't see that kmod-gfs version with 5.1.

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080523/09758618/attachment.htm>

From denisb+gmane at gmail.com  Fri May 23 10:44:13 2008
From: denisb+gmane at gmail.com (denis)
Date: Fri, 23 May 2008 12:44:13 +0200
Subject: [Linux-cluster] Re: kmod-gfs removed
In-Reply-To: <2ca799770805230137r1ec3ae19r269c955cabe6d74e@mail.gmail.com>
References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com>	<81184.14267.qm@web50601.mail.re2.yahoo.com>	<2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com>	<g15sqj$dsj$1@ger.gmane.org>
	<g15uko$jmh$1@ger.gmane.org>
	<2ca799770805230137r1ec3ae19r269c955cabe6d74e@mail.gmail.com>
Message-ID: <g1675t$h7o$1@ger.gmane.org>

Mikko Partio wrote:
>> On Fri, May 23, 2008 at 11:18 AM, denis denisb wrote:
>>     Scratch that, these appear to be installed too :
>>      kmod-gfs  x86_64  0.1.23-5.el5   rhel-x86_64-server-cluster-storage-5
>>      kmod-gfs2 x86_64  1.92-1.1.el5   rhel-x86_64-server-cluster-storage-5
> Is this RHEL/CentOS 5.2? I don't see that kmod-gfs version with 5.1.

Yes, this is RHEL5.2.

Regards
--
Denis



From nico at altiva.fr  Fri May 23 15:08:02 2008
From: nico at altiva.fr (NM)
Date: Fri, 23 May 2008 15:08:02 +0000 (UTC)
Subject: [Linux-cluster] Booting node 1 causes it to fence node 2
Message-ID: <g16mki$enl$1@ger.gmane.org>

I have two nodes, each fenceable through a Dell RAC card. When I power 
cycle one of them, it reboots ... and proceeds to fence the other one!

I must be missing something ...

(btw should cman be started in init.d automatically? or should it be 
launched by an operator after having made sure the node was sane?)



From rpeterso at redhat.com  Fri May 23 17:50:13 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Fri, 23 May 2008 12:50:13 -0500
Subject: [Linux-cluster] Problems with gfs_grow
In-Reply-To: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com>
References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com>
Message-ID: <1211565013.10437.119.camel@technetium.msp.redhat.com>

Hi Mikko,

On Fri, 2008-05-23 at 09:04 +0300, Mikko Partio wrote:
> Hello all,
> 
> I tried to expand my gfs filesystem from 1,5TB to 2TB. I added the new
> 500G disk to volume manager etc, and finally run gfs_grow. The command
> finished without warnings, but a few seconds after that my cluster
> crashed with "Kernel Panic - not syncing. Fatal exception". When I got
> the cluster up again and executed gfs_fsck on the filesystem I get
> this error: 
> 
> sh-3.1# gfs_fsck -v /dev/xxx-vg/xxx-lv
> Initializing fsck
> Initializing lists...
> Initializing special inodes...
> Validating Resource Group index.
> Level 1 check.
> 5167 resource groups found.
> (passed)
> Setting block ranges...
> Can't seek to last block in file system: 4738147774
> Unable to determine the boundaries of the file system.
> Freeing buffers.

You've probably hit the gfs_grow bug described in bz #434962 (436383)
and the gfs_fsck bug described in 440897 (440896).  My apologies if
you can't read them; permissions to individual bugzilla records are
out of my control.  It's not guaranteed to be your problem, but it
sounds similar.

The fixes are available in the recently released RHEL5.2, although
I don't know when they'll hit Centos.  The fixes are also available
in the latest cluster git tree if you want to compile/install them
from source code yourself.  Documentation for doing this can
be found at: http://sources.redhat.com/cluster/wiki/ClusterGit

Regards,

Bob Peterson
Red Hat Clustering & GFS




From Klaus.Steinberger at physik.uni-muenchen.de  Sat May 24 08:31:33 2008
From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger)
Date: Sat, 24 May 2008 10:31:33 +0200
Subject: [Linux-cluster] Re: Booting node 1 causes it to fence node 2 (NM)
In-Reply-To: <20080523160008.98774619B25@hormel.redhat.com>
References: <20080523160008.98774619B25@hormel.redhat.com>
Message-ID: <200805241031.36484.Klaus.Steinberger@physik.uni-muenchen.de>

Hi,

> I have two nodes, each fenceable through a Dell RAC card. When I power
> cycle one of them, it reboots ... and proceeds to fence the other one!
Do you have the cluster Communication and the RAC card's on the same subnet? 
There is some hidden hint in the docu that on a two node cluster both cluster 
communication and fencing devices must be on the same network. I had similar 
symptoms as long as I tried cluster comm on fencing on different subnet in a 
two node cluster.

> (btw should cman be started in init.d automatically? or should it be
It should be started automatically.

Sincerly,
Klaus

-- 
Klaus Steinberger         Beschleunigerlaboratorium
Phone: (+49 89)289 14287  Am Coulombwall 6, D-85748 Garching, Germany
FAX:   (+49 89)289 14280  EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE
URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2002 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080524/a40c69b3/attachment.p7s>

From dist-list at LEXUM.UMontreal.CA  Sat May 24 13:04:50 2008
From: dist-list at LEXUM.UMontreal.CA (FM)
Date: Sat, 24 May 2008 09:04:50 -0400
Subject: [Linux-cluster] kernel panic umounting GFS FS
Message-ID: <48381272.1010003@lexum.umontreal.ca>

well,
today, I try to unmount GFS form one node to update it (for the latest 
kernel).
All nodes had a kernel panic.




Here is the stack :


May 24 07:23:09 ancona kernel: CMAN: too many transition restarts - will die
May 24 07:23:09 ancona kernel: CMAN: we are leaving the cluster. 
Inconsistent cluster view
May 24 07:23:09 ancona kernel: WARNING: dlm_emergency_shutdown
May 24 07:23:09 ancona clurgmgrd[4604]: <warning> #67: Shutting down 
uncleanly
May 24 07:23:09 ancona kernel: WARNING: dlm_emergency_shutdown
May 24 07:23:09 ancona kernel: SM: 00000006 sm_stop: SG still joined
May 24 07:23:09 ancona kernel: SM: 01000008 sm_stop: SG still joined
May 24 07:23:09 ancona kernel: SM: 02000014 sm_stop: SG still joined
May 24 07:23:09 ancona kernel: SM: 0300000a sm_stop: SG still joined
May 24 07:23:09 ancona ccsd[3732]: Cluster manager shutdown.  Attemping 
to reconnect...
May 24 07:23:10 ancona kernel: dlm: dlm_unlock: lkid 947100ed lockspace 
not found
May 24 07:23:10 ancona kernel: nval 91ed0131 fr 8 r 8        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 921a00a3 fr 
8 r 8        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 924b0156 fr 
7 r 7        2
May 24 07:23:10 ancona kernel: home send einval to 7
May 24 07:23:10 ancona kernel: home (3942) req reply einval 934f0161 fr 
1 r 1        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 90a603ad fr 
8 r 8        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 92b600d0 fr 
4 r 4        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 915b02a7 fr 
5 r 5        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 935b0262 fr 
5 r 5        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 922d0261 fr 
5 r 5        2
May 24 07:23:10 ancona kernel: home send einval to 2
May 24 07:23:10 ancona kernel: home send einval to 8
May 24 07:23:10 ancona kernel: home (3942) req reply einval 92b00008 fr 
7 r 7        2
May 24 07:23:10 ancona kernel: home send einval to 7
May 24 07:23:10 ancona kernel: home (3942) req reply einval 92ca0337 fr 
1 r 1        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 932d0128 fr 
1 r 1        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 9276022a fr 
7 r 7        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 94a90311 fr 
8 r 8        2
May 24 07:23:10 ancona kernel: home (3942) req reply einval 93ec0156 fr 
8 r 8        2
May 24 07:23:10 ancona kernel: 3931 pr_start last_stop 0 last_start 6 
last_finish 0
May 24 07:23:10 ancona kernel: 3931 pr_start count 7 type 2 event 6 
flags 250
May 24 07:23:10 ancona kernel: 3931 claim_jid 4
May 24 07:23:10 ancona kernel: 3931 pr_start 6 done 1
May 24 07:23:10 ancona kernel: 3931 pr_finish flags 5a
May 24 07:23:10 ancona kernel: 3916 recovery_done jid 4 msg 309 a
May 24 07:23:10 ancona kernel: 3916 recovery_done nodeid 6 flg 18
May 24 07:23:10 ancona kernel: 3930 pr_start last_stop 6 last_start 8 
last_finish 6
May 24 07:23:10 ancona kernel: 3930 pr_start count 8 type 2 event 8 
flags 21a
May 24 07:23:10 ancona kernel: 3930 pr_start 8 done 1
May 24 07:23:10 ancona kernel: 3930 pr_finish flags 1a
May 24 07:23:10 ancona kernel:
May 24 07:23:10 ancona kernel: lock_dlm:  Assertion failed on line 361 
of file /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c
May 24 07:23:10 ancona kernel: lock_dlm:  assertion:  "!error || (plock 
&& error == -EINPROGRESS)"
May 24 07:23:10 ancona kernel: lock_dlm:  time = 2227212828
May 24 07:23:10 ancona kernel: home: error=-22 num=5,9641cab lkf=0 flags=84
May 24 07:23:10 ancona kernel:
May 24 07:23:10 ancona kernel: ------------[ cut here ]------------
May 24 07:23:10 ancona kernel: kernel BUG at 
/builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c:361!
May 24 07:23:10 ancona kernel: invalid operand: 0000 [#1]
May 24 07:23:10 ancona kernel: SMP
May 24 07:23:11 ancona kernel: Modules linked in: autofs4 lock_dlm(U) 
gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc arpt_mangle 
arptable_filter arp_tables dm_mirror dm_round_robin dm_multipath button 
battery ac ohci_hcd tg3 bonding(U) floppy sg st ext3 jbd dm_mod qla2300 
qla2xxx scsi_transport_fc cciss sd_mod scsi_mod
May 24 07:23:11 ancona kernel: CPU:    1
May 24 07:23:11 ancona kernel: EIP:    0060:[<f8ae8611>]    Not tainted VLI
May 24 07:23:11 ancona kernel: EFLAGS: 00010246   (2.6.9-67.0.7.ELsmp)
May 24 07:23:11 ancona kernel: EIP is at do_dlm_unlock+0xa9/0xbf [lock_dlm]
May 24 07:23:11 ancona kernel: eax: 00000001   ebx: e82b5b80   ecx: 
f6cdbef0   edx: f8aed2d3
May 24 07:23:11 ancona kernel: esi: ffffffea   edi: 00000000   ebp: 
f8a53000   esp: f6cdbeec
May 24 07:23:11 ancona kernel: ds: 007b   es: 007b   ss: 0068
May 24 07:23:11 ancona kernel: Process gfs_glockd (pid: 3939, 
threadinfo=f6cdb000 task=f70d19b0)
May 24 07:23:11 ancona kernel: Stack: f8aed2d3 f8a53000 00000003 
e82b5b80 f8ae88b2 f8b48ede 00000001 ea3e7874
May 24 07:23:11 ancona kernel:        ea3e7858 f8b3ed63 f8b75e60 
ed9fcac0 ea3e7858 f8b75e60 e6fcf8fc f8b3e257
May 24 07:23:11 ancona kernel:        ea3e7858 00000001 ea3e7858 
f8b3e30e ea3e7874 00000000 f8b3f5f2 00000000
May 24 07:23:11 ancona kernel: Call Trace:
May 24 07:23:11 ancona kernel:  [<f8ae88b2>] lm_dlm_unlock+0x14/0x1c 
[lock_dlm]
May 24 07:23:11 ancona kernel:  [<f8b48ede>] gfs_lm_unlock+0x2c/0x42 [gfs]
May 24 07:23:11 ancona kernel:  [<f8b3ed63>] 
gfs_glock_drop_th+0xf3/0x12d [gfs]
May 24 07:23:11 ancona kernel:  [<f8b3e257>] rq_demote+0x7f/0x98 [gfs]
May 24 07:23:11 ancona kernel:  [<f8b3e30e>] run_queue+0x5a/0xc1 [gfs]
May 24 07:23:11 ancona kernel:  [<f8b3f5f2>] gfs_glock_dq+0x15f/0x16e [gfs]
May 24 07:23:11 ancona kernel:  [<f8b3f946>] 
gfs_glock_dq_uninit+0x8/0x10 [gfs]
May 24 07:23:11 ancona kernel:  [<f8b4251e>] gfs_inode_destroy+0x8e/0xbf 
[gfs]
May 24 07:23:11 ancona kernel:  [<f8b403c8>] 
gfs_reclaim_glock+0xa2/0x13c [gfs]
May 24 07:23:11 ancona kernel:  [<f8b32e05>] gfs_glockd+0x39/0xde [gfs]
May 24 07:23:11 ancona kernel:  [<c011e7b9>] default_wake_function+0x0/0xc
May 24 07:23:11 ancona kernel:  [<c02d8876>] ret_from_fork+0x6/0x14
May 24 07:23:11 ancona kernel:  [<c011e7b9>] default_wake_function+0x0/0xc
May 24 07:23:11 ancona kernel:  [<f8b32dcc>] gfs_glockd+0x0/0xde [gfs]
May 24 07:23:11 ancona kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
May 24 07:23:11 ancona kernel: Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 
04 ff 73 0c 56 ff 70 18 68 ef d3 ae f8 e8 de a2 63 c7 83 c4 34 68 d3 d2 
ae f8 e8 d1 a2 63 c7 <0f> 0b 69 01 1b d2 ae f8 68 d5 d2 ae f8 e8 8c 9a 
63 c7 5b 5e 5f
May 24 07:23:11 ancona kernel:  <0>Fatal exception: panic in 5 seconds
May 24 07:23:11 ancona kernel: dlm: dlm_lock: no lockspace
May 24 07:23:12 ancona kernel: nval 91ed0131 fr 8 r 8        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 921a00a3 fr 
8 r 8        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 924b0156 fr 
7 r 7        2
May 24 07:23:12 ancona kernel: home send einval to 7
May 24 07:23:12 ancona kernel: home (3942) req reply einval 934f0161 fr 
1 r 1        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 90a603ad fr 
8 r 8        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 92b600d0 fr 
4 r 4        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 915b02a7 fr 
5 r 5        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 935b0262 fr 
5 r 5        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 922d0261 fr 
5 r 5        2
May 24 07:23:12 ancona kernel: home send einval to 2
May 24 07:23:12 ancona kernel: home send einval to 8
May 24 07:23:12 ancona kernel: home (3942) req reply einval 92b00008 fr 
7 r 7        2
May 24 07:23:12 ancona kernel: home send einval to 7
May 24 07:23:12 ancona kernel: home (3942) req reply einval 92ca0337 fr 
1 r 1        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 932d0128 fr 
1 r 1        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 9276022a fr 
7 r 7        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 94a90311 fr 
8 r 8        2
May 24 07:23:12 ancona kernel: home (3942) req reply einval 93ec0156 fr 
8 r 8        2
May 24 07:23:12 ancona kernel: 3931 pr_start last_stop 0 last_start 6 
last_finish 0
May 24 07:23:12 ancona kernel: 3931 pr_start count 7 type 2 event 6 
flags 250
May 24 07:23:12 ancona kernel: 3931 claim_jid 4
May 24 07:23:12 ancona kernel: 3931 pr_start 6 done 1
May 24 07:23:12 ancona kernel: 3931 pr_finish flags 5a
May 24 07:23:12 ancona kernel: 3916 recovery_done jid 4 msg 309 a
May 24 07:23:12 ancona kernel: 3916 recovery_done nodeid 6 flg 18
May 24 07:23:12 ancona kernel: 3930 pr_start last_stop 6 last_start 8 
last_finish 6
May 24 07:23:12 ancona kernel: 3930 pr_start count 8 type 2 event 8 
flags 21a
May 24 07:23:12 ancona kernel: 3930 pr_start 8 done 1
May 24 07:23:12 ancona kernel: 3930 pr_finish flags 1a
May 24 07:23:12 ancona kernel:
May 24 07:23:12 ancona kernel: lock_dlm:  Assertion failed on line 432 
of file /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c
May 24 07:23:13 ancona kernel: lock_dlm:  assertion:  "!error"
May 24 07:23:13 ancona kernel: lock_dlm:  time = 2227213341
May 24 07:23:13 ancona kernel: home: num=2,7a2ec26 err=-22 cur=-1 req=3 
lkf=10000
May 24 07:23:13 ancona kernel:
May 24 07:23:13 ancona kernel: ------------[ cut here ]------------
May 24 07:23:13 ancona kernel: kernel BUG at 
/builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c:432!
May 24 07:23:13 ancona kernel: invalid operand: 0000 [#2]
May 24 07:23:13 ancona kernel: SMP
May 24 07:23:13 ancona kernel: Modules linked in: autofs4 lock_dlm(U) 
gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc arpt_mangle 
arptable_filter arp_tables dm_mirror dm_round_robin dm_multipath button 
battery ac ohci_hcd tg3 bonding(U) floppy sg st ext3 jbd dm_mod qla2300 
qla2xxx scsi_transport_fc cciss sd_mod scsi_mod
May 24 07:23:13 ancona kernel: CPU:    0
May 24 07:23:13 ancona kernel: EIP:    0060:[<f8ae8798>]    Not tainted VLI
May 24 07:23:13 ancona kernel: EFLAGS: 00010246   (2.6.9-67.0.7.ELsmp)
May 24 07:23:13 ancona kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm]
May 24 07:23:13 ancona kernel: eax: 00000001   ebx: ffffffea   ecx: 
ee2f5c34   edx: f8aed2d3
May 24 07:23:13 ancona kernel: esi: f8ae87b7   edi: f7e4cc00   ebp: 
e5fa9280   esp: ee2f5c30
May 24 07:23:13 ancona kernel: ds: 007b   es: 007b   ss: 0068
May 24 07:23:13 ancona kernel: Process httpd (pid: 10812, 
threadinfo=ee2f5000 task=f4f6b830)
May 24 07:23:13 ancona kernel: Stack: f8aed2d3 20202020 32202020 
20202020 20202020 32613720 36326365 f8b30018
May 24 07:23:13 ancona kernel:        00000246 e5fa9280 00000003 
00000000 e5fa9280 f8ae8847 00000003 f8af0c80
May 24 07:23:13 ancona kernel:        f8a53000 f8b48e9a 00000008 
00000001 e78e546c e78e5450 f8a53000 f8b3ea9a
May 24 07:23:13 ancona kernel: Call Trace:
May 24 07:23:13 ancona kernel:  [<f8b30018>] 
gfs_acl_validate_set+0x18/0x8d [gfs]
May 24 07:23:13 ancona kernel:  [<f8ae8847>] lm_dlm_lock+0x49/0x52 
[lock_dlm]
May 24 07:23:13 ancona kernel:  [<f8b48e9a>] gfs_lm_lock+0x35/0x4d [gfs]
May 24 07:23:13 ancona kernel:  [<f8b3ea9a>] 
gfs_glock_xmote_th+0x130/0x172 [gfs]
May 24 07:23:13 ancona kernel:  [<f8b3e159>] rq_promote+0xc8/0x147 [gfs]
May 24 07:23:13 ancona kernel:  [<f8b3e345>] run_queue+0x91/0xc1 [gfs]
May 24 07:23:13 ancona kernel:  [<f8b3f355>] gfs_glock_nq+0xcf/0x116 [gfs]
May 24 07:23:13 ancona kernel:  [<f8b3f92b>] gfs_glock_nq_init+0x13/0x26 
[gfs]
May 24 07:23:13 ancona kernel:  [<f8b42f44>] gfs_lookupi+0x321/0x3bf [gfs]
May 24 07:23:13 ancona kernel:  [<f8b564c8>] gfs_lookup+0x83/0xfb [gfs]



From mpartio at gmail.com  Sun May 25 16:40:16 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Sun, 25 May 2008 19:40:16 +0300
Subject: [Linux-cluster] Problems with gfs_grow
In-Reply-To: <1211565013.10437.119.camel@technetium.msp.redhat.com>
References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com>
	<1211565013.10437.119.camel@technetium.msp.redhat.com>
Message-ID: <2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com>

On Fri, May 23, 2008 at 8:50 PM, Bob Peterson <rpeterso at redhat.com> wrote:

> Hi Mikko,
>
> On Fri, 2008-05-23 at 09:04 +0300, Mikko Partio wrote:
> > Hello all,
> >
> > I tried to expand my gfs filesystem from 1,5TB to 2TB. I added the new
> > 500G disk to volume manager etc, and finally run gfs_grow. The command
> > finished without warnings, but a few seconds after that my cluster
> > crashed with "Kernel Panic - not syncing. Fatal exception". When I got
> > the cluster up again and executed gfs_fsck on the filesystem I get
> > this error:
> >
> > sh-3.1# gfs_fsck -v /dev/xxx-vg/xxx-lv
> > Initializing fsck
> > Initializing lists...
> > Initializing special inodes...
> > Validating Resource Group index.
> > Level 1 check.
> > 5167 resource groups found.
> > (passed)
> > Setting block ranges...
> > Can't seek to last block in file system: 4738147774
> > Unable to determine the boundaries of the file system.
> > Freeing buffers.
>
> You've probably hit the gfs_grow bug described in bz #434962 (436383)
> and the gfs_fsck bug described in 440897 (440896).  My apologies if
> you can't read them; permissions to individual bugzilla records are
> out of my control.  It's not guaranteed to be your problem, but it
> sounds similar.
>
> The fixes are available in the recently released RHEL5.2, although
> I don't know when they'll hit Centos.  The fixes are also available
> in the latest cluster git tree if you want to compile/install them
> from source code yourself.  Documentation for doing this can
> be found at: http://sources.redhat.com/cluster/wiki/ClusterGit
>


Hi Bob and thanks for you reply.

So, what I should do is to upgrade to 5.2 and then run gfs_fsck on the
filesystem?

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080525/e3f29d5a/attachment.htm>

From mpartio at gmail.com  Mon May 26 05:24:20 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Mon, 26 May 2008 08:24:20 +0300
Subject: [Linux-cluster] Problems with gfs_grow
In-Reply-To: <2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com>
References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com>
	<1211565013.10437.119.camel@technetium.msp.redhat.com>
	<2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com>
Message-ID: <2ca799770805252224i7019fdfatd064f1ab158ad791@mail.gmail.com>

On Sun, May 25, 2008 at 7:40 PM, Mikko Partio <mpartio at gmail.com> wrote:

> The fixes are available in the recently released RHEL5.2, although
>> I don't know when they'll hit Centos.  The fixes are also available
>> in the latest cluster git tree if you want to compile/install them
>> from source code yourself.  Documentation for doing this can
>> be found at: http://sources.redhat.com/cluster/wiki/ClusterGit
>>
>
> Hi Bob and thanks for you reply.
>
> So, what I should do is to upgrade to 5.2 and then run gfs_fsck on the
> filesystem?
>
>
Seeing that CentOS 5.2 is not released yet, I decided to take the git way. I
have never used it before so I'm not sure if I'm doing everything correctly,
but it seems that a compiled version from RHEL52 branch does not fix the
issue (details below). Would the HEAD version of gfs_fsck do any better?

Regards

Mikko

sh-3.1$ ../git checkout my52
Already on "my52"
sh-3.1$ cd gfs
sh-3.1$ ./configure

Configuring Makefiles for your system...
Completed Makefile configuration
sh-3.1$ cd gfs_fsck
sh-3.1$ make
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 main.c -o
main.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
initialize.c -o initialize.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass1.c
-o pass1.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass1b.c
-o pass1b.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass1c.c
-o pass1c.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass2.c
-o pass2.o
pass2.c: In function 'build_rooti':
pass2.c:533: warning: pointer targets in initialization differ in signedness
pass2.c:540: warning: pointer targets in initialization differ in signedness
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass3.c
-o pass3.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass4.c
-o pass4.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass5.c
-o pass5.o
pass5.c: In function 'check_block_status':
pass5.c:188: warning: pointer targets in assignment differ in signedness
pass5.c:190: warning: pointer targets in assignment differ in signedness
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
block_list.c -o block_list.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 super.c
-o super.o
super.c: In function 'gfs_rgindex_calculate':
super.c:1023: warning: pointer targets in passing argument 2 of 'hexdump'
differ in signedness
super.c: In function 'ri_update':
super.c:1098: warning: pointer targets in passing argument 3 of
'gfs_rgindex_calculate' differ in signedness
super.c:1107: warning: pointer targets in passing argument 3 of
'gfs_rgindex_rebuild' differ in signedness
super.c: In function 'gfs_rgindex_calculate':
super.c:899: warning: 'length' may be used uninitialized in this function
super.c:899: warning: 'addr' may be used uninitialized in this function
super.c: In function 'gfs_rgindex_rebuild':
super.c:683: warning: 'end_block' may be used uninitialized in this function
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 bio.c -o
bio.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 ondisk.c
-o ondisk.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 file.c -o
file.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 rgrp.c -o
rgrp.o
rgrp.c: In function 'fs_rgrp_recount':
rgrp.c:329: warning: pointer targets in passing argument 1 of 'fs_bitcount'
differ in signedness
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_bits.c
-o fs_bits.o
fs_bits.c: In function 'fs_get_bitmap':
fs_bits.c:297: warning: pointer targets in assignment differ in signedness
fs_bits.c: In function 'fs_set_bitmap':
fs_bits.c:354: warning: pointer targets in passing argument 1 of 'fs_setbit'
differ in signedness
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 util.c -o
util.o
util.c: In function 'next_rg_meta':
util.c:173: warning: pointer targets in passing argument 1 of 'fs_bitfit'
differ in signedness
util.c: In function 'next_rg_meta_free':
util.c:226: warning: pointer targets in passing argument 1 of 'fs_bitfit'
differ in signedness
util.c:229: warning: pointer targets in passing argument 1 of 'fs_bitfit'
differ in signedness
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_bmap.c
-o fs_bmap.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
fs_inode.c -o fs_inode.o
fs_inode.c: In function 'fs_mkdir':
fs_inode.c:519: warning: pointer targets in assignment differ in signedness
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_dir.c
-o fs_dir.o
fs_dir.c: In function 'leaf_search':
fs_dir.c:298: warning: pointer targets in passing argument 1 of
'gfs_dir_hash' differ in signedness
fs_dir.c: In function 'linked_leaf_search':
fs_dir.c:385: warning: pointer targets in passing argument 1 of
'gfs_dir_hash' differ in signedness
fs_dir.c: In function 'dir_e_add':
fs_dir.c:1259: warning: pointer targets in passing argument 1 of
'gfs_dir_hash' differ in signedness
fs_dir.c: In function 'dir_l_add':
fs_dir.c:1456: warning: pointer targets in passing argument 1 of
'gfs_dir_hash' differ in signedness
fs_dir.c: In function 'fs_dir_search':
fs_dir.c:467: warning: 'bh' may be used uninitialized in this function
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
fs_recovery.c -o fs_recovery.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 log.c -o
log.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 hash.c -o
hash.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
inode_hash.c -o inode_hash.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 bitmap.c
-o bitmap.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
lost_n_found.c -o lost_n_found.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 inode.c
-o inode.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 link.c -o
link.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2
metawalk.c -o metawalk.o
gcc -MMD -c  -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64
-DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 eattr.c
-o eattr.o
gcc -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM
-DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 main.o initialize.o
pass1.o pass1b.o pass1c.o pass2.o pass3.o pass4.o pass5.o block_list.o
super.o bio.o ondisk.o file.o rgrp.o fs_bits.o util.o fs_bmap.o fs_inode.o
fs_dir.o fs_recovery.o log.o hash.o inode_hash.o bitmap.o lost_n_found.o
inode.o link.o metawalk.o eattr.o -o gfs_fsck

sh-3.1$ ./gfs_fsck -V
GFS fsck DEVEL.1211779210 (built May 26 2008 08:20:46)
Copyright (C) Red Hat, Inc.  2004-2005  All rights reserved.

sh-3.1$ sudo ./gfs_fsck -v /dev/xxx-vg/xxx-lv
Password:
Initializing fsck
Initializing lists...
Initializing special inodes...
Validating Resource Group index.
Level 1 check.
5167 resource groups found.
(passed)
Setting block ranges...
Can't seek to last block in file system: 4738147774
Unable to determine the boundaries of the file system.
Freeing buffers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080526/ce6e113e/attachment.htm>

From mpartio at gmail.com  Mon May 26 07:31:23 2008
From: mpartio at gmail.com (Mikko Partio)
Date: Mon, 26 May 2008 10:31:23 +0300
Subject: [Linux-cluster] Problems with gfs_grow
In-Reply-To: <2ca799770805252224i7019fdfatd064f1ab158ad791@mail.gmail.com>
References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com>
	<1211565013.10437.119.camel@technetium.msp.redhat.com>
	<2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com>
	<2ca799770805252224i7019fdfatd064f1ab158ad791@mail.gmail.com>
Message-ID: <2ca799770805260031j6122421dhb38ceefead2739d9@mail.gmail.com>

On Mon, May 26, 2008 at 8:24 AM, Mikko Partio <mpartio at gmail.com> wrote:

>
>
> On Sun, May 25, 2008 at 7:40 PM, Mikko Partio <mpartio at gmail.com> wrote:
>
>>  The fixes are available in the recently released RHEL5.2, although
>>> I don't know when they'll hit Centos.  The fixes are also available
>>> in the latest cluster git tree if you want to compile/install them
>>> from source code yourself.  Documentation for doing this can
>>> be found at: http://sources.redhat.com/cluster/wiki/ClusterGit
>>>
>>
>> Hi Bob and thanks for you reply.
>>
>> So, what I should do is to upgrade to 5.2 and then run gfs_fsck on the
>> filesystem?
>>
>>
> Seeing that CentOS 5.2 is not released yet, I decided to take the git way.
> I have never used it before so I'm not sure if I'm doing everything
> correctly, but it seems that a compiled version from RHEL52 branch does not
> fix the issue (details below). Would the HEAD version of gfs_fsck do any
> better?
>


Sorry to continue this monologue, but I got the issue resolved. I compiled
another version of gfs_fsck ("master" in git) and it immediately found
rindex errors from the filesystem. Now everything *seems* to be ok. Thanks
for your help!

Regards

Mikko
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080526/a152ad58/attachment.htm>

From shajie_ahmed at yahoo.com  Mon May 26 08:20:12 2008
From: shajie_ahmed at yahoo.com (shajie ahmed)
Date: Mon, 26 May 2008 01:20:12 -0700 (PDT)
Subject: [Linux-cluster] Regarding RHEL 4 cluster failover
Message-ID: <325862.30726.qm@web37402.mail.mud.yahoo.com>

Hi ,

  I am using cluster suite in RHEL 4 for a two -node cluster , using ILO for fencing. I have configured and running my services on it. I have two questions to ask --

Q1 :: How can I set the maximum number or restarts for a service ?

 If a service has  failed on one node and cluster is trying to restart it and for any reason if the service does not starts on that node. How can I set the cluster to relocate the service to other node after a fixed number or restarts??

Q2. When power cable gets faulty??
  
If for any reason one node goes out of power supply then the services running on it are not relocated to the other node and even the other node is unaware of the failure of that node .....what can I do for this?? 

Please suggest.

Regards
Syed Shajie Ahmed.

 

       
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080526/fd3aaaa4/attachment.htm>

From Alain.Moulle at bull.net  Mon May 26 08:33:11 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Mon, 26 May 2008 10:33:11 +0200
Subject: [Linux-cluster] CS5 still problem "Node x is undead" (contd.)
Message-ID: <483A75C7.60003@bull.net>

Hi

As told before, the patch :
http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9
does not solve the problem for my configuration ...

Just an idea/question : could this problem be also linked
to the defaut value of token ? Or has it nothing to do with it ?
Because currently, I have this problem with a Quorum disk
configured and no token record in cluster.conf, so token
is at its default value ...

???
Thanks
Regards
Alain Moull?

> Hi Lon
> I've applied the patch (see resulting code below) but the patch
> does not solve the problem.
> Is there another patch linked to this problem ?
> Thanks
> Regards
> Alain Moull?
<
>>>> when testing a two-nodes cluster with quorum disk, when
>>>> I poweroff the node1 , node 2 fences well the node 1 and
>>>> failovers the service, but in log of node 2 I have before and after
>>>> the fence success messages  many messages like this:
>>>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for

node 2

>>>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for

node 2

>>>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for

node 2

>>>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.
>>>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: <alert> Writing eviction notice for

node 2

>>>> Apr 24 11:30:08 s_sys at xn3 qdiskd[13740]: <crit> Node 2 is undead.



http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9


Resulting code after patch application in cman/qdisk/main.c :
===========================================================
                   Transition from Online -> Evicted
                 */
                if (ni[x].ni_misses > ctx->qc_tko &&
                     state_run(ni[x].ni_status.ps_state)) {

                        /*
                           Mark our internal views as dead if nodes miss too
                           many heartbeats...  This will cause a master
                           transition if no live master exists.
                         */
                        if (ni[x].ni_status.ps_state >= S_RUN &&
                            ni[x].ni_seen) {
                                clulog(LOG_DEBUG, "Node %d DOWN\n",
                                       ni[x].ni_status.ps_nodeid);
                                ni[x].ni_seen = 0;
                        }

                        ni[x].ni_state = S_EVICT;
                        ni[x].ni_status.ps_state = S_EVICT;
                        ni[x].ni_evil_incarnation =
                                ni[x].ni_status.ps_incarnation;

                        /*
                           Write eviction notice if we're the master.
                         */
                        if (ctx->qc_status == S_MASTER) {
                                clulog(LOG_NOTICE,
                                       "Writing eviction notice for node %d\n",
                                       ni[x].ni_status.ps_nodeid);
                                qd_write_status(ctx, ni[x].ni_status.ps_nodeid,
                                                S_EVICT, NULL, NULL, NULL);
                                if (ctx->qc_flags & RF_ALLOW_KILL) {
                                        clulog(LOG_DEBUG, "Telling CMAN to "
                                                "kill the node\n");
                                        cman_kill_node(ctx->qc_ch,
                                                ni[x].ni_status.ps_nodeid);
                                }
                        }

                        /* Clear our master mask for the node after eviction */
                        if (mask)
                                clear_bit(mask, (ni[x].ni_status.ps_nodeid-1),
                                          sizeof(memb_mask_t));
                        continue;
                }



------------------------------




From pmshehzad at yahoo.com  Mon May 26 13:45:24 2008
From: pmshehzad at yahoo.com (Mshehzad Pankhawala)
Date: Mon, 26 May 2008 06:45:24 -0700 (PDT)
Subject: [Linux-cluster] How to Configuring DRBD on already Mounted Disk
Message-ID: <184977.74826.qm@web45814.mail.sp1.yahoo.com>


 Hi Sir, 
 
 
 
We are intermediate for DRBD 
&nbsp; DRBD setup carefully and it is ok.
I want to setup server (Main Server - serving mail service) serving the postfix mail service&nbsp; which is using my /home directory and mounted on /dev/sda5.

I want this /dev/sda5 (/home) to be used in the DRBD, and evenif it is mounted on /home.

But when I start drbd service
#service drbd start
It start with diskless mode.
 
 Starting DRBD resources:&nbsp;&nbsp;&nbsp; [ d0 Failure: (114) Lower device is already claimed. This usually means it is mounted.

cmd /sbin/drbdsetup /dev/drbd0 disk /dev/sda5 /dev/sda5 internal --set-defaults --create-device&nbsp; failed - continuing!

s0 n0 ].
...........

cmd /sbin/drbdsetup /dev/drbd0 disk /dev/sda5 internal --set-defaults 
--create-device --on-io-error=detach failed - continuing! 
 
 
&nbsp; My kernel version  2.6.18 
 
 Is it possible to use my mounted disk (/dev/sda5) evenif it is mounted.
If Yes. Then please tell me how?

Thanks.
MohammedShehzad Pankhawala
 


      Check out the all-new face of Yahoo! India. Go to http://in..yahoo.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080526/bf60e95e/attachment.htm>

From corey.kovacs at gmail.com  Mon May 26 23:56:18 2008
From: corey.kovacs at gmail.com (Corey Kovacs)
Date: Tue, 27 May 2008 00:56:18 +0100
Subject: [Linux-cluster] Cluster NFS operation...
Message-ID: <7d6e8da40805261656q4d0113fdhc5cbea3b8aabb512@mail.gmail.com>

After setting up my 5 node RHEL5.2 cluster I began some testing of the
NFS failover capabilities.

My config is simple...

gfs2 /home filesystem
gfs2 /apps filesystem
gfs2 /projects filesystem

I've tried both managed IP's and managed NFS services for failover.
Both seem to have problems handling "failback" in my case.
The NFS services consist of the following model.

service name
     IP Address
     GFS FS
          NFS Export
               NFS Client

The three services are spread across three of the 5 nodes. Reading the
man page for clurmtabd reveals that it is supposed to
maintain the client states and "merge" rmtab entries etc to prevent
stale filehandles etc.

The clients are RHEl 4.6 using automounted nfs. The clients are
requesting nfs ver 3, and tcp, with the hard and intr flags.

THings seem to work fine for an initial failover, but when I try to
failback, things hang

I am planning on using this cluster to replace an aging alpha cluster
running Tru64/TruCluster.

So I guess my questions are..

1. Is this a known issue?
2. Is there a document other than the nfscookbook from R.P. or at
least a version thats been updated in the last year (if somethings
changed that is)
3. How, when simply floating IP addresses as outlined in the cookbook,
does the rmtab get managed since no NFS service is configured?

Any help in understanding these issues would be appreciated.


-C



From yamato at redhat.com  Tue May 27 06:18:24 2008
From: yamato at redhat.com (Masatake YAMATO)
Date: Tue, 27 May 2008 15:18:24 +0900 (JST)
Subject: [Linux-cluster] [PATCH] checking NULL pointer in device_write of
	dlm-control
Message-ID: <20080527.151824.252812822.yamato@redhat.com>

Hi,

(This list is good place to submit a patch?
 If not, please let me know where I should do.)

I found a way to let linux dereference NULL pointer
in gfs2-2.6-nmw/fs/dlm/user.c. 

If `device_write' method is called via "dlm-control", 
file->private_data is NULL. (See ctl_device_open() in 
user.c. ) Through proc->flags is read:

	if ((kbuf->cmd == DLM_USER_LOCK || kbuf->cmd == DLM_USER_UNLOCK) &&
	    test_bit(DLM_PROC_FLAGS_CLOSING, &proc->flags))
		return -EINVAL;

It causes following message on my Fedora 9 PC:

    BUG: unable to handle kernel NULL pointer dereference at 00000004
    IP: [<f8f555df>] :dlm:device_write+0xa5/0x432
    *pde = 7f11b067 
    Oops: 0000 [#2] SMP 
    Modules linked in: ...<snipped>

    Pid: 26899, comm: a.out Tainted: G      D  (2.6.25-14.fc9.i686 #1)
    EIP: 0060:[<f8f555df>] EFLAGS: 00210297 CPU: 1
    EIP is at device_write+0xa5/0x432 [dlm]
    EAX: f66ad200 EBX: f66ad280 ECX: 00000000 EDX: 00000006
    ESI: 00000064 EDI: 00000000 EBP: c8a45f70 ESP: c8a45f44
     DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process a.out (pid: 26899, ti=c8a45000 task=c8a72000 task.ti=c8a45000)
    Stack: bfe90b34 f66ad280 00000000 c8a45f58 c04cc41c c8a45f70 c0482bd5 00000001 
	   def7a0c0 f8f5553a 00000064 c8a45f90 c04832bb c8a45f9c bfe90b34 c04817e7 
	   def7a0c0 fffffff7 080483a0 c8a45fb0 c04833f8 c8a45f9c 00000000 00000000 
    Call Trace:
     [<c04cc41c>] ? security_file_permission+0xf/0x11
     [<c0482bd5>] ? rw_verify_area+0x76/0x97
     [<f8f5553a>] ? device_write+0x0/0x432 [dlm]
     [<c04832bb>] ? vfs_write+0x8a/0x12e
     [<c04817e7>] ? do_sys_open+0xab/0xb5
     [<c04833f8>] ? sys_write+0x3b/0x60
     [<c0405bf2>] ? syscall_call+0x7/0xb
     [<c0620000>] ? acpi_pci_root_add+0x22f/0x2a0
     =======================
    Code: <snipped>
    EIP: [<f8f555df>] device_write+0xa5/0x432 [dlm] SS:ESP 0068:c8a45f44
    ---[ end trace 74c3a9c3bd1a789d ]---
    [yamato at localhost dlm-crash]$ 



Here is a patch.

Signed-off-by: Masatake YAMATO <yamato at redhat.com>
diff --git a/fs/dlm/user.c b/fs/dlm/user.c
index ebbcf38..1aa76b3 100644
--- a/fs/dlm/user.c
+++ b/fs/dlm/user.c
@@ -538,7 +538,7 @@ static ssize_t device_write(struct file *file, const char __user *buf,
 
 	/* do we really need this? can a write happen after a close? */
 	if ((kbuf->cmd == DLM_USER_LOCK || kbuf->cmd == DLM_USER_UNLOCK) &&
-	    test_bit(DLM_PROC_FLAGS_CLOSING, &proc->flags))
+	    (proc && test_bit(DLM_PROC_FLAGS_CLOSING, &proc->flags)))
 		return -EINVAL;
 
 	sigfillset(&allsigs);




From Alain.Moulle at bull.net  Tue May 27 13:30:57 2008
From: Alain.Moulle at bull.net (Alain Moulle)
Date: Tue, 27 May 2008 15:30:57 +0200
Subject: [Linux-cluster] CS5 / IP ressource with bonding ?
Message-ID: <483C0D11.7070900@bull.net>

Hi

Is it possible to manage IP ressources linked to a bounded interface ?

Or is there any known problem with that ?

Thanks
Regards
Alain Moull?



From nico at altiva.fr  Tue May 27 13:55:36 2008
From: nico at altiva.fr (NM)
Date: Tue, 27 May 2008 13:55:36 +0000 (UTC)
Subject: [Linux-cluster] Re: Booting node 1 causes it to fence node 2 (NM)
References: <20080523160008.98774619B25@hormel.redhat.com>
	<200805241031.36484.Klaus.Steinberger@physik.uni-muenchen.de>
Message-ID: <g1h3so$eoe$1@ger.gmane.org>

On Sat, 24 May 2008 10:31:33 +0200, Klaus Steinberger wrote:

> Do you have the cluster Communication and the RAC card's on the same
> subnet? There is some hidden hint in the docu that on a two node cluster
> both cluster communication and fencing devices must be on the same
> network. I had similar symptoms as long as I tried cluster comm on
> fencing on different subnet in a two node cluster.

I eventually solved this by upping the "post join delay" from 3s 
(default) to 20s (recommended in the doc). Why is the default value so 
low compared to the recommended value? This is weird.

>> (btw should cman be started in init.d automatically? or should it be
> It should be started automatically.

Thanks.



From teemu.m2 at luukku.com  Tue May 27 17:05:43 2008
From: teemu.m2 at luukku.com (m.. mm..)
Date: Tue, 27 May 2008 20:05:43 +0300 (EEST)
Subject: [Linux-cluster] cluster name not in cluster.conf??
Message-ID: <1211907943851.teemu.m2.58786.h27Hz02gMOQpp4y6ciu3cw@luukku.com>

Hi,

How can is fix this problem. "cluster name not in cluster.conf"

I get this info when i do start cman. I don't understand this because i have cluster name right in my cluster.conf 

Has somebody fix to this, this problem came with RedHat 5.1 version...i think i have 2 nodes here..


...................................................................
Luukku Plus paketilla p??set eroon tila- ja turvallisuusongelmista.
Hanki Luukku Plus ja helpotat el?m??si. http://www.mtv3.fi/luukku




From rpeterso at redhat.com  Tue May 27 18:41:38 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 27 May 2008 13:41:38 -0500
Subject: [Linux-cluster] [PATCH] checking NULL pointer in device_write
	of dlm-control
In-Reply-To: <20080527.151824.252812822.yamato@redhat.com>
References: <20080527.151824.252812822.yamato@redhat.com>
Message-ID: <1211913698.10437.137.camel@technetium.msp.redhat.com>

On Tue, 2008-05-27 at 15:18 +0900, Masatake YAMATO wrote:
> Hi,
> 
> (This list is good place to submit a patch?
>  If not, please let me know where I should do.)

Hi Mr. Masatake,

The proper place to submit patches to the cluster code is the
public cluster-devel mailing list.  Please see:
https://www.redhat.com/mailman/listinfo/cluster-devel

Although a lot of us read both mailing lists.

Regards,

Bob Peterson
Red Hat Clustering & GFS




From rpeterso at redhat.com  Tue May 27 18:43:32 2008
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 27 May 2008 13:43:32 -0500
Subject: [Linux-cluster] cluster name not in cluster.conf??
In-Reply-To: <1211907943851.teemu.m2.58786.h27Hz02gMOQpp4y6ciu3cw@luukku.com>
References: <1211907943851.teemu.m2.58786.h27Hz02gMOQpp4y6ciu3cw@luukku.com>
Message-ID: <1211913812.10437.138.camel@technetium.msp.redhat.com>

Hi,

On Tue, 2008-05-27 at 20:05 +0300, m.. mm.. wrote:
> Hi,
> 
> How can is fix this problem. "cluster name not in cluster.conf"
> 
> I get this info when i do start cman. I don't understand this because i have cluster name right in my cluster.conf 
> 
> Has somebody fix to this, this problem came with RedHat 5.1 version...i think i have 2 nodes here..

Please post your cluster.conf file here (removing any passwords first,
of course).

Regards,

Bob Peterson
Red Hat Clustering & GFS




From d.degroot at griffith.edu.au  Wed May 28 01:57:06 2008
From: d.degroot at griffith.edu.au (Darrin De Groot)
Date: Wed, 28 May 2008 12:57:06 +1100
Subject: [Linux-cluster] multipathed quorum disk
Message-ID: <OF3F8F1738.106DA8BC-ON4A257457.00093718-CA257457.000A8E51@domino.griffith.edu.au>

Hi,

I am running a 4 node cluster with a multipathed quorum disk, configured 
to use the path /dev/dm-1. The problem that I am having is that if I lose 
one path to the disk (am testing by pulling one fibre), the node is almost 
always fenced (one node, once, managed to stay up, out of more than 10 
attempts). Is there some timeout that needs changing to give qdiskd the 
time to realise that a path is down? I have tried an interval of 3 seconds 
with at TKO of 10, with no success, and a token timeout set at 45000ms:

<totem consensus="4800" join="60" token="45000" 
token_retransmits_before_loss_const="20"/>
        <quorumd device="/dev/dm-1" interval="3" min_score="1" tko="10" 
votes="3"/>

output of mkqdisk -L:

[root at host3 ~]# mkqdisk -L
mkqdisk v0.5.1
/dev/sdc1:
        Magic:   eb7a62c2
        Label:   cms_qdisk
        Created: Mon May 26 14:24:29 2008
        Host:    host3

/dev/sdd1:
        Magic:   eb7a62c2
        Label:   cms_qdisk
        Created: Mon May 26 14:24:29 2008
        Host:    host3

/dev/dm-1:
        Magic:   eb7a62c2
        Label:   cms_qdisk
        Created: Mon May 26 14:24:29 2008
        Host:    host3

When the node subsequently boots, with only one path, everything works 
just fine, so it can obviously use both paths.

Is anyone able to offer any advice on why this is happening (and how to 
stop it)?

Regards,

Darrin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080528/15714731/attachment.htm>

From lhh at redhat.com  Wed May 28 15:09:28 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 28 May 2008 11:09:28 -0400
Subject: [Linux-cluster] CS5 still problem "Node x is undead" (contd.)
In-Reply-To: <483A75C7.60003@bull.net>
References: <483A75C7.60003@bull.net>
Message-ID: <1211987368.3174.178.camel@ayanami.boston.devel.redhat.com>


On Mon, 2008-05-26 at 10:33 +0200, Alain Moulle wrote:
> Hi
> 
> As told before, the patch :
> http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9
> does not solve the problem for my configuration ...
> 
> Just an idea/question : could this problem be also linked
> to the defaut value of token ? Or has it nothing to do with it ?
> Because currently, I have this problem with a Quorum disk
> configured and no token record in cluster.conf, so token
> is at its default value ...

It could be - try setting it to 21000:

 <totem token="21000"/>

(you can put the <totem ..> tag right below the <cman ..> tag).

-- Lon



From lhh at redhat.com  Wed May 28 15:12:44 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 28 May 2008 11:12:44 -0400
Subject: [Linux-cluster] CS5 / IP ressource with bonding ?
In-Reply-To: <483C0D11.7070900@bull.net>
References: <483C0D11.7070900@bull.net>
Message-ID: <1211987564.3174.183.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-05-27 at 15:30 +0200, Alain Moulle wrote:
> Hi
> 
> Is it possible to manage IP ressources linked to a bounded interface ?
> 
> Or is there any known problem with that ?

What do you mean - manage the only IP on an interface?

Currently, no - you need to have an interface with an IP in the same
subnet mask bound to it in order for the IP resource agent to select the
appropriate interface.

There was a patch floating around some time ago on the mailing list
which allowed the specification of something like:
  
   ethernet_device="eth0" 

However, the IP resource agent does not perform routing tasks, and it's
really not its job to do so, which is why the patch was rejected.

-- Lon



From lhh at redhat.com  Wed May 28 15:45:40 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 28 May 2008 11:45:40 -0400
Subject: [Linux-cluster] multipathed quorum disk
In-Reply-To: <OF3F8F1738.106DA8BC-ON4A257457.00093718-CA257457.000A8E51@domino.griffith.edu.au>
References: <OF3F8F1738.106DA8BC-ON4A257457.00093718-CA257457.000A8E51@domino.griffith.edu.au>
Message-ID: <1211989540.3174.216.camel@ayanami.boston.devel.redhat.com>


On Wed, 2008-05-28 at 12:57 +1100, Darrin De Groot wrote:
> 
> Hi, 
> 
> I am running a 4 node cluster with a multipathed quorum disk,
> configured to use the path /dev/dm-1. The problem that I am having is
> that if I lose one path to the disk (am testing by pulling one fibre),
> the node is almost always fenced (one node, once, managed to stay up,
> out of more than 10 attempts). Is there some timeout that needs
> changing to give qdiskd the time to realise that a path is down? I
> have tried an interval of 3 seconds with at TKO of 10, with no
> success, and a token timeout set at 45000ms: 
> 
> <totem consensus="4800" join="60" token="45000"
> token_retransmits_before_loss_const="20"/> 
>         <quorumd device="/dev/dm-1" interval="3" min_score="1"
> tko="10" votes="3"/> 
> 

As a general rule, you want qdiskd's timeout to exceed the path failover
time with some time for the I/Os to get out after a path failover
completes.  As a general rule of thumb, totem's token timeout needs to
approximately double the qdisk timeout.  E.g.:

  <totem token="120000" ... /> 
  <quorumd device="/dev/dm-1"
   interval="3" min_score="1" tko="20" votes="3"
  />

[Note: Obviously, I think qdiskd should algorithmically determine fairly
optimial timings based on the totem token timeout in the future. ]

-- Lon



From barbos at gmail.com  Wed May 28 21:18:39 2008
From: barbos at gmail.com (Alex Kompel)
Date: Wed, 28 May 2008 14:18:39 -0700
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <4834D68B.9010309@auckland.ac.nz>
References: <20080521160013.8C3D6619BED@hormel.redhat.com>
	<4834D68B.9010309@auckland.ac.nz>
Message-ID: <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com>

On Wed, May 21, 2008 at 7:12 PM, Michael O'Sullivan
<michael.osullivan at auckland.ac.nz> wrote:
> Hi Alex,
>
> We wanted an iSCSI SAN that has highly available data, hence the need for 2
> (or more storage devices) and a reliable storage network (omitted from the
> diagram). Many of the articles I have read for iSCSI don't address
> multipathing to the iSCSI devices, in our configuration iSCSI Disk 1
> presented as /dev/sdc and /dev/sdd on each server (and iSCSI Disk 2
> presented as /dev/sde and /dev/sdf), but it wan't clear how to let the
> servers know that the two iSCSI portals attached to the same target - thus I
> used mdadm. Also, I wanted to raid the iSCSI disks to make sure the data
> stays highly available - thus the second use of mdadm. Now we had a single
> iSCSI raid array spread over 2 (or more) devices which provides the iSCSI
> SAN. However, I wanted to make sure the servers did not try to access the
> same data simultaneously, so I used GFS to ensure correct use of the iSCSI
> SAN. If I understand correctly it seems like the multipathing and raiding
> may be possible in Red Hat Cluster Suite GFS without using iSCSI? Or to use
> iSCSI with some other software to ensure proper locking happens for the
> iSCSI raid array? I am reading the link you suggested to see what other
> people have done, but as always any suggestions, etc are more than welcome.
>

I would not use multipath I/O with iSCSI unless you have specific
reasons for doing so. iSCSI is only as highly-available as you network
infrastructure allows it to be. If you have a full failover within the
network then you don't need multipath. That simplifies configuration a
lot. Provided your network core is fully redundant (both link and
routing layers), you can connect 2 NICs on each server to separate
switches and bond them (google for "channel bonding"). Once you have
redundant network connection you can use the setup from the article I
posted earlier. This will give you iSCSI endpoint failover.

-Alex



From lhh at redhat.com  Wed May 28 21:35:20 2008
From: lhh at redhat.com (Lon Hohberger)
Date: Wed, 28 May 2008 17:35:20 -0400
Subject: [Linux-cluster] Cluster NFS operation...
In-Reply-To: <7d6e8da40805261656q4d0113fdhc5cbea3b8aabb512@mail.gmail.com>
References: <7d6e8da40805261656q4d0113fdhc5cbea3b8aabb512@mail.gmail.com>
Message-ID: <1212010520.3174.233.camel@ayanami.boston.devel.redhat.com>


On Tue, 2008-05-27 at 00:56 +0100, Corey Kovacs wrote:

> service name
>      IP Address
>      GFS FS
>           NFS Export
>                NFS Client
> 
> The three services are spread across three of the 5 nodes. Reading the
> man page for clurmtabd reveals that it is supposed to
> maintain the client states and "merge" rmtab entries etc to prevent
> stale filehandles etc.
> 
> The clients are RHEl 4.6 using automounted nfs. The clients are
> requesting nfs ver 3, and tcp, with the hard and intr flags.

> THings seem to work fine for an initial failover, but when I try to
> failback, things hang

* On 2.6 kernels including RHEL4, clurmtabd isn't used.

* TCP takes 0-15 minutes to fail over or failback depending on the I/O
pattern:

https://bugzilla.redhat.com/show_bug.cgi?id=369991

This doesn't happen with UDP.


> 2. Is there a document other than the nfscookbook from R.P. or at
> least a version thats been updated in the last year (if somethings
> changed that is)

The general procedures haven't changed.

-- Lon



From ross at kallisti.us  Wed May 28 21:37:13 2008
From: ross at kallisti.us (Ross Vandegrift)
Date: Wed, 28 May 2008 17:37:13 -0400
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com>
References: <20080521160013.8C3D6619BED@hormel.redhat.com>
	<4834D68B.9010309@auckland.ac.nz>
	<3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com>
Message-ID: <20080528213713.GB18367@kallisti.us>

On Wed, May 28, 2008 at 02:18:39PM -0700, Alex Kompel wrote:
> I would not use multipath I/O with iSCSI unless you have specific
> reasons for doing so. iSCSI is only as highly-available as you network
> infrastructure allows it to be. If you have a full failover within the
> network then you don't need multipath. That simplifies configuration a
> lot. Provided your network core is fully redundant (both link and
> routing layers), you can connect 2 NICs on each server to separate
> switches and bond them (google for "channel bonding"). Once you have
> redundant network connection you can use the setup from the article I
> posted earlier. This will give you iSCSI endpoint failover.

This depends on a lot of things.  In all of the iSCSI storage systems
I'm familiar with, the same target is provided redundantly via
different portal IPs.  This provides failover in the case of an iscsi
controller failing on the storage system.  The network can be as
redundant as you like, but without multipath, you won't survive a
portal failure.

If you bond between two different switches, you'll only be able to do
failover between the NICs.  If you use multipath, you can round-robin
between them to provide a greater bandwidth overhead.

I'd suggest using multipath.  Check the open-iscsi documentation and
mailing list archives for tips on tuning the timing for those pieces.

-- 
Ross Vandegrift
ross at kallisti.us

"The good Christian should beware of mathematicians, and all those who
make empty prophecies. The danger already exists that the mathematicians
have made a covenant with the devil to darken the spirit and to confine
man in the bonds of Hell."
	--St. Augustine, De Genesi ad Litteram, Book II, xviii, 37



From barbos at gmail.com  Wed May 28 23:16:55 2008
From: barbos at gmail.com (Alex Kompel)
Date: Wed, 28 May 2008 16:16:55 -0700
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <20080528213713.GB18367@kallisti.us>
References: <20080521160013.8C3D6619BED@hormel.redhat.com>
	<4834D68B.9010309@auckland.ac.nz>
	<3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com>
	<20080528213713.GB18367@kallisti.us>
Message-ID: <3ae027040805281616l1d9856f3o1f06ac2f7de51e6f@mail.gmail.com>

On Wed, May 28, 2008 at 2:37 PM, Ross Vandegrift <ross at kallisti.us> wrote:
> On Wed, May 28, 2008 at 02:18:39PM -0700, Alex Kompel wrote:
>> I would not use multipath I/O with iSCSI unless you have specific
>> reasons for doing so. iSCSI is only as highly-available as you network
>> infrastructure allows it to be. If you have a full failover within the
>> network then you don't need multipath. That simplifies configuration a
>> lot. Provided your network core is fully redundant (both link and
>> routing layers), you can connect 2 NICs on each server to separate
>> switches and bond them (google for "channel bonding"). Once you have
>> redundant network connection you can use the setup from the article I
>> posted earlier. This will give you iSCSI endpoint failover.
>
> This depends on a lot of things.  In all of the iSCSI storage systems
> I'm familiar with, the same target is provided redundantly via
> different portal IPs.  This provides failover in the case of an iscsi
> controller failing on the storage system.  The network can be as
> redundant as you like, but without multipath, you won't survive a
> portal failure.

In this case the portal failure is handled by host failover mechanisms
(heartbeat, RedHat cluster, etc) and connection failure is handled by
the network layer. Sometimes you have to use multipath (for example,
if there is no way to do transparent failover on storage controllers)
but it adds extra complexity on the initiator side so if there is a
way to avoid it why not do it?

> If you bond between two different switches, you'll only be able to do
> failover between the NICs.  If you use multipath, you can round-robin
> between them to provide a greater bandwidth overhead.

Same goes for bonding: link aggregation with active-active bonding.

-Alex



From fdinitto at redhat.com  Thu May 29 04:19:06 2008
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Thu, 29 May 2008 06:19:06 +0200 (CEST)
Subject: [Linux-cluster] Cluster 2.03.03 released
Message-ID: <Pine.LNX.4.64.0805290616150.5892@trider-g7>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


The cluster team and its vibrant community are proud to announce the 5th
release from the STABLE2 branch: 2.03.03.

The STABLE2 branch collects, on a daily base, all bug fixes and the bare
minimal changes required to run the cluster on top of the most recent Linux
kernel (2.6.25) and rock solid openais (0.80.3 or higher).

The new source tarball can be downloaded here:

   ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.03.tar.gz

In order to use GFS1, the Linux kernel requires a minimal patch:

   ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch

To report bugs or issues:

   https://bugzilla.redhat.com/

Would you like to meet the cluster team or members of its community?

   Join us on IRC (irc.freenode.net #linux-cluster) and share your
   experience  with other sysadministrators or power users.

Happy clustering,
Fabio

Under the hood (from 2.03.02):

Bob Peterson (1):
       bz 446085: Back-port faster bitfit algorithm from gfs2 for better

David Teigland (1):
       gfs_controld: ignore write(2) return value on plock dev

Fabio M. Di Nitto (16):
       [RGMANAGER] ^M's are good for DOS, bad for UNIX
       [FENCE] Rename bladecenter as it should be .pl -> .py
       [CCS] Make a bunch of functions static
       [BUILD] Stop using DEVEL.DATE library soname
       [BUILD] Set soname to 2.3
       [BUILD] Move fencelib in /usr/share
       [BUILD] Allow users to set path to init.d
       [BUILD] Add --without_kernel_modules configure option
       [GFS] Sync with gfs2 init script
       [MISC] Cast some love to init scripts
       [CMAN] Fix path to cman_tool
       [INIT] Do not start services automatically
       [MISC] Update copyright
       [BUILD] Fix sparc #ifdef according to the new gcc tables
       [BUILD] Fix rg_test linking
       [BUILD] Fix install permissions

Jonathan Brassow (1):
       ger/lvm.sh:  HA LVM wasn't working on IA64

Lon Hohberger (2):
       [cman] Fix infinite loop in several daemons
       [rgmanager] Fix #441582 - symlinks in mount points causing failures

Marc - A. Dahlhaus (1):
       [MISC] Add version string to -V options of dlm_tool and group deamons

Marek 'marx' Grac (8):
       [FENCE] SSH support using stdin options
       [FENCE] Fix #435154: Support for 24 port APC fencing device
       [FENCE] Fix name of the option in fencing library
       [FENCE] Fix problem with different menu for admin/user for APC
       [FENCE] Fix typo in name of the exceptions in fencing agents
       [FENCE] Fix #248609: SSH support in Bladecenter fencing (ssh)
       [FENCE] Fix #446995: Parse error: Unknown option 'switch=3'
       [FENCE] Fix #447378 - fence_apc unable to connect via ssh to APC 7900

  ccs/lib/libccs.c                              |    6 +-
  cman/init.d/cman.in                           |   27 +-
  cman/init.d/qdiskd                            |   19 +-
  cman/qdisk/disk_util.c                        |    2 +-
  configure                                     |   35 +-
  dlm/tool/main.c                               |    4 +-
  fence/agents/apc/fence_apc.py                 |  112 ++-
  fence/agents/bladecenter/fence_bladecenter.pl |   90 --
  fence/agents/bladecenter/fence_bladecenter.py |   90 ++
  fence/agents/drac/fence_drac5.py              |    4 +-
  fence/agents/ilo/fence_ilo.py                 |    4 +-
  fence/agents/lib/fencing.py.py                |   55 +-
  fence/agents/scsi/scsi_reserve                |   28 +-
  fence/agents/wti/fence_wti.py                 |    4 +-
  gfs-kernel/src/gfs/bits.c                     |   85 +-
  gfs-kernel/src/gfs/bits.h                     |    3 +-
  gfs-kernel/src/gfs/rgrp.c                     |    3 +-
  gfs/init.d/gfs                                |   15 +-
  gfs2/init.d/gfs2                              |   15 +-
  group/daemon/main.c                           |    3 +-
  group/dlm_controld/main.c                     |    5 +-
  group/gfs_controld/main.c                     |    5 +-
  group/gfs_controld/plock.c                    |    6 +-
  group/test/clientd.c                          |    2 +-
  group/tool/main.c                             |    4 +-
  make/defines.mk.input                         |    5 +-
  make/install.mk                               |    6 +-
  make/official_release_version                 |    1 +
  make/uninstall.mk                             |    2 +-
  rgmanager/include/platform.h                  |    2 +-
  rgmanager/init.d/rgmanager.in                 |   15 +-
  rgmanager/src/clulib/vft.c                    |    2 +-
  rgmanager/src/daemons/Makefile                |    2 +-
  rgmanager/src/resources/ASEHAagent.sh         | 1786 ++++++++++++------------
  rgmanager/src/resources/Makefile              |   17 +-
  rgmanager/src/resources/clusterfs.sh          |    2 +-
  rgmanager/src/resources/fs.sh                 |    2 +-
  rgmanager/src/resources/lvm.sh                |    2 +-
  rgmanager/src/resources/netfs.sh              |    2 +-
  scripts/fenceparse                            |    2 +-
  40 files changed, 1359 insertions(+), 1115 deletions(-)

- --
I'm going to make him an offer he can't refuse.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iQIVAwUBSD4u1wgUGcMLQ3qJAQLGgA//QZ0cHFlKMmBKiREVS/7XYfTM1CSMXGeq
g5rVXbh8lqobDgfqHSs10Q8BwxkP6XCPodYv3z5ws6uKvnGGhV+8ceDhTdxJUYBE
16VMLGC1pHT/cRHiYeukAfCvt3fXXV6Q114OGZYJSYGCMfXpjPBXMyqi4xTbDoUn
wYB4vUTjx++j7WsaW9uKVT5ORRmj+Xg6ubbCDchjZitiAp8Fwfx1Lz5RlRm7XhSo
7z/3GzMIN2oPz1g15aGqLq6/SYBmM4iAX9KzC1xTslyxcw5/2+5UlFGF5JcXXidd
QiPoRv1hDbk8xwoFBSgMmkUERldO3RSTQTBhN2SEBeaAP7E9hDgh3a6ZMq74jvab
sWZ9LUBDS8rDNDjB0ak+BNUZy1loRQWj57ASL+jMXADy3QtL9vWNxyhsZFLRMpJ7
aUuzJWA3mFR1MyqOS/Zxy1Dea6A6LQETafwWMnbAk6M+h5SbCOfjl2Ti/7bvlG9E
pthE8F0LygzorLnp+68jerYjSqKMwWjbM4etsOo/iV66utc27Udmwbf0VX5nBo8N
ZnxoNDF/VtXtrccljEBnHdnhpru+wsUPLL6B+3nx7Gv6Ats3axOmuOYlQpW/vJpG
FGCwXcP4+qLVmz4v7Hi2qshDLsbHUSXNtH2Mlzl9EMhk3OW1U4g4Vk2Tml2MUCp1
9ad09m2XsSc=
=APS2
-----END PGP SIGNATURE-----



From sunhux at gmail.com  Thu May 29 05:43:58 2008
From: sunhux at gmail.com (sunhux G)
Date: Thu, 29 May 2008 13:43:58 +0800
Subject: [Linux-cluster] what's IBM "Remote Supervisor Adapter II",
	"power fencing device" & clustering
Message-ID: <60f08e700805282243y3c15e8f4l3211376babdec865@mail.gmail.com>

Hi,

What's  the purpose of "Remote Supervisor Adapter II" & can it be
used to configure a Redhat cluster?

What's "power fencing Device"?

Can we set up a cluster between 2 RHES (ver 5.1 AP) just by
using one additional network port on each server or do we need
2 network ports per server?  Thought heartbeat link requires one
only or usually the general practice is to use 2?

Without getting additional network port, can we just RSA II port?

Any specific information for setting up Linux clustering for IBM
hardware (x3850 M2 & x3950 M2)  is appreciated


Tks
U
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080529/cba33df2/attachment.htm>

From denisb+gmane at gmail.com  Thu May 29 07:18:28 2008
From: denisb+gmane at gmail.com (denis)
Date: Thu, 29 May 2008 09:18:28 +0200
Subject: [Linux-cluster] Updating issues
Message-ID: <g1llc4$ivg$1@ger.gmane.org>

While doing a yum update on one RHCS node I got this:

Transaction Check Error:   file /etc/depmod.d/gfs2.conf from install of 
kmod-gfs2-1.92-1.1.el5 conflicts with file from package 
kmod-gfs2-1.52-1.16.el5


This node does not use GFS2 (and I already unmounted any GFS1 volumes 
anyway), so I removed the package, and the update transaction test then 
passes without errors.

The kmod-gfs2 package should probably have been removed in the 
transaction too?

Is this something I should report via bugzilla?

Regards
--
Denis



From denisb+gmane at gmail.com  Thu May 29 08:20:18 2008
From: denisb+gmane at gmail.com (denis)
Date: Thu, 29 May 2008 10:20:18 +0200
Subject: [Linux-cluster] LVM hanging during mkinitrd checks
Message-ID: <g1lp03$tos$1@ger.gmane.org>

During the update to RHEL5.2 I had another problem.

The mkinitrd process would hang indefinitely while scanning my block 
devices with
"lvm.static lvs --ignorelockingfailure --noheadings -o vg_name blockdev"

stracing the lvs process showed no signs of life. It hung on random 
function calls (I checked several hung lvs processes during the update).

I simply killed the lvs processes when they hung. A manual check of lvs 
did hang and never returned output so the issue was system wide, not 
specific to the update run.

Could this be because I unmounted my shared GFS volume prior to the 
update? I cannot really see why that should be problematic, but lvs on 
the other (non upgraded node) worked fine.

After finishing the update, lvs works fine and returns the expected :
"  No volume groups found"


What is the best practise for updates of GFS components? Should I keep 
my volumes mounted during updates or unmount them?

Regards
--
Denis



From denisb+gmane at gmail.com  Thu May 29 08:25:02 2008
From: denisb+gmane at gmail.com (denis)
Date: Thu, 29 May 2008 10:25:02 +0200
Subject: [Linux-cluster] cman_tool returns Flags: Dirty
Message-ID: <g1lp8u$un2$1@ger.gmane.org>

Hi list,

Sorry for the noise, but I thought posting my results with the update to 
RHEL5.2 would be of interest to a few of you.


The node that has been updated to RHEL5.2 seems to operate very nicely 
in the cluster so far. After a boot it did rejoin the cluster, and got 
back its affinity assigned service.

Quorum disk operations resumed after a bit too.

The only warning sign I can see is this output from cman_tool status :

# cman_tool status
Version: 6.1.0
Config Version: 39
Cluster Name: cluster_clustername
Cluster Id: 19444
Cluster Member: Yes
Cluster Generation: 776
Membership state: Cluster-Member
Nodes: 2
Expected votes: 3
Total votes: 3
Quorum: 2
Active subsystems: 10
Flags: Dirty
Ports Bound: 0 11 177
Node name: nodename.customername.com
Node ID: 1


What does "Flags: Dirty" mean? Is it anything to worry about?
Google was unhelpful.

Regards
--
Denis



From magawake at gmail.com  Thu May 29 10:41:31 2008
From: magawake at gmail.com (Mag Gam)
Date: Thu, 29 May 2008 06:41:31 -0400
Subject: [Linux-cluster] GFS implementation
Message-ID: <1cbd6f830805290341s20e2cccdp1e1befdb3c568dc5@mail.gmail.com>

Hello:

I am planning to implement GFS for my university as a summer project. I have
10 servers each with SAN disks attached. I will be reading and writing many
files for professor's research projects. Each file can be anywhere from 1k
to 120GB (fluid dynamic research images). The 10 servers will be using NIC
bonding (1GB/network). So, would GFS be ideal for this? I have been reading
a lot about it and it seems like a perfect solution.

Any thoughts?

TIA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080529/ec2b1159/attachment.htm>

From maciej.bogucki at artegence.com  Thu May 29 10:55:29 2008
From: maciej.bogucki at artegence.com (Maciej Bogucki)
Date: Thu, 29 May 2008 12:55:29 +0200
Subject: [Linux-cluster] GFS implementation
In-Reply-To: <1cbd6f830805290341s20e2cccdp1e1befdb3c568dc5@mail.gmail.com>
References: <1cbd6f830805290341s20e2cccdp1e1befdb3c568dc5@mail.gmail.com>
Message-ID: <483E8BA1.9080608@artegence.com>

Mag Gam wrote:
> Hello:
>
> I am planning to implement GFS for my university as a summer project. 
> I have 10 servers each with SAN disks attached. I will be reading and 
> writing many files for professor's research projects. Each file can be 
> anywhere from 1k to 120GB (fluid dynamic research images). The 10 
> servers will be using NIC bonding (1GB/network). So, would GFS be 
> ideal for this? I have been reading a lot about it and it seems like a 
> perfect solution.
>
> Any thoughts?

Please remember about fencing.

Best Regards
Maciej Bogucki




From denisb+gmane at gmail.com  Thu May 29 11:13:21 2008
From: denisb+gmane at gmail.com (denis)
Date: Thu, 29 May 2008 13:13:21 +0200
Subject: [Linux-cluster] Re: cman_tool returns Flags: Dirty
In-Reply-To: <g1lp8u$un2$1@ger.gmane.org>
References: <g1lp8u$un2$1@ger.gmane.org>
Message-ID: <g1m34h$15j$1@ger.gmane.org>

denis wrote:
> What does "Flags: Dirty" mean? Is it anything to worry about?
> Google was unhelpful.

Google was sort of helpful after all;

http://www.redhat.com/archives/cluster-devel/2007-September/msg00091.html

NODE_FLAGS_DIRTY - This node has internal state and must not join
                    a cluster that also has state.


What does this actually imply? Anything to care about? How would this 
node "recover" from being dirty?


Regards
--
Denis



From deka.lipika at gmail.com  Thu May 29 14:35:17 2008
From: deka.lipika at gmail.com (Lipika Deka)
Date: Thu, 29 May 2008 15:35:17 +0100
Subject: [Linux-cluster] Intra file sharing in GFS
Message-ID: <43097e740805290735o78802497tbb2a3a38ead14680@mail.gmail.com>

Hi,
      Does DLM take care of intra file sharing in GFS and if so how?
Thanks,
Cheers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080529/c3e6f095/attachment.htm>

From kees at tweakers.net  Thu May 29 14:37:02 2008
From: kees at tweakers.net (Kees Hoekzema)
Date: Thu, 29 May 2008 16:37:02 +0200
Subject: [Linux-cluster] GFS in a small cluster
Message-ID: <001701c8c199$757b48b0$6071da10$@net>

Hello List,

Recently we have been looking at replacing our NFS server with a SAN in our
(relatively small) webserver cluster. We decided to go with the Dell
MD3000i, an iSCSI SAN. Right now I have it for testing purposes and I'm
trying to set up a simple cluster to get more experience with it. At the
moment we do not run Redhat, but Debian; so although this is probably the
wrong mailing list for me, I could not find any other place where problems
like this are discussed.

The cluster, if it goes into production, will have to serve 'dynamic' files
to the webservers, these include images, videos and generic downloads. So
what will happen on the SAN is many reads, and relatively very few writes,
at the moment the read-write proportions on the NFS server are around 99%
reads vs 1% writes, the only writes that occur are users uploading a new
image, or one server creating some graphs.

Not only the webservers will use this SAN, but also the database servers
will use it to read some files from it. I have been looking at different
filesystems to run on this SAN the suit my needs, and GFS is one of those,
but I have a few problems and questions.
- Is locking really needed? There is no chance one webserver will try to
write to a file that is being written to by another file.
- How about fencing? I'd rather have a corrupt filesystem than a corrupt
database, how silly that may sound, but I do not want the webservers be able
to switch off the (infinite more important) database servers, and all
servers can easily work without any problem without the share, they will
still serve most of the content, just not the user-uploaded images / videos
/ downloads.

Is GFS the right FS for me or do I need to look to other (cluster aware)
filesystems?

>From the FAQ: http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_whatgood
What I really need is a filesystem that is cluster-aware, aka that it knows
and reacts to the fact that other systems than himself are able to write to
the disk, and as said, ext3 does not know that; mount it on both systems and
they do see the original data, but as soon as one changes something the
other won't pick it up. 

Anyway, I tried gfs with the lock_nolock protocol, but I might as well use
ext3 than. With any other protocol, the mount will just hang with:

Trying to join cluster "lock_dlm", "tweakers:webdata"
dlm: Using TCP for communications
dlm: connect from non cluster node
BUG: soft lockup - CPU#2 stuck for 11s! [dlm_send:3566]
Pid: 3566, comm: dlm_send Not tainted (2.6.24-1-686 #1)
EIP: 0060:[<c02bdbe9>] EFLAGS: 00000202 CPU: 2
EIP is at _spin_unlock_irqrestore+0xa/0x13

The other FS we looked at was OCFS2, but although it is a lot easier to set
up, and it works without any problems, it does have a limit of 32k
directories in one directory, something which we easily surpass on our
current shares (over 50k directories in one dir).

Anyway, is there a method to have gfs mounted without locking, but still be
cluster-aware (aka; the fs can be updated by other servers) and without
fencing?

-kees






From lp at xbe.ch  Thu May 29 15:22:30 2008
From: lp at xbe.ch (Lorenz Pfiffner)
Date: Thu, 29 May 2008 17:22:30 +0200
Subject: [Linux-cluster] apache resource problem in RHCS 5.1
Message-ID: <483ECA36.7070007@xbe.ch>

Hello everybody

I have the following test setup:

- RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1
- Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a test)
- 4 IP resources defined
- GFS over DRBD, doesn't matter, because it doesn't even work on a local disk

Now I would like to have an "Apache Resource" which i can select in the luci interface. I assume it's using the /usr/share/cluster/apache.sh script. If I try to start it, the error message looks like
this:

May 28 16:18:15 testsrv clurgmgrd: [18475]: <err> Starting Service apache:test_httpd > Failed
May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> start on apache "test_httpd" returned 1 (generic error)
May 28 16:18:15 testsrv clurgmgrd[18475]: <warning> #68: Failed to start service:test_proxy_http; return value: 1
May 28 16:18:15 testsrv clurgmgrd[18475]: <notice> Stopping service service:test_proxy_http
May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Checking Existence Of File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > Failed - File Doesn't Exist
May 28 16:18:16 testsrv clurgmgrd: [18475]: <err> Stopping Service apache:test_httpd > Failed
May 28 16:18:16 testsrv clurgmgrd[18475]: <notice> stop on apache "test_httpd" returned 1 (generic error)
May 28 16:18:16 testsrv clurgmgrd[18475]: <warning> #71: Relocating failed service service:test_proxy_http

I've another cluster in which I had to alter the default init.d/httpd script to be able to run multiple apache instances (not vhosts) on one server. But there I have the Apache Service configured with
a "Script Resource".

Is this supposed to work of is it a feature in development? I don't see something like "Apache Resource" in the current documentation.

Kind Regards
Lorenz




From sghosh at redhat.com  Thu May 29 15:42:22 2008
From: sghosh at redhat.com (Subhendu Ghosh)
Date: Thu, 29 May 2008 11:42:22 -0400
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
In-Reply-To: <3ae027040805281616l1d9856f3o1f06ac2f7de51e6f@mail.gmail.com>
References: <20080521160013.8C3D6619BED@hormel.redhat.com>	<4834D68B.9010309@auckland.ac.nz>	<3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com>	<20080528213713.GB18367@kallisti.us>
	<3ae027040805281616l1d9856f3o1f06ac2f7de51e6f@mail.gmail.com>
Message-ID: <483ECEDE.30208@redhat.com>


>> If you bond between two different switches, you'll only be able to do
>> failover between the NICs.  If you use multipath, you can round-robin
>> between them to provide a greater bandwidth overhead.
> 
> Same goes for bonding: link aggregation with active-active bonding.
> 

active-active bonding across two network switches is just bad. Spanning Tree 
does not like it.

-Subhendu




From jerlyon at gmail.com  Thu May 29 18:36:40 2008
From: jerlyon at gmail.com (Jeremy Lyon)
Date: Thu, 29 May 2008 12:36:40 -0600
Subject: [Linux-cluster] Cluster starts,
	but a node won't rejoin after reboot
In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F779020C6285@SKYHQAMX08.klasi.is>
References: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com>
	<3DDA6E3E456E144DA3BB0A62A7F7F779020C6285@SKYHQAMX08.klasi.is>
Message-ID: <779919740805291136i166b37ado2d2d4b21112cbbfe@mail.gmail.com>

> I'm having the exact same issue on a RHEL 5.2 system, and have a open
> support case with Redhat. When it will be resolved i can post the details
> ....
>
Any word on this?  I think I may get my own case going.  Do you know if a
bugzilla got assigned to this?

Thanks!
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080529/7603100a/attachment.htm>

From ccaulfie at redhat.com  Fri May 30 07:29:51 2008
From: ccaulfie at redhat.com (Christine Caulfield)
Date: Fri, 30 May 2008 08:29:51 +0100
Subject: [Linux-cluster] Re: cman_tool returns Flags: Dirty
In-Reply-To: <g1m34h$15j$1@ger.gmane.org>
References: <g1lp8u$un2$1@ger.gmane.org> <g1m34h$15j$1@ger.gmane.org>
Message-ID: <483FACEF.2080509@redhat.com>

denis wrote:
> denis wrote:
>> What does "Flags: Dirty" mean? Is it anything to worry about?
>> Google was unhelpful.
> 
> Google was sort of helpful after all;
> 
> http://www.redhat.com/archives/cluster-devel/2007-September/msg00091.html
> 
> NODE_FLAGS_DIRTY - This node has internal state and must not join
>                    a cluster that also has state.
> 
> 
> What does this actually imply? Anything to care about? How would this 
> node "recover" from being dirty?
> 

It's a perfectly normal state. in fact it's expected if you are running 
services. It simply means that the cluster has some services running 
that have state of their own that cannot be recovered without a full 
restart. I would be more worried if you did NOT see this in cman_tool 
status. It's NOT a warning. don't worry about it :)

-- 

Chrissie



From maciej.bogucki at artegence.com  Fri May 30 07:38:16 2008
From: maciej.bogucki at artegence.com (Maciej Bogucki)
Date: Fri, 30 May 2008 09:38:16 +0200
Subject: [Linux-cluster] GFS in a small cluster
Message-ID: <483FAEE8.9050706@artegence.com>

> The cluster, if it goes into production, will have to serve 'dynamic' files
> to the webservers, these include images, videos and generic downloads. So
> what will happen on the SAN is many reads, and relatively very few writes,
> at the moment the read-write proportions on the NFS server are around 99%
> reads vs 1% writes, the only writes that occur are users uploading a new
> image, or one server creating some graphs.
No problem to GFS. 

> Not only the webservers will use this SAN, but also the database servers
> will use it to read some files from it. I have been looking at different
> filesystems to run on this SAN the suit my needs, and GFS is one of those,
> but I have a few problems and questions.
Create two LUN on the array, one for database and the second for static files with two GFS fs on the top of it.


> - Is locking really needed? There is no chance one webserver will try to
write to a file that is being written to by another file.
Yes, you need locking, if You have more than one serwer in the cluster.

> - How about fencing? I'd rather have a corrupt filesystem than a corrupt
> database, how silly that may sound, but I do not want the webservers be able
> to switch off the (infinite more important) database servers, and all
> servers can easily work without any problem without the share, they will
> still serve most of the content, just not the user-uploaded images / videos
>/ downloads.
Configure one ore more fencing method for the cluster and sleep well ;)


> Is GFS the right FS for me or do I need to look to other (cluster aware)
> filesystems?
Yes, but when You properly configure it(fe. configure/test fencing). 


> The other FS we looked at was OCFS2, but although it is a lot easier to set
> up, and it works without any problems, it does have a limit of 32k
> directories in one directory, something which we easily surpass on our
> current shares (over 50k directories in one dir).
OCFS2 is similar to GFS, and it is for Oracle RAC environment. I suggest to use GFS, because it is more popular than OCFS2. 

> Anyway, is there a method to have gfs mounted without locking, 
> but still be
> cluster-aware (aka; the fs can be updated by other servers) and without
> fencing?
Yes, but only on one node. Manual fencing is needed for production environment!

Best Regards
Maciej Bogucki





From michael.osullivan at auckland.ac.nz  Fri May 30 09:16:23 2008
From: michael.osullivan at auckland.ac.nz (michael.osullivan at auckland.ac.nz)
Date: Fri, 30 May 2008 21:16:23 +1200 (NZST)
Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID
Message-ID: <50463.222.152.69.120.1212138983.squirrel@mail.esc.auckland.ac.nz>

Hi everyone,

We chose not to bond the NICs because we'd heard this does not scale the
bandwidth linearly. To keep performance of the network high we wanted to
allow the load to be spread across multiple links and multipath seemed the
best way.

The iSCSI setup suggested by the article
http://www.pcpro.co.uk/realworld/82284/san-on-the-cheap/page1.html uses
one storage device as the primary storage and the second one as the
secondary storage. The iSCSI target is presented via the first device and
will failover to the second device. This allows for failure of either of
the devices, but does not allow the storage load to be shared amongst the
devices.

By having the setup as described in
http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or
http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the
fullscreen view) with multipath we provide two distinct paths between each
server and each storage device, both of which can be used to send/receive
data. By creating a RAID-5 array out of the iSCSI disks I hope I have
allowed both of them to share the storage load.

Our setup is intended to provide diverse protection for the storage system
via:

1) RAID for the storage devices;
2) multipathing over the network - we've had dm multipath recommended
instead of mdadm - any comments?;
3) a cluster for the servers using GFS to allow locking of the storage
system;

but also allows all the components to share the load instead of using a
primary/secondary type setup (which largely "wastes" the scondary
resources). We are going to use IOMeter to test our setup and see how it
performs. We will then run the same tests with different parts of the
network disabled and see what happens.

As usual any comments/suggestions/criticisms are welcome. Thanks for all
the discussion, it has been very useful and enlightening, Mike



From deka.lipika at gmail.com  Fri May 30 10:12:47 2008
From: deka.lipika at gmail.com (Lipika Deka)
Date: Fri, 30 May 2008 11:12:47 +0100
Subject: [Linux-cluster] Granularity of a lock
Message-ID: <43097e740805300312l66756578qdc627eb53334e728@mail.gmail.com>

Hello List,
    Would anyone tell me what is the granularity of a lock in GFS using DLM
and is locking part of a file possible.Is there something similar to byte
range locks of GPFS in GFS?

Thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080530/7d525baa/attachment.htm>

From swhiteho at redhat.com  Fri May 30 10:17:51 2008
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Fri, 30 May 2008 11:17:51 +0100
Subject: [Linux-cluster] Granularity of a lock
In-Reply-To: <43097e740805300312l66756578qdc627eb53334e728@mail.gmail.com>
References: <43097e740805300312l66756578qdc627eb53334e728@mail.gmail.com>
Message-ID: <1212142671.3474.31.camel@quoit>

Hi,

On Fri, 2008-05-30 at 11:12 +0100, Lipika Deka wrote:
> Hello List,
>     Would anyone tell me what is the granularity of a lock in GFS
> using DLM and is locking part of a file possible.Is there something
> similar to byte range locks of GPFS in GFS?
>  
> Thanks in advance.
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

I'm not sure of exactly what GPFS does in this regard, but the locking
in GFS (and GFS2) is one lock per inode I'm afraid. Its a RW-lock though
so that read accesses at least can be done across the whole cluster at
once. With GFS2 that includes read accesses to rw-mapped mmaped regions
of files, for GFS that requires an exclusive lock I'm afraid,

Steve.




From T.Kumar at alcoa.com  Fri May 30 17:25:07 2008
From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS))
Date: Fri, 30 May 2008 13:25:07 -0400
Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 -
	Impact analysis.
Message-ID: <0C3FC6B507AF684199E57BFCA3EAB5532565630D@NOANDC-MXU11.NOA.Alcoa.com>


I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm  along with
the other three dependencies  listed below.

lvm2-cluster-2.02.32-4.el5.x86_64.rpm
device-mapper-event-1.02.24-1.el5.x86_64.rpm
device-mapper-1.02.24-1.el5.x86_64.rpm

I prefer to do this as I realise  the below.

lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which resolves the
"clvmd -R  did not work as expected". 

Do any one know of any problems which might come with upgrading the
lvm2, device mapper packages.



From orkcu at yahoo.com  Fri May 30 18:14:41 2008
From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=)
Date: Fri, 30 May 2008 11:14:41 -0700 (PDT)
Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 -
	Impact analysis.
In-Reply-To: <0C3FC6B507AF684199E57BFCA3EAB5532565630D@NOANDC-MXU11.NOA.Alcoa.com>
Message-ID: <767810.9219.qm@web50605.mail.re2.yahoo.com>




--- On Fri, 5/30/08, Kumar, T Santhosh (TCS) <T.Kumar at alcoa.com> wrote:

> From: Kumar, T Santhosh (TCS) <T.Kumar at alcoa.com>
> Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis.
> To: linux-cluster at redhat.com
> Received: Friday, May 30, 2008, 1:25 PM
> I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm 
> along with
> the other three dependencies  listed below.
> 
> lvm2-cluster-2.02.32-4.el5.x86_64.rpm
> device-mapper-event-1.02.24-1.el5.x86_64.rpm
> device-mapper-1.02.24-1.el5.x86_64.rpm
> 
> I prefer to do this as I realise  the below.
> 
> lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which
> resolves the
> "clvmd -R  did not work as expected". 
> 
> Do any one know of any problems which might come with
> upgrading the
> lvm2, device mapper packages.

I suggest you to take a look in bugzilla.
I dont have a linux server in my hand right now to check so I dont know tom what RHEL release you are refering, but we got some clvm problems when we update a RHEL4.5 to RHEL4.6 + update.
and also there is bug, fixed for 5.2 but dont know for 4.6, that I think you should into, it was discussed in this list days ago (subject: LVM manager or something)

cu
roger



      __________________________________________________________________
Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail.  Click on Options in Mail and switch to New Mail today or register for free at http://mail.yahoo.ca