From jpalmae at gmail.com Fri May 2 01:16:40 2008 From: jpalmae at gmail.com (Jorge Palma) Date: Thu, 1 May 2008 21:16:40 -0400 Subject: [Linux-cluster] GFS Storage cluster !!!! In-Reply-To: <48189385.8080007@nexatech.com> References: <48119C51.8020904@monster.co.in> <5b65f1b10804300830s46f1038bj3e29e79c9699a133@mail.gmail.com> <48189385.8080007@nexatech.com> Message-ID: <5b65f1b10805011816s74312a4qf5ebcdb17b398e93@mail.gmail.com> I Know.... Thanks!! On Wed, Apr 30, 2008 at 11:43 AM, Jeff Macfarland wrote: > Just an FYI- does not support SCSI PR > > > Jorge Palma wrote: > > you can use ISCSI to simulate a SAN > > > > http://iscsitarget.sourceforge.net/ > > > > Regards > > > > -- > > > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Jorge Palma Escobar Ingeniero de Sistemas Red Hat Linux Certified Engineer Certificate N? 804005089418233 From jas199931 at yahoo.com Fri May 2 01:43:30 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 1 May 2008 18:43:30 -0700 (PDT) Subject: [Linux-cluster] Lock Resources Message-ID: <353354.65091.qm@web32206.mail.mud.yahoo.com> Hi, All: I have downloaded "Programming Locking Applications" written by Christine Caulfield from http://sources.redhat.com/cluster/wiki/HomePage?action=AttachFile&do=view&target=rhdlmbook.pdf I read it through, especially the DLM locking model. It is very informative. Thanks Christine. Now I have some questions about the lock resource and wish to get answers from you. 1. Whether the kernel on each server/node is going to initialize a number of empty lock resources after completely rebooting the cluster? 2. If so, what is the default value of the number of empty lock resources? Is it configurable? 3. Whether the number of lock resources is fixed regardless the load of the server? 4. If not, how the number of lock resources will be expended under a heavy load? 5. The lock manager maintains a cluster-wide directory of the locations of the master copy of all the lock resources within the cluster and evenly divides the content of the directory across all nodes. How can I check the content held by a node (what command or API)? 6. If only one node A is busy while other nodes are idle all the time, does it mean that the node A holds a very big master copy of lock resources and other nodes have nothing? 7. For the above case, what would be the content of the cluster-wide directory? Only one entry as only the node A is really doing IO, or many entries and the number of entries is the same as the number of used lock resources on the node A? If the latter case is true, will the lock manager still divide the content evenly to other nodes? If so, would it costs the node A extra time on finding the location of the lock resources, which is just on itself, by messaging other nodes? If you need more information from me in order to help me, or if you think my questions are not clear, please kindly let me know. Thank you very much in advance and look forward to hearing from you. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From ccaulfie at redhat.com Fri May 2 07:35:32 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 02 May 2008 08:35:32 +0100 Subject: [Linux-cluster] Lock Resources In-Reply-To: <353354.65091.qm@web32206.mail.mud.yahoo.com> References: <353354.65091.qm@web32206.mail.mud.yahoo.com> Message-ID: <481AC444.5080709@redhat.com> Ja S wrote: > Hi, All: > > I have downloaded "Programming Locking Applications" > written by Christine Caulfield from > http://sources.redhat.com/cluster/wiki/HomePage?action=AttachFile&do=view&target=rhdlmbook.pdf > > > I read it through, especially the DLM locking model. > It is very informative. Thanks Christine. > > Now I have some questions about the lock resource and > wish to get answers from you. > > 1. Whether the kernel on each server/node is going to > initialize a number of empty lock resources after > completely rebooting the cluster? > > 2. If so, what is the default value of the number of > empty lock resources? Is it configurable? There is no such thing as an "empty" lock resource. Lock resources are allocated from kernel memory as required. That does mean that the number of resources that can be held on a node is limited by the amount of physical memory in the system. I think this addresses 3 & 4. > 3. Whether the number of lock resources is fixed > regardless the load of the server? > > 4. If not, how the number of lock resources will be > expended under a heavy load? > > 5. The lock manager maintains a cluster-wide directory > of the locations of the master copy of all the lock > resources within the cluster and evenly divides the > content of the directory across all nodes. How can I > check the content held by a node (what command or > API)? On RHEL4 (cluster 1) systems the lock directory is viewable in /proc/cluster/dlm_dir. I don't think there is currently any equivalent in RHEL5 (cluster 2) > 6. If only one node A is busy while other nodes are > idle all the time, does it mean that the node A holds > a very big master copy of lock resources and other > nodes have nothing? That's correct. There is no point in mastering locks on a remote node as it will just slow access down for the only node using those locks. > 7. For the above case, what would be the content of > the cluster-wide directory? Only one entry as only the > node A is really doing IO, or many entries and the > number of entries is the same as the number of used > lock resources on the node A? If the latter case is > true, will the lock manager still divide the content > evenly to other nodes? If so, would it costs the node > A extra time on finding the location of the lock > resources, which is just on itself, by messaging > other nodes? You're correct that the lock directory will still be distributed around the cluster in this case and that it causes network traffic. It isn't a lot of network traffic (and there needs to be some way of determining where a resource is mastered; a node does not know, initially, if it is the only node that is using a resource). That lookup only happens the first time a resource is used by a node, once the node knows where the master is, it does not need to look it up again, unless it releases all locks on the resource. I hope this helps, -- Chrissie From jas199931 at yahoo.com Fri May 2 12:23:16 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 2 May 2008 05:23:16 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <481AC444.5080709@redhat.com> Message-ID: <364006.18824.qm@web32202.mail.mud.yahoo.com> Hi, Christine: Really appreciate your prompt and kind reply. I have some further questions. > > > > 1. Whether the kernel on each server/node is going > to > > initialize a number of empty lock resources after > > completely rebooting the cluster? > > > > 2. If so, what is the default value of the number > of > > empty lock resources? Is it configurable? > > There is no such thing as an "empty" lock resource. > Lock resources are > allocated from kernel memory as required. That does > mean that the number > of resources that can be held on a node is limited > by the amount of > physical memory in the system. Does it mean the cache allocated for disk IO will be reduced to meet the need of more lock resources? If so, for an extremely busy node, when reducing the cache, the physical disk IO will increase, which in turn increases the processing time (as disk IO is much slower than accessing cache), which then in turn increases the period of holding the lock resources, which in turn makes the kernel grab more memory space that should be used for cache in order to create new lock resources for new requests, and on and on, and eventually ends up to a no-cache situtation at all. Would this case ever happen? > I think this addresses 3 & 4. Yes, your answer does address them. Thank you. However, what will happen if an extremely busy application needs to write more new files thus the kernel needs to allocate more lock resources but the physical memory limit has been reached and all existing lock resources cannot be released? I guess the kernel will simply force the application go into an uninterruptable sleep until some lock resources are released or some memories are freed. Am I right? > > 3. Whether the number of lock resources is fixed > > regardless the load of the server? > > > > 4. If not, how the number of lock resources will > be > > expended under a heavy load? > > > > 5. The lock manager maintains a cluster-wide > directory > > of the locations of the master copy of all the > lock > > resources within the cluster and evenly divides > the > > content of the directory across all nodes. How can > I > > check the content held by a node (what command or > > API)? > > On RHEL4 (cluster 1) systems the lock directory is > viewable in > /proc/cluster/dlm_dir. I don't think there is > currently any equivalent > in RHEL5 (cluster 2) Thanks. Very helpful. From the busiest node A the first several lines of dlm_dir are below. How to interpret them, please? DLM lockspace 'data' 5 2f06768 1 5 114d15 1 5 120b13 1 5 5bd1f04 1 3 6a02f8 2 5 cb7604 1 5 ca187b 1 Also there are many files under /proc/cluster, Could you please direct me to a place where I can find the usages of these files and descriptions of their content? > > 6. If only one node A is busy while other nodes > are > > idle all the time, does it mean that the node A > holds > > a very big master copy of lock resources and other > > nodes have nothing? > > That's correct. There is no point in mastering locks > on a remote node as > it will just slow access down for the only node > using those locks. > > > 7. For the above case, what would be the content > of > > the cluster-wide directory? Only one entry as only > the > > node A is really doing IO, or many entries and the > > number of entries is the same as the number of > used > > lock resources on the node A? If the latter case > is > > true, will the lock manager still divide the > content > > evenly to other nodes? If so, would it costs the > node > > A extra time on finding the location of the lock > > resources, which is just on itself, by messaging > > other nodes? > > You're correct that the lock directory will still be > distributed around > the cluster in this case and that it causes network > traffic. It isn't a > lot of network traffic (and there needs to be some > way of determining > where a resource is mastered; a node does not know, > initially, if it is > the only node that is using a resource). > That lookup only happens the first time > a resource is used by a node, once the > node knows where the master is, > it does not need to look it up again, > unless it releases all > locks on the resource. > Oh, I see. Just to further clarify, does it means if the same lock resource is required again by an application on the node A, the node A will go straight to the known node (ie the node B) which holds the master previously, but needs to lookup again if the node B has already released the lock resource? > > > I hope this helps, > Yes, yes, very helpful. Thank you very much indeed. Wish to receive your kind reply again. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From ccaulfie at redhat.com Fri May 2 12:41:00 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 02 May 2008 13:41:00 +0100 Subject: [Linux-cluster] Lock Resources In-Reply-To: <364006.18824.qm@web32202.mail.mud.yahoo.com> References: <364006.18824.qm@web32202.mail.mud.yahoo.com> Message-ID: <481B0BDC.1000105@redhat.com> Ja S wrote: > Hi, Christine: > > Really appreciate your prompt and kind reply. > > I have some further questions. > > >>> 1. Whether the kernel on each server/node is going >> to >>> initialize a number of empty lock resources after >>> completely rebooting the cluster? >>> >>> 2. If so, what is the default value of the number >> of >>> empty lock resources? Is it configurable? >> There is no such thing as an "empty" lock resource. >> Lock resources are >> allocated from kernel memory as required. That does >> mean that the number >> of resources that can be held on a node is limited >> by the amount of >> physical memory in the system. > > Does it mean the cache allocated for disk IO will be > reduced to meet the need of more lock resources? > > If so, for an extremely busy node, when reducing the > cache, the physical disk IO will increase, which in > turn increases the processing time (as disk IO is much > slower than accessing cache), which then in turn > increases the period of holding the lock resources, > which in turn makes the kernel grab more memory space > that should be used for cache in order to create new > lock resources for new requests, and on and on, and > eventually ends up to a no-cache situtation at all. > Would this case ever happen? I suppose it could happen, yes. There are tuning values for GFS you can use to make it flush unused locks more frequently but if the locks are needed then they are needed! > >> I think this addresses 3 & 4. > > Yes, your answer does address them. Thank you. > However, what will happen if an extremely busy > application needs to write more new files thus the > kernel needs to allocate more lock resources but the > physical memory limit has been reached and all > existing lock resources cannot be released? I guess > the kernel will simply force the application go into > an uninterruptable sleep until some lock resources are > released or some memories are freed. Am I right? I think so yes. The VMM is not my speciality > > >>> 3. Whether the number of lock resources is fixed >>> regardless the load of the server? >>> >>> 4. If not, how the number of lock resources will >> be >>> expended under a heavy load? >>> >>> 5. The lock manager maintains a cluster-wide >> directory >>> of the locations of the master copy of all the >> lock >>> resources within the cluster and evenly divides >> the >>> content of the directory across all nodes. How can >> I >>> check the content held by a node (what command or >>> API)? >> On RHEL4 (cluster 1) systems the lock directory is >> viewable in >> /proc/cluster/dlm_dir. I don't think there is >> currently any equivalent >> in RHEL5 (cluster 2) > > Thanks. Very helpful. From the busiest node A the > first several lines of dlm_dir are below. How to > interpret them, please? > > DLM lockspace 'data' > 5 2f06768 1 > 5 114d15 1 > 5 120b13 1 > 5 5bd1f04 1 > 3 6a02f8 2 > 5 cb7604 1 > 5 ca187b 1 > The first two numbers are the lock name. Don't ask me what they mean, that's a GFS question! (actually, I think inode numbers might be involved) The last number is the nodeID on which the lock is mastered. > Also there are many files under /proc/cluster, Could > you please direct me to a place where I can find the > usages of these files and descriptions of their > content? They are not well documented. Mainly because they are subject to change and are not a recognised API. Maybe something could be put onto the cluster wiki at some point. >>> 6. If only one node A is busy while other nodes >> are >>> idle all the time, does it mean that the node A >> holds >>> a very big master copy of lock resources and other >>> nodes have nothing? >> That's correct. There is no point in mastering locks >> on a remote node as >> it will just slow access down for the only node >> using those locks. >> >>> 7. For the above case, what would be the content >> of >>> the cluster-wide directory? Only one entry as only >> the >>> node A is really doing IO, or many entries and the >>> number of entries is the same as the number of >> used >>> lock resources on the node A? If the latter case >> is >>> true, will the lock manager still divide the >> content >>> evenly to other nodes? If so, would it costs the >> node >>> A extra time on finding the location of the lock >>> resources, which is just on itself, by messaging >>> other nodes? >> You're correct that the lock directory will still be >> distributed around >> the cluster in this case and that it causes network >> traffic. It isn't a >> lot of network traffic (and there needs to be some >> way of determining >> where a resource is mastered; a node does not know, >> initially, if it is >> the only node that is using a resource). > > > >> That lookup only happens the first time >> a resource is used by a node, once the >> node knows where the master is, >> it does not need to look it up again, >> unless it releases all >> locks on the resource. >> > > Oh, I see. Just to further clarify, does it means if > the same lock resource is required again by an > application on the node A, the node A will go straight > to the known node (ie the node B) which holds the > master previously, but needs to lookup again if the > node B has already released the lock resource? Not quite. A resource is mastered on a node for as long as there are locks for it. If node A gets the lock (which is mastered on node B) then it knows always to go do node B until all locks on node A are released. When that happens the local copy of the resource on node A is released including the reference to node B. If all the locks on node B are released (but A still has some) then the resource will stay mastered on node B and nodes that still have locks on that resource will know where to find it without a directory lookup. -- Chrissie From jas199931 at yahoo.com Fri May 2 13:25:38 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 2 May 2008 06:25:38 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <481B0BDC.1000105@redhat.com> Message-ID: <557151.79920.qm@web32204.mail.mud.yahoo.com> --- Christine Caulfield wrote: > > DLM lockspace 'data' > > 5 2f06768 1 > > 5 114d15 1 > > 5 120b13 1 > > 5 5bd1f04 1 > > 3 6a02f8 2 > > 5 cb7604 1 > > 5 ca187b 1 > > > > The first two numbers are the lock name. Don't ask > me what they mean, > that's a GFS question! (actually, I think inode > numbers might be > involved) The last number is the nodeID on which the > lock is mastered. Great, thanks again! > >> That lookup only happens the first time > >> a resource is used by a node, once the > >> node knows where the master is, > >> it does not need to look it up again, > >> unless it releases all > >> locks on the resource. > >> > > > > Oh, I see. Just to further clarify, does it means > if > > the same lock resource is required again by an > > application on the node A, the node A will go > straight > > to the known node (ie the node B) which holds the > > master previously, but needs to lookup again if > the > > node B has already released the lock resource? > > Not quite. A resource is mastered on a node for as > long as there are > locks for it. If node A gets the lock (which is > mastered on node B) then > it knows always to go do node B until all locks on > node A are released. > When that happens the local copy of the resource on > node A is released > including the reference to node B. If all the locks > on node B are > released (but A still has some) then the resource > will stay mastered on > node B and nodes that still have locks on that > resource will know where > to find it without a directory lookup. > Aha, I think I missed another important concept -- a local copy of lock resources. I did not realise the existence of the local copy of lock resources. Which file should I check to figure out how many local copies a node has and what the local copies are? Many thanks again, you have been very helpful. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From ccaulfie at redhat.com Fri May 2 13:33:52 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 02 May 2008 14:33:52 +0100 Subject: [Linux-cluster] Lock Resources In-Reply-To: <557151.79920.qm@web32204.mail.mud.yahoo.com> References: <557151.79920.qm@web32204.mail.mud.yahoo.com> Message-ID: <481B1840.8020907@redhat.com> Ja S wrote: > --- Christine Caulfield wrote: > > >>> DLM lockspace 'data' >>> 5 2f06768 1 >>> 5 114d15 1 >>> 5 120b13 1 >>> 5 5bd1f04 1 >>> 3 6a02f8 2 >>> 5 cb7604 1 >>> 5 ca187b 1 >>> >> The first two numbers are the lock name. Don't ask >> me what they mean, >> that's a GFS question! (actually, I think inode >> numbers might be >> involved) The last number is the nodeID on which the >> lock is mastered. > > > Great, thanks again! > > >>>> That lookup only happens the first time >>>> a resource is used by a node, once the >>>> node knows where the master is, >>>> it does not need to look it up again, >>>> unless it releases all >>>> locks on the resource. >>>> >>> Oh, I see. Just to further clarify, does it means >> if >>> the same lock resource is required again by an >>> application on the node A, the node A will go >> straight >>> to the known node (ie the node B) which holds the >>> master previously, but needs to lookup again if >> the >>> node B has already released the lock resource? >> Not quite. A resource is mastered on a node for as >> long as there are >> locks for it. If node A gets the lock (which is >> mastered on node B) then >> it knows always to go do node B until all locks on >> node A are released. >> When that happens the local copy of the resource on >> node A is released >> including the reference to node B. If all the locks >> on node B are >> released (but A still has some) then the resource >> will stay mastered on >> node B and nodes that still have locks on that >> resource will know where >> to find it without a directory lookup. >> > > Aha, I think I missed another important concept -- a > local copy of lock resources. I did not realise the > existence of the local copy of lock resources. Which > file should I check to figure out how many local > copies a node has and what the local copies are? All the locks are displayed in /proc/cluster/dlm_locks, that shows you which are local copies and which are masters. -- Chrissie From jas199931 at yahoo.com Fri May 2 13:48:38 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 2 May 2008 06:48:38 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <481B1840.8020907@redhat.com> Message-ID: <555963.87870.qm@web32202.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > --- Christine Caulfield > wrote: > > > > > >>> DLM lockspace 'data' > >>> 5 2f06768 1 > >>> 5 114d15 1 > >>> 5 120b13 1 > >>> 5 5bd1f04 1 > >>> 3 6a02f8 2 > >>> 5 cb7604 1 > >>> 5 ca187b 1 > >>> > >> The first two numbers are the lock name. Don't > ask > >> me what they mean, > >> that's a GFS question! (actually, I think inode > >> numbers might be > >> involved) The last number is the nodeID on which > the > >> lock is mastered. > > > > > > Great, thanks again! > > > > > >>>> That lookup only happens the first time > >>>> a resource is used by a node, once the > >>>> node knows where the master is, > >>>> it does not need to look it up again, > >>>> unless it releases all > >>>> locks on the resource. > >>>> > >>> Oh, I see. Just to further clarify, does it > means > >> if > >>> the same lock resource is required again by an > >>> application on the node A, the node A will go > >> straight > >>> to the known node (ie the node B) which holds > the > >>> master previously, but needs to lookup again if > >> the > >>> node B has already released the lock resource? > >> Not quite. A resource is mastered on a node for > as > >> long as there are > >> locks for it. If node A gets the lock (which is > >> mastered on node B) then > >> it knows always to go do node B until all locks > on > >> node A are released. > >> When that happens the local copy of the resource > on > >> node A is released > >> including the reference to node B. If all the > locks > >> on node B are > >> released (but A still has some) then the resource > >> will stay mastered on > >> node B and nodes that still have locks on that > >> resource will know where > >> to find it without a directory lookup. > >> > > > > Aha, I think I missed another important concept -- > a > > local copy of lock resources. I did not realise > the > > existence of the local copy of lock resources. > Which > > file should I check to figure out how many local > > copies a node has and what the local copies are? > > All the locks are displayed in > /proc/cluster/dlm_locks, that shows you > which are local copies and which are masters. Fantastic ! Thank you very much once more. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From oliveiros.cristina at gmail.com Sun May 4 22:33:34 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Sun, 4 May 2008 23:33:34 +0100 Subject: [Linux-cluster] GFS on fedora Message-ID: Howdy List, I would like to install gfs on a two node cluster running both fedora 8. Can anyone please kindly supply me with some links for the procedure? Which packages are needed, where to get them, that sort of things. I've already googled up and down a little but I couldn't find no rigourous information on this, or maybe I am just blind :-) Thanks a lot in advance Best, Oliveiros -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Sun May 4 23:22:44 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 05 May 2008 00:22:44 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: References: Message-ID: <481E4544.1020301@bobich.net> Oliveiros Cristina wrote: > Howdy List, > I would like to install gfs on a two node cluster running both fedora 8. > > Can anyone please kindly supply me with some links for the procedure? First part of the procedure is to not use FC if you plan for this to be useful. FC7+ comes only with GFS2. There are no GFS1 packages included, and GFS2 isn't stable yet. > Which packages are needed, where to get them, that sort of things. cman openais gfs-utils kmod-gfs rgmanager Can't remember if there may be more. > I've already googled up and down a little but I couldn't find no > rigourous information on this, or maybe I am just blind :-) This is probably a not a bad place to start: https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS#section-GFS-Documentation Gordan From jas199931 at yahoo.com Sun May 4 23:27:36 2008 From: jas199931 at yahoo.com (Ja S) Date: Sun, 4 May 2008 16:27:36 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <481B1840.8020907@redhat.com> Message-ID: <853958.85045.qm@web32207.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > --- Christine Caulfield > wrote: > > > > > >>> DLM lockspace 'data' > >>> 5 2f06768 1 > >>> 5 114d15 1 > >>> 5 120b13 1 > >>> 5 5bd1f04 1 > >>> 3 6a02f8 2 > >>> 5 cb7604 1 > >>> 5 ca187b 1 > >>> > >> The first two numbers are the lock name. Don't > ask > >> me what they mean, > >> that's a GFS question! (actually, I think inode > >> numbers might be > >> involved) The last number is the nodeID on which > the > >> lock is mastered. > > > > > > Great, thanks again! > > > > > >>>> That lookup only happens the first time > >>>> a resource is used by a node, once the > >>>> node knows where the master is, > >>>> it does not need to look it up again, > >>>> unless it releases all > >>>> locks on the resource. > >>>> > >>> Oh, I see. Just to further clarify, does it > means > >> if > >>> the same lock resource is required again by an > >>> application on the node A, the node A will go > >> straight > >>> to the known node (ie the node B) which holds > the > >>> master previously, but needs to lookup again if > >> the > >>> node B has already released the lock resource? > >> Not quite. A resource is mastered on a node for > as > >> long as there are > >> locks for it. If node A gets the lock (which is > >> mastered on node B) then > >> it knows always to go do node B until all locks > on > >> node A are released. > >> When that happens the local copy of the resource > on > >> node A is released > >> including the reference to node B. If all the > locks > >> on node B are > >> released (but A still has some) then the resource > >> will stay mastered on > >> node B and nodes that still have locks on that > >> resource will know where > >> to find it without a directory lookup. > >> > > > > Aha, I think I missed another important concept -- > a > > local copy of lock resources. I did not realise > the > > existence of the local copy of lock resources. > Which > > file should I check to figure out how many local > > copies a node has and what the local copies are? > > All the locks are displayed in > /proc/cluster/dlm_locks, that shows you > which are local copies and which are masters. A couple of further questions about the master copy of lock resources. The first one: ============= Again, assume: 1) Node A is extremely too busy and handle all requests 2) other nodes are just idle and have never handled any requests According to the documents, Node A will hold all master copies initially. The thing I am not aware of and unclear is whether the lock manager will evenly distribute the master copies on Node A to other nodes when it thinks the number of master copies on Node A is too many? The second one: ============== Assume a master copy of lock resource is on Node A. Now Node B holds a local copy of the lock resource. When the lock queues changed on the local copy on Node B, will the master copy on Node A be updated simultaneously? If so, when more than one nodes have the local copy of the same lock resource, how the lock manager to handle the update of the master copy? Using another lock mechanism to prevent the corruption of the master copy? Thanks again in advance. Jas > -- > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From oliveiros.cristina at gmail.com Sun May 4 23:36:04 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Mon, 5 May 2008 00:36:04 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: <481E4544.1020301@bobich.net> References: <481E4544.1020301@bobich.net> Message-ID: Hello, Gordan, Thank you for your e-mail. *"First part of the procedure is to not use FC if you plan for this to be useful" *By this you mean that it is not a good idea to install it on FC? Is GFS somewhat RH oriented? I chose FC because I am not familiar with rh and I've read somewhere that gfs would work on fc Thank you for the package names and link Best, Oliveiros 2008/5/5 Gordan Bobic : > Oliveiros Cristina wrote: > > > Howdy List, > > I would like to install gfs on a two node cluster running both fedora 8. > > > > Can anyone please kindly supply me with some links for the procedure? > > > > First part of the procedure is to not use FC if you plan for this to be > useful. FC7+ comes only with GFS2. There are no GFS1 packages included, and > GFS2 isn't stable yet. > > Which packages are needed, where to get them, that sort of things. > > > > cman > openais > gfs-utils > kmod-gfs > rgmanager > > Can't remember if there may be more. > > I've already googled up and down a little but I couldn't find no > > rigourous information on this, or maybe I am just blind :-) > > > > This is probably a not a bad place to start: > > https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS#section-GFS-Documentation > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jas199931 at yahoo.com Mon May 5 00:28:29 2008 From: jas199931 at yahoo.com (Ja S) Date: Sun, 4 May 2008 17:28:29 -0700 (PDT) Subject: [Linux-cluster] An odd problem may be related to GFS+DLM Message-ID: <897999.3920.qm@web32203.mail.mud.yahoo.com> Hi, All: We realised a problem and suspected that the problem might be related to GFS and DLM. Therefore, I am sending the email to this group. If you think my problem is irrelevant, please forgive me. ========================= We have a SAN environment, where 5 nodes running RHEL v4u4 and Redhat Cluster Suite connected to EMC AX150SCi iSCSI RAID storage (GFS+DLM, RAID10) We have a subdirectory on the storage and we are sure that no applications on these five nodes know the existence of the subdirectory. In other words, the subdirectory should be free of lock but its parent directories may have locks. The subdirectory holds more than 31700 small files and the total size of these files is about 4.3G. Within these 31700 files, about 1/3 of them are symbolic links pointing to other files at the same subdirectory. The subdirectory stat is: File: `abc' Size: 8192 Blocks: 6024 IO Block: 4096 directory Device: fc00h/64512d Inode: 1065226 Links: 2 Access: (0770/drwxrwx---) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2008-05-04 22:53:39.000000000 +0000 Modify: 2008-04-15 03:02:24.000000000 +0000 Change: 2008-04-15 07:11:52.000000000 +0000 Now, when I tried to ls the subdirectory from an idle node, it took ages to output the information. I then timed the ls command, and the results were shocking. # time ls -la > /dev/null real 3m5.249s user 0m0.628s sys 0m5.137s As I said that the node I used to access the subdirectory was completely idle, what could cause the long delay? We asked EMC to check the hardware (including the controller and hard drives) and was reported that there was no problem at all. Therefore, I would like to seek your kind answers to the following questions: Is the problem related to GFS and DLM? I heard GFS is not suitable for many small files. Is that true? Is the delay caused by locks applied to its parent directories? Which direction should I go to figure out what is happening and what is the underlying reason? Thanks for your time and look forward to your reply. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From gordan at bobich.net Mon May 5 01:07:26 2008 From: gordan at bobich.net (Gordan Bobic) Date: Mon, 05 May 2008 02:07:26 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: References: <481E4544.1020301@bobich.net> Message-ID: <481E5DCE.5040206@bobich.net> Oliveiros Cristina wrote: > /"First part of the procedure is to not use FC if you plan for this to > be useful" > > /By this you mean that it is not a good idea to install it on FC? Is GFS > somewhat RH oriented? FC is effectively RedHat alpha. There is no structural or organizational difference between them. The differences are in stability and the amount of testing that goes into things. GFS (and RedHat Cluster Services which GFS is a part of) will run on any distribution, of course - it's just that you may have to build the correct stable packages from source, which seems pointless when you can have something that just works already. It's down to personal preference. > I chose FC because I am not familiar with rh and I've read somewhere > that gfs would work on fc It'll work, but running FC in a production environment is asking for trouble. You might as well run it on Gentoo and custom compile everything from bleeding edge sources, but it isn't going to help you achieve a stable system that has been tested by someone else other than just you. Gordan From oliveiros.cristina at gmail.com Mon May 5 09:46:32 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Mon, 5 May 2008 10:46:32 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: <481E5DCE.5040206@bobich.net> References: <481E4544.1020301@bobich.net> <481E5DCE.5040206@bobich.net> Message-ID: Hello again , Gordan. I understand what you explained. But, actually, I don't want to run it on a production environment. It is mainly for testing purposes, it's part of a work for university. And , could you please tell me where can I download the source tree ? I will need to read the code. Thanks you for your help and thoughful considerations. All The Best, Oliveiros 2008/5/5 Gordan Bobic : > Oliveiros Cristina wrote: > > > /"First part of the procedure is to not use FC if you plan for this to > > be useful" > > > > /By this you mean that it is not a good idea to install it on FC? Is GFS > > somewhat RH oriented? > > > > FC is effectively RedHat alpha. There is no structural or organizational > difference between them. The differences are in stability and the amount of > testing that goes into things. > > GFS (and RedHat Cluster Services which GFS is a part of) will run on any > distribution, of course - it's just that you may have to build the correct > stable packages from source, which seems pointless when you can have > something that just works already. It's down to personal preference. > > I chose FC because I am not familiar with rh and I've read somewhere that > > gfs would work on fc > > > > It'll work, but running FC in a production environment is asking for > trouble. You might as well run it on Gentoo and custom compile everything > from bleeding edge sources, but it isn't going to help you achieve a stable > system that has been tested by someone else other than just you. > > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sghosh at redhat.com Mon May 5 13:56:25 2008 From: sghosh at redhat.com (Subhendu Ghosh) Date: Mon, 05 May 2008 09:56:25 -0400 Subject: [Linux-cluster] GFS on fedora In-Reply-To: References: <481E4544.1020301@bobich.net> <481E5DCE.5040206@bobich.net> Message-ID: <481F1209.1080109@redhat.com> Source tree is available at: http://sources.redhat.com/cluster/wiki/ There are at least 4 major branches that are being maintained - roughly equivalent to RHEL 3, 4, 5 and devel -regards Subhendu Oliveiros Cristina wrote: > Hello again , Gordan. > > I understand what you explained. > But, actually, I don't want to run it on a production environment. > It is mainly for testing purposes, it's part of a work for university. > > And , could you please tell me where can I download the source tree ? > I will need to read the code. > > Thanks you for your help and thoughful considerations. > > All The Best, > Oliveiros > > > 2008/5/5 Gordan Bobic >: > > Oliveiros Cristina wrote: > > /"First part of the procedure is to not use FC if you plan for > this to be useful" > > > /By this you mean that it is not a good idea to install it on > FC? Is GFS somewhat RH oriented? > > > FC is effectively RedHat alpha. There is no structural or > organizational difference between them. The differences are in > stability and the amount of testing that goes into things. > > GFS (and RedHat Cluster Services which GFS is a part of) will run on > any distribution, of course - it's just that you may have to build > the correct stable packages from source, which seems pointless when > you can have something that just works already. It's down to > personal preference. > > > I chose FC because I am not familiar with rh and I've read > somewhere that gfs would work on fc > > > It'll work, but running FC in a production environment is asking for > trouble. You might as well run it on Gentoo and custom compile > everything from bleeding edge sources, but it isn't going to help > you achieve a stable system that has been tested by someone else > other than just you. > > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Red Hat Summit Boston | June 18-20, 2008 Learn more: http://www.redhat.com/summit -------------- next part -------------- A non-text attachment was scrubbed... Name: sghosh.vcf Type: text/x-vcard Size: 266 bytes Desc: not available URL: From underscore_dot at yahoo.com Mon May 5 18:54:39 2008 From: underscore_dot at yahoo.com (nch) Date: Mon, 5 May 2008 11:54:39 -0700 (PDT) Subject: [Linux-cluster] GFS on fedora Message-ID: <422308.31579.qm@web32401.mail.mud.yahoo.com> see the docs section ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.02.tar.gz cheers ----- Original Message ---- From: Oliveiros Cristina To: Linux-cluster at redhat.com Sent: Monday, May 5, 2008 12:33:34 AM Subject: [Linux-cluster] GFS on fedora Howdy List, I would like to install gfs on a two node cluster running both fedora 8. Can anyone please kindly supply me with some links for the procedure? Which packages are needed, where to get them, that sort of things. I've already googled up and down a little but I couldn't find no rigourous information on this, or maybe I am just blind :-) Thanks a lot in advance Best, Oliveiros ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From raycharles_man at yahoo.com Mon May 5 23:29:44 2008 From: raycharles_man at yahoo.com (Ray Charles) Date: Mon, 5 May 2008 16:29:44 -0700 (PDT) Subject: [Linux-cluster] GFS on fedora In-Reply-To: <422308.31579.qm@web32401.mail.mud.yahoo.com> Message-ID: <526617.75888.qm@web32105.mail.mud.yahoo.com> Hi, I'd like to add a word on choosing F8 for trying gfs. A while back, could still be the case, gfs2-tools were not as complete as they are on Centos-5. Specifically it was the util to grow the file system that was not working. So you may need to consider that if its still not working. -Ray --- nch wrote: > see the docs section > ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.02.tar.gz > > cheers > > > ----- Original Message ---- > From: Oliveiros Cristina > > To: Linux-cluster at redhat.com > Sent: Monday, May 5, 2008 12:33:34 AM > Subject: [Linux-cluster] GFS on fedora > > Howdy List, > I would like to install gfs on a two node cluster > running both fedora 8. > > Can anyone please kindly supply me with some links > for the procedure? > > Which packages are needed, where to get them, that > sort of things. > > I've already googled up and down a little but I > couldn't find no > rigourous information on this, or maybe I am just > blind :-) > > Thanks a lot in advance > > Best, > Oliveiros > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ> -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From dhongqian at 163.com Tue May 6 05:54:40 2008 From: dhongqian at 163.com (dhongqian) Date: Tue, 6 May 2008 13:54:40 +0800 Subject: [Linux-cluster] Problem: can't write file via gfs2 Message-ID: <200805061354402181105@163.com> I use 4 nodes cluster that all mount the same gfs2 storage. On one node, I use dd write a 512000 byte file while on other node , I use the command 'ls -l' to see , the file only 3584 byte. [root at nd11 mnt]# dd if=/dev/zero of=x count=1000 1000+0 records in 1000+0 records out [root at nd11 mnt]# ll total 501492 -rw-r--r-- 1 root root 512000 May 5 23:55 x -rw-r--r-- 1 root root 3584 May 6 2008 xxx [root at nd13 mnt]# ll total 501492 -rw-r--r-- 1 root root 3584 May 5 2008 x -rw-r--r-- 1 root root 3584 May 6 2008 xxx Thank you very much in advance and look forward to hearing from you. hongqian 2008-05-06 hongqian -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Tue May 6 07:31:05 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Tue, 06 May 2008 08:31:05 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: <526617.75888.qm@web32105.mail.mud.yahoo.com> References: <526617.75888.qm@web32105.mail.mud.yahoo.com> Message-ID: <1210059065.3413.1.camel@localhost.localdomain> Hi, That bug has been fixed, along with others and the gfs2-utils in F-8 is now the most uptodate code. Unfortunately cman still lags behind, but we are working on that, Steve. On Mon, 2008-05-05 at 16:29 -0700, Ray Charles wrote: > > > Hi, > > I'd like to add a word on choosing F8 for trying gfs. > A while back, could still be the case, gfs2-tools were > not as complete as they are on Centos-5. Specifically > it was the util to grow the file system that was not > working. So you may need to consider that if its still > not working. > > -Ray > > --- nch wrote: > > > see the docs section > > > ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.02.tar.gz > > > > cheers > > > > > > ----- Original Message ---- > > From: Oliveiros Cristina > > > > To: Linux-cluster at redhat.com > > Sent: Monday, May 5, 2008 12:33:34 AM > > Subject: [Linux-cluster] GFS on fedora > > > > Howdy List, > > I would like to install gfs on a two node cluster > > running both fedora 8. > > > > Can anyone please kindly supply me with some links > > for the procedure? > > > > Which packages are needed, where to get them, that > > sort of things. > > > > I've already googled up and down a little but I > > couldn't find no > > rigourous information on this, or maybe I am just > > blind :-) > > > > Thanks a lot in advance > > > > Best, > > Oliveiros > > > > > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ> > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lists at tangent.co.za Tue May 6 08:02:55 2008 From: lists at tangent.co.za (Chris Picton) Date: Tue, 6 May 2008 08:02:55 +0000 (UTC) Subject: [Linux-cluster] GFS vs GFS2 Message-ID: Hi All I am investigating a new cluster installation. Documentation from redhat indicates that GFS2 is not yet production ready. Tests I have run show it is *much* faster that gfs for my workload. Is GFS2 not production-ready due to lack of testing, or due to known bugs? Any advice would be appreciated Chris From underscore_dot at yahoo.com Tue May 6 10:29:00 2008 From: underscore_dot at yahoo.com (nch) Date: Tue, 6 May 2008 03:29:00 -0700 (PDT) Subject: [Linux-cluster] mounting as non root Message-ID: <580317.8530.qm@web32404.mail.mud.yahoo.com> Hi, there. I can mount a gfs2 filesystem (gnbd) as root, but I'm having difficulties to mount it or, at least, giving write access to other users/groups. Any ideas on how to do this? Regards. ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From suuuper at messinalug.org Tue May 6 10:55:25 2008 From: suuuper at messinalug.org (Giovanni Mancuso) Date: Tue, 06 May 2008 12:55:25 +0200 Subject: [Linux-cluster] Problem gfs2 and drbd Message-ID: <4820391D.1070601@messinalug.org> Hi to all, I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/' i have ls: /store/new/: Input/output error and in my dmesg i have: GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state != LM_ST_UNLOCKED" failed GFS2: fsid=sophosha:gfs00.1: function = gfs2_glock_drop_th, file = fs/gfs2/glock.c, line = 963 [] gfs2_assert_warn_i+0x7e/0x108 [gfs2] [] gfs2_glock_drop_th+0x83/0xfb [gfs2] [] xmote_bh+0x10a/0x271 [gfs2] [] run_queue+0xd4/0x26e [gfs2] [] glock_work_func+0x24/0x31 [gfs2] [] run_workqueue+0x78/0xb5 [] glock_work_func+0x0/0x31 [gfs2] [] worker_thread+0xd9/0x10d [] default_wake_function+0x0/0xc [] worker_thread+0x0/0x10d [] kthread+0xc0/0xeb [] kthread+0x0/0xeb [] kernel_thread_helper+0x7/0x10 ======================= how can i solve it? P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd. From maciej.bogucki at artegence.com Tue May 6 10:57:41 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Tue, 06 May 2008 12:57:41 +0200 Subject: [Linux-cluster] mounting as non root In-Reply-To: <580317.8530.qm@web32404.mail.mud.yahoo.com> References: <580317.8530.qm@web32404.mail.mud.yahoo.com> Message-ID: <482039A5.8070705@artegence.com> nch napisa?(a): > > Hi, there. > I can mount a gfs2 filesystem (gnbd) as root, but I'm having > difficulties to mount it or, at least, giving write access to other > users/groups. > Any ideas on how to do this? It is from "man mount" (iii) Normally, only the superuser can mount file systems. However, when fstab contains the user option on a line, anybody can mount the corresponding system. Thus, given a line /dev/cdrom /cd iso9660 ro,user,noauto,unhide any user can mount the iso9660 file system found on his CDROM using the command Best Regards Maciej Bogucki From swhiteho at redhat.com Tue May 6 11:00:44 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Tue, 06 May 2008 12:00:44 +0100 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <4820391D.1070601@messinalug.org> References: <4820391D.1070601@messinalug.org> Message-ID: <1210071644.3413.18.camel@localhost.localdomain> Hi, I've not seen that before. What version of GFS2 are you using and are you using lock_nolock or lock_dlm? Steve. On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote: > Hi to all, > I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/' > i have ls: /store/new/: Input/output error > and in my dmesg i have: > > GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state != > LM_ST_UNLOCKED" failed > GFS2: fsid=sophosha:gfs00.1: function = gfs2_glock_drop_th, file = > fs/gfs2/glock.c, line = 963 > [] gfs2_assert_warn_i+0x7e/0x108 [gfs2] > [] gfs2_glock_drop_th+0x83/0xfb [gfs2] > [] xmote_bh+0x10a/0x271 [gfs2] > [] run_queue+0xd4/0x26e [gfs2] > [] glock_work_func+0x24/0x31 [gfs2] > [] run_workqueue+0x78/0xb5 > [] glock_work_func+0x0/0x31 [gfs2] > [] worker_thread+0xd9/0x10d > [] default_wake_function+0x0/0xc > [] worker_thread+0x0/0x10d > [] kthread+0xc0/0xeb > [] kthread+0x0/0xeb > [] kernel_thread_helper+0x7/0x10 > ======================= > > how can i solve it? > > P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From suuuper at messinalug.org Tue May 6 11:04:49 2008 From: suuuper at messinalug.org (Giovanni Mancuso) Date: Tue, 06 May 2008 13:04:49 +0200 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <1210071644.3413.18.camel@localhost.localdomain> References: <4820391D.1070601@messinalug.org> <1210071644.3413.18.camel@localhost.localdomain> Message-ID: <48203B51.7030705@messinalug.org> I use lock_dlm and my version of gfs2 is: GFS2 (built Oct 10 2007 16:34:59) installed Thanks Steven Whitehouse ha scritto: > Hi, > > I've not seen that before. What version of GFS2 are you using and are > you using lock_nolock or lock_dlm? > > Steve. > > On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote: > >> Hi to all, >> I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/' >> i have ls: /store/new/: Input/output error >> and in my dmesg i have: >> >> GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state != >> LM_ST_UNLOCKED" failed >> GFS2: fsid=sophosha:gfs00.1: function = gfs2_glock_drop_th, file = >> fs/gfs2/glock.c, line = 963 >> [] gfs2_assert_warn_i+0x7e/0x108 [gfs2] >> [] gfs2_glock_drop_th+0x83/0xfb [gfs2] >> [] xmote_bh+0x10a/0x271 [gfs2] >> [] run_queue+0xd4/0x26e [gfs2] >> [] glock_work_func+0x24/0x31 [gfs2] >> [] run_workqueue+0x78/0xb5 >> [] glock_work_func+0x0/0x31 [gfs2] >> [] worker_thread+0xd9/0x10d >> [] default_wake_function+0x0/0xc >> [] worker_thread+0x0/0x10d >> [] kthread+0xc0/0xeb >> [] kthread+0x0/0xeb >> [] kernel_thread_helper+0x7/0x10 >> ======================= >> >> how can i solve it? >> >> P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd. >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Tue May 6 11:06:38 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Tue, 06 May 2008 12:06:38 +0100 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <48203B51.7030705@messinalug.org> References: <4820391D.1070601@messinalug.org> <1210071644.3413.18.camel@localhost.localdomain> <48203B51.7030705@messinalug.org> Message-ID: <1210071998.3413.21.camel@localhost.localdomain> Hi, On Tue, 2008-05-06 at 13:04 +0200, Giovanni Mancuso wrote: > I use > lock_dlm > and my version of gfs2 is: > GFS2 (built Oct 10 2007 16:34:59) installed > Built from what exactly? Linus' kernel tree? the -nmw git tree? Some distribution or other? I suspect that you probably need to upgrade to a newer kernel version though given that date. Ideally as recent as possible, Steve. > Thanks > > > Steven Whitehouse ha scritto: > > Hi, > > > > I've not seen that before. What version of GFS2 are you using and are > > you using lock_nolock or lock_dlm? > > > > Steve. > > > > On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote: > > > > > Hi to all, > > > I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/' > > > i have ls: /store/new/: Input/output error > > > and in my dmesg i have: > > > > > > GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state != > > > LM_ST_UNLOCKED" failed > > > GFS2: fsid=sophosha:gfs00.1: function = gfs2_glock_drop_th, file = > > > fs/gfs2/glock.c, line = 963 > > > [] gfs2_assert_warn_i+0x7e/0x108 [gfs2] > > > [] gfs2_glock_drop_th+0x83/0xfb [gfs2] > > > [] xmote_bh+0x10a/0x271 [gfs2] > > > [] run_queue+0xd4/0x26e [gfs2] > > > [] glock_work_func+0x24/0x31 [gfs2] > > > [] run_workqueue+0x78/0xb5 > > > [] glock_work_func+0x0/0x31 [gfs2] > > > [] worker_thread+0xd9/0x10d > > > [] default_wake_function+0x0/0xc > > > [] worker_thread+0x0/0x10d > > > [] kthread+0xc0/0xeb > > > [] kthread+0x0/0xeb > > > [] kernel_thread_helper+0x7/0x10 > > > ======================= > > > > > > how can i solve it? > > > > > > P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd. > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From underscore_dot at yahoo.com Tue May 6 12:32:39 2008 From: underscore_dot at yahoo.com (nch) Date: Tue, 6 May 2008 05:32:39 -0700 (PDT) Subject: [Linux-cluster] mounting as non root Message-ID: <149194.48263.qm@web32401.mail.mud.yahoo.com> I tried that, unsuccessfully. The relevant line in my fstab is: /dev/gnbd/disk /mnt/shared gfs2 user,noauto 0 0 And this is the error msg when trying "mount /mnt/shared" as a non root user /sbin/mount.gfs2: error mounting /dev/gnbd/disk on /mnt/shared: Operation not permitted Any suggestions? Lots of thanks. ----- Original Message ---- From: Maciej Bogucki To: linux clustering Sent: Tuesday, May 6, 2008 12:57:41 PM Subject: Re: [Linux-cluster] mounting as non root nch napisa?(a): > > Hi, there. > I can mount a gfs2 filesystem (gnbd) as root, but I'm having > difficulties to mount it or, at least, giving write access to other > users/groups. > Any ideas on how to do this? It is from "man mount" (iii) Normally, only the superuser can mount file systems. However, when fstab contains the user option on a line, anybody can mount the corresponding system. Thus, given a line /dev/cdrom /cd iso9660 ro,user,noauto,unhide any user can mount the iso9660 file system found on his CDROM using the command Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From T.Kumar at alcoa.com Tue May 6 13:40:28 2008 From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS)) Date: Tue, 6 May 2008 09:40:28 -0400 Subject: [Linux-cluster] lvextend error on Redhat-cluster suit 5.1 Message-ID: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com> Linux hostname 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 5.1 (Tikanga) # lvextend -L +4000M /dev/vgec_rde0_pdb/lvol2 Extending logical volume lvol2 to 63.91 GB Error locking on node xxxxxx: Volume group for uuid not found: CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl Error locking on node xxxxxx: Volume group for uuid not found: CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl Failed to suspend lvol2 # vgdisplay -v vgec_rde0_pdb Using volume group(s) on command line Finding volume group "vgec_rde0_pdb" --- Volume group --- VG Name vgec_rde0_pdb System ID Format lvm2 Metadata Areas 4 Metadata Sequence No 9 VG Access read/write VG Status resizable Clustered yes Shared no MAX LV 255 Cur LV 7 Open LV 7 Max PV 150 Cur PV 4 Act PV 4 VG Size 269.62 GB PE Size 32.00 MB Total PE 8628 Alloc PE / Size 6752 / 211.00 GB Free PE / Size 1876 / 58.62 GB VG UUID CyPYYt-smPY-Fg2M-11gl-sWM2-OSzm-cVAbkm # lvdisplay -v /dev/vgec_rde0_pdb/lvol2 Using logical volume(s) on command line --- Logical volume --- LV Name /dev/vgec_rde0_pdb/lvol2 VG Name vgec_rde0_pdb LV UUID 05WEDG-VxER-xVhT-jHDI-l90y-jpjq-7urtPl LV Write Access read/write LV Status available # open 1 LV Size 60.00 GB Current LE 1920 Segments 1 Allocation inherit Read ahead sectors 0 Block device 253:15 Let me know if you have any suggetion From bkyoung at gmail.com Tue May 6 14:02:19 2008 From: bkyoung at gmail.com (Brandon Young) Date: Tue, 6 May 2008 09:02:19 -0500 Subject: [Linux-cluster] lvextend error on Redhat-cluster suit 5.1 In-Reply-To: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com> References: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com> Message-ID: <824ffea00805060702ycc37813ib4412b3169eac327@mail.gmail.com> 'partprobe' on each cluster node, and try restarting clvmd on each node. Note that you should unmount the filesystem before restarting clvmd ... On Tue, May 6, 2008 at 8:40 AM, Kumar, T Santhosh (TCS) wrote: > > > Linux hostname 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Red Hat Enterprise Linux Server release 5.1 (Tikanga) > > # lvextend -L +4000M /dev/vgec_rde0_pdb/lvol2 > Extending logical volume lvol2 to 63.91 GB > Error locking on node xxxxxx: Volume group for uuid not found: > > CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl > Error locking on node xxxxxx: Volume group for uuid not found: > > CyPYYtsmPYFg2M11glsWM2OSzmcVAbkm05WEDGVxERxVhTjHDIl90yjpjq7urtPl > Failed to suspend lvol2 > > > # vgdisplay -v vgec_rde0_pdb > Using volume group(s) on command line > Finding volume group "vgec_rde0_pdb" > --- Volume group --- > VG Name vgec_rde0_pdb > System ID > Format lvm2 > Metadata Areas 4 > Metadata Sequence No 9 > VG Access read/write > VG Status resizable > Clustered yes > Shared no > MAX LV 255 > Cur LV 7 > Open LV 7 > Max PV 150 > Cur PV 4 > Act PV 4 > VG Size 269.62 GB > PE Size 32.00 MB > Total PE 8628 > Alloc PE / Size 6752 / 211.00 GB > Free PE / Size 1876 / 58.62 GB > VG UUID CyPYYt-smPY-Fg2M-11gl-sWM2-OSzm-cVAbkm > > > # lvdisplay -v /dev/vgec_rde0_pdb/lvol2 > Using logical volume(s) on command line > --- Logical volume --- > LV Name /dev/vgec_rde0_pdb/lvol2 > VG Name vgec_rde0_pdb > LV UUID 05WEDG-VxER-xVhT-jHDI-l90y-jpjq-7urtPl > LV Write Access read/write > LV Status available > # open 1 > LV Size 60.00 GB > Current LE 1920 > Segments 1 > Allocation inherit > Read ahead sectors 0 > Block device 253:15 > > > Let me know if you have any suggetion > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From suuuper at messinalug.org Tue May 6 14:32:44 2008 From: suuuper at messinalug.org (Giovanni Mancuso) Date: Tue, 06 May 2008 16:32:44 +0200 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <1210071998.3413.21.camel@localhost.localdomain> References: <4820391D.1070601@messinalug.org> <1210071644.3413.18.camel@localhost.localdomain> <48203B51.7030705@messinalug.org> <1210071998.3413.21.camel@localhost.localdomain> Message-ID: <48206C0C.3000805@messinalug.org> I use kernel: uname -a Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 i686 athlon i386 GNU/Linux and release: cat /etc/redhat-release Red Hat Enterprise Linux Server release 5.1 (Tikanga) Thanks Steven Whitehouse ha scritto: > Hi, > > On Tue, 2008-05-06 at 13:04 +0200, Giovanni Mancuso wrote: > >> I use >> lock_dlm >> and my version of gfs2 is: >> GFS2 (built Oct 10 2007 16:34:59) installed >> >> > Built from what exactly? Linus' kernel tree? the -nmw git tree? Some > distribution or other? > > I suspect that you probably need to upgrade to a newer kernel version > though given that date. Ideally as recent as possible, > > Steve. > > > >> Thanks >> >> >> Steven Whitehouse ha scritto: >> >>> Hi, >>> >>> I've not seen that before. What version of GFS2 are you using and are >>> you using lock_nolock or lock_dlm? >>> >>> Steve. >>> >>> On Tue, 2008-05-06 at 12:55 +0200, Giovanni Mancuso wrote: >>> >>> >>>> Hi to all, >>>> I have a problem with gfs2. If i try to do: watch -n1 'ls -ls /store/' >>>> i have ls: /store/new/: Input/output error >>>> and in my dmesg i have: >>>> >>>> GFS2: fsid=sophosha:gfs00.1: warning: assertion "gl->gl_state != >>>> LM_ST_UNLOCKED" failed >>>> GFS2: fsid=sophosha:gfs00.1: function = gfs2_glock_drop_th, file = >>>> fs/gfs2/glock.c, line = 963 >>>> [] gfs2_assert_warn_i+0x7e/0x108 [gfs2] >>>> [] gfs2_glock_drop_th+0x83/0xfb [gfs2] >>>> [] xmote_bh+0x10a/0x271 [gfs2] >>>> [] run_queue+0xd4/0x26e [gfs2] >>>> [] glock_work_func+0x24/0x31 [gfs2] >>>> [] run_workqueue+0x78/0xb5 >>>> [] glock_work_func+0x0/0x31 [gfs2] >>>> [] worker_thread+0xd9/0x10d >>>> [] default_wake_function+0x0/0xc >>>> [] worker_thread+0x0/0x10d >>>> [] kthread+0xc0/0xeb >>>> [] kthread+0x0/0xeb >>>> [] kernel_thread_helper+0x7/0x10 >>>> ======================= >>>> >>>> how can i solve it? >>>> >>>> P.S /store/ is my gfs2 filesystem replicaded to another machine with drbd. >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster at redhat.com >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>>> >>>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbrassow at redhat.com Tue May 6 15:31:21 2008 From: jbrassow at redhat.com (Jonathan Brassow) Date: Tue, 6 May 2008 10:31:21 -0500 Subject: [Linux-cluster] lvextend error on Redhat-cluster suit 5.1 In-Reply-To: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com> References: <0C3FC6B507AF684199E57BFCA3EAB55324BA4958@NOANDC-MXU11.NOA.Alcoa.com> Message-ID: <05861C58-13E1-4D9D-BFCF-209A566A659E@redhat.com> On May 6, 2008, at 8:40 AM, Kumar, T Santhosh (TCS) wrote: > > > Linux hostname 2.6.18-53.1.14.el5 #1 SMP Tue Feb 19 07:18:46 EST 2008 > x86_64 x86_64 x86_64 GNU/Linux > > Red Hat Enterprise Linux Server release 5.1 (Tikanga) > > # lvextend -L +4000M /dev/vgec_rde0_pdb/lvol2 > Extending logical volume lvol2 to 63.91 GB > Error locking on node xxxxxx: Volume group for uuid not found: This type of message usually implies that the machine can see a storage device that is no longer part of a volume group, but has not been wiped (pvremove). This can happen for any number of reasons. The admin may have repartitioned something and forgot to wipe the PVs... a disk failed and came back... new disks were added that had LVM metadata on them... etc. Certainly try the suggestion about 'partprobe' and restarting clvmd... If that works, great. Otherwise, you will have to find the partition with the stray PV metadata on it - perhaps best done by reconciling 'cat /proc/partitions' and 'pvs'. brassow From brian at chpc.utah.edu Tue May 6 18:57:11 2008 From: brian at chpc.utah.edu (Brian D. Haymore) Date: Tue, 06 May 2008 12:57:11 -0600 Subject: [Linux-cluster] Sanity Check Message-ID: <4820AA07.3090303@chpc.utah.edu> I tried to send this yesterday but didn't see it on the list yet so I apologize if this ends up being a duplicate. We have been starting to play with Cluster Suite as part of RHEL over the past week. Our needs, we think, are pretty basic. However we have not found enough information in the docs to help validate our plans as being sane. So for that I'm turning to the list in hopes someone can help. What we are trying to do is simply have 3 servers attached to a SAN with common disk storage. By common storage I simply mean the SAN is zoned so that all three servers can see this common storage. We then want to lvm, cluster lvm flag enabled, this storage such that we can create many logical volumes. Each of those LVs would have an ext3 file system on it, implying only one server at a time will mount and use it. Then we can take the N LVs and distribute them out between the three servers in a very fixed fashion. So thus far we see that we need to have CMAN running as well as have lvm2-cluster installed and having run `lvmconf --enable-cluster`. Then from system-config-lvm we created a cluster of our 3 servers. We think that is about all we need to do for this very crude initial setup. This is where we wanted to get some feedback if in fact this is an acceptable, while overly basic, configuration. Could someone offer any feedback here? We do realize that we are ignoring many of the key features of the cluster setup where we could define these LVs and their file systems as resourced as well as the services using them and have cman, rgmanager, etc help build a more robust and polished setup. We are in a time crunch for now and need to get an initial setup going thus the above question, then with time we hope to learn the other parts of the system and then migrate things in a better direction. Thanks for your time. -- Brian D. Haymore University of Utah Center for High Performance Computing 155 South 1452 East RM 405 Salt Lake City, Ut 84112-0190 Phone: (801) 558-1150, Fax: (801) 585-5366 http://www.map.utah.edu/umaplink/0019.html From sdake at redhat.com Tue May 6 19:04:10 2008 From: sdake at redhat.com (Steven Dake) Date: Tue, 06 May 2008 12:04:10 -0700 Subject: [Linux-cluster] Sanity Check In-Reply-To: <4820AA07.3090303@chpc.utah.edu> References: <4820AA07.3090303@chpc.utah.edu> Message-ID: <1210100651.7766.13.camel@balance> On Tue, 2008-05-06 at 12:57 -0600, Brian D. Haymore wrote: > I tried to send this yesterday but didn't see it on the list yet so I > apologize if this ends up being a duplicate. > > > > > We have been starting to play with Cluster Suite as part of RHEL over > the past week. Our needs, we think, are pretty basic. However we have > not found enough information in the docs to help validate our plans as > being sane. So for that I'm turning to the list in hopes someone can help. > > What we are trying to do is simply have 3 servers attached to a SAN with > common disk storage. By common storage I simply mean the SAN is zoned > so that all three servers can see this common storage. We then want to > lvm, cluster lvm flag enabled, this storage such that we can create many > logical volumes. Each of those LVs would have an ext3 file system on > it, implying only one server at a time will mount and use it. Then we > can take the N LVs and distribute them out between the three servers in > a very fixed fashion. > > So thus far we see that we need to have CMAN running as well as have > lvm2-cluster installed and having run `lvmconf --enable-cluster`. Then > from system-config-lvm we created a cluster of our 3 servers. We think > that is about all we need to do for this very crude initial setup. This > is where we wanted to get some feedback if in fact this is an > acceptable, while overly basic, configuration. Could someone offer any > feedback here? > > We do realize that we are ignoring many of the key features of the > cluster setup where we could define these LVs and their file systems as > resourced as well as the services using them and have cman, rgmanager, > etc help build a more robust and polished setup. We are in a time > crunch for now and need to get an initial setup going thus the above > question, then with time we hope to learn the other parts of the system > and then migrate things in a better direction. Thanks for your time. > > Your design should work fine although maybe not clearly defined in any documentation as a use case. If you have no need of shared storage for the logical volumes, then there is no need for gfs. You will need to run lvm2 (clvmd) in clustered mode so that each node sees the logical volume changes when any node makes a change. check out the documentation section on http://sources.redhat.com/cluster/wiki you may find some information there that is helpful in your configuration. Regards -steve From lhh at redhat.com Tue May 6 19:06:54 2008 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 06 May 2008 15:06:54 -0400 Subject: [Linux-cluster] Sanity Check In-Reply-To: <4820AA07.3090303@chpc.utah.edu> References: <4820AA07.3090303@chpc.utah.edu> Message-ID: <1210100814.15248.8.camel@localhost.localdomain> On Tue, 2008-05-06 at 12:57 -0600, Brian D. Haymore wrote: > So thus far we see that we need to have CMAN running as well as have > lvm2-cluster installed and having run `lvmconf --enable-cluster`. Then > from system-config-lvm we created a cluster of our 3 servers. We think > that is about all we need to do for this very crude initial setup. This > is where we wanted to get some feedback if in fact this is an > acceptable, while overly basic, configuration. Could someone offer any > feedback here? That looks about right. You also want fencing if you're using clustered LVM to protect the LVM metadata, but I don't know if it's strictly *required* or not, since you're statically assigning VGs to specific nodes. Note that if you just assign static LUNs to each node and manage those LUNs from the SAN management interface, you don't even need lvm2-cluster or CMAN. For example, you can present only certain LUNs to certain computers. > We do realize that we are ignoring many of the key features of the > cluster setup where we could define these LVs and their file systems as > resourced as well as the services using them and have cman, rgmanager, > etc help build a more robust and polished setup. -- Lon From garromo at us.ibm.com Tue May 6 20:37:32 2008 From: garromo at us.ibm.com (Gary Romo) Date: Tue, 6 May 2008 14:37:32 -0600 Subject: [Linux-cluster] How do you verify/test fencing? Message-ID: Is there a command that you can run to test/veryify that fencing is working properly? Or that it is part of the fence if you will? I realize that the primary focus of the fence is to shut off the other server(s). However, when I have a cluster up, how can I determine that all of my nodes are properly fenced? Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Tue May 6 21:35:04 2008 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 06 May 2008 17:35:04 -0400 Subject: [Linux-cluster] How do you verify/test fencing? In-Reply-To: References: Message-ID: <1210109704.15248.28.camel@localhost.localdomain> On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote: > > Is there a command that you can run to test/veryify that fencing is > working properly? > Or that it is part of the fence if you will? > I realize that the primary focus of the fence is to shut off the other > server(s). > However, when I have a cluster up, how can I determine that all of my > nodes are properly fenced? I'm not sure exactly how to answer the question. Fencing is used to cut a node off; if all nodes are fenced, no one can access shared storage ;) * For testing whether or not fencing works, stop the cluster software on all the nodes and run 'fence_node ' (where nodename is a host you're not working on). * For testing whether or not a node will be fenced as a matter of recovery, try 'cman_tool services'. If that node's ID isn't in the "fence" section, it will not be fenced if it fails. (Note that mounting a GFS file system will fail if the node is not a part of the "fence" service.) Let me know if this answers your question. -- Lon From jas199931 at yahoo.com Tue May 6 21:36:25 2008 From: jas199931 at yahoo.com (Ja S) Date: Tue, 6 May 2008 14:36:25 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <853958.85045.qm@web32207.mail.mud.yahoo.com> Message-ID: <777828.55819.qm@web32207.mail.mud.yahoo.com> > A couple of further questions about the master copy > of > lock resources. > > The first one: > ============= > > Again, assume: > 1) Node A is extremely too busy and handle all > requests > 2) other nodes are just idle and have never handled > any requests > > According to the documents, Node A will hold all > master copies initially. The thing I am not aware of > and unclear is whether the lock manager will evenly > distribute the master copies on Node A to other > nodes > when it thinks the number of master copies on Node A > is too many? > After reading the source code briefly, it seems that there is a remastering process, which will be called when recovering and rebuilding the lock directory when any node(s) failed. Correct me if I am wrong, please. > > The second one: > ============== > > Assume a master copy of lock resource is on Node A. > Now Node B holds a local copy of the lock resource. > When the lock queues changed on the local copy on > Node > B, will the master copy on Node A be updated > simultaneously? If so, when more than one nodes have > the local copy of the same lock resource, how the > lock > manager to handle the update of the master copy? > Using > another lock mechanism to prevent the corruption of > the master copy? > I have not found the answer so far. I may need to read the source code very carefully. Can anyone kindly provide any hints? Thanks again in advance. Jas > > > > > -- > > > > Chrissie > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From garromo at us.ibm.com Tue May 6 21:37:46 2008 From: garromo at us.ibm.com (Gary Romo) Date: Tue, 6 May 2008 15:37:46 -0600 Subject: [Linux-cluster] fence error messages Message-ID: I am getting these in /var/log/messages May 6 14:46:28 lxomt06e fenced[2849]: fencing node "bogusnode" May 6 14:46:28 lxomt06e fenced[2849]: fence "bogusnode" failed I am basically setting up a single-node cluster, because I don't have the 2nd node yet. So I am using manual fence in order to accomplish this. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Tue May 6 21:47:26 2008 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 06 May 2008 17:47:26 -0400 Subject: [Linux-cluster] fence error messages In-Reply-To: References: Message-ID: <1210110446.15248.37.camel@localhost.localdomain> On Tue, 2008-05-06 at 15:37 -0600, Gary Romo wrote: > > I am getting these in /var/log/messages > > May 6 14:46:28 lxomt06e fenced[2849]: fencing node "bogusnode" > May 6 14:46:28 lxomt06e fenced[2849]: fence "bogusnode" failed > > I am basically setting up a single-node cluster, because I don't have > the 2nd node yet. > So I am using manual fence in order to accomplish this. > > > > > nodename="bogusnode"/> > > do you have a manual_fence in the section? -- Lon From ccaulfie at redhat.com Wed May 7 06:58:04 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Wed, 07 May 2008 07:58:04 +0100 Subject: [Linux-cluster] Lock Resources In-Reply-To: <853958.85045.qm@web32207.mail.mud.yahoo.com> References: <853958.85045.qm@web32207.mail.mud.yahoo.com> Message-ID: <482152FC.4070707@redhat.com> Ja S wrote: > --- Christine Caulfield wrote: > >> Ja S wrote: >>> --- Christine Caulfield >> wrote: >>> >>>>> DLM lockspace 'data' >>>>> 5 2f06768 1 >>>>> 5 114d15 1 >>>>> 5 120b13 1 >>>>> 5 5bd1f04 1 >>>>> 3 6a02f8 2 >>>>> 5 cb7604 1 >>>>> 5 ca187b 1 >>>>> >>>> The first two numbers are the lock name. Don't >> ask >>>> me what they mean, >>>> that's a GFS question! (actually, I think inode >>>> numbers might be >>>> involved) The last number is the nodeID on which >> the >>>> lock is mastered. >>> >>> Great, thanks again! >>> >>> >>>>>> That lookup only happens the first time >>>>>> a resource is used by a node, once the >>>>>> node knows where the master is, >>>>>> it does not need to look it up again, >>>>>> unless it releases all >>>>>> locks on the resource. >>>>>> >>>>> Oh, I see. Just to further clarify, does it >> means >>>> if >>>>> the same lock resource is required again by an >>>>> application on the node A, the node A will go >>>> straight >>>>> to the known node (ie the node B) which holds >> the >>>>> master previously, but needs to lookup again if >>>> the >>>>> node B has already released the lock resource? >>>> Not quite. A resource is mastered on a node for >> as >>>> long as there are >>>> locks for it. If node A gets the lock (which is >>>> mastered on node B) then >>>> it knows always to go do node B until all locks >> on >>>> node A are released. >>>> When that happens the local copy of the resource >> on >>>> node A is released >>>> including the reference to node B. If all the >> locks >>>> on node B are >>>> released (but A still has some) then the resource >>>> will stay mastered on >>>> node B and nodes that still have locks on that >>>> resource will know where >>>> to find it without a directory lookup. >>>> >>> Aha, I think I missed another important concept -- >> a >>> local copy of lock resources. I did not realise >> the >>> existence of the local copy of lock resources. >> Which >>> file should I check to figure out how many local >>> copies a node has and what the local copies are? >> All the locks are displayed in >> /proc/cluster/dlm_locks, that shows you >> which are local copies and which are masters. > > > A couple of further questions about the master copy of > lock resources. > > The first one: > ============= > > Again, assume: > 1) Node A is extremely too busy and handle all > requests > 2) other nodes are just idle and have never handled > any requests > > According to the documents, Node A will hold all > master copies initially. The thing I am not aware of > and unclear is whether the lock manager will evenly > distribute the master copies on Node A to other nodes > when it thinks the number of master copies on Node A > is too many? Locks are only remastered when a node leaves the cluster. In that case all of its nodes will be moved to another node. We do not do dynamic remastering - a resource that is mastered on one node will stay mastered on that node regardless of traffic or load, until all users of the resource have been freed. > The second one: > ============== > > Assume a master copy of lock resource is on Node A. > Now Node B holds a local copy of the lock resource. > When the lock queues changed on the local copy on Node > B, will the master copy on Node A be updated > simultaneously? If so, when more than one nodes have > the local copy of the same lock resource, how the lock > manager to handle the update of the master copy? Using > another lock mechanism to prevent the corruption of > the master copy? > All locking happens on the master node. The local copy is just that, a copy. It is updated when the master confirms what has happened. The local copy is there mainly for rebuilding the resource table when a master leaves the cluster, and to keep a track of locks that exist on the local node. The local copy is NOT complete. it only contains local users of a resource. -- Chrissie From maciej.bogucki at artegence.com Wed May 7 07:24:53 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Wed, 07 May 2008 09:24:53 +0200 Subject: [Linux-cluster] mounting as non root In-Reply-To: <149194.48263.qm@web32401.mail.mud.yahoo.com> References: <149194.48263.qm@web32401.mail.mud.yahoo.com> Message-ID: <48215945.7090704@artegence.com> nch napisa?(a): > > I tried that, unsuccessfully. > The relevant line in my fstab is: > /dev/gnbd/disk /mnt/shared gfs2 user,noauto 0 0 > > And this is the error msg when trying "mount /mnt/shared" as a non > root user > /sbin/mount.gfs2: error mounting /dev/gnbd/disk on /mnt/shared: > Operation not permitted Please paste me the output of: "id" and "cat /etc/fstab" Best Regards Maciej Bogucki From maciej.bogucki at artegence.com Wed May 7 07:30:16 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Wed, 07 May 2008 09:30:16 +0200 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <48206C0C.3000805@messinalug.org> References: <4820391D.1070601@messinalug.org> <1210071644.3413.18.camel@localhost.localdomain> <48203B51.7030705@messinalug.org> <1210071998.3413.21.camel@localhost.localdomain> <48206C0C.3000805@messinalug.org> Message-ID: <48215A88.2020805@artegence.com> Giovanni Mancuso napisa?(a): > I use kernel: > uname -a > Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 > i686 athlon i386 GNU/Linux > > and release: > cat /etc/redhat-release > Red Hat Enterprise Linux Server release 5.1 (Tikanga) Hello, You could try to upgrade kernel to the newer one and the rest of the packages. Best Regards Maciej Bogucki From swhiteho at redhat.com Wed May 7 08:33:51 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 07 May 2008 09:33:51 +0100 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <48215A88.2020805@artegence.com> References: <4820391D.1070601@messinalug.org> <1210071644.3413.18.camel@localhost.localdomain> <48203B51.7030705@messinalug.org> <1210071998.3413.21.camel@localhost.localdomain> <48206C0C.3000805@messinalug.org> <48215A88.2020805@artegence.com> Message-ID: <1210149231.3345.1.camel@localhost.localdomain> Hi, On Wed, 2008-05-07 at 09:30 +0200, Maciej Bogucki wrote: > Giovanni Mancuso napisa?(a): > > I use kernel: > > uname -a > > Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 > > i686 athlon i386 GNU/Linux > > > > and release: > > cat /etc/redhat-release > > Red Hat Enterprise Linux Server release 5.1 (Tikanga) > > Hello, > > You could try to upgrade kernel to the newer one and the rest of the > packages. > > Best Regards > Maciej Bogucki > Yes, thats certainly worth doing, although RHEL 5.1 kernels are not the best testing ground for GFS2. I'd suggest using a Fedora kernel for testing purposes, or at least the latest 5.2 kernel if you really need to use RHEL. Steve. From jas199931 at yahoo.com Wed May 7 10:34:31 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 7 May 2008 03:34:31 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <482152FC.4070707@redhat.com> Message-ID: <424492.84161.qm@web32205.mail.mud.yahoo.com> > > > > A couple of further questions about the master > copy of > > lock resources. > > > > The first one: > > ============= > > > > Again, assume: > > 1) Node A is extremely too busy and handle all > > requests > > 2) other nodes are just idle and have never > handled > > any requests > > > > According to the documents, Node A will hold all > > master copies initially. The thing I am not aware > of > > and unclear is whether the lock manager will > evenly > > distribute the master copies on Node A to other > nodes > > when it thinks the number of master copies on Node > A > > is too many? > > Locks are only remastered when a node leaves the > cluster. In that case > all of its nodes will be moved to another node. We > do not do dynamic > remastering - a resource that is mastered on one > node will stay mastered > on that node regardless of traffic or load, until > all users of the > resource have been freed. Thank you very much. > > > The second one: > > ============== > > > > Assume a master copy of lock resource is on Node > A. > > Now Node B holds a local copy of the lock > resource. > > When the lock queues changed on the local copy on > Node > > B, will the master copy on Node A be updated > > simultaneously? If so, when more than one nodes > have > > the local copy of the same lock resource, how the > lock > > manager to handle the update of the master copy? > Using > > another lock mechanism to prevent the corruption > of > > the master copy? > > > > All locking happens on the master node. The local > copy is just that, a > copy. It is updated when the master confirms what > has happened. The > local copy is there mainly for rebuilding the > resource table when a > master leaves the cluster, and to keep a track of > locks that exist on > the local node. The local copy is NOT complete. it > only contains local > users of a resource. > Thanks again for the kind and detailed explanation. I am sorry I have to bother you again as I am having more questions. I analysed /proc/cluster/dlm_dir and dlm_locks and found some strange things. Please see below: >From /proc/cluster/dlm_dir: In lock space [ABC]: This node (node 2) has 445 lock resources in total where --328 master lock resources --117 local copies of lock resources mastered on other nodes. =============================== =============================== >From /proc/cluster/dlm_locks: In lock space [ABC]: There are 1678 lock resouces in use where --1674 lock resources are mastered by this node (node 2) --4 lock resources are mastered by other nodes, within which: ----1 lock resource mastered on node 1 ----1 lock resource mastered on node 3 ----1 lock resource mastered on node 4 ----1 lock resource mastered on node 5 A typical master lock resource in /proc/cluster/dlm_locks is: Resource 000001000de4fd88 (parent 0000000000000000). Name (len=24) " 3 5fafc85" Master Copy LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Granted Queue 1ff5036d NL Remote: 4 000603e8 80d2013f NL Remote: 5 00040214 00240209 NL Remote: 3 0001031d 00080095 NL Remote: 1 00040197 00010304 NL Conversion Queue Waiting Queue After search for local copy in /proc/cluster/dlm_locks, I got: Resource 000001002a273618 (parent 0000000000000000). Name (len=16) "withdraw 3......" Local Copy, Master is node 3 Granted Queue 0004008d PR Master: 0001008c Conversion Queue Waiting Queue -- Resource 000001003fe69b68 (parent 0000000000000000). Name (len=16) "withdraw 5......" Local Copy, Master is node 5 Granted Queue 819402ef PR Master: 00010317 Conversion Queue Waiting Queue -- Resource 000001002a2732e8 (parent 0000000000000000). Name (len=16) "withdraw 1......" Local Copy, Master is node 1 Granted Queue 000401e9 PR Master: 00010074 Conversion Queue Waiting Queue -- Resource 000001004a32e598 (parent 0000000000000000). Name (len=16) "withdraw 4......" Local Copy, Master is node 4 Granted Queue 1f5b0317 PR Master: 00010203 Conversion Queue Waiting Queue These four local copy of lock resources have been staying in /proc/cluster/dlm_locks for several days. Now my questions: 1. In my case, for the same lock space, the number of master lock resources reported by dlm_dir is much SMALLER than that reported in dlm_locks. My understanding is that master lock resources listed in dlm_dir must be larger than or at least the same as that reported in dlm_locks. The situation I discovered on the node does not make any sense to me. Am I missing anything? Can you help me to clarify the case? 2. What can cause "withdraw ...." to be the lock resource name? 3. These four local copy of lock resources have not been released for at least serveral days as I knew. How can I find out whether they are in a strange dead situation or are still waiting for the lock manager to release them? How to change the timeout? Thank you very much for your great further help in advance. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From suuuper at messinalug.org Wed May 7 11:07:08 2008 From: suuuper at messinalug.org (Giovanni Mancuso) Date: Wed, 07 May 2008 13:07:08 +0200 Subject: [Linux-cluster] Problem gfs2 and drbd In-Reply-To: <1210149231.3345.1.camel@localhost.localdomain> References: <4820391D.1070601@messinalug.org> <1210071644.3413.18.camel@localhost.localdomain> <48203B51.7030705@messinalug.org> <1210071998.3413.21.camel@localhost.localdomain> <48206C0C.3000805@messinalug.org> <48215A88.2020805@artegence.com> <1210149231.3345.1.camel@localhost.localdomain> Message-ID: <48218D5C.5040604@messinalug.org> Ok, now i try to upgrade my kernel with the latest 5.2 kernel. Thanks Steven Whitehouse ha scritto: > Hi, > > On Wed, 2008-05-07 at 09:30 +0200, Maciej Bogucki wrote: > >> Giovanni Mancuso napisa?(a): >> >>> I use kernel: >>> uname -a >>> Linux sophosha1 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:02 EDT 2007 >>> i686 athlon i386 GNU/Linux >>> >>> and release: >>> cat /etc/redhat-release >>> Red Hat Enterprise Linux Server release 5.1 (Tikanga) >>> >> Hello, >> >> You could try to upgrade kernel to the newer one and the rest of the >> packages. >> >> Best Regards >> Maciej Bogucki >> >> > > Yes, thats certainly worth doing, although RHEL 5.1 kernels are not the > best testing ground for GFS2. I'd suggest using a Fedora kernel for > testing purposes, or at least the latest 5.2 kernel if you really need > to use RHEL. > > Steve. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at tangent.co.za Wed May 7 11:19:13 2008 From: lists at tangent.co.za (Chris Picton) Date: Wed, 7 May 2008 11:19:13 +0000 (UTC) Subject: [Linux-cluster] Re: How do you verify/test fencing? References: <1210109704.15248.28.camel@localhost.localdomain> Message-ID: On Tue, 06 May 2008 17:35:04 -0400, Lon Hohberger wrote: > On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote: >> >> Is there a command that you can run to test/veryify that fencing is >> working properly? >> Or that it is part of the fence if you will? I realize that the primary >> focus of the fence is to shut off the other server(s). >> However, when I have a cluster up, how can I determine that all of my >> nodes are properly fenced? > > > * For testing whether or not fencing works, stop the cluster software on > all the nodes and run 'fence_node ' (where nodename is a host > you're not working on). > > * For testing whether or not a node will be fenced as a matter of > recovery, try 'cman_tool services'. If that node's ID isn't in the > "fence" section, it will not be fenced if it fails. These two step will not test that a node will be fenced automatically if it is malfunctioning. What can be done to a cluster node, to test that it will be automatically fenced if there is a problem. From vimal at monster.co.in Wed May 7 11:41:10 2008 From: vimal at monster.co.in (Vimal Gupta) Date: Wed, 07 May 2008 17:11:10 +0530 Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: References: Message-ID: <48219556.9060901@monster.co.in> Hi, I have the same question.??? Anybody has the answer Please.......??? Chris Picton wrote: > Hi All > > I am investigating a new cluster installation. > > Documentation from redhat indicates that GFS2 is not yet production > ready. Tests I have run show it is *much* faster that gfs for my > workload. > > Is GFS2 not production-ready due to lack of testing, or due to known bugs? > > Any advice would be appreciated > > Chris > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Vimal Gupta Sr. System Administrator Monster.com India Pvt.Ltd. FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A, NOIDA, UP 201 301, INDIA Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360 From oliveiros.cristina at gmail.com Wed May 7 11:44:28 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Wed, 7 May 2008 12:44:28 +0100 Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: <48219556.9060901@monster.co.in> References: <48219556.9060901@monster.co.in> Message-ID: And I have the same question, also. Best, Oliveiros 2008/5/7 Vimal Gupta : > Hi, > > I have the same question.??? > Anybody has the answer Please.......??? > > > Chris Picton wrote: > > > Hi All > > > > I am investigating a new cluster installation. > > > > Documentation from redhat indicates that GFS2 is not yet production > > ready. Tests I have run show it is *much* faster that gfs for my workload. > > > > Is GFS2 not production-ready due to lack of testing, or due to known > > bugs? > > > > Any advice would be appreciated > > > > Chris > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > -- > > Vimal Gupta > Sr. System Administrator > Monster.com India Pvt.Ltd. > FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A, > NOIDA, UP 201 301, INDIA > Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360 > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jb at soe.se Wed May 7 11:44:51 2008 From: jb at soe.se (=?ISO-8859-1?Q?Jonas_Bj=F6rklund?=) Date: Wed, 7 May 2008 13:44:51 +0200 (CEST) Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: <48219556.9060901@monster.co.in> References: <48219556.9060901@monster.co.in> Message-ID: Hello, I would like to know also... /Jonas On Wed, 7 May 2008, Vimal Gupta wrote: > Hi, > > I have the same question.??? > Anybody has the answer Please.......??? > > Chris Picton wrote: >> Hi All >> >> I am investigating a new cluster installation. >> >> Documentation from redhat indicates that GFS2 is not yet production ready. >> Tests I have run show it is *much* faster that gfs for my workload. >> >> Is GFS2 not production-ready due to lack of testing, or due to known bugs? >> >> Any advice would be appreciated >> >> Chris >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> >> > > > -- > > Vimal Gupta > Sr. System Administrator > Monster.com India Pvt.Ltd. > FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A, > NOIDA, UP 201 301, INDIA > Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360 > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From gordan at bobich.net Wed May 7 11:53:50 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 7 May 2008 12:53:50 +0100 (BST) Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: References: <48219556.9060901@monster.co.in> Message-ID: For some reason, I always worry when people whether something that isn't production ready _REALLY_ isn't production ready, or whether the developers are just saying it isn't production ready for fun... IIRC, the plan was that it will be ready by RHEL5.1, but additional critical bugs were discovered, the fixes for which have, to my knowledge, not made it into the distro yet. Gordan On Wed, 7 May 2008, Jonas Bj?rklund wrote: > Hello, > > I would like to know also... > > /Jonas > > On Wed, 7 May 2008, Vimal Gupta wrote: > >> Hi, >> >> I have the same question.??? >> Anybody has the answer Please.......??? >> >> Chris Picton wrote: >>> Hi All >>> >>> I am investigating a new cluster installation. >>> >>> Documentation from redhat indicates that GFS2 is not yet production >>> ready. >>> Tests I have run show it is *much* faster that gfs for my workload. >>> >>> Is GFS2 not production-ready due to lack of testing, or due to known >>> bugs? >>> >>> Any advice would be appreciated >>> >>> Chris >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster at redhat.com >>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> >>> >> >> >> -- >> >> Vimal Gupta >> Sr. System Administrator >> Monster.com India Pvt.Ltd. >> FC - 23, Block - B, 1st Floor, Film City, Sector - 16 A, >> NOIDA, UP 201 301, INDIA >> Ph# : +91-120-4024230 Fax: +91-40-66506449 Mobile: +91-9811150360 >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From swhiteho at redhat.com Wed May 7 12:09:21 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 07 May 2008 13:09:21 +0100 Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: References: <48219556.9060901@monster.co.in> Message-ID: <1210162161.3345.26.camel@localhost.localdomain> Hi, On Wed, 2008-05-07 at 13:44 +0200, Jonas Bj?rklund wrote: > Hello, > > I would like to know also... > > /Jonas > > On Wed, 7 May 2008, Vimal Gupta wrote: > > > Hi, > > > > I have the same question.??? > > Anybody has the answer Please.......??? > > > > Chris Picton wrote: > >> Hi All > >> > >> I am investigating a new cluster installation. > >> > >> Documentation from redhat indicates that GFS2 is not yet production ready. > >> Tests I have run show it is *much* faster that gfs for my workload. > >> > >> Is GFS2 not production-ready due to lack of testing, or due to known bugs? > >> > >> Any advice would be appreciated > >> > >> Chris > >> The answer is a bit of both. We are getting to the stage where the known bugs are mostly solved or will be very shortly. You can see the state of the bug list at any time by going to bugzilla.redhat.com and looking for any bug with gfs2 in the summary line. There are currently approx 70 such bugs, but please bear in mind that a large number of these are asking for new features, and some of them are duplicates of the same bug across different versions of RHEL and/or Fedora. We are currently at a stage where having a large number of people helping us in testing would be very helpful. If you have your own favourite filesystem test, or if you are in a position to run a test application, then we would be very interested in any reports of success/failure. If you do have any problems, then please do: o Check bugzilla to see if someone else has had the same problem o Report them (preferably via bugzilla, as that ensures that they won't get lost somewhere) o Report them as "Fedora, rawhide" if they relate to the upstream kernel (either Linus' tree or my -nmw git tree) and indicate in the comments section which of these kernels you were using o Send patches if you have them, but please don't let that stop you reporting bugs. All reports are useful. We might not be able to always fix each and every report right away, but sometimes patterns emerge via a number of reports which do allow us to home in on a particularly tricky issue. o If you experience a hang, then please include (if possible): - A glock lock dump from all nodes (via debugfs) - A dlm lock dump from all nodes (via debugfs) - A stack trace from all nodes (echo t >/proc/sysrq-trigger) o If you experience an oops, then please make sure that you include all the messages (including those which might have been logged just before the oops itself). The more people we have testing & reporting bugs, the quicker we can approach stability. There is one issue which I'm currently working on relating to a (fairly rare, but nonetheless possible) race. This happens when two threads calling ->readpage() race with each other. The reason that this is problematic is that its the one place left where we are using "try locks" to get around the page lock/glock lock ordering problem and the VFS's AOP_TRUNCATED_PAGE return code is not guaranteed to result in ->readpage() being called again if another ->readpage() has raced with it and brought the page uptodate. As a result "try locks" are the only option, but for long and complicated reasons when a "try lock" is queued it might end up triggering a demotion (if a request is pending from a remote node) which deadlocks due to page lock/glock ordering. The patch I'm working on at the moment, fixes that problem by failing the glock (GLR_TRYFAILED) if a demote is needed and scheduling the glock workqueue to deal with the demotion, thus avoiding the race. The try lock will then be retried at a later date when it can be successful. The bugzilla for this is #432057 if you want to follow my progress. Steve. From swhiteho at redhat.com Wed May 7 12:21:31 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Wed, 07 May 2008 13:21:31 +0100 Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: References: <48219556.9060901@monster.co.in> Message-ID: <1210162891.3345.38.camel@localhost.localdomain> Hi, On Wed, 2008-05-07 at 12:53 +0100, gordan at bobich.net wrote: > For some reason, I always worry when people whether something that isn't > production ready _REALLY_ isn't production ready, or whether the > developers are just saying it isn't production ready for fun... > > IIRC, the plan was that it will be ready by RHEL5.1, but additional > critical bugs were discovered, the fixes for which have, to my > knowledge, not made it into the distro yet. > > Gordan > This issue is that the rules for updating RHEL are that we can't put in updates to GFS2 in RHEL 5.1 because GFS2 is a demo feature in 5.1 and we don't want to potentially risk adding bugs by fixing unsupported features. I know that it seems to have been a long time but, I hope, understandably, we are cautious of risking other people's important data on the filesystem until we are sure that we've sorted out all the issues and have been through extensive testing. The net result is that there is a delay between the "appears to work ok" stage and the "this is supported" stage and thats more or less inevitable. Fedora (and rawhide in particular) is there to provide the "bleeding edge" code for testing purposes ahead of the RHEL releases. I know that we've been a bit slow in pushing updates (particularly of the gfs2-utils and cman packages) into Fedora in the past. Thats changing and we should be much better at keeping those uptodate in the future. The gfs2-utils package was recently updated and cman is on the list to be done shortly, Steve. From Sayed.Mujtaba at in.unisys.com Wed May 7 12:20:58 2008 From: Sayed.Mujtaba at in.unisys.com (Mujtaba, Sayed Mohammed) Date: Wed, 7 May 2008 17:50:58 +0530 Subject: [Linux-cluster] Red Hat cluster Manager(cman) and rgmanager Message-ID: Hi, I am interested in studying design of Cluster Manager (cman) and rgmanager. Is there any specific document available focusing on development of these? Or let me know where I can get more information about it as only going through available source code in Red Hat site to understand it is not a good idea. Thanks, -Mujtaba -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Wed May 7 12:27:14 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 7 May 2008 13:27:14 +0100 (BST) Subject: [Linux-cluster] GFS vs GFS2 In-Reply-To: <1210162891.3345.38.camel@localhost.localdomain> References: <48219556.9060901@monster.co.in> <1210162891.3345.38.camel@localhost.localdomain> Message-ID: On Wed, 7 May 2008, Steven Whitehouse wrote: > On Wed, 2008-05-07 at 12:53 +0100, gordan at bobich.net wrote: >> For some reason, I always worry when people whether something that isn't >> production ready _REALLY_ isn't production ready, or whether the >> developers are just saying it isn't production ready for fun... >> >> IIRC, the plan was that it will be ready by RHEL5.1, but additional >> critical bugs were discovered, the fixes for which have, to my >> knowledge, not made it into the distro yet. >> > This issue is that the rules for updating RHEL are that we can't put in > updates to GFS2 in RHEL 5.1 because GFS2 is a demo feature in 5.1 and we > don't want to potentially risk adding bugs by fixing unsupported > features. I know that it seems to have been a long time but, I hope, > understandably, we are cautious of risking other people's important data > on the filesystem until we are sure that we've sorted out all the issues > and have been through extensive testing. I think you misunderstood - I fully suport the approach you are taking of ensuring that RHEL features are totally stable. Those that want to play with unstable features always have FC available. :) Gordan From underscore_dot at yahoo.com Wed May 7 15:03:12 2008 From: underscore_dot at yahoo.com (nch) Date: Wed, 7 May 2008 08:03:12 -0700 (PDT) Subject: [Linux-cluster] GFS vs GFS2 Message-ID: <879831.53021.qm@web32408.mail.mud.yahoo.com> Hi, I think I'll post mine. I'm using a GNBD device formated as GFS2 (min-gfs.txt) to share a Compass/Lucene search engine index between two instances of a web app. If one of the instances creates the index, the other one won't be able to read it, whether the first one is running or not, throwing java.io.IOException: read past EOF. I might have configured sth wrong, but the thing is that if I format the device as GFS, instead of GFS2, then this issue does not occur. Regards ----- Original Message ---- From: Steven Whitehouse To: linux clustering Sent: Wednesday, May 7, 2008 2:09:21 PM Subject: Re: [Linux-cluster] GFS vs GFS2 Hi, On Wed, 2008-05-07 at 13:44 +0200, Jonas Bj?rklund wrote: > Hello, > > I would like to know also... > > /Jonas > > On Wed, 7 May 2008, Vimal Gupta wrote: > > > Hi, > > > > I have the same question.??? > > Anybody has the answer Please.......??? > > > > Chris Picton wrote: > >> Hi All > >> > >> I am investigating a new cluster installation. > >> > >> Documentation from redhat indicates that GFS2 is not yet production ready. > >> Tests I have run show it is *much* faster that gfs for my workload. > >> > >> Is GFS2 not production-ready due to lack of testing, or due to known bugs? > >> > >> Any advice would be appreciated > >> > >> Chris > >> The answer is a bit of both. We are getting to the stage where the known bugs are mostly solved or will be very shortly. You can see the state of the bug list at any time by going to bugzilla.redhat.com and looking for any bug with gfs2 in the summary line. There are currently approx 70 such bugs, but please bear in mind that a large number of these are asking for new features, and some of them are duplicates of the same bug across different versions of RHEL and/or Fedora. We are currently at a stage where having a large number of people helping us in testing would be very helpful. If you have your own favourite filesystem test, or if you are in a position to run a test application, then we would be very interested in any reports of success/failure. If you do have any problems, then please do: o Check bugzilla to see if someone else has had the same problem o Report them (preferably via bugzilla, as that ensures that they won't get lost somewhere) o Report them as "Fedora, rawhide" if they relate to the upstream kernel (either Linus' tree or my -nmw git tree) and indicate in the comments section which of these kernels you were using o Send patches if you have them, but please don't let that stop you reporting bugs. All reports are useful. We might not be able to always fix each and every report right away, but sometimes patterns emerge via a number of reports which do allow us to home in on a particularly tricky issue. o If you experience a hang, then please include (if possible): - A glock lock dump from all nodes (via debugfs) - A dlm lock dump from all nodes (via debugfs) - A stack trace from all nodes (echo t >/proc/sysrq-trigger) o If you experience an oops, then please make sure that you include all the messages (including those which might have been logged just before the oops itself). The more people we have testing & reporting bugs, the quicker we can approach stability. There is one issue which I'm currently working on relating to a (fairly rare, but nonetheless possible) race. This happens when two threads calling ->readpage() race with each other. The reason that this is problematic is that its the one place left where we are using "try locks" to get around the page lock/glock lock ordering problem and the VFS's AOP_TRUNCATED_PAGE return code is not guaranteed to result in ->readpage() being called again if another ->readpage() has raced with it and brought the page uptodate. As a result "try locks" are the only option, but for long and complicated reasons when a "try lock" is queued it might end up triggering a demotion (if a request is pending from a remote node) which deadlocks due to page lock/glock ordering. The patch I'm working on at the moment, fixes that problem by failing the glock (GLR_TRYFAILED) if a demote is needed and scheduling the glock workqueue to deal with the demotion, thus avoiding the race. The try lock will then be retried at a later date when it can be successful. The bugzilla for this is #432057 if you want to follow my progress. Steve. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From garromo at us.ibm.com Wed May 7 15:34:50 2008 From: garromo at us.ibm.com (Gary Romo) Date: Wed, 7 May 2008 09:34:50 -0600 Subject: [Linux-cluster] fence error messages In-Reply-To: <1210110446.15248.37.camel@localhost.localdomain> Message-ID: Yes I do. Gary Romo IBM Global Technology Services 303.458.4415 Email: garromo at us.ibm.com Pager:1.877.552.9264 Text message: gromo at skytel.com Lon Hohberger Sent by: linux-cluster-bounces at redhat.com 05/06/2008 03:47 PM Please respond to linux clustering To linux clustering cc Subject Re: [Linux-cluster] fence error messages On Tue, 2008-05-06 at 15:37 -0600, Gary Romo wrote: > > I am getting these in /var/log/messages > > May 6 14:46:28 lxomt06e fenced[2849]: fencing node "bogusnode" > May 6 14:46:28 lxomt06e fenced[2849]: fence "bogusnode" failed > > I am basically setting up a single-node cluster, because I don't have > the 2nd node yet. > So I am using manual fence in order to accomplish this. > > > > > nodename="bogusnode"/> > > do you have a manual_fence in the section? -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfranz at freerun.com Wed May 7 16:23:06 2008 From: jfranz at freerun.com (Jerry Franz) Date: Wed, 07 May 2008 09:23:06 -0700 Subject: [Linux-cluster] Multipathd not reliably picking up GNBD devices on client machines Message-ID: <4821D76A.3010806@freerun.com> I've about run out of ideas. I have assembled a HA stack on Redhat Cluster where a pair of machines running DRBD in Primary/Primary mode over bonded gigabit ethernet interfaces serve six clustered logical volumes with GFS via GNBD to four other machines. All ethernet interfaces are bonded. I've got bonding, GNBD, CLVMD, Multipath and DRBD all happy: Except that multipathd simply refuses to *reliably* pick up the GNBD devices during system boot. I'll run '/etc/init.d/multipathd reload' by hand (sometimes it takes more than once) and it will sooner or later pick them up, but I can see no rhyme or reason to it: Sometimes it just works, and sometimes it doesn't. I'll boot a client machine once and everything might work fine. I'll reboot it again, and maybe multipathd won't find the GNBD devices (or maybe it will find _some_ of them). Ideas? -- Benjamin Franz From bkyoung at gmail.com Wed May 7 18:14:05 2008 From: bkyoung at gmail.com (Brandon Young) Date: Wed, 7 May 2008 13:14:05 -0500 Subject: [Linux-cluster] Re: How do you verify/test fencing? In-Reply-To: References: <1210109704.15248.28.camel@localhost.localdomain> Message-ID: <824ffea00805071114x34d6b1f1q304a60e1e52e541b@mail.gmail.com> Unplug the heartbeat cable. On Wed, May 7, 2008 at 6:19 AM, Chris Picton wrote: > On Tue, 06 May 2008 17:35:04 -0400, Lon Hohberger wrote: > > > On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote: > >> > >> Is there a command that you can run to test/veryify that fencing is > >> working properly? > >> Or that it is part of the fence if you will? I realize that the primary > >> focus of the fence is to shut off the other server(s). > >> However, when I have a cluster up, how can I determine that all of my > >> nodes are properly fenced? > > > > > > * For testing whether or not fencing works, stop the cluster software on > > all the nodes and run 'fence_node ' (where nodename is a host > > you're not working on). > > > > * For testing whether or not a node will be fenced as a matter of > > recovery, try 'cman_tool services'. If that node's ID isn't in the > > "fence" section, it will not be fenced if it fails. > > These two step will not test that a node will be fenced automatically if > it is malfunctioning. > > What can be done to a cluster node, to test that it will be automatically > fenced if there is a problem. > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Wed May 7 18:57:13 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 07 May 2008 14:57:13 -0400 Subject: [Linux-cluster] Red Hat cluster Manager(cman) and rgmanager In-Reply-To: References: Message-ID: <1210186633.23294.0.camel@dhcp-100-19-208.bos.redhat.com> On Wed, 2008-05-07 at 17:50 +0530, Mujtaba, Sayed Mohammed wrote: > Hi, > > I am interested in studying design of Cluster Manager (cman) and > rgmanager. for rgmanager, the README has a lot of the design elements. > > Is there any specific document available focusing on development of > these? > > > > Or let me know where I can get more information about it as only > going through > > available source code in Red Hat site to understand it is not a good > idea. > > > > > > Thanks, > > -Mujtaba > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From fog at t.is Wed May 7 20:57:28 2008 From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=) Date: Wed, 7 May 2008 20:57:28 -0000 Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is> Hi, I have a 2 node cluster running RHEL 5.1 x86_64 and fully patched as of today. If i cold-boot the cluster (both nodes) everything comes up smoothly and i can migrate services between nodes etc... However when i take one node down i am having difficultys leaving the fence domain. If i kill the fence daemon on the node i am trying to remove gracefully or use cman_tool leave force and reboot it, it comes back up, cman starts and it appears to join the cluster. The CLVMD init script hangs (just sits and hangs) and rgmanager does not start up correctly. Also CLVMD and rgmanager just sit in a zombie state and i have to poweroff or fence the node to get it to reboot.... The cluster never stabilizes it self until i cold boot both nodes. Then it is OK until the next reboot. I have read something about similar cases but did not find any magic solution! ;) My cluster.conf is attached. There is no firewall running on the machines in question (chkconfig iptables off;). Various output from the the that is rebooted: Output from group_tool services: type level name id state fence 0 default 00000000 JOIN_STOP_WAIT [1 2] dlm 1 rgmanager 00000000 JOIN_STOP_WAIT [1 2] Output from group_tool fenced: 1210193027 our_nodeid 1 our_name node-16 1210193027 listen 4 member 5 groupd 7 1210193029 client 3: join default 1210193029 delay post_join 120s post_fail 0s 1210193029 added 2 nodes from ccs 1210193542 client 3: dump Various output from the other node: Output from group_tool services: type level name id state fence 0 default 00010002 JOIN_START_WAIT [1 2] dlm 1 clvmd 00020002 none [2] dlm 1 rgmanager 00030002 FAIL_ALL_STOPPED [1 2] Output from group_tool dump fenced: 1210191957 our_nodeid 2 our_name node-17 1210191957 listen 4 member 5 groupd 7 1210191958 client 3: join default 1210191958 delay post_join 120s post_fail 0s 1210191958 added 2 nodes from ccs 1210191958 setid default 65538 1210191958 start default 1 members 2 1210191958 do_recovery stop 0 start 1 finish 0 1210191958 node "node-16" not a cman member, cn 1 1210191958 add first victim node-16 1210191959 node "node-16" not a cman member, cn 1 1210191960 node "node-16" not a cman member, cn 1 1210191961 node "node-16" not a cman member, cn 1 1210191962 node "node-16" not a cman member, cn 1 1210191963 node "node-16" not a cman member, cn 1 1210191964 node "node-16" not a cman member, cn 1 1210191965 node "node-16" not a cman member, cn 1 1210191966 node "node-16" not a cman member, cn 1 1210191967 node "node-16" not a cman member, cn 1 1210191968 node "node-16" not a cman member, cn 1 1210191969 node "node-16" not a cman member, cn 1 1210191970 node "node-16" not a cman member, cn 1 1210191971 node "node-16" not a cman member, cn 1 1210191972 node "node-16" not a cman member, cn 1 1210191973 node "node-16" not a cman member, cn 1 1210191974 reduce victim node-16 1210191974 delay of 16s leaves 0 victims 1210191974 finish default 1 1210191974 stop default 1210191974 start default 2 members 1 2 1210191974 do_recovery stop 1 start 2 finish 1 1210193633 client 3: dump Thanks in advanced. K?r kve?ja / Best Regards, Finnur ?rn Gu?mundsson Network Engineer - Network Operations fog at t.is TM Software Ur?arhvarf 6, IS-203 K?pavogur, Iceland Tel: +354 545 3000 - fax +354 545 3610 www.tm-software.is This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster.conf Type: application/octet-stream Size: 2036 bytes Desc: cluster.conf URL: From garromo at us.ibm.com Wed May 7 22:36:34 2008 From: garromo at us.ibm.com (Gary Romo) Date: Wed, 7 May 2008 16:36:34 -0600 Subject: [Linux-cluster] Re: How do you verify/test fencing? In-Reply-To: <824ffea00805071114x34d6b1f1q304a60e1e52e541b@mail.gmail.com> Message-ID: We are using multicast address. I know how to physically test wether fencing is happening or not, but what commands can you ask the cluster to report on fencing? I don't see any, but wanted to double check with the group. Thanks. Gary Romo "Brandon Young" Sent by: linux-cluster-bounces at redhat.com 05/07/2008 12:14 PM Please respond to linux clustering To "linux clustering" cc Subject Re: [Linux-cluster] Re: How do you verify/test fencing? Unplug the heartbeat cable. On Wed, May 7, 2008 at 6:19 AM, Chris Picton wrote: On Tue, 06 May 2008 17:35:04 -0400, Lon Hohberger wrote: > On Tue, 2008-05-06 at 14:37 -0600, Gary Romo wrote: >> >> Is there a command that you can run to test/veryify that fencing is >> working properly? >> Or that it is part of the fence if you will? I realize that the primary >> focus of the fence is to shut off the other server(s). >> However, when I have a cluster up, how can I determine that all of my >> nodes are properly fenced? > > > * For testing whether or not fencing works, stop the cluster software on > all the nodes and run 'fence_node ' (where nodename is a host > you're not working on). > > * For testing whether or not a node will be fenced as a matter of > recovery, try 'cman_tool services'. If that node's ID isn't in the > "fence" section, it will not be fenced if it fails. These two step will not test that a node will be fenced automatically if it is malfunctioning. What can be done to a cluster node, to test that it will be automatically fenced if there is a problem. -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From jas199931 at yahoo.com Wed May 7 23:41:36 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 7 May 2008 16:41:36 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <424492.84161.qm@web32205.mail.mud.yahoo.com> Message-ID: <311342.99837.qm@web32207.mail.mud.yahoo.com> --- Ja S wrote: > > > > > > A couple of further questions about the master > > copy of > > > lock resources. > > > > > > The first one: > > > ============= > > > > > > Again, assume: > > > 1) Node A is extremely too busy and handle all > > > requests > > > 2) other nodes are just idle and have never > > handled > > > any requests > > > > > > According to the documents, Node A will hold all > > > master copies initially. The thing I am not > aware > > of > > > and unclear is whether the lock manager will > > evenly > > > distribute the master copies on Node A to other > > nodes > > > when it thinks the number of master copies on > Node > > A > > > is too many? > > > > Locks are only remastered when a node leaves the > > cluster. In that case > > all of its nodes will be moved to another node. We > > do not do dynamic > > remastering - a resource that is mastered on one > > node will stay mastered > > on that node regardless of traffic or load, until > > all users of the > > resource have been freed. > > > Thank you very much. > > > > > > > The second one: > > > ============== > > > > > > Assume a master copy of lock resource is on Node > > A. > > > Now Node B holds a local copy of the lock > > resource. > > > When the lock queues changed on the local copy > on > > Node > > > B, will the master copy on Node A be updated > > > simultaneously? If so, when more than one nodes > > have > > > the local copy of the same lock resource, how > the > > lock > > > manager to handle the update of the master copy? > > Using > > > another lock mechanism to prevent the corruption > > of > > > the master copy? > > > > > > > All locking happens on the master node. The local > > copy is just that, a > > copy. It is updated when the master confirms what > > has happened. The > > local copy is there mainly for rebuilding the > > resource table when a > > master leaves the cluster, and to keep a track of > > locks that exist on > > the local node. The local copy is NOT complete. it > > only contains local > > users of a resource. > > > > Thanks again for the kind and detailed explanation. > > > I am sorry I have to bother you again as I am having > more questions. I analysed /proc/cluster/dlm_dir and > dlm_locks and found some strange things. Please see > below: > > > >From /proc/cluster/dlm_dir: > > In lock space [ABC]: > This node (node 2) has 445 lock resources in total > where > --328 master lock resources > --117 local copies of lock resources mastered on > other nodes. > > =============================== > =============================== > > > >From /proc/cluster/dlm_locks: > > In lock space [ABC]: > There are 1678 lock resouces in use where > --1674 lock resources are mastered by this node > (node > 2) > --4 lock resources are mastered by other nodes, > within which: > ----1 lock resource mastered on node 1 > ----1 lock resource mastered on node 3 > ----1 lock resource mastered on node 4 > ----1 lock resource mastered on node 5 > > A typical master lock resource in > /proc/cluster/dlm_locks is: > Resource 000001000de4fd88 (parent 0000000000000000). > Name (len=24) " 3 5fafc85" > Master Copy > LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > Granted Queue > 1ff5036d NL Remote: 4 000603e8 > 80d2013f NL Remote: 5 00040214 > 00240209 NL Remote: 3 0001031d > 00080095 NL Remote: 1 00040197 > 00010304 NL > Conversion Queue > Waiting Queue > > > After search for local copy in > /proc/cluster/dlm_locks, I got: > Resource 000001002a273618 (parent 0000000000000000). > Name (len=16) "withdraw 3......" > Local Copy, Master is node 3 > Granted Queue > 0004008d PR Master: 0001008c > Conversion Queue > Waiting Queue > > -- > Resource 000001003fe69b68 (parent 0000000000000000). > Name (len=16) "withdraw 5......" > Local Copy, Master is node 5 > Granted Queue > 819402ef PR Master: 00010317 > Conversion Queue > Waiting Queue > > -- > Resource 000001002a2732e8 (parent 0000000000000000). > Name (len=16) "withdraw 1......" > Local Copy, Master is node 1 > Granted Queue > 000401e9 PR Master: 00010074 > Conversion Queue > Waiting Queue > > -- > Resource 000001004a32e598 (parent 0000000000000000). > Name (len=16) "withdraw 4......" > Local Copy, Master is node 4 > Granted Queue > 1f5b0317 PR Master: 00010203 > Conversion Queue > Waiting Queue > > These four local copy of lock resources have been > staying in /proc/cluster/dlm_locks for several days. > > Now my questions: > 1. In my case, for the same lock space, the number > of > master lock resources reported by dlm_dir is much > SMALLER than that reported in dlm_locks. My > understanding is that master lock resources listed > in > dlm_dir must be larger than or at least the same as > that reported in dlm_locks. The situation I > discovered > on the node does not make any sense to me. Am I > missing anything? Can you help me to clarify the > case? I have found the answer. Yes, I did miss something. I need to sum all lock resources mastered by the node on all cluster members. In this case, the total number of lock resources mastered by the node is just 1674, which matches the number reported from dlm_locks. Sorry for asking the question without careful thinking. > 2. What can cause "withdraw ...." to be the lock > resource name? After read the gfs source code, it seems that this is caused by issuing a command like "gfs_tool withdraw ". However, I checked all command histroies on all nodes in the cluster, but did not find any command like this. This question and the next question remain open. Please help. > 3. These four local copy of lock resources have not > been released for at least serveral days as I knew. > How can I find out whether they are in a strange > dead > situation or are still waiting for the lock manager > to release them? How to change the timeout? > Thank you very much for your great further help in advance. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From ccaulfie at redhat.com Thu May 8 07:21:31 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Thu, 08 May 2008 08:21:31 +0100 Subject: [Linux-cluster] Lock Resources In-Reply-To: <311342.99837.qm@web32207.mail.mud.yahoo.com> References: <311342.99837.qm@web32207.mail.mud.yahoo.com> Message-ID: <4822A9FB.50804@redhat.com> Ja S wrote: > --- Ja S wrote: > >>>> A couple of further questions about the master >>> copy of >>>> lock resources. >>>> >>>> The first one: >>>> ============= >>>> >>>> Again, assume: >>>> 1) Node A is extremely too busy and handle all >>>> requests >>>> 2) other nodes are just idle and have never >>> handled >>>> any requests >>>> >>>> According to the documents, Node A will hold all >>>> master copies initially. The thing I am not >> aware >>> of >>>> and unclear is whether the lock manager will >>> evenly >>>> distribute the master copies on Node A to other >>> nodes >>>> when it thinks the number of master copies on >> Node >>> A >>>> is too many? >>> Locks are only remastered when a node leaves the >>> cluster. In that case >>> all of its nodes will be moved to another node. We >>> do not do dynamic >>> remastering - a resource that is mastered on one >>> node will stay mastered >>> on that node regardless of traffic or load, until >>> all users of the >>> resource have been freed. >> >> Thank you very much. >> >> >>>> The second one: >>>> ============== >>>> >>>> Assume a master copy of lock resource is on Node >>> A. >>>> Now Node B holds a local copy of the lock >>> resource. >>>> When the lock queues changed on the local copy >> on >>> Node >>>> B, will the master copy on Node A be updated >>>> simultaneously? If so, when more than one nodes >>> have >>>> the local copy of the same lock resource, how >> the >>> lock >>>> manager to handle the update of the master copy? >>> Using >>>> another lock mechanism to prevent the corruption >>> of >>>> the master copy? >>>> >>> All locking happens on the master node. The local >>> copy is just that, a >>> copy. It is updated when the master confirms what >>> has happened. The >>> local copy is there mainly for rebuilding the >>> resource table when a >>> master leaves the cluster, and to keep a track of >>> locks that exist on >>> the local node. The local copy is NOT complete. it >>> only contains local >>> users of a resource. >>> >> Thanks again for the kind and detailed explanation. >> >> >> I am sorry I have to bother you again as I am having >> more questions. I analysed /proc/cluster/dlm_dir and >> dlm_locks and found some strange things. Please see >> below: >> >> >> >From /proc/cluster/dlm_dir: >> >> In lock space [ABC]: >> This node (node 2) has 445 lock resources in total >> where >> --328 master lock resources >> --117 local copies of lock resources mastered on >> other nodes. >> >> =============================== >> =============================== >> >> >> >From /proc/cluster/dlm_locks: >> >> In lock space [ABC]: >> There are 1678 lock resouces in use where >> --1674 lock resources are mastered by this node >> (node >> 2) >> --4 lock resources are mastered by other nodes, >> within which: >> ----1 lock resource mastered on node 1 >> ----1 lock resource mastered on node 3 >> ----1 lock resource mastered on node 4 >> ----1 lock resource mastered on node 5 >> >> A typical master lock resource in >> /proc/cluster/dlm_locks is: >> Resource 000001000de4fd88 (parent 0000000000000000). >> Name (len=24) " 3 5fafc85" >> Master Copy >> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00 >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >> Granted Queue >> 1ff5036d NL Remote: 4 000603e8 >> 80d2013f NL Remote: 5 00040214 >> 00240209 NL Remote: 3 0001031d >> 00080095 NL Remote: 1 00040197 >> 00010304 NL >> Conversion Queue >> Waiting Queue >> >> >> After search for local copy in >> /proc/cluster/dlm_locks, I got: >> Resource 000001002a273618 (parent 0000000000000000). >> Name (len=16) "withdraw 3......" >> Local Copy, Master is node 3 >> Granted Queue >> 0004008d PR Master: 0001008c >> Conversion Queue >> Waiting Queue >> >> -- >> Resource 000001003fe69b68 (parent 0000000000000000). >> Name (len=16) "withdraw 5......" >> Local Copy, Master is node 5 >> Granted Queue >> 819402ef PR Master: 00010317 >> Conversion Queue >> Waiting Queue >> >> -- >> Resource 000001002a2732e8 (parent 0000000000000000). >> Name (len=16) "withdraw 1......" >> Local Copy, Master is node 1 >> Granted Queue >> 000401e9 PR Master: 00010074 >> Conversion Queue >> Waiting Queue >> >> -- >> Resource 000001004a32e598 (parent 0000000000000000). >> Name (len=16) "withdraw 4......" >> Local Copy, Master is node 4 >> Granted Queue >> 1f5b0317 PR Master: 00010203 >> Conversion Queue >> Waiting Queue >> >> These four local copy of lock resources have been >> staying in /proc/cluster/dlm_locks for several days. >> >> Now my questions: >> 1. In my case, for the same lock space, the number >> of >> master lock resources reported by dlm_dir is much >> SMALLER than that reported in dlm_locks. My >> understanding is that master lock resources listed >> in >> dlm_dir must be larger than or at least the same as >> that reported in dlm_locks. The situation I >> discovered >> on the node does not make any sense to me. Am I >> missing anything? Can you help me to clarify the >> case? > > I have found the answer. Yes, I did miss something. I > need to sum all lock resources mastered by the node on > all cluster members. In this case, the total number of > lock resources mastered by the node is just 1674, > which matches the number reported from dlm_locks. > Sorry for asking the question without careful > thinking. > > >> 2. What can cause "withdraw ...." to be the lock >> resource name? > > After read the gfs source code, it seems that this is > caused by issuing a command like "gfs_tool withdraw > ". However, I checked all command > histroies on all nodes in the cluster, but did not > find any command like this. This question and the next > question remain open. Please help. You might like to ask GFS-specific questions on a new thread. I don't know about GFS and the people who do are probable not reading this one by now ;-) >> 3. These four local copy of lock resources have not >> been released for at least serveral days as I knew. >> How can I find out whether they are in a strange >> dead >> situation or are still waiting for the lock manager >> to release them? How to change the timeout? There is no lock timeout for local copies. If a lock is shown in dlm_locks then either the lock is active somewhere or you have found a bug! Bear in mind that this is a DLM response, GFS does cache locks but don't know the details. -- Chrissie From jas199931 at yahoo.com Thu May 8 08:03:48 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 01:03:48 -0700 (PDT) Subject: [Linux-cluster] Lock Resources In-Reply-To: <4822A9FB.50804@redhat.com> Message-ID: <143345.94581.qm@web32208.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > --- Ja S wrote: > > > >>>> A couple of further questions about the master > >>> copy of > >>>> lock resources. > >>>> > >>>> The first one: > >>>> ============= > >>>> > >>>> Again, assume: > >>>> 1) Node A is extremely too busy and handle all > >>>> requests > >>>> 2) other nodes are just idle and have never > >>> handled > >>>> any requests > >>>> > >>>> According to the documents, Node A will hold > all > >>>> master copies initially. The thing I am not > >> aware > >>> of > >>>> and unclear is whether the lock manager will > >>> evenly > >>>> distribute the master copies on Node A to other > >>> nodes > >>>> when it thinks the number of master copies on > >> Node > >>> A > >>>> is too many? > >>> Locks are only remastered when a node leaves the > >>> cluster. In that case > >>> all of its nodes will be moved to another node. > We > >>> do not do dynamic > >>> remastering - a resource that is mastered on one > >>> node will stay mastered > >>> on that node regardless of traffic or load, > until > >>> all users of the > >>> resource have been freed. > >> > >> Thank you very much. > >> > >> > >>>> The second one: > >>>> ============== > >>>> > >>>> Assume a master copy of lock resource is on > Node > >>> A. > >>>> Now Node B holds a local copy of the lock > >>> resource. > >>>> When the lock queues changed on the local copy > >> on > >>> Node > >>>> B, will the master copy on Node A be updated > >>>> simultaneously? If so, when more than one nodes > >>> have > >>>> the local copy of the same lock resource, how > >> the > >>> lock > >>>> manager to handle the update of the master > copy? > >>> Using > >>>> another lock mechanism to prevent the > corruption > >>> of > >>>> the master copy? > >>>> > >>> All locking happens on the master node. The > local > >>> copy is just that, a > >>> copy. It is updated when the master confirms > what > >>> has happened. The > >>> local copy is there mainly for rebuilding the > >>> resource table when a > >>> master leaves the cluster, and to keep a track > of > >>> locks that exist on > >>> the local node. The local copy is NOT complete. > it > >>> only contains local > >>> users of a resource. > >>> > >> Thanks again for the kind and detailed > explanation. > >> > >> > >> I am sorry I have to bother you again as I am > having > >> more questions. I analysed /proc/cluster/dlm_dir > and > >> dlm_locks and found some strange things. Please > see > >> below: > >> > >> > >> >From /proc/cluster/dlm_dir: > >> > >> In lock space [ABC]: > >> This node (node 2) has 445 lock resources in > total > >> where > >> --328 master lock resources > >> --117 local copies of lock resources mastered > on > >> other nodes. > >> > >> =============================== > >> =============================== > >> > >> > >> >From /proc/cluster/dlm_locks: > >> > >> In lock space [ABC]: > >> There are 1678 lock resouces in use where > >> --1674 lock resources are mastered by this node > >> (node > >> 2) > >> --4 lock resources are mastered by other > nodes, > >> within which: > >> ----1 lock resource mastered on node 1 > >> ----1 lock resource mastered on node 3 > >> ----1 lock resource mastered on node 4 > >> ----1 lock resource mastered on node 5 > >> > >> A typical master lock resource in > >> /proc/cluster/dlm_locks is: > >> Resource 000001000de4fd88 (parent > 0000000000000000). > >> Name (len=24) " 3 5fafc85" > >> Master Copy > >> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 > 00 > >> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 > >> Granted Queue > >> 1ff5036d NL Remote: 4 000603e8 > >> 80d2013f NL Remote: 5 00040214 > >> 00240209 NL Remote: 3 0001031d > >> 00080095 NL Remote: 1 00040197 > >> 00010304 NL > >> Conversion Queue > >> Waiting Queue > >> > >> > >> After search for local copy in > >> /proc/cluster/dlm_locks, I got: > >> Resource 000001002a273618 (parent > 0000000000000000). > >> Name (len=16) "withdraw 3......" > >> Local Copy, Master is node 3 > >> Granted Queue > >> 0004008d PR Master: 0001008c > >> Conversion Queue > >> Waiting Queue > >> > >> -- > >> Resource 000001003fe69b68 (parent > 0000000000000000). > >> Name (len=16) "withdraw 5......" > >> Local Copy, Master is node 5 > >> Granted Queue > >> 819402ef PR Master: 00010317 > >> Conversion Queue > >> Waiting Queue > >> > >> -- > >> Resource 000001002a2732e8 (parent > 0000000000000000). > >> Name (len=16) "withdraw 1......" > >> Local Copy, Master is node 1 > >> Granted Queue > >> 000401e9 PR Master: 00010074 > >> Conversion Queue > >> Waiting Queue > >> > >> -- > >> Resource 000001004a32e598 (parent > 0000000000000000). > >> Name (len=16) "withdraw 4......" > >> Local Copy, Master is node 4 > >> Granted Queue > >> 1f5b0317 PR Master: 00010203 > >> Conversion Queue > >> Waiting Queue > >> > >> These four local copy of lock resources have been > >> staying in /proc/cluster/dlm_locks for several > days. > >> > >> Now my questions: > >> 1. In my case, for the same lock space, the > number > >> of > >> master lock resources reported by dlm_dir is much > >> SMALLER than that reported in dlm_locks. My > >> understanding is that master lock resources > listed > >> in > >> dlm_dir must be larger than or at least the same > as > >> that reported in dlm_locks. The situation I > >> discovered > >> on the node does not make any sense to me. Am I > >> missing anything? Can you help me to clarify the > >> case? > > > > I have found the answer. Yes, I did miss > something. I > > need to sum all lock resources mastered by the > node on > > all cluster members. In this case, the total > number of > > lock resources mastered by the node is just 1674, > > which matches the number reported from dlm_locks. > > Sorry for asking the question without careful > > thinking. > > > > > >> 2. What can cause "withdraw ...." to be the lock > >> resource name? > > > > After read the gfs source code, it seems that this > is > > caused by issuing a command like "gfs_tool > withdraw > > ". However, I checked all command > > histroies on all nodes in the cluster, but did not > > find any command like this. This question and the > next > > question remain open. Please help. > > > You might like to ask GFS-specific questions on a > new thread. I don't > know about GFS and the people who do are probable > not reading this one > by now ;-) > > > >> 3. These four local copy of lock resources have > not > >> been released for at least serveral days as I > knew. > >> How can I find out whether they are in a strange > >> dead > >> situation or are still waiting for the lock > manager > >> to release them? How to change the timeout? > > There is no lock timeout for local copies. If a lock > is shown in > dlm_locks then either the lock is active somewhere > or you have found a bug! > > Bear in mind that this is a DLM response, GFS does > cache locks but don't > know the details. > Thank you for the information. Best, Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From jas199931 at yahoo.com Thu May 8 08:49:05 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 01:49:05 -0700 (PDT) Subject: [Linux-cluster] GFS lock cache or bug? Message-ID: <163785.26143.qm@web32208.mail.mud.yahoo.com> Hi, All: I used to 'ls -la' a subdirecotry, which contains more than 30,000 small files, on a SAN storage long time ago just once from Node 5, which sits in the cluster but does nothing. In other words, Node 5 is an idel node. Now when I looked at /proc/cluster/dlm_locks on the node, I realised that there are many PR locks and the number of PR clocks is pretty much the same as the number of files in the subdirectory I used to list. Then I randomly picked up some lock resources and converted the second part (hex number) of the name of the lock resources to decimal numbers, which are simply the inode numbers. Then I searched the subdirectory and confirmed that these inode numbers match the files in the subdirectory. Now, my questions are: 1) how can I find out which unix command requires what kind of locks? Does the ls command really need PR lock? 2) how long GFS caches the locks? 3) whether we can configure the caching period? 4) if GFS should not cache the lock for so many days, then does it mean this is a bug? 5) Is that a way to find out which process requires a particular lock? Below is a typical record in dlm_locks on Node 5. Is any piece of information useful for identifing the process? Resource d95d2ccc (parent 00000000). Name (len=24) " 5 cb5d35" Local Copy, Master is node 1 Granted Queue 137203da PR Master: 73980279 Conversion Queue Waiting Queue 6) If I am sure that no processes or applications are accessing the subdirectory, then how I can force GFS release these PR locks so that DLM can release the corresponding lock resources as well. Thank you very much for reading the questions and look forward to hearing from you. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From denisb+gmane at gmail.com Thu May 8 11:05:58 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 08 May 2008 13:05:58 +0200 Subject: [Linux-cluster] Re: How do you verify/test fencing? In-Reply-To: References: Message-ID: Gary Romo wrote: > > Is there a command that you can run to test/veryify that fencing is > working properly? > Or that it is part of the fence if you will? > I realize that the primary focus of the fence is to shut off the other > server(s). > However, when I have a cluster up, how can I determine that all of my > nodes are properly fenced? well.. If you would like to check that fencing devices are properly configured via cluster.conf, issue fence_node NODENAME pr. node. with all cluster services running. Of course, you would ideally want to do this one node at a time. This will ensure cluster.conf has proper fencing setup and that the fencing devices actually work. If the node isn't locked out of the cluster when you issue fence_node then there is something wrong.. -- Denis From s.wendy.cheng at gmail.com Thu May 8 13:28:22 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 08 May 2008 09:28:22 -0400 Subject: [Linux-cluster] GFS lock cache or bug? In-Reply-To: <163785.26143.qm@web32208.mail.mud.yahoo.com> References: <163785.26143.qm@web32208.mail.mud.yahoo.com> Message-ID: <4822FFF6.4000309@gmail.com> Ja S wrote: > Hi, All: > I have an old write-up about GFS lock cache issues. Shareroot people had pulled it into their web site: http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs It should explain some of your confusions. The tunables described in that write-up are formally included into RHEL 5.1 and RHEL 4.6 right now (so no need to ask for private patches). There is a long story about GFS(1)'s "ls -la" problem that one time I did plan to do something about it. Unfortunately I'm having a new job now so the better bet is probably going for GFS2. Will pass some thoughts about GFS1's "ls -la" when I have some spare time next week. -- Wendy > I used to 'ls -la' a subdirecotry, which contains more > than 30,000 small files, on a SAN storage long time > ago just once from Node 5, which sits in the cluster > but does nothing. In other words, Node 5 is an idel > node. > > Now when I looked at /proc/cluster/dlm_locks on the > node, I realised that there are many PR locks and the > number of PR clocks is pretty much the same as the > number of files in the subdirectory I used to list. > > Then I randomly picked up some lock resources and > converted the second part (hex number) of the name of > the lock resources to decimal numbers, which are > simply the inode numbers. Then I searched the > subdirectory and confirmed that these inode numbers > match the files in the subdirectory. > > > Now, my questions are: > > 1) how can I find out which unix command requires what > kind of locks? Does the ls command really need PR > lock? > > 2) how long GFS caches the locks? > > 3) whether we can configure the caching period? > > 4) if GFS should not cache the lock for so many days, > then does it mean this is a bug? > > 5) Is that a way to find out which process requires a > particular lock? Below is a typical record in > dlm_locks on Node 5. Is any piece of information > useful for identifing the process? > > Resource d95d2ccc (parent 00000000). Name (len=24) " > 5 cb5d35" > Local Copy, Master is node 1 > Granted Queue > 137203da PR Master: 73980279 > Conversion Queue > Waiting Queue > > > 6) If I am sure that no processes or applications are > accessing the subdirectory, then how I can force GFS > release these PR locks so that DLM can release the > corresponding lock resources as well. > > > Thank you very much for reading the questions and look > forward to hearing from you. > > Jas > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From smeacham at charter.net Thu May 8 13:51:47 2008 From: smeacham at charter.net (smeacham at charter.net) Date: Thu, 8 May 2008 13:51:47 +0000 Subject: [Linux-cluster] GFS lock cache or bug? In-Reply-To: <4822FFF6.4000309@gmail.com> References: <163785.26143.qm@web32208.mail.mud.yahoo.com><4822FFF6.4000309@gmail.com> Message-ID: <642289788-1210254609-cardhu_decombobulator_blackberry.rim.net-2131772402-@bxe151.bisx.prod.on.blackberry> Sent via BlackBerry by AT&T -----Original Message----- From: Wendy Cheng Date: Thu, 08 May 2008 09:28:22 To:linux clustering Subject: Re: [Linux-cluster] GFS lock cache or bug? Ja S wrote: > Hi, All: > I have an old write-up about GFS lock cache issues. Shareroot people had pulled it into their web site: http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs It should explain some of your confusions. The tunables described in that write-up are formally included into RHEL 5.1 and RHEL 4.6 right now (so no need to ask for private patches). There is a long story about GFS(1)'s "ls -la" problem that one time I did plan to do something about it. Unfortunately I'm having a new job now so the better bet is probably going for GFS2. Will pass some thoughts about GFS1's "ls -la" when I have some spare time next week. -- Wendy > I used to 'ls -la' a subdirecotry, which contains more > than 30,000 small files, on a SAN storage long time > ago just once from Node 5, which sits in the cluster > but does nothing. In other words, Node 5 is an idel > node. > > Now when I looked at /proc/cluster/dlm_locks on the > node, I realised that there are many PR locks and the > number of PR clocks is pretty much the same as the > number of files in the subdirectory I used to list. > > Then I randomly picked up some lock resources and > converted the second part (hex number) of the name of > the lock resources to decimal numbers, which are > simply the inode numbers. Then I searched the > subdirectory and confirmed that these inode numbers > match the files in the subdirectory. > > > Now, my questions are: > > 1) how can I find out which unix command requires what > kind of locks? Does the ls command really need PR > lock? > > 2) how long GFS caches the locks? > > 3) whether we can configure the caching period? > > 4) if GFS should not cache the lock for so many days, > then does it mean this is a bug? > > 5) Is that a way to find out which process requires a > particular lock? Below is a typical record in > dlm_locks on Node 5. Is any piece of information > useful for identifing the process? > > Resource d95d2ccc (parent 00000000). Name (len=24) " > 5 cb5d35" > Local Copy, Master is node 1 > Granted Queue > 137203da PR Master: 73980279 > Conversion Queue > Waiting Queue > > > 6) If I am sure that no processes or applications are > accessing the subdirectory, then how I can force GFS > release these PR locks so that DLM can release the > corresponding lock resources as well. > > > Thank you very much for reading the questions and look > forward to hearing from you. > > Jas > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jas199931 at yahoo.com Thu May 8 14:28:52 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 07:28:52 -0700 (PDT) Subject: [Linux-cluster] GFS lock cache or bug? In-Reply-To: <4822FFF6.4000309@gmail.com> Message-ID: <109539.63539.qm@web32205.mail.mud.yahoo.com> Hi Wendy: Thank you very much for the kind answer. Unfortunately, I am using Red Hat Enterprise Linux WS release 4 (Nahant Update 5) 2.6.9-42.ELsmp. When I ran gfs_tool gettune /mnt/ABC, I got: ilimit1 = 100 ilimit1_tries = 3 ilimit1_min = 1 ilimit2 = 500 ilimit2_tries = 10 ilimit2_min = 3 demote_secs = 300 incore_log_blocks = 1024 jindex_refresh_secs = 60 depend_secs = 60 scand_secs = 5 recoverd_secs = 60 logd_secs = 1 quotad_secs = 5 inoded_secs = 15 quota_simul_sync = 64 quota_warn_period = 10 atime_quantum = 3600 quota_quantum = 60 quota_scale = 1.0000 (1, 1) quota_enforce = 1 quota_account = 1 new_files_jdata = 0 new_files_directio = 0 max_atomic_write = 4194304 max_readahead = 262144 lockdump_size = 131072 stall_secs = 600 complain_secs = 10 reclaim_limit = 5000 entries_per_readdir = 32 prefetch_secs = 10 statfs_slots = 64 max_mhc = 10000 greedy_default = 100 greedy_quantum = 25 greedy_max = 250 rgrp_try_threshold = 100 There is no glock_purge option. I will try to tune demote_secs, but I don't think it will fix 'ls -la' issue. By the way, could you please kindly direct me to a place where I can find detailed explanations of these tunable options? Best, Jas --- Wendy Cheng wrote: > Ja S wrote: > > Hi, All: > > > > I have an old write-up about GFS lock cache issues. > Shareroot people had > pulled it into their web site: > http://open-sharedroot.org/Members/marc/blog/blog-on-gfs/glock-trimming-patch/?searchterm=gfs > > It should explain some of your confusions. The > tunables described in > that write-up are formally included into RHEL 5.1 > and RHEL 4.6 right now > (so no need to ask for private patches). > > There is a long story about GFS(1)'s "ls -la" > problem that one time I > did plan to do something about it. Unfortunately I'm > having a new job > now so the better bet is probably going for GFS2. > > Will pass some thoughts about GFS1's "ls -la" when I > have some spare > time next week. > > -- Wendy > > > I used to 'ls -la' a subdirecotry, which contains > more > > than 30,000 small files, on a SAN storage long > time > > ago just once from Node 5, which sits in the > cluster > > but does nothing. In other words, Node 5 is an > idel > > node. > > > > Now when I looked at /proc/cluster/dlm_locks on > the > > node, I realised that there are many PR locks and > the > > number of PR clocks is pretty much the same as the > > number of files in the subdirectory I used to > list. > > > > Then I randomly picked up some lock resources and > > converted the second part (hex number) of the name > of > > the lock resources to decimal numbers, which are > > simply the inode numbers. Then I searched the > > subdirectory and confirmed that these inode > numbers > > match the files in the subdirectory. > > > > > > Now, my questions are: > > > > 1) how can I find out which unix command requires > what > > kind of locks? Does the ls command really need PR > > lock? > > > > 2) how long GFS caches the locks? > > > > 3) whether we can configure the caching period? > > > > 4) if GFS should not cache the lock for so many > days, > > then does it mean this is a bug? > > > > 5) Is that a way to find out which process > requires a > > particular lock? Below is a typical record in > > dlm_locks on Node 5. Is any piece of information > > useful for identifing the process? > > > > Resource d95d2ccc (parent 00000000). Name (len=24) > " > > 5 cb5d35" > > Local Copy, Master is node 1 > > Granted Queue > > 137203da PR Master: 73980279 > > Conversion Queue > > Waiting Queue > > > > > > 6) If I am sure that no processes or applications > are > > accessing the subdirectory, then how I can force > GFS > > release these PR locks so that DLM can release the > > corresponding lock resources as well. > > > > > > Thank you very much for reading the questions and > look > > forward to hearing from you. > > > > Jas > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From s.wendy.cheng at gmail.com Thu May 8 15:05:43 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 08 May 2008 11:05:43 -0400 Subject: [Linux-cluster] GFS lock cache or bug? In-Reply-To: <109539.63539.qm@web32205.mail.mud.yahoo.com> References: <109539.63539.qm@web32205.mail.mud.yahoo.com> Message-ID: <482316C7.2000606@gmail.com> Ja S wrote: > Hi Wendy: > > Thank you very much for the kind answer. > > Unfortunately, I am using Red Hat Enterprise Linux WS > release 4 (Nahant Update 5) 2.6.9-42.ELsmp. > > When I ran gfs_tool gettune /mnt/ABC, I got: > [snip] .. > > > There is no glock_purge option. I will try to tune > demote_secs, but I don't think it will fix 'ls -la' > issue. > No, it will not. Don't waste your time. Will try to explain this more whenever I get a chance (but not right now). > > By the way, could you please kindly direct me to a > place where I can find detailed explanations of these > tunable options? > > > There is one called readme.gfs_tune - in theory, it is in: http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_tune. Just check few minutes ago ... my people page seems to have become Bob Peterson's people page but large amount of my old write-ups and unpublished patches still there. So if you type "wcheng", you probably will get "rpeterso" - contents are mostly the same though. There are also few GFS1/GFS2/NFS patches, as well as the detailed NFS over GFS documents, GFS glock write-ups, etc, in the (people's page) "Patches" and "Project" directories. Feel free to peek and/or try them out (but I suspect they'll disappear soon). On the other hand, if GFS2 is out in time, there is really no point to mess around with GFS1 any more - it is old and outdated anyway. -- Wendy From pbruna at it-linux.cl Thu May 8 16:26:10 2008 From: pbruna at it-linux.cl (Patricio A. Bruna) Date: Thu, 8 May 2008 12:26:10 -0400 (CLT) Subject: [Linux-cluster] script.sh : status & monitor In-Reply-To: <1206556464.4684.111.camel@ayanami.boston.devel.redhat.com> Message-ID: <32156755.38501210263970324.JavaMail.root@lisa.itlinux.cl> Hi, What result do Cluster Suite waits for when execute /etc/init.d/xxx status? I guess is a value from RETVAL, if so, which one would be OK and Failed? Thanks ------------------------------------ Patricio Bruna V. IT Linux Ltda. http://www.it-linux.cl Fono : (+56-2) 333 0578 M?vil : (+56-09) 8827 0342 ----- "Lon Hohberger" escribi?: On Wed, 2008-03-26 at 14:01 +0100, Alain Moulle wrote: > Hi > > In script.sh we can see these two lines : > > > Right now, nothing. > I guess one is the periodic status call on services launched > by the Cluster Suite but which one ? status is the one you want to look for in your script (think: What would a SysV init script do?) :) -- Lon -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpeterso at redhat.com Thu May 8 16:23:20 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 08 May 2008 11:23:20 -0500 Subject: [Linux-cluster] GFS lock cache or bug? In-Reply-To: <482316C7.2000606@gmail.com> References: <109539.63539.qm@web32205.mail.mud.yahoo.com> <482316C7.2000606@gmail.com> Message-ID: <1210263800.2764.9.camel@technetium.msp.redhat.com> Hi, On Thu, 2008-05-08 at 11:05 -0400, Wendy Cheng wrote: > Just check few minutes ago ... my people page seems to have become Bob > Peterson's people page but large amount of my old write-ups and > unpublished patches still there. So if you type "wcheng", you probably > will get "rpeterso" - contents are mostly the same though. I haven't removed anything, so all of Wendy's patches and contents are still there. > There are also few GFS1/GFS2/NFS patches, as well as the detailed NFS > over GFS documents, GFS glock write-ups, etc, in the (people's page) > "Patches" and "Project" directories. Feel free to peek and/or try them > out (but I suspect they'll disappear soon). I don't have any plans to make any of Wendy's patches or content disappear. In fact, one of the reasons I wanted it moved under my name was to safeguard it since Wendy left Red Hat. I didn't want some Red Hat administrator to say, "What's this wcheng people page? She doesn't work here anymore; let's delete it all." This way it's safe. Regards, Bob Peterson Red Hat Clustering & GFS From dist-list at LEXUM.UMontreal.CA Thu May 8 17:09:32 2008 From: dist-list at LEXUM.UMontreal.CA (FM) Date: Thu, 08 May 2008 13:09:32 -0400 Subject: [Linux-cluster] network best practice for cluster? Message-ID: <482333CC.20104@lexum.umontreal.ca> Hello, We read a lot of gfs tuning, number of nodes, etc. But how about the network infrastructure ? is a separate network/vlan for dlm is the way to go ? Do you tune the network stack to speed dlm ? In my server room, it is very simple : GFS-1 2 directors behind the firewall (using NAT). and 5 nodes behind them with 2 nicks (using bonding). All requests and dlm network traffic are using the same network (and the same bonded card). The network is Gb I am very curious to know the best practice ! The cluster is working great (especilly since the introduction of the glock_purge parameter) but there is always room for improvement ! regards, From lhh at redhat.com Thu May 8 19:44:27 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 08 May 2008 15:44:27 -0400 Subject: [Linux-cluster] network best practice for cluster? In-Reply-To: <482333CC.20104@lexum.umontreal.ca> References: <482333CC.20104@lexum.umontreal.ca> Message-ID: <1210275867.4582.14.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-05-08 at 13:09 -0400, FM wrote: > Hello, > > We read a lot of gfs tuning, number of nodes, etc. But how about the > network infrastructure ? > > is a separate network/vlan for dlm is the way to go ? Do you tune the > network stack to speed dlm ? The cluster (generally) including the DLM and fencing should be on a separate network from other hosts if possible. I don't know that a special network just for DLM traffic is necessary. -- Lon From jas199931 at yahoo.com Thu May 8 21:27:00 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 14:27:00 -0700 (PDT) Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? Message-ID: <620879.5230.qm@web32201.mail.mud.yahoo.com> Hi, All: I used to post this question before, but have not received any comments yet. Please allow me post it again. I have a subdirectory containing more than 30,000 small files on a SAN storage (GFS1+DLM, RAID10). No user application knows the existence of the subdirectory. In other words, the subdirectory is free of accessing. However, it took ages to list the subdirectory on an absolute idle cluster node. See below: # time ls -la | wc -l 31767 real 3m5.249s user 0m0.628s sys 0m5.137s There are about 3 minutes spent on somewhere. Does anyone have any clue what the system was waiting for? Thanks for your time and wish to see your valuable comments soon. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From gordan at bobich.net Thu May 8 21:38:39 2008 From: gordan at bobich.net (Gordan Bobic) Date: Thu, 08 May 2008 22:38:39 +0100 Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <620879.5230.qm@web32201.mail.mud.yahoo.com> References: <620879.5230.qm@web32201.mail.mud.yahoo.com> Message-ID: <482372DF.9000706@bobich.net> 30K files?! That'll take a while even on a local file system. Gordan Ja S wrote: > Hi, All: > > I used to post this question before, but have not > received any comments yet. Please allow me post it > again. > > I have a subdirectory containing more than 30,000 > small files on a SAN storage (GFS1+DLM, RAID10). No > user application knows the existence of the > subdirectory. In other words, the subdirectory is free > of accessing. > > However, it took ages to list the subdirectory on an > absolute idle cluster node. See below: > > # time ls -la | wc -l > 31767 > > real 3m5.249s > user 0m0.628s > sys 0m5.137s > > There are about 3 minutes spent on somewhere. Does > anyone have any clue what the system was waiting for? > > > Thanks for your time and wish to see your valuable > comments soon. > > Jas > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From s.wendy.cheng at gmail.com Thu May 8 21:51:03 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Thu, 08 May 2008 17:51:03 -0400 Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <620879.5230.qm@web32201.mail.mud.yahoo.com> References: <620879.5230.qm@web32201.mail.mud.yahoo.com> Message-ID: <482375C7.2060207@gmail.com> Ja S wrote: > Hi, All: > > I used to post this question before, but have not > received any comments yet. Please allow me post it > again. > > I have a subdirectory containing more than 30,000 > small files on a SAN storage (GFS1+DLM, RAID10). No > user application knows the existence of the > subdirectory. In other words, the subdirectory is free > of accessing. > Short answer is to remember "ls" and "ls -la" are very different commands. "ls" is a directory read (that reads from one single file) but "ls -la" needs to get file attributes (file size, modification times, ownership, etc) from *each* of the files from the subject directory. In your case, it needs to read more than 30,000 inodes to get them. The "ls -la" is slower for *any* filesystem but particularly troublesome for a cluster filesystem such as GFS due to: 1. Cluster locking overheads (it needs readlocks from *each* of the files involved). 2. Depending on when and how these files are created. During file creation time and if there are lock contentions, GFS has a tendency to spread the file locations all over the disk. 3. You use iscsi such that dlm lock traffic and file block access are on the same fabric ? If this is true, you will more or less serialize the lock access. Hope above short answer will ease your confusion. -- Wendy > However, it took ages to list the subdirectory on an > absolute idle cluster node. See below: > > # time ls -la | wc -l > 31767 > > real 3m5.249s > user 0m0.628s > sys 0m5.137s > > There are about 3 minutes spent on somewhere. Does > anyone have any clue what the system was waiting for? > > > Thanks for your time and wish to see your valuable > comments soon. > > Jas > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jas199931 at yahoo.com Thu May 8 22:29:41 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 15:29:41 -0700 (PDT) Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <482375C7.2060207@gmail.com> Message-ID: <792305.60086.qm@web32203.mail.mud.yahoo.com> Hi, Wendy: Thanks for your so prompt and kind explanation. It is very helpful. According to your comments, I did another test. See below: # stat abc/ File: `abc/' Size: 8192 Blocks: 6024 IO Block: 4096 directory Device: fc00h/64512d Inode: 1065226 Links: 2 Access: (0770/drwxrwx---) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2008-05-08 06:18:58.000000000 +0000 Modify: 2008-04-15 03:02:24.000000000 +0000 Change: 2008-04-15 07:11:52.000000000 +0000 # cd abc/ # time ls | wc -l 31764 real 0m44.797s user 0m0.189s sys 0m2.276s The real time in this test is much shorter than the previous one. However, it is still reasonable long. As you said, the ?ls? command only reads the single directory file. In my case, the directory file itself is only 8192 bytes. The time spent on disk IO should be included in ?sys 0m2.276s?. Although DLM needs time to lookup the location of the corresponding master lock resource and to process locking, the system should not take about 42 seconds to complete the ?ls? command. So, what is the hidden issue or is there a way to identify possible bottlenecks? Great thanks in advance. Jas --- Wendy Cheng wrote: > Ja S wrote: > > Hi, All: > > > > I used to post this question before, but have not > > received any comments yet. Please allow me post it > > again. > > > > I have a subdirectory containing more than 30,000 > > small files on a SAN storage (GFS1+DLM, RAID10). > No > > user application knows the existence of the > > subdirectory. In other words, the subdirectory is > free > > of accessing. > > > Short answer is to remember "ls" and "ls -la" are > very different > commands. "ls" is a directory read (that reads from > one single file) but > "ls -la" needs to get file attributes (file size, > modification times, > ownership, etc) from *each* of the files from the > subject directory. In > your case, it needs to read more than 30,000 inodes > to get them. The "ls > -la" is slower for *any* filesystem but particularly > troublesome for a > cluster filesystem such as GFS due to: > > 1. Cluster locking overheads (it needs readlocks > from *each* of the > files involved). > 2. Depending on when and how these files are > created. During file > creation time and if there are lock contentions, GFS > has a tendency to > spread the file locations all over the disk. > 3. You use iscsi such that dlm lock traffic and file > block access are on > the same fabric ? If this is true, you will more or > less serialize the > lock access. > > Hope above short answer will ease your confusion. > > -- Wendy > > However, it took ages to list the subdirectory on > an > > absolute idle cluster node. See below: > > > > # time ls -la | wc -l > > 31767 > > > > real 3m5.249s > > user 0m0.628s > > sys 0m5.137s > > > > There are about 3 minutes spent on somewhere. Does > > anyone have any clue what the system was waiting > for? > > > > > > Thanks for your time and wish to see your valuable > > comments soon. > > > > Jas > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From rpeterso at redhat.com Thu May 8 22:29:20 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 08 May 2008 17:29:20 -0500 Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <620879.5230.qm@web32201.mail.mud.yahoo.com> References: <620879.5230.qm@web32201.mail.mud.yahoo.com> Message-ID: <1210285760.2764.37.camel@technetium.msp.redhat.com> On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote: > Hi, All: > > I used to post this question before, but have not > received any comments yet. Please allow me post it > again. > > I have a subdirectory containing more than 30,000 > small files on a SAN storage (GFS1+DLM, RAID10). No > user application knows the existence of the > subdirectory. In other words, the subdirectory is free > of accessing. > > However, it took ages to list the subdirectory on an > absolute idle cluster node. See below: > > # time ls -la | wc -l > 31767 > > real 3m5.249s > user 0m0.628s > sys 0m5.137s > > There are about 3 minutes spent on somewhere. Does > anyone have any clue what the system was waiting for? > > > Thanks for your time and wish to see your valuable > comments soon. > > Jas Hi Jas, I believe the answer to your question is in the FAQ: http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow Regards, Bob Peterson Red Hat Clustering & GFS From jas199931 at yahoo.com Thu May 8 22:44:12 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 15:44:12 -0700 (PDT) Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <482372DF.9000706@bobich.net> Message-ID: <598939.9469.qm@web32202.mail.mud.yahoo.com> --- Gordan Bobic wrote: > 30K files?! > That'll take a while even on a local file system. Not really. Last week I made a copy of the directory on the local hard disk (ext3). See the test results for both "ls" and "ls -la" commands: # time ls -la | wc -l 31767 real 0m2.967s user 0m0.627s sys 0m0.689s # time ls | wc -l 31764 real 0m1.508s user 0m0.262s sys 0m0.082s Comparing with the results in my previous email, does it indicate that GFS is not designed for sharing huge number of small files? I heard that GFS is originally designed for sharing small number of larger files. Is that true? If so, could you please kindly suggest a file system which can handle huge number of concurrent requests on many many number of small files? Thanks for your interest. Jas > > Gordan > > Ja S wrote: > > Hi, All: > > > > I used to post this question before, but have not > > received any comments yet. Please allow me post it > > again. > > > > I have a subdirectory containing more than 30,000 > > small files on a SAN storage (GFS1+DLM, RAID10). > No > > user application knows the existence of the > > subdirectory. In other words, the subdirectory is > free > > of accessing. > > > > However, it took ages to list the subdirectory on > an > > absolute idle cluster node. See below: > > > > # time ls -la | wc -l > > 31767 > > > > real 3m5.249s > > user 0m0.628s > > sys 0m5.137s > > > > There are about 3 minutes spent on somewhere. Does > > anyone have any clue what the system was waiting > for? > > > > > > Thanks for your time and wish to see your valuable > > comments soon. > > > > Jas > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From andrew at ntsg.umt.edu Thu May 8 22:52:18 2008 From: andrew at ntsg.umt.edu (Andrew A. Neuschwander) Date: Thu, 8 May 2008 16:52:18 -0600 (MDT) Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <1210285760.2764.37.camel@technetium.msp.redhat.com> References: <620879.5230.qm@web32201.mail.mud.yahoo.com> <1210285760.2764.37.camel@technetium.msp.redhat.com> Message-ID: <40809.10.8.105.69.1210287138.squirrel@secure.ntsg.umt.edu> I've looked at this problem a bit as well. My system is a 4Gb FC SAN with a bonded GigE DLM dedicated network. Stat'ing 30,000 files in 3 minutes on GFS isn't unreasonable considering that it must get and release the gfs locks. In this scenario, you are averaging about 6ms per file stat. When we did our tests, all of our subsystems (FC, Net, CPU, Memory, Disk) were near idle. I think the 6ms is simply the accumulated latency of all the subsystems involved. There is a lot of work happening in that short period of time. -A -- Andrew A. Neuschwander, RHCE Linux Systems/Software Engineer College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew at ntsg.umt.edu - 406.243.6310 On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote: > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote: >> Hi, All: >> >> I used to post this question before, but have not >> received any comments yet. Please allow me post it >> again. >> >> I have a subdirectory containing more than 30,000 >> small files on a SAN storage (GFS1+DLM, RAID10). No >> user application knows the existence of the >> subdirectory. In other words, the subdirectory is free >> of accessing. >> >> However, it took ages to list the subdirectory on an >> absolute idle cluster node. See below: >> >> # time ls -la | wc -l >> 31767 >> >> real 3m5.249s >> user 0m0.628s >> sys 0m5.137s >> >> There are about 3 minutes spent on somewhere. Does >> anyone have any clue what the system was waiting for? >> >> >> Thanks for your time and wish to see your valuable >> comments soon. >> >> Jas > > Hi Jas, > > I believe the answer to your question is in the FAQ: > > http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow > > Regards, > > Bob Peterson > Red Hat Clustering & GFS > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > From jas199931 at yahoo.com Fri May 9 06:37:34 2008 From: jas199931 at yahoo.com (Ja S) Date: Thu, 8 May 2008 23:37:34 -0700 (PDT) Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <40809.10.8.105.69.1210287138.squirrel@secure.ntsg.umt.edu> Message-ID: <388070.61398.qm@web32206.mail.mud.yahoo.com> Hi, Andrew: Thank you very much for the help. Yes, your explanation really makes sense. I buy it. But I would like to discuss it a little bit further. The following message was part of my previous reply to Wendy. Just paste it here for your convenience. # stat abc/ File: `abc/' Size: 8192 Blocks: 6024 IO Block: 4096 directory Device: fc00h/64512d Inode: 1065226 Links: 2 Access: (0770/drwxrwx---) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2008-05-08 06:18:58.000000000 +0000 Modify: 2008-04-15 03:02:24.000000000 +0000 Change: 2008-04-15 07:11:52.000000000 +0000 # cd abc/ # time ls | wc -l 31764 real 0m44.797s user 0m0.189s sys 0m2.276s >From the test results, it seems that the system really only used 2.276 seconds to perform the disk IO, read the directory and count the number of files. I am not sure whether I missed anything or not. I really cannot understand how the system took about 42 seconds to process the lock on the single directory. Any further comments? Thanks again in advance, Jas --- "Andrew A. Neuschwander" wrote: > I've looked at this problem a bit as well. My system > is a 4Gb FC SAN with > a bonded GigE DLM dedicated network. Stat'ing 30,000 > files in 3 minutes on > GFS isn't unreasonable considering that it must get > and release the gfs > locks. In this scenario, you are averaging about 6ms > per file stat. When > we did our tests, all of our subsystems (FC, Net, > CPU, Memory, Disk) were > near idle. I think the 6ms is simply the accumulated > latency of all the > subsystems involved. There is a lot of work > happening in that short period > of time. > > -A > -- > Andrew A. Neuschwander, RHCE > Linux Systems/Software Engineer > College of Forestry and Conservation > The University of Montana > http://www.ntsg.umt.edu > andrew at ntsg.umt.edu - 406.243.6310 > > > On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote: > > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote: > >> Hi, All: > >> > >> I used to post this question before, but have not > >> received any comments yet. Please allow me post > it > >> again. > >> > >> I have a subdirectory containing more than 30,000 > >> small files on a SAN storage (GFS1+DLM, RAID10). > No > >> user application knows the existence of the > >> subdirectory. In other words, the subdirectory is > free > >> of accessing. > >> > >> However, it took ages to list the subdirectory on > an > >> absolute idle cluster node. See below: > >> > >> # time ls -la | wc -l > >> 31767 > >> > >> real 3m5.249s > >> user 0m0.628s > >> sys 0m5.137s > >> > >> There are about 3 minutes spent on somewhere. > Does > >> anyone have any clue what the system was waiting > for? > >> > >> > >> Thanks for your time and wish to see your > valuable > >> comments soon. > >> > >> Jas > > > > Hi Jas, > > > > I believe the answer to your question is in the > FAQ: > > > > > http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow > > > > Regards, > > > > Bob Peterson > > Red Hat Clustering & GFS > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From l.dardini at comune.prato.it Fri May 9 07:15:54 2008 From: l.dardini at comune.prato.it (Leandro Dardini) Date: Fri, 9 May 2008 09:15:54 +0200 Subject: R: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <388070.61398.qm@web32206.mail.mud.yahoo.com> References: <40809.10.8.105.69.1210287138.squirrel@secure.ntsg.umt.edu> <388070.61398.qm@web32206.mail.mud.yahoo.com> Message-ID: <6F861500A5092B4C8CD653DE20A4AA0D60641D@exchange3.comune.prato.local> Just remember to disable atime on the GFS volume. If atime is enabled maybe there is the lock contention for the writing of this info if multiple clients "read" the directory. Leandro > -----Messaggio originale----- > Da: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] Per conto di Ja S > Inviato: venerd? 9 maggio 2008 8.38 > A: linux clustering > Oggetto: Re: [Linux-cluster] Why GFS is so slow? What it is > waiting for? > > Hi, Andrew: > > Thank you very much for the help. Yes, your explanation > really makes sense. I buy it. > > But I would like to discuss it a little bit further. > The following message was part of my previous reply to Wendy. > Just paste it here for your convenience. > > > # stat abc/ > File: `abc/' > Size: 8192 Blocks: 6024 IO Block: > 4096 directory > Device: fc00h/64512d Inode: 1065226 Links: 2 > Access: (0770/drwxrwx---) Uid: ( 0/ root) > Gid: ( 0/ root) > Access: 2008-05-08 06:18:58.000000000 +0000 > Modify: 2008-04-15 03:02:24.000000000 +0000 > Change: 2008-04-15 07:11:52.000000000 +0000 > > # cd abc/ > # time ls | wc -l > 31764 > > real 0m44.797s > user 0m0.189s > sys 0m2.276s > > > >From the test results, it seems that the system really > only used 2.276 seconds to perform the disk IO, read the > directory and count the number of files. > > I am not sure whether I missed anything or not. I really > cannot understand how the system took about 42 seconds to > process the lock on the single directory. > > Any further comments? > > Thanks again in advance, > > Jas > > > --- "Andrew A. Neuschwander" > wrote: > > > I've looked at this problem a bit as well. My system is a > 4Gb FC SAN > > with a bonded GigE DLM dedicated network. Stat'ing 30,000 > files in 3 > > minutes on GFS isn't unreasonable considering that it must get and > > release the gfs locks. In this scenario, you are averaging > about 6ms > > per file stat. When we did our tests, all of our subsystems > (FC, Net, > > CPU, Memory, Disk) were near idle. I think the 6ms is simply the > > accumulated latency of all the subsystems involved. There > is a lot of > > work happening in that short period of time. > > > > -A > > -- > > Andrew A. Neuschwander, RHCE > > Linux Systems/Software Engineer > > College of Forestry and Conservation > > The University of Montana > > http://www.ntsg.umt.edu > > andrew at ntsg.umt.edu - 406.243.6310 > > > > > > On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote: > > > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote: > > >> Hi, All: > > >> > > >> I used to post this question before, but have not received any > > >> comments yet. Please allow me post > > it > > >> again. > > >> > > >> I have a subdirectory containing more than 30,000 small > files on a > > >> SAN storage (GFS1+DLM, RAID10). > > No > > >> user application knows the existence of the > subdirectory. In other > > >> words, the subdirectory is > > free > > >> of accessing. > > >> > > >> However, it took ages to list the subdirectory on > > an > > >> absolute idle cluster node. See below: > > >> > > >> # time ls -la | wc -l > > >> 31767 > > >> > > >> real 3m5.249s > > >> user 0m0.628s > > >> sys 0m5.137s > > >> > > >> There are about 3 minutes spent on somewhere. > > Does > > >> anyone have any clue what the system was waiting > > for? > > >> > > >> > > >> Thanks for your time and wish to see your > > valuable > > >> comments soon. > > >> > > >> Jas > > > > > > Hi Jas, > > > > > > I believe the answer to your question is in the > > FAQ: > > > > > > > > > http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow > > > > > > Regards, > > > > > > Bob Peterson > > > Red Hat Clustering & GFS > > > > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > ______________________________________________________________ > ______________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jas199931 at yahoo.com Fri May 9 07:44:55 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 9 May 2008 00:44:55 -0700 (PDT) Subject: R: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <6F861500A5092B4C8CD653DE20A4AA0D60641D@exchange3.comune.prato.local> Message-ID: <291451.6237.qm@web32205.mail.mud.yahoo.com> Hi, Leandro: Thanks for the good reminder. Yes, we did. Any other comments? Best, Jas --- Leandro Dardini wrote: > Just remember to disable atime on the GFS volume. If > atime is enabled maybe there is the lock contention > for the writing of this info if multiple clients > "read" the directory. > > Leandro > > > -----Messaggio originale----- > > Da: linux-cluster-bounces at redhat.com > > [mailto:linux-cluster-bounces at redhat.com] Per > conto di Ja S > > Inviato: venerd?9 maggio 2008 8.38 > > A: linux clustering > > Oggetto: Re: [Linux-cluster] Why GFS is so slow? > What it is > > waiting for? > > > > Hi, Andrew: > > > > Thank you very much for the help. Yes, your > explanation > > really makes sense. I buy it. > > > > But I would like to discuss it a little bit > further. > > The following message was part of my previous > reply to Wendy. > > Just paste it here for your convenience. > > > > > > # stat abc/ > > File: `abc/' > > Size: 8192 Blocks: 6024 IO > Block: > > 4096 directory > > Device: fc00h/64512d Inode: 1065226 Links: > 2 > > Access: (0770/drwxrwx---) Uid: ( 0/ root) > > Gid: ( 0/ root) > > Access: 2008-05-08 06:18:58.000000000 +0000 > > Modify: 2008-04-15 03:02:24.000000000 +0000 > > Change: 2008-04-15 07:11:52.000000000 +0000 > > > > # cd abc/ > > # time ls | wc -l > > 31764 > > > > real 0m44.797s > > user 0m0.189s > > sys 0m2.276s > > > > > > >From the test results, it seems that the system > really > > only used 2.276 seconds to perform the disk IO, > read the > > directory and count the number of files. > > > > I am not sure whether I missed anything or not. I > really > > cannot understand how the system took about 42 > seconds to > > process the lock on the single directory. > > > > Any further comments? > > > > Thanks again in advance, > > > > Jas > > > > > > --- "Andrew A. Neuschwander" > > wrote: > > > > > I've looked at this problem a bit as well. My > system is a > > 4Gb FC SAN > > > with a bonded GigE DLM dedicated network. > Stat'ing 30,000 > > files in 3 > > > minutes on GFS isn't unreasonable considering > that it must get and > > > release the gfs locks. In this scenario, you are > averaging > > about 6ms > > > per file stat. When we did our tests, all of our > subsystems > > (FC, Net, > > > CPU, Memory, Disk) were near idle. I think the > 6ms is simply the > > > accumulated latency of all the subsystems > involved. There > > is a lot of > > > work happening in that short period of time. > > > > > > -A > > > -- > > > Andrew A. Neuschwander, RHCE > > > Linux Systems/Software Engineer > > > College of Forestry and Conservation > > > The University of Montana > > > http://www.ntsg.umt.edu > > > andrew at ntsg.umt.edu - 406.243.6310 > > > > > > > > > On Thu, May 8, 2008 4:29 pm, Bob Peterson wrote: > > > > On Thu, 2008-05-08 at 14:27 -0700, Ja S wrote: > > > >> Hi, All: > > > >> > > > >> I used to post this question before, but have > not received any > > > >> comments yet. Please allow me post > > > it > > > >> again. > > > >> > > > >> I have a subdirectory containing more than > 30,000 small > > files on a > > > >> SAN storage (GFS1+DLM, RAID10). > > > No > > > >> user application knows the existence of the > > subdirectory. In other > > > >> words, the subdirectory is > > > free > > > >> of accessing. > > > >> > > > >> However, it took ages to list the > subdirectory on > > > an > > > >> absolute idle cluster node. See below: > > > >> > > > >> # time ls -la | wc -l > > > >> 31767 > > > >> > > > >> real 3m5.249s > > > >> user 0m0.628s > > > >> sys 0m5.137s > > > >> > > > >> There are about 3 minutes spent on somewhere. > > > Does > > > >> anyone have any clue what the system was > waiting > > > for? > > > >> > > > >> > > > >> Thanks for your time and wish to see your > > > valuable > > > >> comments soon. > > > >> > > > >> Jas > > > > > > > > Hi Jas, > > > > > > > > I believe the answer to your question is in > the > > > FAQ: > > > > > > > > > > > > > > http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_slow > > > > > > > > Regards, > > > > > > > > Bob Peterson > > > > Red Hat Clustering & GFS > > > > > > > > > > > > -- > > > > Linux-cluster mailing list > > > > Linux-cluster at redhat.com > > > > > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > ______________________________________________________________ > > ______________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From vimal at monster.co.in Fri May 9 09:14:12 2008 From: vimal at monster.co.in (Vimal Gupta) Date: Fri, 09 May 2008 14:44:12 +0530 Subject: [Linux-cluster] GFS CLuster with LIDS Message-ID: <482415E4.5000508@monster.co.in> Hi All, I am having CentOs with LIDS running on that system . Can I implement GFS cluster on that node with the lids. Anyone have same kind of exp. please share... -- Vimal Gupta Sr. System Administrator Monster.com India Pvt.Ltd. From Klaus.Steinberger at physik.uni-muenchen.de Fri May 9 09:14:01 2008 From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger) Date: Fri, 9 May 2008 11:14:01 +0200 Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <20080509074522.96CD3618E0A@hormel.redhat.com> References: <20080509074522.96CD3618E0A@hormel.redhat.com> Message-ID: <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de> Hi, > However, it took ages to list the subdirectory on an > absolute idle cluster node. See below: > > # time ls -la | wc -l > 31767 > > real 3m5.249s > user 0m0.628s > sys 0m5.137s > > There are about 3 minutes spent on somewhere. Does > anyone have any clue what the system was waiting for? Did you tune glock's? I found that it's very important for performance of GFS. I'm doing the following tunings currently: gfs_tool settune /export/data/etp quota_account 0 gfs_tool settune /export/data/etp glock_purge 50 gfs_tool settune /export/data/etp demote_secs 200 gfs_tool settune /export/data/etp statfs_fast 1 Switch off quota off course only if you don't need it. All this tunings have to be done every time after mounting, so do it in a init.d script running after GFS mount, and of course do it on every node. Here is the link to the glock paper: http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 The glock tuning (glock_purge and demote_secs parameters) definitly solved a problem we had here with the Tivoli Backup Client. Before it was running for days and sometimes even did give up. We observed heavy lock traffic. After changing the glock parameters times for the backup did go down dramatically, we now can run a Incremental Backup on a 4 TByte filesystem in under 4 hours. So give it a try. There is some more tuning, which could be done unfortunately just on creation of filesystem. The default number of Resource Groups is ways too large for nowadays TByte Filesystems. Sincerly, Klaus -- Klaus Steinberger Beschleunigerlaboratorium Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 Garching, Germany FAX: (+49 89)289 14280 EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2002 bytes Desc: not available URL: From jas199931 at yahoo.com Fri May 9 09:25:21 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 9 May 2008 02:25:21 -0700 (PDT) Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de> Message-ID: <489981.61809.qm@web32204.mail.mud.yahoo.com> Hi, Klaus: Thank you very much for your kind answer. Tunning the parameters sounds really interesting. I should give it a try. By the way, how did you come up with these new parameter values? Did you calculate them based on some measures or simply pick them up and test. Best, Jas --- Klaus Steinberger wrote: > Hi, > > > However, it took ages to list the subdirectory on > an > > absolute idle cluster node. See below: > > > > # time ls -la | wc -l > > 31767 > > > > real 3m5.249s > > user 0m0.628s > > sys 0m5.137s > > > > There are about 3 minutes spent on somewhere. Does > > anyone have any clue what the system was waiting > for? > > Did you tune glock's? I found that it's very > important for performance of > GFS. > > I'm doing the following tunings currently: > > gfs_tool settune /export/data/etp quota_account 0 > gfs_tool settune /export/data/etp glock_purge 50 > gfs_tool settune /export/data/etp demote_secs 200 > gfs_tool settune /export/data/etp statfs_fast 1 > > Switch off quota off course only if you don't need > it. All this tunings have > to be done every time after mounting, so do it in a > init.d script running > after GFS mount, and of course do it on every node. > > Here is the link to the glock paper: > > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > > The glock tuning (glock_purge and demote_secs > parameters) definitly solved a > problem we had here with the Tivoli Backup Client. > Before it was running for > days and sometimes even did give up. We observed > heavy lock traffic. > > After changing the glock parameters times for the > backup did go down > dramatically, we now can run a Incremental Backup on > a 4 TByte filesystem in > under 4 hours. So give it a try. > > There is some more tuning, which could be done > unfortunately just on creation > of filesystem. The default number of Resource Groups > is ways too large for > nowadays TByte Filesystems. > > Sincerly, > Klaus > > > -- > Klaus Steinberger Beschleunigerlaboratorium > Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 > Garching, Germany > FAX: (+49 89)289 14280 EMail: > Klaus.Steinberger at Physik.Uni-Muenchen.DE > URL: > http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From martin.fuerstenau at oce.com Fri May 9 10:53:42 2008 From: martin.fuerstenau at oce.com (Martin Fuerstenau) Date: Fri, 09 May 2008 12:53:42 +0200 Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <489981.61809.qm@web32204.mail.mud.yahoo.com> References: <489981.61809.qm@web32204.mail.mud.yahoo.com> Message-ID: <1210330422.11974.14.camel@lx002140.ops.de> Hi, I had (nearly) the same problem. A slow gfs. From the beginning. Two weeks ago the cluster crashed every time the load became heavier. What was the reason? A rotten gfs. The gfs uses leafnodes for data an leafnodes for metadata whithin the filesystem. And the problem was in the metadata leafnodes. Have you checked the Filesystem? Unmount it from all nodes and use gfs_fsck on the filesystem. In my case it reported (and repaired) tons of unused leafnoedes and some other errors. First time I started it without the -y (for yes). Well, after one hour ot typing y I killed it and started it with -y. The work was done whithin an hour for 1TB. Now the filesystem is clean and it was like a turboloader and Nitrogen injection for a car. Fast as it was never before. Maybe there is a bug in the mkfs command or so. I will never use a gfs without a filesystem check after creation Martin Fuerstenau Seniro System Engineer Oce Printing Systems, Poing On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote: > Hi, Klaus: > > Thank you very much for your kind answer. > > Tunning the parameters sounds really interesting. I > should give it a try. > > By the way, how did you come up with these new > parameter values? Did you calculate them based on > some measures or simply pick them up and test. > > Best, > > Jas > > > --- Klaus Steinberger > wrote: > > > Hi, > > > > > However, it took ages to list the subdirectory on > > an > > > absolute idle cluster node. See below: > > > > > > # time ls -la | wc -l > > > 31767 > > > > > > real 3m5.249s > > > user 0m0.628s > > > sys 0m5.137s > > > > > > There are about 3 minutes spent on somewhere. Does > > > anyone have any clue what the system was waiting > > for? > > > > Did you tune glock's? I found that it's very > > important for performance of > > GFS. > > > > I'm doing the following tunings currently: > > > > gfs_tool settune /export/data/etp quota_account 0 > > gfs_tool settune /export/data/etp glock_purge 50 > > gfs_tool settune /export/data/etp demote_secs 200 > > gfs_tool settune /export/data/etp statfs_fast 1 > > > > Switch off quota off course only if you don't need > > it. All this tunings have > > to be done every time after mounting, so do it in a > > init.d script running > > after GFS mount, and of course do it on every node. > > > > Here is the link to the glock paper: > > > > > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > > > > The glock tuning (glock_purge and demote_secs > > parameters) definitly solved a > > problem we had here with the Tivoli Backup Client. > > Before it was running for > > days and sometimes even did give up. We observed > > heavy lock traffic. > > > > After changing the glock parameters times for the > > backup did go down > > dramatically, we now can run a Incremental Backup on > > a 4 TByte filesystem in > > under 4 hours. So give it a try. > > > > There is some more tuning, which could be done > > unfortunately just on creation > > of filesystem. The default number of Resource Groups > > is ways too large for > > nowadays TByte Filesystems. > > > > Sincerly, > > Klaus > > > > > > -- > > Klaus Steinberger Beschleunigerlaboratorium > > Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 > > Garching, Germany > > FAX: (+49 89)289 14280 EMail: > > Klaus.Steinberger at Physik.Uni-Muenchen.DE > > URL: > > > http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Martin F?rstenau Tel. : (49) 8121-72-4684 Oce Printing Systems Fax : (49) 8121-72-4996 OI-12 E-Mail : martin.fuerstenau at oce.com Siemensallee 2 85586 Poing Germany Visit Oce at drupa! Register online now: This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message. Thank you for your co-operation. From jas199931 at yahoo.com Fri May 9 11:51:58 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 9 May 2008 04:51:58 -0700 (PDT) Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <1210330422.11974.14.camel@lx002140.ops.de> Message-ID: <293470.66323.qm@web32204.mail.mud.yahoo.com> Hi Martin: Thanks for your reply indeed. --- Martin Fuerstenau wrote: > Hi, > > I had (nearly) the same problem. A slow gfs. From > the beginning. Two > weeks ago the cluster crashed every time the load > became heavier. > > What was the reason? A rotten gfs. The gfs uses > leafnodes for data an > leafnodes for metadata whithin the filesystem. And > the problem was in > the metadata leafnodes. > > Have you checked the Filesystem? Unmount it from all > nodes and use > gfs_fsck on the filesystem. No, not yet. I am afraid I cannot umount the file sytem then do the gfs_fsck since the server downtime is totally forbidden. Is there any other way to reclaim the unused or lost blocks ( I guess leafnodes you mentioned meant to be the disk block, correct me if I am wrong.)? Should "gfs_tool settune /mnt/points inoded_secs 10" work for a heavy loaded node with freqent create and delete file operations? >In my case it reported > (and repaired) tons > of unused leafnoedes and some other errors. First > time I started it > without the -y (for yes). Well, after one hour ot > typing y I killed it > and started it with -y. The work was done whithin an > hour for 1TB. Now > the filesystem is clean and it was like a > turboloader and Nitrogen > injection for a car. Fast as it was never before. Great. Sounds fantastic. However, if the low performance is caused by the "rotten" gfs, will your now cleaned file system be possibly messed up again after a certain period? Do you have a smart way to monitor the status of your file system in order to make a regular downtime schedule and "force" your manager to prove it, :-) ? If you do, I am eager to know. Thanks again and look forward to your next reply. Best, Jas > Maybe there is a bug in the mkfs command or so. I > will never use a gfs > without a filesystem check after creation > > Martin Fuerstenau > Seniro System Engineer > Oce Printing Systems, Poing > > On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote: > > Hi, Klaus: > > > > Thank you very much for your kind answer. > > > > Tunning the parameters sounds really interesting. > I > > should give it a try. > > > > By the way, how did you come up with these new > > parameter values? Did you calculate them based on > > some measures or simply pick them up and test. > > > > Best, > > > > Jas > > > > > > --- Klaus Steinberger > > wrote: > > > > > Hi, > > > > > > > However, it took ages to list the subdirectory > on > > > an > > > > absolute idle cluster node. See below: > > > > > > > > # time ls -la | wc -l > > > > 31767 > > > > > > > > real 3m5.249s > > > > user 0m0.628s > > > > sys 0m5.137s > > > > > > > > There are about 3 minutes spent on somewhere. > Does > > > > anyone have any clue what the system was > waiting > > > for? > > > > > > Did you tune glock's? I found that it's very > > > important for performance of > > > GFS. > > > > > > I'm doing the following tunings currently: > > > > > > gfs_tool settune /export/data/etp quota_account > 0 > > > gfs_tool settune /export/data/etp glock_purge 50 > > > gfs_tool settune /export/data/etp demote_secs > 200 > > > gfs_tool settune /export/data/etp statfs_fast 1 > > > > > > Switch off quota off course only if you don't > need > > > it. All this tunings have > > > to be done every time after mounting, so do it > in a > > > init.d script running > > > after GFS mount, and of course do it on every > node. > > > > > > Here is the link to the glock paper: > > > > > > > > > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > > > > > > The glock tuning (glock_purge and demote_secs > > > parameters) definitly solved a > > > problem we had here with the Tivoli Backup > Client. > > > Before it was running for > > > days and sometimes even did give up. We observed > > > heavy lock traffic. > > > > > > After changing the glock parameters times for > the > > > backup did go down > > > dramatically, we now can run a Incremental > Backup on > > > a 4 TByte filesystem in > > > under 4 hours. So give it a try. > > > > > > There is some more tuning, which could be done > > > unfortunately just on creation > > > of filesystem. The default number of Resource > Groups > > > is ways too large for > > > nowadays TByte Filesystems. > > > > > > Sincerly, > > > Klaus > > > > > > > > > -- > > > Klaus Steinberger > Beschleunigerlaboratorium > > > Phone: (+49 89)289 14287 Am Coulombwall 6, > D-85748 > > > Garching, Germany > > > FAX: (+49 89)289 14280 EMail: > > > Klaus.Steinberger at Physik.Uni-Muenchen.DE > > > URL: > > > > > > http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > Martin F?rstenau Tel. : (49) 8121-72-4684 > Oce Printing Systems Fax : (49) 8121-72-4996 > OI-12 E-Mail : > martin.fuerstenau at oce.com > Siemensallee 2 > 85586 Poing > Germany > > > > Visit Oce at drupa! Register online now: > > > This message and attachment(s) are intended solely > for use by the addressee and may contain information > that is privileged, confidential or otherwise exempt > from disclosure under applicable law. > > If you are not the intended recipient or agent > thereof responsible for delivering this message to > the intended recipient, you are hereby notified that > any dissemination, distribution or copying of this > communication is strictly prohibited. > > If you have received this communication in error, > please notify the sender immediately by telephone > and with a 'reply' message. > > Thank you for your co-operation. > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From martin.fuerstenau at oce.com Fri May 9 12:39:40 2008 From: martin.fuerstenau at oce.com (Martin Fuerstenau) Date: Fri, 09 May 2008 14:39:40 +0200 Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <293470.66323.qm@web32204.mail.mud.yahoo.com> References: <293470.66323.qm@web32204.mail.mud.yahoo.com> Message-ID: <1210336780.11974.22.camel@lx002140.ops.de> Hi, unfortunaley not. According to my informaiotns (which are mainly from this list and from the wiki) for each node of the cluster this structure (journal) is established on the filesystem. If you read the manpage for gfs_fsck you see, that it must be unmounted from all nodes. If you have the problem I had you should plan a maintenance window asap. My problem started as mentioned with a slow gfs from the beginning and lead to clustercrashs after 7 months. All my problems were fixed by the check. Perhaps is the same with your system. Yours - Martin On Fri, 2008-05-09 at 04:51 -0700, Ja S wrote: > Hi Martin: > > Thanks for your reply indeed. > > --- Martin Fuerstenau > wrote: > > > Hi, > > > > I had (nearly) the same problem. A slow gfs. From > > the beginning. Two > > weeks ago the cluster crashed every time the load > > became heavier. > > > > What was the reason? A rotten gfs. The gfs uses > > leafnodes for data an > > leafnodes for metadata whithin the filesystem. And > > the problem was in > > the metadata leafnodes. > > > > Have you checked the Filesystem? Unmount it from all > > nodes and use > > gfs_fsck on the filesystem. > > No, not yet. I am afraid I cannot umount the file > sytem then do the gfs_fsck since the server downtime > is totally forbidden. > > Is there any other way to reclaim the unused or lost > blocks ( I guess leafnodes you mentioned meant to be > the disk block, correct me if I am wrong.)? > > Should "gfs_tool settune /mnt/points inoded_secs 10" > work for a heavy loaded node with freqent create and > delete file operations? > > > >In my case it reported > > (and repaired) tons > > of unused leafnoedes and some other errors. First > > time I started it > > without the -y (for yes). Well, after one hour ot > > typing y I killed it > > and started it with -y. The work was done whithin an > > hour for 1TB. Now > > the filesystem is clean and it was like a > > turboloader and Nitrogen > > injection for a car. Fast as it was never before. > > Great. Sounds fantastic. However, if the low > performance is caused by the "rotten" gfs, will your > now cleaned file system be possibly messed up again > after a certain period? Do you have a smart way to > monitor the status of your file system in order to > make a regular downtime schedule and "force" your > manager to prove it, :-) ? If you do, I am eager to > know. > > Thanks again and look forward to your next reply. > > Best, > > Jas > > > > > > Maybe there is a bug in the mkfs command or so. I > > will never use a gfs > > without a filesystem check after creation > > > > Martin Fuerstenau > > Seniro System Engineer > > Oce Printing Systems, Poing > > > > On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote: > > > Hi, Klaus: > > > > > > Thank you very much for your kind answer. > > > > > > Tunning the parameters sounds really interesting. > > I > > > should give it a try. > > > > > > By the way, how did you come up with these new > > > parameter values? Did you calculate them based on > > > some measures or simply pick them up and test. > > > > > > Best, > > > > > > Jas > > > > > > > > > --- Klaus Steinberger > > > wrote: > > > > > > > Hi, > > > > > > > > > However, it took ages to list the subdirectory > > on > > > > an > > > > > absolute idle cluster node. See below: > > > > > > > > > > # time ls -la | wc -l > > > > > 31767 > > > > > > > > > > real 3m5.249s > > > > > user 0m0.628s > > > > > sys 0m5.137s > > > > > > > > > > There are about 3 minutes spent on somewhere. > > Does > > > > > anyone have any clue what the system was > > waiting > > > > for? > > > > > > > > Did you tune glock's? I found that it's very > > > > important for performance of > > > > GFS. > > > > > > > > I'm doing the following tunings currently: > > > > > > > > gfs_tool settune /export/data/etp quota_account > > 0 > > > > gfs_tool settune /export/data/etp glock_purge 50 > > > > gfs_tool settune /export/data/etp demote_secs > > 200 > > > > gfs_tool settune /export/data/etp statfs_fast 1 > > > > > > > > Switch off quota off course only if you don't > > need > > > > it. All this tunings have > > > > to be done every time after mounting, so do it > > in a > > > > init.d script running > > > > after GFS mount, and of course do it on every > > node. > > > > > > > > Here is the link to the glock paper: > > > > > > > > > > > > > > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > > > > > > > > The glock tuning (glock_purge and demote_secs > > > > parameters) definitly solved a > > > > problem we had here with the Tivoli Backup > > Client. > > > > Before it was running for > > > > days and sometimes even did give up. We observed > > > > heavy lock traffic. > > > > > > > > After changing the glock parameters times for > > the > > > > backup did go down > > > > dramatically, we now can run a Incremental > > Backup on > > > > a 4 TByte filesystem in > > > > under 4 hours. So give it a try. > > > > > > > > There is some more tuning, which could be done > > > > unfortunately just on creation > > > > of filesystem. The default number of Resource > > Groups > > > > is ways too large for > > > > nowadays TByte Filesystems. > > > > > > > > Sincerly, > > > > Klaus > > > > > > > > > > > > -- > > > > Klaus Steinberger > > Beschleunigerlaboratorium > > > > Phone: (+49 89)289 14287 Am Coulombwall 6, > > D-85748 > > > > Garching, Germany > > > > FAX: (+49 89)289 14280 EMail: > > > > Klaus.Steinberger at Physik.Uni-Muenchen.DE > > > > URL: > > > > > > > > > > http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ > > > > > -- > > > > Linux-cluster mailing list > > > > Linux-cluster at redhat.com > > > > > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > > ____________________________________________________________________________________ > > > Be a better friend, newshound, and > > > know-it-all with Yahoo! Mobile. Try it now. > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > Martin F?rstenau Tel. : (49) 8121-72-4684 > > Oce Printing Systems Fax : (49) 8121-72-4996 > > OI-12 E-Mail : > > martin.fuerstenau at oce.com > > Siemensallee 2 > > 85586 Poing > > Germany > > > > > > > > Visit Oce at drupa! Register online now: > > > > > > This message and attachment(s) are intended solely > > for use by the addressee and may contain information > > that is privileged, confidential or otherwise exempt > > from disclosure under applicable law. > > > > If you are not the intended recipient or agent > > thereof responsible for delivering this message to > > the intended recipient, you are hereby notified that > > any dissemination, distribution or copying of this > > communication is strictly prohibited. > > > > If you have received this communication in error, > > please notify the sender immediately by telephone > > and with a 'reply' message. > > > > Thank you for your co-operation. > > > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > ____________________________________________________________________________________ > Be a better friend, newshound, and > know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > Visit Oce at drupa! Register online now: This message and attachment(s) are intended solely for use by the addressee and may contain information that is privileged, confidential or otherwise exempt from disclosure under applicable law. If you are not the intended recipient or agent thereof responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by telephone and with a 'reply' message. Thank you for your co-operation. From jas199931 at yahoo.com Fri May 9 12:59:27 2008 From: jas199931 at yahoo.com (Ja S) Date: Fri, 9 May 2008 05:59:27 -0700 (PDT) Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <1210336780.11974.22.camel@lx002140.ops.de> Message-ID: <138485.59267.qm@web32208.mail.mud.yahoo.com> Hi, Martin: Another big thanks to you for your kind reply and suggestions. Best, Jas --- Martin Fuerstenau wrote: > Hi, > > unfortunaley not. According to my informaiotns > (which are mainly from > this list and from the wiki) for each node of the > cluster this structure > (journal) is established on the filesystem. If you > read the manpage for > gfs_fsck you see, that it must be unmounted from all > nodes. > > If you have the problem I had you should plan a > maintenance window > asap. > > My problem started as mentioned with a slow gfs from > the beginning and > lead to clustercrashs after 7 months. All my > problems were fixed by the > check. Perhaps is the same with your system. > > Yours - Martin > > On Fri, 2008-05-09 at 04:51 -0700, Ja S wrote: > > Hi Martin: > > > > Thanks for your reply indeed. > > > > --- Martin Fuerstenau > > wrote: > > > > > Hi, > > > > > > I had (nearly) the same problem. A slow gfs. > From > > > the beginning. Two > > > weeks ago the cluster crashed every time the > load > > > became heavier. > > > > > > What was the reason? A rotten gfs. The gfs uses > > > leafnodes for data an > > > leafnodes for metadata whithin the filesystem. > And > > > the problem was in > > > the metadata leafnodes. > > > > > > Have you checked the Filesystem? Unmount it from > all > > > nodes and use > > > gfs_fsck on the filesystem. > > > > No, not yet. I am afraid I cannot umount the file > > sytem then do the gfs_fsck since the server > downtime > > is totally forbidden. > > > > Is there any other way to reclaim the unused or > lost > > blocks ( I guess leafnodes you mentioned meant to > be > > the disk block, correct me if I am wrong.)? > > > > Should "gfs_tool settune /mnt/points inoded_secs > 10" > > work for a heavy loaded node with freqent create > and > > delete file operations? > > > > > > >In my case it reported > > > (and repaired) tons > > > of unused leafnoedes and some other errors. > First > > > time I started it > > > without the -y (for yes). Well, after one hour > ot > > > typing y I killed it > > > and started it with -y. The work was done > whithin an > > > hour for 1TB. Now > > > the filesystem is clean and it was like a > > > turboloader and Nitrogen > > > injection for a car. Fast as it was never > before. > > > > Great. Sounds fantastic. However, if the low > > performance is caused by the "rotten" gfs, will > your > > now cleaned file system be possibly messed up > again > > after a certain period? Do you have a smart way to > > monitor the status of your file system in order to > > make a regular downtime schedule and "force" your > > manager to prove it, :-) ? If you do, I am eager > to > > know. > > > > Thanks again and look forward to your next reply. > > > > Best, > > > > Jas > > > > > > > > > > > Maybe there is a bug in the mkfs command or so. > I > > > will never use a gfs > > > without a filesystem check after creation > > > > > > Martin Fuerstenau > > > Seniro System Engineer > > > Oce Printing Systems, Poing > > > > > > On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote: > > > > Hi, Klaus: > > > > > > > > Thank you very much for your kind answer. > > > > > > > > Tunning the parameters sounds really > interesting. > > > I > > > > should give it a try. > > > > > > > > By the way, how did you come up with these new > > > > parameter values? Did you calculate them based > on > > > > some measures or simply pick them up and test. > > > > > > > > Best, > > > > > > > > Jas > > > > > > > > > > > > --- Klaus Steinberger > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > > However, it took ages to list the > subdirectory > > > on > > > > > an > > > > > > absolute idle cluster node. See below: > > > > > > > > > > > > # time ls -la | wc -l > > > > > > 31767 > > > > > > > > > > > > real 3m5.249s > > > > > > user 0m0.628s > > > > > > sys 0m5.137s > > > > > > > > > > > > There are about 3 minutes spent on > somewhere. > > > Does > > > > > > anyone have any clue what the system was > > > waiting > > > > > for? > > > > > > > > > > Did you tune glock's? I found that it's > very > > > > > important for performance of > > > > > GFS. > > > > > > > > > > I'm doing the following tunings currently: > > > > > > > > > > gfs_tool settune /export/data/etp > quota_account > > > 0 > > > > > gfs_tool settune /export/data/etp > glock_purge 50 > > > > > gfs_tool settune /export/data/etp > demote_secs > > > 200 > > > > > gfs_tool settune /export/data/etp > statfs_fast 1 > > > > > > > > > > Switch off quota off course only if you > don't > > > need > > > > > it. All this tunings have > > > > > to be done every time after mounting, so do > it > > > in a > > > > > init.d script running > > > > > after GFS mount, and of course do it on > every > > > node. > > > > > > > > > > Here is the link to the glock paper: > > > > > > > > > > > > > > > > > > > > http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4 > > > > > > > > > > The glock tuning (glock_purge and > demote_secs > > > > > parameters) definitly solved a > > > > > problem we had here with the Tivoli Backup > > > Client. > > > > > Before it was running for > > > > > days and sometimes even did give up. We > observed > > > > > heavy lock traffic. > > > > > > > > > > After changing the glock parameters times > for > > > the > > > > > backup did go down > > > > > dramatically, we now can run a Incremental > > > Backup on > > > > > a 4 TByte filesystem in > > > > > under 4 hours. So give it a try. > > > > > > > > > > There is some more tuning, which could be > done > > > > > unfortunately just on creation > > > > > of filesystem. The default number of > Resource > > > Groups > > > > > is ways too large for > > > > > nowadays TByte Filesystems. > > > > > > > > > > Sincerly, > > > > > Klaus > > > > > > > > > > > > > > > -- > > > > > Klaus Steinberger > > > Beschleunigerlaboratorium > > > > > Phone: (+49 89)289 14287 Am Coulombwall 6, > > > D-85748 > > > > > Garching, Germany > > > > > FAX: (+49 89)289 14280 EMail: > > > > > Klaus.Steinberger at Physik.Uni-Muenchen.DE > > > > > URL: > > > > > > > > > > > > > > > http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ > > > > > > -- > > > > > Linux-cluster mailing list > > > > > Linux-cluster at redhat.com > > > > > > > > > > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > > > > > > > > > > > ____________________________________________________________________________________ > > > > Be a better friend, newshound, and > > > > know-it-all with Yahoo! Mobile. Try it now. > > > > > > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > > > > > -- > > > > Linux-cluster mailing list > > > > Linux-cluster at redhat.com > > > > > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > Martin F?rstenau Tel. : (49) > 8121-72-4684 > > > Oce Printing Systems Fax : (49) > 8121-72-4996 > > > OI-12 E-Mail : > > > martin.fuerstenau at oce.com > > > Siemensallee 2 > > > 85586 Poing > > > Germany > > > > > > > > > > > > Visit Oce at drupa! Register online now: > > > > > > > > > This message and attachment(s) are intended > solely > > > for use by the addressee and may contain > information > > > that is privileged, confidential or otherwise > exempt > > > from disclosure under applicable law. > > > > > > If you are not the intended recipient or agent > > > thereof responsible for delivering this message > to > > > the intended recipient, you are hereby notified > that > > > any dissemination, distribution or copying of > this > > > communication is strictly prohibited. > > > > > > If you have received this communication in > error, > > > please notify the sender immediately by > telephone > > > and with a 'reply' message. > > > > > > Thank you for your co-operation. > > > > > > > > > > > > -- > > > Linux-cluster mailing list > > > Linux-cluster at redhat.com > > > > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > > > > > > > ____________________________________________________________________________________ > > Be a better friend, newshound, and > > know-it-all with Yahoo! Mobile. Try it now. > http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > Visit Oce at drupa! Register online now: > > > This message and attachment(s) are intended solely > for use by the addressee and may contain information > that is privileged, confidential or otherwise exempt > from disclosure under applicable law. > > If you are not the intended recipient or agent > thereof responsible for delivering this message to > the intended recipient, you are hereby notified that > any dissemination, distribution or copying of this > communication is strictly prohibited. > > If you have received this communication in error, > please notify the sender immediately by telephone > and with a 'reply' message. > > Thank you for your co-operation. > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From lists at tangent.co.za Fri May 9 14:34:32 2008 From: lists at tangent.co.za (Chris Picton) Date: Fri, 9 May 2008 14:34:32 +0000 (UTC) Subject: [Linux-cluster] Re: GFS vs GFS2 References: <48219556.9060901@monster.co.in> <1210162161.3345.26.camel@localhost.localdomain> Message-ID: On Wed, 07 May 2008 13:09:21 +0100, Steven Whitehouse wrote: >> >> >> >> Is GFS2 not production-ready due to lack of testing, or due to >> >> known bugs? >> >> >> >> Any advice would be appreciated >> >> >> >> Chris >> >> >> >> > The answer is a bit of both. We are getting to the stage where the known > bugs are mostly solved or will be very shortly. You can see the state of > the bug list at any time by going to bugzilla.redhat.com and looking for > any bug with gfs2 in the summary line. There are currently approx 70 > such bugs, but please bear in mind that a large number of these are > asking for new features, and some of them are duplicates of the same bug > across different versions of RHEL and/or Fedora. > > We are currently at a stage where having a large number of people > helping us in testing would be very helpful. If you have your own > favourite filesystem test, or if you are in a position to run a test > application, then we would be very interested in any reports of > success/failure. Thank you for the update. I assume that if things go according to plan, we wont see a supported gfs2 in 5.2, but probably will in 5.3? I, oddly enough, currently have a situation where running some bonnie++ tests causes machines to hang using gfs, but not gfs2. I will file a bug report when I can. Chris From linux-cluster at merctech.com Fri May 9 14:41:57 2008 From: linux-cluster at merctech.com (linux-cluster at merctech.com) Date: Fri, 09 May 2008 10:41:57 -0400 Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: Your message of "Fri, 09 May 2008 11:14:01 +0200." <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de> References: <200805091114.10395.Klaus.Steinberger@physik.uni-muenchen.de> <20080509074522.96CD3618E0A@hormel.redhat.com> Message-ID: <23156.1210344117@mirchi> In the message dated: Fri, 09 May 2008 11:14:01 +0200, The pithy ruminations from Klaus Steinberger on <[Linux-cluster] Re: Why GFS is so slow? What it is waiting for?> were: => --===============1371945295== [SNIP!] => => There is some more tuning, which could be done unfortunately just on creati => on => of filesystem. The default number of Resource Groups is ways too large for => => nowadays TByte Filesystems. I would appreciate it greatly if you could expand on this. I'm setting up a cluster that will have several filesystems in the 3~6TB range. This will be GFS1 over LVM2, with SAN (no iSCSI) connections to the servers, if that has any bearing on the tuning suggestions. Thanks, Mark => => Sincerly, => Klaus => => => =2D- => Klaus Steinberger Beschleunigerlaboratorium => Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 Garching, Germany => =46AX: (+49 89)289 14280 EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE => URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ From theophanis_kontogiannis at yahoo.gr Fri May 9 15:31:07 2008 From: theophanis_kontogiannis at yahoo.gr (Theophanis Kontogiannis) Date: Fri, 9 May 2008 18:31:07 +0300 Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is> References: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is> Message-ID: <009f01c8b1e9$b48f36c0$9f01a8c0@corp.netone.gr> Hi Finnur, The LV is running on top of DRBD? Please provide us with a bit more details. Thank you, Theophanis Kontogiannis. _____ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Finnur Orn Gu?mundsson - TM Software Sent: Wednesday, May 07, 2008 11:57 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue Hi, I have a 2 node cluster running RHEL 5.1 x86_64 and fully patched as of today. If i cold-boot the cluster (both nodes) everything comes up smoothly and i can migrate services between nodes etc... However when i take one node down i am having difficultys leaving the fence domain. If i kill the fence daemon on the node i am trying to remove gracefully or use cman_tool leave force and reboot it, it comes back up, cman starts and it appears to join the cluster. The CLVMD init script hangs (just sits and hangs) and rgmanager does not start up correctly. Also CLVMD and rgmanager just sit in a zombie state and i have to poweroff or fence the node to get it to reboot.... The cluster never stabilizes it self until i cold boot both nodes. Then it is OK until the next reboot. I have read something about similar cases but did not find any magic solution! ;) My cluster.conf is attached. There is no firewall running on the machines in question (chkconfig iptables off;). Various output from the the that is rebooted: Output from group_tool services: type level name id state fence 0 default 00000000 JOIN_STOP_WAIT [1 2] dlm 1 rgmanager 00000000 JOIN_STOP_WAIT [1 2] Output from group_tool fenced: 1210193027 our_nodeid 1 our_name node-16 1210193027 listen 4 member 5 groupd 7 1210193029 client 3: join default 1210193029 delay post_join 120s post_fail 0s 1210193029 added 2 nodes from ccs 1210193542 client 3: dump Various output from the other node: Output from group_tool services: type level name id state fence 0 default 00010002 JOIN_START_WAIT [1 2] dlm 1 clvmd 00020002 none [2] dlm 1 rgmanager 00030002 FAIL_ALL_STOPPED [1 2] Output from group_tool dump fenced: 1210191957 our_nodeid 2 our_name node-17 1210191957 listen 4 member 5 groupd 7 1210191958 client 3: join default 1210191958 delay post_join 120s post_fail 0s 1210191958 added 2 nodes from ccs 1210191958 setid default 65538 1210191958 start default 1 members 2 1210191958 do_recovery stop 0 start 1 finish 0 1210191958 node "node-16" not a cman member, cn 1 1210191958 add first victim node-16 1210191959 node "node-16" not a cman member, cn 1 1210191960 node "node-16" not a cman member, cn 1 1210191961 node "node-16" not a cman member, cn 1 1210191962 node "node-16" not a cman member, cn 1 1210191963 node "node-16" not a cman member, cn 1 1210191964 node "node-16" not a cman member, cn 1 1210191965 node "node-16" not a cman member, cn 1 1210191966 node "node-16" not a cman member, cn 1 1210191967 node "node-16" not a cman member, cn 1 1210191968 node "node-16" not a cman member, cn 1 1210191969 node "node-16" not a cman member, cn 1 1210191970 node "node-16" not a cman member, cn 1 1210191971 node "node-16" not a cman member, cn 1 1210191972 node "node-16" not a cman member, cn 1 1210191973 node "node-16" not a cman member, cn 1 1210191974 reduce victim node-16 1210191974 delay of 16s leaves 0 victims 1210191974 finish default 1 1210191974 stop default 1210191974 start default 2 members 1 2 1210191974 do_recovery stop 1 start 2 finish 1 1210193633 client 3: dump Thanks in advanced. K?r kve?ja / Best Regards, Finnur ?rn Gu?mundsson Network Engineer - Network Operations fog at t.is TM Software Ur?arhvarf 6, IS-203 K?pavogur, Iceland Tel: +354 545 3000 - fax +354 545 3610 www.tm-software.is This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From fog at t.is Fri May 9 16:02:00 2008 From: fog at t.is (=?utf-8?B?RmlubnVyIMOWcm4gR3XDsG11bmRzc29uIC0gVE0gU29mdA==?= =?utf-8?B?d2FyZQ==?=) Date: Fri, 9 May 2008 16:02:00 -0000 Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue In-Reply-To: <009f01c8b1e9$b48f36c0$9f01a8c0@corp.netone.gr> References: <3DDA6E3E456E144DA3BB0A62A7F7F779020069C8@SKYHQAMX08.klasi.is> <009f01c8b1e9$b48f36c0$9f01a8c0@corp.netone.gr> Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F77902069695@SKYHQAMX08.klasi.is> Hi, Nop, The shared storage is provided by IBM SVC (SAN Volume Controller) through Qlogic 24xx HBA cards. The switches are Brocade 48000. Devices are created on top of dm-multipath devices. I really think this has something to do with the fence daemon since i am unable to leave the fence domain gracefully on a cold boot of the whole cluster. Thanks, Finnur From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Theophanis Kontogiannis Sent: 9. ma? 2008 15:31 To: 'linux clustering' Subject: RE: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue Hi Finnur, The LV is running on top of DRBD? Please provide us with a bit more details. Thank you, Theophanis Kontogiannis. ________________________________ From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Finnur Orn Gu?mundsson - TM Software Sent: Wednesday, May 07, 2008 11:57 PM To: linux-cluster at redhat.com Subject: [Linux-cluster] RHEL 5.1 (fully patched) - A weird issue Hi, I have a 2 node cluster running RHEL 5.1 x86_64 and fully patched as of today. If i cold-boot the cluster (both nodes) everything comes up smoothly and i can migrate services between nodes etc... However when i take one node down i am having difficultys leaving the fence domain. If i kill the fence daemon on the node i am trying to remove gracefully or use cman_tool leave force and reboot it, it comes back up, cman starts and it appears to join the cluster. The CLVMD init script hangs (just sits and hangs) and rgmanager does not start up correctly. Also CLVMD and rgmanager just sit in a zombie state and i have to poweroff or fence the node to get it to reboot.... The cluster never stabilizes it self until i cold boot both nodes. Then it is OK until the next reboot. I have read something about similar cases but did not find any magic solution! ;) My cluster.conf is attached. There is no firewall running on the machines in question (chkconfig iptables off;). Various output from the the that is rebooted: Output from group_tool services: type level name id state fence 0 default 00000000 JOIN_STOP_WAIT [1 2] dlm 1 rgmanager 00000000 JOIN_STOP_WAIT [1 2] Output from group_tool fenced: 1210193027 our_nodeid 1 our_name node-16 1210193027 listen 4 member 5 groupd 7 1210193029 client 3: join default 1210193029 delay post_join 120s post_fail 0s 1210193029 added 2 nodes from ccs 1210193542 client 3: dump Various output from the other node: Output from group_tool services: type level name id state fence 0 default 00010002 JOIN_START_WAIT [1 2] dlm 1 clvmd 00020002 none [2] dlm 1 rgmanager 00030002 FAIL_ALL_STOPPED [1 2] Output from group_tool dump fenced: 1210191957 our_nodeid 2 our_name node-17 1210191957 listen 4 member 5 groupd 7 1210191958 client 3: join default 1210191958 delay post_join 120s post_fail 0s 1210191958 added 2 nodes from ccs 1210191958 setid default 65538 1210191958 start default 1 members 2 1210191958 do_recovery stop 0 start 1 finish 0 1210191958 node "node-16" not a cman member, cn 1 1210191958 add first victim node-16 1210191959 node "node-16" not a cman member, cn 1 1210191960 node "node-16" not a cman member, cn 1 1210191961 node "node-16" not a cman member, cn 1 1210191962 node "node-16" not a cman member, cn 1 1210191963 node "node-16" not a cman member, cn 1 1210191964 node "node-16" not a cman member, cn 1 1210191965 node "node-16" not a cman member, cn 1 1210191966 node "node-16" not a cman member, cn 1 1210191967 node "node-16" not a cman member, cn 1 1210191968 node "node-16" not a cman member, cn 1 1210191969 node "node-16" not a cman member, cn 1 1210191970 node "node-16" not a cman member, cn 1 1210191971 node "node-16" not a cman member, cn 1 1210191972 node "node-16" not a cman member, cn 1 1210191973 node "node-16" not a cman member, cn 1 1210191974 reduce victim node-16 1210191974 delay of 16s leaves 0 victims 1210191974 finish default 1 1210191974 stop default 1210191974 start default 2 members 1 2 1210191974 do_recovery stop 1 start 2 finish 1 1210193633 client 3: dump Thanks in advanced. K?r kve?ja / Best Regards, Finnur ?rn Gu?mundsson Network Engineer - Network Operations fog at t.is TM Software Ur?arhvarf 6, IS-203 K?pavogur, Iceland Tel: +354 545 3000 - fax +354 545 3610 www.tm-software.is This e-mail message and any attachments are confidential and may be privileged. TM Software e-mail disclaimer: www.tm-software.is/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Klaus.Steinberger at physik.uni-muenchen.de Sat May 10 07:10:19 2008 From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger) Date: Sat, 10 May 2008 09:10:19 +0200 Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <20080509160012.D4B2061924F@hormel.redhat.com> References: <20080509160012.D4B2061924F@hormel.redhat.com> Message-ID: <200805100910.20104.Klaus.Steinberger@physik.uni-muenchen.de> > I would appreciate it greatly if you could expand on this. Yep, I used -r 2048 for my new 6 TByte filesystem. Here some information about it: The default for resource group size: -r MegaBytes gfs_mkfs will try to make Resource Groups about this big. The default is 256 MB. From the cluster FAQ: How can I performance-tune GFS or make it any faster? You shouldn't expect GFS to perform as fast as non-clustered file systems because it needs to do inter-node locking and file system coordination. That said, there are some things you can do to improve GFS performance. * Use -r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems. The issue has to do with the size of the GFS resource groups, which is an internal GFS structure for managing the file system data. This is an internal GFS structure, not to be confused with rgmanager's Resource Groups. Some file system slowdown can be blamed on having a large number of RGs. The bigger your file system, the more RGs you need. By default, gfs_mkfs carves your file system into 256MB RGs, but it allows you to specify a preferred RG size. The default, 256MB, is good for average size file systems, but you can increase performance on a bigger file system by using a bigger RG size. For example, my 40TB file system needs 156438 RGs of 256MB each and whenever GFS has to run that linked list, it takes a long time. The same 40TB file system can be created with bigger RGs--2048MB--requiring only 19555 of them. The time savings is dramatic: It took nearly 23 minutes for my system to read in all 156438 RG Structures with 256MB RGs, but only 4 minutes to read in the 19555 RG Structures for my 2048MB RGs. The time to do an operation like df on an empty file system dropped from 24 seconds with 256MB RGs, to under a second with 2048MB RGs. I'm sure that increasing the size of the RGs would help gfs_fsck's performance as well. Future versions of gfs_mkfs and mkfs.gfs2 will dynamically choose an RG size to reduce the RG overhead. Sincerly, Klaus -- Klaus Steinberger Beschleunigerlaboratorium Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 Garching, Germany FAX: (+49 89)289 14280 EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2002 bytes Desc: not available URL: From oliveiros.cristina at gmail.com Sat May 10 19:07:27 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Sat, 10 May 2008 20:07:27 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: <481E4544.1020301@bobich.net> References: <481E4544.1020301@bobich.net> Message-ID: Hello, Gordan, Are you sure those are the packages? When I try to yum install gfs-utils and kmod-gfs, it says it doesn't know those packages... The other three are installed ok. Help.... Best, Oliveiros 2008/5/5 Gordan Bobic : > Oliveiros Cristina wrote: > >> Howdy List, >> I would like to install gfs on a two node cluster running both fedora 8. >> >> Can anyone please kindly supply me with some links for the procedure? >> > > First part of the procedure is to not use FC if you plan for this to be > useful. FC7+ comes only with GFS2. There are no GFS1 packages included, and > GFS2 isn't stable yet. > > Which packages are needed, where to get them, that sort of things. >> > > cman > openais > gfs-utils > kmod-gfs > rgmanager > > Can't remember if there may be more. > > I've already googled up and down a little but I couldn't find no >> rigourous information on this, or maybe I am just blind :-) >> > > This is probably a not a bad place to start: > > https://open.datacore.ch/DCwiki.open/Wiki.jsp?page=GFS#section-GFS-Documentation > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Sat May 10 19:16:55 2008 From: gordan at bobich.net (Gordan Bobic) Date: Sat, 10 May 2008 20:16:55 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: References: <481E4544.1020301@bobich.net> Message-ID: <4825F4A7.9060609@bobich.net> Oliveiros Cristina wrote: > Hello, Gordan, > Are you sure those are the packages? > When I try to yum install gfs-utils and kmod-gfs, it says it doesn't > know those packages... > > The other three are installed ok. That's what the package names are on CentOS / RHEL5. I can't see why they would be different on Fedora, but you can always do: # yum list | grep -i gfs and see what that returns. It is possible that kmod-gfs is actually built into the kernel itself (Fedora have much more frequent complete kernel updates, as stability is not the main requirement), so there is no separate package. If I had to hazard a guess, then gfs-utils isn't there because GFS1 isn't included in Fedora, only GFS2. So try gfs2-utils. The output of "yum list" should make it obvious if this is the case. I said it before and I'll say it again - use FC's GFS2 at your peril. Last time I tried it (~6 months ago), it didn't work at all. Gordan From oliveiros.cristina at gmail.com Sat May 10 19:24:28 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Sat, 10 May 2008 20:24:28 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: <4825F4A7.9060609@bobich.net> References: <481E4544.1020301@bobich.net> <4825F4A7.9060609@bobich.net> Message-ID: Hello, Gordan. Thank you for your email here's what it says [langolier at bravo ~]$ yum list|grep -i gfs fgfs-Atlas.i386 0.3.1-5.fc8 fedora fgfs-base.noarch 0.9.11-0.1.pre1.fc8 fedora gfs-artemisia-fonts.noarch 20070415-1.fc8 updates gfs-baskerville-fonts.noarch 20070327-3.fc8 updates gfs-bodoni-classic-fonts.noarch 20070415-2.fc8 updates gfs-bodoni-fonts.noarch 20070415-1.fc8 updates gfs-complutum-fonts.noarch 20070413-3.fc8 updates gfs-didot-classic-fonts.noarch 20070415-1.fc8 updates gfs-didot-fonts.noarch 20070616-2.fc8 updates gfs-gazis-fonts.noarch 20070417-2.fc8 updates gfs-neohellenic-fonts.noarch 20070415-1.fc8 updates gfs-olga-fonts.noarch 20060908-1.fc8 updates gfs-porson-fonts.noarch 20060908-3.fc8 updates gfs-solomos-fonts.noarch 20071114-2.fc8 updates gfs-theokritos-fonts.noarch 20070415-2.fc8 updates gfs2-utils.i386 2.03.00-3.fc8 updates I need to use gfs , not gfs2. If it isn't included in fc, the alternative is to build from sources? Best, Oliveiros 2008/5/10 Gordan Bobic : > Oliveiros Cristina wrote: > >> Hello, Gordan, >> Are you sure those are the packages? >> When I try to yum install gfs-utils and kmod-gfs, it says it doesn't know >> those packages... >> >> The other three are installed ok. >> > > That's what the package names are on CentOS / RHEL5. I can't see why they > would be different on Fedora, but you can always do: > > # yum list | grep -i gfs > > and see what that returns. It is possible that kmod-gfs is actually built > into the kernel itself (Fedora have much more frequent complete kernel > updates, as stability is not the main requirement), so there is no separate > package. > > If I had to hazard a guess, then gfs-utils isn't there because GFS1 isn't > included in Fedora, only GFS2. So try gfs2-utils. The output of "yum list" > should make it obvious if this is the case. > > I said it before and I'll say it again - use FC's GFS2 at your peril. Last > time I tried it (~6 months ago), it didn't work at all. > > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Sat May 10 19:31:27 2008 From: gordan at bobich.net (Gordan Bobic) Date: Sat, 10 May 2008 20:31:27 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: References: <481E4544.1020301@bobich.net> <4825F4A7.9060609@bobich.net> Message-ID: <4825F80F.6040101@bobich.net> Oliveiros Cristina wrote: > Hello, Gordan. > Thank you for your email > > here's what it says > > [langolier at bravo ~]$ yum list|grep -i gfs [...] > gfs2-utils.i386 2.03.00-3.fc8 updates > > I need to use gfs , not gfs2. > If it isn't included in fc, the alternative is to build from sources? Personally I'd just use RHEL5/CentOS5, but if you want to go the sources route, good luck. Gordan From oliveiros.cristina at gmail.com Sat May 10 19:39:14 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Sat, 10 May 2008 12:39:14 -0700 Subject: [Linux-cluster] GFS on fedora In-Reply-To: <4825F80F.6040101@bobich.net> References: <481E4544.1020301@bobich.net> <4825F4A7.9060609@bobich.net> <4825F80F.6040101@bobich.net> Message-ID: Hello, Gordan. I didn't make a decision, was just asking. According to what yum list said, There is no way to install it through rpms, is my understanding correct? Oliveiros 2008/5/10 Gordan Bobic : > Oliveiros Cristina wrote: > >> Hello, Gordan. >> Thank you for your email >> >> here's what it says >> >> [langolier at bravo ~]$ yum list|grep -i gfs >> > [...] > >> gfs2-utils.i386 2.03.00-3.fc8 updates >> I need to use gfs , not gfs2. >> If it isn't included in fc, the alternative is to build from sources? >> > > Personally I'd just use RHEL5/CentOS5, but if you want to go the sources > route, good luck. > > > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gordan at bobich.net Sat May 10 23:12:26 2008 From: gordan at bobich.net (Gordan Bobic) Date: Sun, 11 May 2008 00:12:26 +0100 Subject: [Linux-cluster] GFS on fedora In-Reply-To: References: <481E4544.1020301@bobich.net> <4825F4A7.9060609@bobich.net> <4825F80F.6040101@bobich.net> Message-ID: <48262BDA.6010505@bobich.net> Oliveiros Cristina wrote: > According to what yum list said, > There is no way to install it through rpms, is my understanding correct? Yes, that's the size of it. GFS1 simply doesn't ship with FC6+ Gordan From michael.osullivan at auckland.ac.nz Sun May 11 11:04:49 2008 From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan) Date: Sun, 11 May 2008 23:04:49 +1200 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID Message-ID: <4826D2D1.7010103@auckland.ac.nz> Hi everyone, I have set up a small experimental network with a linux cluster and SAN that I want to have high data availability. There are 2 servers that I have put into a cluster using conga (thank you luci and ricci). There are 2 storage devices, each consisting of a basic server with 2 x 1TB disks. The cluster servers and the storage devices each have 2 NICs and are connected using 2 gigabit ethernet switches. I have created a single striped logical volume on each storage device using the 2 disks (to try and speed up I/O on the volume). These volumes (one on each storage device) are presented to the cluster servers using iSCSI (on the cluster servers) and iSCSI target (on the storage devices). Since there are multiple NICs on the storage devices I have set up two iSCSI portals to each logical volume. I have then used mdadm to ensure the volumes are accessible via multipath. Finally, since I want the storage devices to present the data in a highly available way I have used mdadm to create a software raid-5 across the two multipathed volumes (I realise this is essentially mirroring on the 2 storage devices but I am trying to set this up to be extensible to extra storage devices). My next step is to present the raid array (of the two multipathed volumes - one on each storage device) as a GFS to the cluster servers to ensure that locking of access to the data is handled properly. I have recently read that multipathing is possible within GFS, but raid is not (yet). Since I want the two storage devices in a raid-5 array and I am using iSCSI I'm not sure if I should try and use GFS to do the multipathing. Also, being a linux/storage/clustering newbie I'm not sure if my approach is the best thing to do. I want to make sure that my system has no single point of failure that will make any of the data inaccessible. I'm pretty sure our network design supports this. I assume (if I configure it right) the cluster will ensure services will keep going if one of the cluster servers goes down. Thus the only weak point was the storage devices which I hope I have now strengthened by essentially implementing network raid across iSCSI and then presented as a single GFS. I would really appreciate comments/advice/constructive criticism as I have really been learning much of this as I go. Cheers, Mike From swhiteho at redhat.com Mon May 12 08:50:14 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Mon, 12 May 2008 09:50:14 +0100 Subject: [Linux-cluster] Re: GFS vs GFS2 In-Reply-To: References: <48219556.9060901@monster.co.in> <1210162161.3345.26.camel@localhost.localdomain> Message-ID: <1210582214.3635.493.camel@quoit> Hi, On Fri, 2008-05-09 at 14:34 +0000, Chris Picton wrote: > On Wed, 07 May 2008 13:09:21 +0100, Steven Whitehouse wrote: > >> >> > >> >> Is GFS2 not production-ready due to lack of testing, or due to > >> >> known bugs? > >> >> > >> >> Any advice would be appreciated > >> >> > >> >> Chris > >> >> > >> >> > > The answer is a bit of both. We are getting to the stage where the known > > bugs are mostly solved or will be very shortly. You can see the state of > > the bug list at any time by going to bugzilla.redhat.com and looking for > > any bug with gfs2 in the summary line. There are currently approx 70 > > such bugs, but please bear in mind that a large number of these are > > asking for new features, and some of them are duplicates of the same bug > > across different versions of RHEL and/or Fedora. > > > > We are currently at a stage where having a large number of people > > helping us in testing would be very helpful. If you have your own > > favourite filesystem test, or if you are in a position to run a test > > application, then we would be very interested in any reports of > > success/failure. > > Thank you for the update. > > I assume that if things go according to plan, we wont see a supported > gfs2 in 5.2, but probably will in 5.3? > That is quite likely, yes. > I, oddly enough, currently have a situation where running some bonnie++ > tests causes machines to hang using gfs, but not gfs2. I will file a bug > report when I can. > > > Chris > Ok, all such information is useful. Thanks, Steve. From sanelson at gmail.com Mon May 12 10:14:08 2008 From: sanelson at gmail.com (Stephen Nelson-Smith) Date: Mon, 12 May 2008 11:14:08 +0100 Subject: [Linux-cluster] Oracle Shared-Nothing Message-ID: Hi, I want to implement a shared-nothing active/passive failover cluster for Oracle 10g. RAC is out of budget. I'm looking at drbd + heartbeat or cluster suite. Any experiences? recommendations? gotchas? In particular, any idea whether Oracle would support a non-RAC setup? S. From lhh at redhat.com Mon May 12 15:46:26 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 12 May 2008 11:46:26 -0400 Subject: [Linux-cluster] GFS CLuster with LIDS In-Reply-To: <482415E4.5000508@monster.co.in> References: <482415E4.5000508@monster.co.in> Message-ID: <1210607186.10406.31.camel@ayanami.boston.devel.redhat.com> On Fri, 2008-05-09 at 14:44 +0530, Vimal Gupta wrote: > Hi All, > > I am having CentOs with LIDS running on that system . Can I implement > GFS cluster on that node with the lids. > Anyone have same kind of exp. please share... Could you provide a link to LIDS ? -- Lon From rpeterso at redhat.com Mon May 12 17:06:06 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 12 May 2008 12:06:06 -0500 Subject: [Linux-cluster] Re: Why GFS is so slow? What it is waiting for? In-Reply-To: <200805100910.20104.Klaus.Steinberger@physik.uni-muenchen.de> References: <20080509160012.D4B2061924F@hormel.redhat.com> <200805100910.20104.Klaus.Steinberger@physik.uni-muenchen.de> Message-ID: <1210611966.2738.14.camel@technetium.msp.redhat.com> On Sat, 2008-05-10 at 09:10 +0200, Klaus Steinberger wrote: > How can I performance-tune GFS or make it any faster? (snip) > Use -r 2048 on gfs_mkfs and mkfs.gfs2 for large file systems. Actually, this is a delicate balance. If you have too many resource groups (RGs) then it spends a lot of time searching to find the one it needs, but once it finds the RG, the bitmap search will be fast. If you have fewer RGs, it will spend less time searching for the right one, but the bitmaps for each will be bigger, so it will spend more time searching the bitmap once it has been found. I've written a performance enhancement to the "bitfit" algorithm for GFS2 that increases the speed of bitmap searches, making it more speedy to use fewer RGs with larger bitmaps. That code could be back-ported to GFS, but it hasn't been done yet. Actually, there are a lot of performance improvements done for GFS2 that COULD theoretically be ported back to GFS, if someone took the time. Perhaps I'll open an RFE bugzilla and post any patches I come up with to the cluster-devel mailing list. Regards, Bob Peterson Red Hat Clustering & GFS From lhh at redhat.com Mon May 12 17:12:48 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 12 May 2008 13:12:48 -0400 Subject: [Linux-cluster] Oracle Shared-Nothing In-Reply-To: References: Message-ID: <1210612368.10406.50.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-05-12 at 11:14 +0100, Stephen Nelson-Smith wrote: > Hi, > > I want to implement a shared-nothing active/passive failover cluster > for Oracle 10g. RAC is out of budget. > > I'm looking at drbd + heartbeat or cluster suite. > > Any experiences? recommendations? gotchas? > > In particular, any idea whether Oracle would support a non-RAC setup? They support non-RAC configurations, but I doubt they would support running the database on DRBD. You should call Oracle on this one. Also, I **think** buying Oracle Database 10g Release 2 these days gets you Oracle's failover technology called Cluster Ware - so you might not need heartbeat or rgmanager (Cluster Suite component that provides failover for off-the-shelf apps). Again, call Oracle and ask. They want your money, so surely they will answer your questions ;) If you're going to spend the money for Oracle (and you need failover support), I'd really recommend getting a FC or iSCSI RAID array with dual redundant internal controllers and a remote power switch. There are some good SCSI arrays available at lower price points than FC and often iSCSI solutions, as well (but stay away from JBOD/host-RAID configurations). -- Lon From jas199931 at yahoo.com Mon May 12 23:34:45 2008 From: jas199931 at yahoo.com (Ja S) Date: Mon, 12 May 2008 16:34:45 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? Message-ID: <887833.81823.qm@web32204.mail.mud.yahoo.com> Hi, All: When an application on a cluster node A needs to access a file on a SAN storage, how DLM process the lock request? Should DLM firstly determine whether there already exists a lock resource mapped to the file, by doing the following things in the order 1) looking at the master lock resources on the node A, 2) searching the local copies of lock resources on the node A, 3) searching the lock directory on the node A to find out whether a master lock resource assosicated with the file exists on another node, 4) sending messages to other nodes in the cluster for the location of the master lock resource? I ask this question because from some online articles, it seems that DLM will always search the cluster-wide lock directory across the whole cluster first to find the location of the master lock resource. Can anyone kindly confirm the order of processes that DLM does? Many thanks in advance. Jas ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From fdinitto at redhat.com Tue May 13 04:52:13 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Tue, 13 May 2008 06:52:13 +0200 (CEST) Subject: [Linux-cluster] GFS on fedora In-Reply-To: <48262BDA.6010505@bobich.net> References: <481E4544.1020301@bobich.net> <4825F4A7.9060609@bobich.net> <4825F80F.6040101@bobich.net> <48262BDA.6010505@bobich.net> Message-ID: Hi guys, On Sun, 11 May 2008, Gordan Bobic wrote: > Oliveiros Cristina wrote: >> According to what yum list said, >> There is no way to install it through rpms, is my understanding correct? > > Yes, that's the size of it. GFS1 simply doesn't ship with FC6+ > There are a few reasons why GFS1 is not in Fedora anylonger. The first and most important one: http://fedoraproject.org/wiki/Packaging/Guidelines#head-5d326feb10ebf0624361729239c58719e31b6f93 Fedora does not allow external kernel modules anylonger. GFS1 will never be upstream. In order to run GFS1, a patch to the main kernel is required, and this patch will never be included upstream either. That makes it basically impossible for us to maintain a separate rpm repository to provide GFS1 without duplicating a lot of work in maintain an external kernel. Fabio -- I'm going to make him an offer he can't refuse. From vimal at monster.co.in Tue May 13 05:27:21 2008 From: vimal at monster.co.in (Vimal Gupta) Date: Tue, 13 May 2008 10:57:21 +0530 Subject: [Linux-cluster] GFS CLuster with LIDS In-Reply-To: <1210607186.10406.31.camel@ayanami.boston.devel.redhat.com> References: <482415E4.5000508@monster.co.in> <1210607186.10406.31.camel@ayanami.boston.devel.redhat.com> Message-ID: <482926B9.3000505@monster.co.in> Hi Lon, Sorry For delay, Here is the Link for LIDS... http://www.lids.org/document/build_lids-0.2-1.html Lon Hohberger wrote: > On Fri, 2008-05-09 at 14:44 +0530, Vimal Gupta wrote: > >> Hi All, >> >> I am having CentOs with LIDS running on that system . Can I implement >> GFS cluster on that node with the lids. >> Anyone have same kind of exp. please share... >> > > Could you provide a link to LIDS ? > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > -- Vimal Gupta Sr. System Administrator Monster.com India Pvt.Ltd. From sanelson at gmail.com Tue May 13 06:51:46 2008 From: sanelson at gmail.com (Stephen Nelson-Smith) Date: Tue, 13 May 2008 07:51:46 +0100 Subject: [Linux-cluster] Oracle Shared-Nothing In-Reply-To: <1210612368.10406.50.camel@ayanami.boston.devel.redhat.com> References: <1210612368.10406.50.camel@ayanami.boston.devel.redhat.com> Message-ID: Hi... On Mon, May 12, 2008 at 6:12 PM, Lon Hohberger wrote: > > On Mon, 2008-05-12 at 11:14 +0100, Stephen Nelson-Smith wrote: > > Hi, > > > > I want to implement a shared-nothing active/passive failover cluster > > for Oracle 10g. RAC is out of budget. > > > > I'm looking at drbd + heartbeat or cluster suite. > > > > Any experiences? recommendations? gotchas? > > > > In particular, any idea whether Oracle would support a non-RAC setup? > > They support non-RAC configurations, but I doubt they would support > running the database on DRBD. You should call Oracle on this one. I will :) > Also, I **think** buying Oracle Database 10g Release 2 these days gets > you Oracle's failover technology called Cluster Ware - so you might not > need heartbeat or rgmanager (Cluster Suite component that provides > failover for off-the-shelf apps). I have that in mind, yes. > If you're going to spend the money for Oracle (and you need failover > support), I'd really recommend getting a FC or iSCSI RAID array with > dual redundant internal controllers and a remote power switch. The client is dead set against a RAID array, partly on cost (budget v.tight), but also on physical space in the rack - there's only 2U left, and a new rack costs ?1000 pcm. Do I recall mention of cmirror + GNBD as a possible solution to shared-nothing, no-disk-array setups? > -- Lon S. From ccaulfie at redhat.com Tue May 13 07:06:57 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 13 May 2008 08:06:57 +0100 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <887833.81823.qm@web32204.mail.mud.yahoo.com> References: <887833.81823.qm@web32204.mail.mud.yahoo.com> Message-ID: <48293E11.5070405@redhat.com> Ja S wrote: > Hi, All: > > > When an application on a cluster node A needs to > access a file on a SAN storage, how DLM process the > lock request? > > Should DLM firstly determine whether there already > exists a lock resource mapped to the file, by doing > the following things in the order 1) looking at the > master lock resources on the node A, 2) searching the > local copies of lock resources on the node A, 3) > searching the lock directory on the node A to find out > whether a master lock resource assosicated with the > file exists on another node, 4) sending messages to > other nodes in the cluster for the location of the > master lock resource? > > I ask this question because from some online articles, > it seems that DLM will always search the cluster-wide > lock directory across the whole cluster first to find > the location of the master lock resource. > > Can anyone kindly confirm the order of processes that > DLM does? > This should be very well documented, as it's common amongst DLM implementations. If a node needs to lock a resource that it doesn't know about then it hashes the name to get a directory node ID, than asks that node for the master node. if there is no master node (the resource is not active) then the requesting node is made master if the node does know the master, (other locks on the resource exist) then it will go straight to that master node. The node then asks the master for the lock. The lock status (granted, waiting) is recorded in the local copy. Chrissie From jas199931 at yahoo.com Tue May 13 08:31:49 2008 From: jas199931 at yahoo.com (Ja S) Date: Tue, 13 May 2008 01:31:49 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <48293E11.5070405@redhat.com> Message-ID: <598349.86681.qm@web32204.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > Hi, All: > > > > > > When an application on a cluster node A needs to > > access a file on a SAN storage, how DLM process > the > > lock request? > > > > Should DLM firstly determine whether there already > > exists a lock resource mapped to the file, by > doing > > the following things in the order 1) looking at > the > > master lock resources on the node A, 2) searching > the > > local copies of lock resources on the node A, 3) > > searching the lock directory on the node A to find > out > > whether a master lock resource assosicated with > the > > file exists on another node, 4) sending messages > to > > other nodes in the cluster for the location of the > > master lock resource? > > > > I ask this question because from some online > articles, > > it seems that DLM will always search the > cluster-wide > > lock directory across the whole cluster first to > find > > the location of the master lock resource. > > > > Can anyone kindly confirm the order of processes > that > > DLM does? > > > > > This should be very well documented, as it's common > amongst DLM > implementations. > I think I may be blind. I have not yet found a document which describes the sequence of processes in a precise way. I tried to read the source code but I gave up due to lack of comments. > If a node needs to lock a resource that it doesn't > know about then it > hashes the name to get a directory node ID, than > asks that node for the > master node. if there is no master node (the > resource is not active) > then the requesting node is made master > > if the node does know the master, (other locks on > the resource exist) > then it will go straight to that master node. Thanks for the description. However, one point is still not clear to me is how a node can conclude whether it __knows__ the lock resource or not? Will the node search 1) the list of master lock resources owned by itself, then 2) the list of local copies of lock resouces stored on itself, then 3) the lock directory on itself, sequentially? or just search 1) and 2) then if it cannot find any, it will get the node ID based on a hash function (possibly the output of the hash function is itself?) who may hold the location of the master lock resource, then ask the node for the master node, and so on? If so, what exact search algorithms are used, the linear search, the binary search, or what else? I would like to understand the processes in an exact and precise way since our system has been heavily loaded. Sometime there are more than 100K lock resouces on a node. Understanding every bit of the details will help us tune the current system. Many thanks for your time and look forward to your kind reply, Regards, Jas > The node then asks the master for the lock. > > The lock status (granted, waiting) is recorded in > the local copy. > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ From ccaulfie at redhat.com Tue May 13 08:41:27 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 13 May 2008 09:41:27 +0100 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <598349.86681.qm@web32204.mail.mud.yahoo.com> References: <598349.86681.qm@web32204.mail.mud.yahoo.com> Message-ID: <48295437.9080500@redhat.com> Ja S wrote: > --- Christine Caulfield wrote: > >> Ja S wrote: >>> Hi, All: >>> >>> >>> When an application on a cluster node A needs to >>> access a file on a SAN storage, how DLM process >> the >>> lock request? >>> >>> Should DLM firstly determine whether there already >>> exists a lock resource mapped to the file, by >> doing >>> the following things in the order 1) looking at >> the >>> master lock resources on the node A, 2) searching >> the >>> local copies of lock resources on the node A, 3) >>> searching the lock directory on the node A to find >> out >>> whether a master lock resource assosicated with >> the >>> file exists on another node, 4) sending messages >> to >>> other nodes in the cluster for the location of the >>> master lock resource? >>> >>> I ask this question because from some online >> articles, >>> it seems that DLM will always search the >> cluster-wide >>> lock directory across the whole cluster first to >> find >>> the location of the master lock resource. >>> >>> Can anyone kindly confirm the order of processes >> that >>> DLM does? >>> >> >> This should be very well documented, as it's common >> amongst DLM >> implementations. >> > > I think I may be blind. I have not yet found a > document which describes the sequence of processes in > a precise way. I tried to read the source code but I > gave up due to lack of comments. > > >> If a node needs to lock a resource that it doesn't >> know about then it >> hashes the name to get a directory node ID, than >> asks that node for the >> master node. if there is no master node (the >> resource is not active) >> then the requesting node is made master >> >> if the node does know the master, (other locks on >> the resource exist) >> then it will go straight to that master node. > > > Thanks for the description. > > However, one point is still not clear to me is how a > node can conclude whether it __knows__ the lock > resource or not? A node knows the resource if it has a local copy. It's as simple as that. -- Chrissie From sasmaz at itu.edu.tr Tue May 13 08:43:48 2008 From: sasmaz at itu.edu.tr (aydin sasmaz) Date: Tue, 13 May 2008 11:43:48 +0300 Subject: [Linux-cluster] High availability xen cluster In-Reply-To: <47E103BD.4030704@artegence.com> References: <4eccbcc3e1e1f2b73b7cd81b3bff73b6@mail.van-schelve.de> <47E103BD.4030704@artegence.com> Message-ID: <018401c8b4d5$7a6719b0$6f354d10$@edu.tr> Hi I would like to implement automatic failover. Is there any way to do with using cluster suite and redhat ap 5.1? regards -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki Sent: Wednesday, March 19, 2008 2:15 PM To: public at van-schelve.de; linux clustering Subject: Re: [Linux-cluster] High availability xen cluster > There are three disks in each data centre. Currently I only use the disks > in rza. > > At the moment I'm testing with a two node cluster. The virtual machines are > on SAN disks and I can live migrate from one node to the other one. But > what I have to cover is the disaster. What happens when the fabric in rza > crashes? My virtual maschines are unavailable. What I'm thinking about is a > hardware based > mirroring between the both fabrics and break up the mirror when the > disaster happens or we need to power off the storage for maintenance. But > my problem is that I see duplicate pv id's in this situation. I cannot > mirror based on lvm because it is too slow. Do You want do implement automatic failover or manual? You need hardware based mirroring - I'm sure that Hitachi support it(but it cost come $ for license). The second choice is DRBD[1] with fe. iSCSI or GNBD, but You have SAN which is better. If You need automatic failover, You have to set device-mapper-multipath with Active/Standby configuration where Active is Your rza and Standby Your secondary rzb[2]. In this case You have to set also synchronous replication both side. [1] - http://www.drbd.org/ [2] - http://storagefoo.blogspot.com/2006/08/linux-native-multipathing-device.html Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From sasmaz at itu.edu.tr Tue May 13 08:43:48 2008 From: sasmaz at itu.edu.tr (aydin sasmaz) Date: Tue, 13 May 2008 11:43:48 +0300 Subject: [Linux-cluster] High availability xen cluster In-Reply-To: <47E103BD.4030704@artegence.com> References: <4eccbcc3e1e1f2b73b7cd81b3bff73b6@mail.van-schelve.de> <47E103BD.4030704@artegence.com> Message-ID: <018401c8b4d5$7a6719b0$6f354d10$@edu.tr> Hi I would like to implement automatic failover. Is there any way to do with using cluster suite and redhat ap 5.1? regards -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Maciej Bogucki Sent: Wednesday, March 19, 2008 2:15 PM To: public at van-schelve.de; linux clustering Subject: Re: [Linux-cluster] High availability xen cluster > There are three disks in each data centre. Currently I only use the disks > in rza. > > At the moment I'm testing with a two node cluster. The virtual machines are > on SAN disks and I can live migrate from one node to the other one. But > what I have to cover is the disaster. What happens when the fabric in rza > crashes? My virtual maschines are unavailable. What I'm thinking about is a > hardware based > mirroring between the both fabrics and break up the mirror when the > disaster happens or we need to power off the storage for maintenance. But > my problem is that I see duplicate pv id's in this situation. I cannot > mirror based on lvm because it is too slow. Do You want do implement automatic failover or manual? You need hardware based mirroring - I'm sure that Hitachi support it(but it cost come $ for license). The second choice is DRBD[1] with fe. iSCSI or GNBD, but You have SAN which is better. If You need automatic failover, You have to set device-mapper-multipath with Active/Standby configuration where Active is Your rza and Standby Your secondary rzb[2]. In this case You have to set also synchronous replication both side. [1] - http://www.drbd.org/ [2] - http://storagefoo.blogspot.com/2006/08/linux-native-multipathing-device.html Best Regards Maciej Bogucki -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From jas199931 at yahoo.com Tue May 13 08:49:16 2008 From: jas199931 at yahoo.com (Ja S) Date: Tue, 13 May 2008 01:49:16 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <48295437.9080500@redhat.com> Message-ID: <412133.31700.qm@web32203.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > --- Christine Caulfield > wrote: > > > >> Ja S wrote: > >>> Hi, All: > >>> > >>> > >>> When an application on a cluster node A needs to > >>> access a file on a SAN storage, how DLM process > >> the > >>> lock request? > >>> > >>> Should DLM firstly determine whether there > already > >>> exists a lock resource mapped to the file, by > >> doing > >>> the following things in the order 1) looking at > >> the > >>> master lock resources on the node A, 2) > searching > >> the > >>> local copies of lock resources on the node A, 3) > >>> searching the lock directory on the node A to > find > >> out > >>> whether a master lock resource assosicated with > >> the > >>> file exists on another node, 4) sending messages > >> to > >>> other nodes in the cluster for the location of > the > >>> master lock resource? > >>> > >>> I ask this question because from some online > >> articles, > >>> it seems that DLM will always search the > >> cluster-wide > >>> lock directory across the whole cluster first > to > >> find > >>> the location of the master lock resource. > >>> > >>> Can anyone kindly confirm the order of processes > >> that > >>> DLM does? > >>> > >> > >> This should be very well documented, as it's > common > >> amongst DLM > >> implementations. > >> > > > > I think I may be blind. I have not yet found a > > document which describes the sequence of processes > in > > a precise way. I tried to read the source code but > I > > gave up due to lack of comments. > > > > > >> If a node needs to lock a resource that it > doesn't > >> know about then it > >> hashes the name to get a directory node ID, than > >> asks that node for the > >> master node. if there is no master node (the > >> resource is not active) > >> then the requesting node is made master > >> > >> if the node does know the master, (other locks on > >> the resource exist) > >> then it will go straight to that master node. > > > > > > Thanks for the description. > > > > However, one point is still not clear to me is how > a > > node can conclude whether it __knows__ the lock > > resource or not? > > A node knows the resource if it has a local copy. > It's as simple as that. > If the node is a human and has a brain, it can "immediately" recall that it knows the lock resouce. However, for a computer program, it does not "know" anything until it search the target in what it has on hand. Therefore, the point here is the __search__. What should the node search and in which order, and how it searches? If I missed anything, please kindly point out so that I can clarify my question as clear as possible. Thanks again for your time and kind reply. Jas > > > -- > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From ccaulfie at redhat.com Tue May 13 09:06:06 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Tue, 13 May 2008 10:06:06 +0100 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <412133.31700.qm@web32203.mail.mud.yahoo.com> References: <412133.31700.qm@web32203.mail.mud.yahoo.com> Message-ID: <482959FE.7040002@redhat.com> Ja S wrote: > --- Christine Caulfield wrote: > >> Ja S wrote: >>> --- Christine Caulfield >> wrote: >>>> Ja S wrote: >>>>> Hi, All: >>>>> >>>>> >>>>> When an application on a cluster node A needs to >>>>> access a file on a SAN storage, how DLM process >>>> the >>>>> lock request? >>>>> >>>>> Should DLM firstly determine whether there >> already >>>>> exists a lock resource mapped to the file, by >>>> doing >>>>> the following things in the order 1) looking at >>>> the >>>>> master lock resources on the node A, 2) >> searching >>>> the >>>>> local copies of lock resources on the node A, 3) >>>>> searching the lock directory on the node A to >> find >>>> out >>>>> whether a master lock resource assosicated with >>>> the >>>>> file exists on another node, 4) sending messages >>>> to >>>>> other nodes in the cluster for the location of >> the >>>>> master lock resource? >>>>> >>>>> I ask this question because from some online >>>> articles, >>>>> it seems that DLM will always search the >>>> cluster-wide >>>>> lock directory across the whole cluster first >> to >>>> find >>>>> the location of the master lock resource. >>>>> >>>>> Can anyone kindly confirm the order of processes >>>> that >>>>> DLM does? >>>>> >>>> This should be very well documented, as it's >> common >>>> amongst DLM >>>> implementations. >>>> >>> I think I may be blind. I have not yet found a >>> document which describes the sequence of processes >> in >>> a precise way. I tried to read the source code but >> I >>> gave up due to lack of comments. >>> >>> >>>> If a node needs to lock a resource that it >> doesn't >>>> know about then it >>>> hashes the name to get a directory node ID, than >>>> asks that node for the >>>> master node. if there is no master node (the >>>> resource is not active) >>>> then the requesting node is made master >>>> >>>> if the node does know the master, (other locks on >>>> the resource exist) >>>> then it will go straight to that master node. >>> >>> Thanks for the description. >>> >>> However, one point is still not clear to me is how >> a >>> node can conclude whether it __knows__ the lock >>> resource or not? >> A node knows the resource if it has a local copy. >> It's as simple as that. >> > > If the node is a human and has a brain, it can > "immediately" recall that it knows the lock resouce. > However, for a computer program, it does not "know" > anything until it search the target in what it has on > hand. > > Therefore, the point here is the __search__. What > should the node search and in which order, and how it > searches? > > If I missed anything, please kindly point out so that > I can clarify my question as clear as possible. > > I think you're trying to make this more complicated than it is. As I've said several times now, a node "knows" a resource if there is a local lock on it. That's it! It's not more or less difficult than that, really it isn't! If the node doesn't have a local lock on the resource then it doesn't "know" it and has to ask the directory node where it is mastered. (As I'm sure you already know, locks are known by their lock ID numbers, so there's no "search" involved there either). There is no "search" for a lock around the cluster, that's what the directory node provides. And as I have already said, that is located by hashing the resource name to yield a node ID. So, if you like, the "search" you seem to be looking for is simply a hash of the resource name. But it's not really a search, and it's only invoked when the node first encounters a resource. Chrissie From jas199931 at yahoo.com Tue May 13 09:51:48 2008 From: jas199931 at yahoo.com (Ja S) Date: Tue, 13 May 2008 02:51:48 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <482959FE.7040002@redhat.com> Message-ID: <394218.92537.qm@web32202.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > --- Christine Caulfield > wrote: > > > >> Ja S wrote: > >>> --- Christine Caulfield > >> wrote: > >>>> Ja S wrote: > >>>>> Hi, All: > >>>>> > >>>>> > >>>>> When an application on a cluster node A needs > to > >>>>> access a file on a SAN storage, how DLM > process > >>>> the > >>>>> lock request? > >>>>> > >>>>> Should DLM firstly determine whether there > >> already > >>>>> exists a lock resource mapped to the file, by > >>>> doing > >>>>> the following things in the order 1) looking > at > >>>> the > >>>>> master lock resources on the node A, 2) > >> searching > >>>> the > >>>>> local copies of lock resources on the node A, > 3) > >>>>> searching the lock directory on the node A to > >> find > >>>> out > >>>>> whether a master lock resource assosicated > with > >>>> the > >>>>> file exists on another node, 4) sending > messages > >>>> to > >>>>> other nodes in the cluster for the location of > >> the > >>>>> master lock resource? > >>>>> > >>>>> I ask this question because from some online > >>>> articles, > >>>>> it seems that DLM will always search the > >>>> cluster-wide > >>>>> lock directory across the whole cluster first > >> to > >>>> find > >>>>> the location of the master lock resource. > >>>>> > >>>>> Can anyone kindly confirm the order of > processes > >>>> that > >>>>> DLM does? > >>>>> > >>>> This should be very well documented, as it's > >> common > >>>> amongst DLM > >>>> implementations. > >>>> > >>> I think I may be blind. I have not yet found a > >>> document which describes the sequence of > processes > >> in > >>> a precise way. I tried to read the source code > but > >> I > >>> gave up due to lack of comments. > >>> > >>> > >>>> If a node needs to lock a resource that it > >> doesn't > >>>> know about then it > >>>> hashes the name to get a directory node ID, > than > >>>> asks that node for the > >>>> master node. if there is no master node (the > >>>> resource is not active) > >>>> then the requesting node is made master > >>>> > >>>> if the node does know the master, (other locks > on > >>>> the resource exist) > >>>> then it will go straight to that master node. > >>> > >>> Thanks for the description. > >>> > >>> However, one point is still not clear to me is > how > >> a > >>> node can conclude whether it __knows__ the lock > >>> resource or not? > >> A node knows the resource if it has a local copy. > >> It's as simple as that. > >> > > > > If the node is a human and has a brain, it can > > "immediately" recall that it knows the lock > resouce. > > However, for a computer program, it does not > "know" > > anything until it search the target in what it has > on > > hand. > > > > Therefore, the point here is the __search__. What > > should the node search and in which order, and how > it > > searches? > > > > If I missed anything, please kindly point out so > that > > I can clarify my question as clear as possible. > > > > > > I think you're trying to make this more complicated > than it is. Maybe, :-), Just want to know what exact happened. > As I've > said several times now, a node "knows" a resource if > there is a local > lock on it. That's it! It's not more or less > difficult than that, really > it isn't! At the same time, there could be 30K local locks on a node in our system. How are these local locks stored or mapped, in a hash table, or a big but sparse array? >From the source code, I guess the local locks are stored in a list. Correct me if I am wrong since I really have not yet studied the code very carefully. > If the node doesn't have a local lock on > the resource then it > doesn't "know" it and has to ask the directory node > where it is > mastered. Does it mean even if the node owns the master lock resource but it doesn't have a local lock associated with the master lock resource, it still needs to ask the directory node? > (As I'm sure you already know, locks are > known by their lock > ID numbers, so there's no "search" involved there > either). True. When a request on a file has been issued, the inode number of file (in hex) will be used to make up the name of lock resource (the second number of the name). It is true that the node has the list of lock resources (either local copy or master copy) as long as it has local locks. However, the node can just like a teacher, who has a list of students and the students are known by their names or student IDs. When the teacher want to fill up the final grade for each student, he still needs to look at the form and search for the student name and put the grade beside the name. The search can be done according to the student ID if the form is sorted by the student ID or by the student surname if the form is sorted by the surname. Either way, the teacher still needs to __search__. Same thing should be applied to the node. The node may use a smart way to search the lock resources kept in the list, possibly a hash function (but I doubt there is a very good hash function which can find the location of the target lock resource immediately). Am I still wrong? > > There is no "search" for a lock around the cluster, > that's what the > directory node provides. And as I have already said, > that is located by > hashing the resource name to yield a node ID. Yes, yes, I think I didn't say it clearly. The lock resource is located by hashing the resource name to yield a node ID. But before hashing, the node still needs to perform the search within the list or whatever data strucute that keeps the local locks on itself to find out whether the target lock resource is already in use or "known". Isn't it? I am sorry it seems that I am so stubborn. Thanks for your patient. You are a really good helper. Jas > So, if you like, the "search" you seem to be looking > for is simply a > hash of the resource name. But it's not really a > search, and it's only > invoked when the node first encounters a resource. > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From oliveiros.cristina at gmail.com Tue May 13 10:59:16 2008 From: oliveiros.cristina at gmail.com (Oliveiros Cristina) Date: Tue, 13 May 2008 03:59:16 -0700 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <482959FE.7040002@redhat.com> References: <412133.31700.qm@web32203.mail.mud.yahoo.com> <482959FE.7040002@redhat.com> Message-ID: Hello, Christine and Ja S. I've been following this thread, because I need, like Ja, a detailed knowledge of the DLM inner workings. Your explanations were detailed and clear, Christine, but, just for the sake of having it documented, do you know where I can download a white paper or article telling this whole story? Thanks in advance Best, Oliveiros 2008/5/13 Christine Caulfield : > Ja S wrote: > > --- Christine Caulfield wrote: > > > >> Ja S wrote: > >>> --- Christine Caulfield > >> wrote: > >>>> Ja S wrote: > >>>>> Hi, All: > >>>>> > >>>>> > >>>>> When an application on a cluster node A needs to > >>>>> access a file on a SAN storage, how DLM process > >>>> the > >>>>> lock request? > >>>>> > >>>>> Should DLM firstly determine whether there > >> already > >>>>> exists a lock resource mapped to the file, by > >>>> doing > >>>>> the following things in the order 1) looking at > >>>> the > >>>>> master lock resources on the node A, 2) > >> searching > >>>> the > >>>>> local copies of lock resources on the node A, 3) > >>>>> searching the lock directory on the node A to > >> find > >>>> out > >>>>> whether a master lock resource assosicated with > >>>> the > >>>>> file exists on another node, 4) sending messages > >>>> to > >>>>> other nodes in the cluster for the location of > >> the > >>>>> master lock resource? > >>>>> > >>>>> I ask this question because from some online > >>>> articles, > >>>>> it seems that DLM will always search the > >>>> cluster-wide > >>>>> lock directory across the whole cluster first > >> to > >>>> find > >>>>> the location of the master lock resource. > >>>>> > >>>>> Can anyone kindly confirm the order of processes > >>>> that > >>>>> DLM does? > >>>>> > >>>> This should be very well documented, as it's > >> common > >>>> amongst DLM > >>>> implementations. > >>>> > >>> I think I may be blind. I have not yet found a > >>> document which describes the sequence of processes > >> in > >>> a precise way. I tried to read the source code but > >> I > >>> gave up due to lack of comments. > >>> > >>> > >>>> If a node needs to lock a resource that it > >> doesn't > >>>> know about then it > >>>> hashes the name to get a directory node ID, than > >>>> asks that node for the > >>>> master node. if there is no master node (the > >>>> resource is not active) > >>>> then the requesting node is made master > >>>> > >>>> if the node does know the master, (other locks on > >>>> the resource exist) > >>>> then it will go straight to that master node. > >>> > >>> Thanks for the description. > >>> > >>> However, one point is still not clear to me is how > >> a > >>> node can conclude whether it __knows__ the lock > >>> resource or not? > >> A node knows the resource if it has a local copy. > >> It's as simple as that. > >> > > > > If the node is a human and has a brain, it can > > "immediately" recall that it knows the lock resouce. > > However, for a computer program, it does not "know" > > anything until it search the target in what it has on > > hand. > > > > Therefore, the point here is the __search__. What > > should the node search and in which order, and how it > > searches? > > > > If I missed anything, please kindly point out so that > > I can clarify my question as clear as possible. > > > > > > I think you're trying to make this more complicated than it is. As I've > said several times now, a node "knows" a resource if there is a local > lock on it. That's it! It's not more or less difficult than that, really > it isn't! If the node doesn't have a local lock on the resource then it > doesn't "know" it and has to ask the directory node where it is > mastered. (As I'm sure you already know, locks are known by their lock > ID numbers, so there's no "search" involved there either). > > There is no "search" for a lock around the cluster, that's what the > directory node provides. And as I have already said, that is located by > hashing the resource name to yield a node ID. > > So, if you like, the "search" you seem to be looking for is simply a > hash of the resource name. But it's not really a search, and it's only > invoked when the node first encounters a resource. > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Tue May 13 19:05:33 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 13 May 2008 14:05:33 -0500 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <4826D2D1.7010103@auckland.ac.nz> References: <4826D2D1.7010103@auckland.ac.nz> Message-ID: <4829E67D.2050602@gmail.com> Michael O'Sullivan wrote: > Hi everyone, > > I have set up a small experimental network with a linux cluster and > SAN that I want to have high data availability. There are 2 servers > that I have put into a cluster using conga (thank you luci and ricci). > There are 2 storage devices, each consisting of a basic server with 2 > x 1TB disks. The cluster servers and the storage devices each have 2 > NICs and are connected using 2 gigabit ethernet switches. It is a little bit hard to figure out the exact configuration based on this description (a diagram would help if you can). In general, I don't think GFS tuned well with iscsi, particularly the latency could spike if DLM traffic gets mingled with file data traffic, regardless your network bandwidth. However, I don't have enough data to support the speculation. It is also very application dependent. One key question is what kind of GFS applications you plan to dispatch in this environment ? I see you have a SAN here .. Any reason to choose iscsi over FC ? > > I have created a single striped logical volume on each storage device > using the 2 disks (to try and speed up I/O on the volume). These > volumes (one on each storage device) are presented to the cluster > servers using iSCSI (on the cluster servers) and iSCSI target (on the > storage devices). Since there are multiple NICs on the storage devices > I have set up two iSCSI portals to each logical volume. I have then > used mdadm to ensure the volumes are accessible via multipath. The iscsi target function is carried out by the storage device (firmware) or you use Linux's iscsi target ? > > Finally, since I want the storage devices to present the data in a > highly available way I have used mdadm to create a software raid-5 > across the two multipathed volumes (I realise this is essentially > mirroring on the 2 storage devices but I am trying to set this up to > be extensible to extra storage devices). My next step is to present > the raid array (of the two multipathed volumes - one on each storage > device) as a GFS to the cluster servers to ensure that locking of > access to the data is handled properly. So you're going to have CLVM built on top of software RAID ? .. This looks cumbersome. Again, a diagram could help people understand more. -- Wendy > > I have recently read that multipathing is possible within GFS, but > raid is not (yet). Since I want the two storage devices in a raid-5 > array and I am using iSCSI I'm not sure if I should try and use GFS to > do the multipathing. Also, being a linux/storage/clustering newbie I'm > not sure if my approach is the best thing to do. I want to make sure > that my system has no single point of failure that will make any of > the data inaccessible. I'm pretty sure our network design supports > this. I assume (if I configure it right) the cluster will ensure > services will keep going if one of the cluster servers goes down. Thus > the only weak point was the storage devices which I hope I have now > strengthened by essentially implementing network raid across iSCSI and > then presented as a single GFS. > > I would really appreciate comments/advice/constructive criticism as I > have really been learning much of this as I go. > > From rick.ochoa at gmail.com Tue May 13 20:47:55 2008 From: rick.ochoa at gmail.com (rick ochoa) Date: Tue, 13 May 2008 16:47:55 -0400 Subject: [Linux-cluster] GFS, Locking, Read-Only, and high processor loads Message-ID: Hi, I'm setting up a GFS implementation and was wondering what kind of tuning parameters I can set for both read-only and read-write. I work for a company that is migrating to a SAN, implementing GFS as the filesystem. We currently rsync our data from a master server to 5 front-end webservers running Apache and PHP. The rsyncs take an extraordinarily long time as our content (currently >2.5 million small files) grows, and does not scale very well as we add more front-end machines. Our thinking was to put content generated on two inward facing editorial machines on the SAN as read/write, and our web front- ends as read-only. All temporary files and logging would write to local disk. The goal of our initial work was to create this content filesystem, mount the disks, eliminate the rsyncs, and free up our rsync server for use as a slave database server. We used the Luci to configure a node and fencing on a new front-end, and formatted and configured our disk with it. Our deploy plan was to set this machine up, put it behind the load-balancer, and have it operate under normal load for a few days to "burn it in." Once complete, we would begin to migrate the other four front-ends over to the SAN, mounted RO after a reinstall of the OS. This procedure worked without too much issue until we hit the fourth machine in the cluster, where the cpu load went terrifyingly high and we got many "D" state httpd processes. Googling "uninterruptible sleep GFS php" I found references from 2006 about file locking with php and its use of flock() at the start of a session. The disks were remounted as "spectator" in an attempt to limit disk I/O on journals. This seemed to help, but as it was the end of the day seems a false positive. The next day, CPU load was again incredibly high, and after much flailing about we went back to local ext3 disks to buy us some time. I'm reading through this list, which is very informative. I'm attempting to tune our GFS mounts a bit, watching the output of gfs_tool counters on the filesystems, and looking for any anomalies. Here's a more detailed description of our setup: Our hardware configuration consists of a NexSAN SATABoy populated with 8 750GB disks (RAID 5/4.7Tb), and a Brocade Silkworm 3800 for data and fencing. We purchased QLogic single-port, 4Gb HBAs for our servers. (more info available on request) The RAID has 4 partitions, 2 are not mounted: local - (not mounted) 500GB, extents 4.0MB, block size 4KB, attributes -wi-ao, dlm lock protocol - mount /usr/local_san (rw) this is a copy of /usr/local, which can be synced to all hosts code - (not mounted) 500GB, extents 4.0MB, block size 4KB, attributes -wi-ao, dlm lock protocol - mount /web/code (rw) this is a copy of /huffpo/web/prod, without the www content and tmp trees tmp - 500GB, extents 4.0MB, block size 4KB, attributes -wi-a-, dlm lock protocol - mount /web/prod/tmp (rw) this is the temporary directory for front-end web code www - 2TB, extents 4MB, block size 4KB, attributes -wi-ao, dlm local protocol - mount /web/prod/www (ro) read-only content directory, 4 hosts, /etc/fstab options at the time were ro read/write on 1 host we have ~2 more TB available, currently not in use After reading the list a bit, I've come up with the following tunings for read-only: gfs_tool settune /web/prod/www/content glock_purge 80 gfs_tool settune /web/prod/www/content quota_account 0 gfs_tool settune /web/prod/www/content demote_secs 60 gfs_tool settune /web/prod/www/content scand_secs 30 /etc/fstab has spectator,noatime,num_glockd=32 as mount options And the read/write host has: gfs_tool settune /web/prod/www/content statfs_fast 1 /etc/fstab has num_glockd=32,noatime as mount options I've noticed using gfs_tool counters /web/prod/www/content usually has sub 80k locks for the read/write host running rsync, and sub 10k locks for the one (and only) read-only host, where previously the number of locks on all hosts numbered ~80k. Can I be a bit more aggressive with locks on read-only filesystems with the current tunings enabled? I'm not sure what the purpose of the locks on read-only filesystems serve in this instance. Is there a better configuration for heavy reads on a GFS filesystem that is read only? vmstat -d gives me for this filesystem: disk- ------------reads------------ ------------writes----------- ----- IO------ [...] sdc 411192 82490 3998862 7402555 607 645 10016 3837 0 695 My big fear is although the systems currently seem to be running without too much incident, as I add nodes back into the cluster the number of locks and system load will again run high. As we transition from using rsync to writing directly onto the SAN, the number of locks on rw hosts should go down because the spendy directory scans should be removed. Are there certain other optimizations I could use to lower the lock counts? From gordan at bobich.net Tue May 13 21:08:38 2008 From: gordan at bobich.net (Gordan Bobic) Date: Tue, 13 May 2008 22:08:38 +0100 Subject: [Linux-cluster] GFS, Locking, Read-Only, and high processor loads In-Reply-To: References: Message-ID: <482A0356.7070503@bobich.net> rick ochoa wrote: > I work for a company that is migrating to a SAN, implementing GFS as the > filesystem. We currently rsync our data from a master server to 5 > front-end webservers running Apache and PHP. The rsyncs take an > extraordinarily long time as our content (currently >2.5 million small > files) grows, and does not scale very well as we add more front-end > machines. Our thinking was to put content generated on two inward facing > editorial machines on the SAN as read/write, and our web front-ends as > read-only. All temporary files and logging would write to local disk. > The goal of our initial work was to create this content filesystem, > mount the disks, eliminate the rsyncs, and free up our rsync server for > use as a slave database server. You may have options that don't require SAN. If you're happy to continue with DAS (i.e. the cost of SAN doesn't exceed the cost of having separate disks in each machine with the number of machines you foresee using in the near future), you may do well with DRBD instead of a SAN. > We used the Luci to configure a node and fencing on a new front-end, and > formatted and configured our disk with it. Our deploy plan was to set > this machine up, put it behind the load-balancer, and have it operate > under normal load for a few days to "burn it in." Once complete, we > would begin to migrate the other four front-ends over to the SAN, > mounted RO after a reinstall of the OS. > > This procedure worked without too much issue until we hit the fourth > machine in the cluster, where the cpu load went terrifyingly high and we > got many "D" state httpd processes. Googling "uninterruptible sleep GFS > php" I found references from 2006 about file locking with php and its > use of flock() at the start of a session. The disks were remounted as > "spectator" in an attempt to limit disk I/O on journals. This seemed to > help, but as it was the end of the day seems a false positive. The next > day, CPU load was again incredibly high, and after much flailing about > we went back to local ext3 disks to buy us some time. If you have lots of I/O on lots of files in few directories, you may be out of luck. A lot of the overhead of GFS (or any similar FS) is unavoidable be - the locking between the nodes has to be synchronised for every file open. Mounting with noatime,nodiratime,noquota may help a bit, but you will never see performance with frequent access to lots of small files that gets anywhere near local disk performance. There are, however, other options. If DAS is an option for you (and it sounds like it is), look into GlusterFS. It's performance isn't great per se (may well be worse than GFS) if you use it the intended way, but you can use it as a file replication system. If you point your web directory directly at the file store (if you do this, you must be 100% sure that NOTHING you do to those files will involve any kind of writing, or things can get unpredictable and files can get corrupted). This means you'll get local disk performance with the advantage of not having to rsync the data. As long as all nodes are connected, the file changes on the master server will get sent out to the replicas. If you need to reboot a node, you'll need to ensure that it's consistent, which is done by forcing a resync by firing off a find to read the first byte of every file on the mount point. This will force the node to check that it's files are up to date against other nodes. Note that this will cause increased load on all the other nodes while it completes, so use with care. Gordan From kelsey.hightower at gmail.com Tue May 13 21:26:56 2008 From: kelsey.hightower at gmail.com (Kelsey Hightower) Date: Tue, 13 May 2008 17:26:56 -0400 Subject: [Linux-cluster] Complete cluster.conf Schema Description Message-ID: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com> Hello, I have been searching the web for weeks. I am trying to get the complete cluster.conf schema description. I have found a link that describes most of the options but it seems to omit the resources, services, and anything related to configuring failover services. http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Tue May 13 23:22:05 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 13 May 2008 18:22:05 -0500 Subject: [Linux-cluster] Why GFS is so slow? What it is waiting for? In-Reply-To: <792305.60086.qm@web32203.mail.mud.yahoo.com> References: <792305.60086.qm@web32203.mail.mud.yahoo.com> Message-ID: <482A229D.2040108@gmail.com> Ja S wrote: > Hi, Wendy: > > Thanks for your so prompt and kind explanation. It is > very helpful. According to your comments, I did > another test. See below: > > # stat abc/ > File: `abc/' > Size: 8192 Blocks: 6024 IO Block: > 4096 directory > Device: fc00h/64512d Inode: 1065226 Links: 2 > Access: (0770/drwxrwx---) Uid: ( 0/ root) > Gid: ( 0/ root) > Access: 2008-05-08 06:18:58.000000000 +0000 > Modify: 2008-04-15 03:02:24.000000000 +0000 > Change: 2008-04-15 07:11:52.000000000 +0000 > > # cd abc/ > # time ls | wc -l > 31764 > > real 0m44.797s > user 0m0.189s > sys 0m2.276s > > The real time in this test is much shorter than the > previous one. However, it is still reasonable long. As > you said, the ?ls? command only reads the single > directory file. In my case, the directory file itself > is only 8192 bytes. The time spent on disk IO should > be included in ?sys 0m2.276s?. Although DLM needs time > to lookup the location of the corresponding master > lock resource and to process locking, the system > should not take about 42 seconds to complete the ?ls? > command. So, what is the hidden issue or is there a > way to identify possible bottlenecks? > > IIRC, disk IO wait time is excluded from "sys", so you really can't conclude the lion share of your wall (real) time is due to DLM locking. We don't know for sure unless you can provide the relevant profiling data (try to learn how to use OProfile and/or SystemTap to see where exactly your system is waiting at). Latency issues like this is tricky. It would be foolish to conclude anything just by reading the command output without knowing the surrounding configuration and/or run time environment. If small file read latency is important to you, did you turn off storage device's readahead ? Did you try different Linux kernel elevator algorithms ? Did you make sure your other network traffic didn't block DLM traffic ? Be aware latency and bandwidth are two different things. A big and fat network link doesn't automatically imply a quick response time though it may carry more bandwidth. -- Wendy From jas199931 at yahoo.com Wed May 14 00:56:44 2008 From: jas199931 at yahoo.com (Ja S) Date: Tue, 13 May 2008 17:56:44 -0700 (PDT) Subject: [Linux-cluster] Locks reported by gfs_tool lockdump does not match that presented in dlm_locks. Any reason?? Message-ID: <389733.26852.qm@web32203.mail.mud.yahoo.com> Hi, All: For a given lock space, at the same time, I saved a copy of the output of ?gfs_tool lockdump? as ?gfs_locks? and a copy of dlm_locks. Then I checked the locks presents in the two saved files. I realized that the number of locks in gfs_locks is not the same as the locks presented in dlm_locks. For instance, >From dlm_locks: 9980 NL locks, where --7984 locks are from remote nodes --0 locks are on remote nodes --1996 locks are processed on its own master lock resources 0 CR locks, where --0 locks are from remote nodes --0 locks are on remote nodes --0 locks are processed on its own master lock resources 0 CW locks, where --0 locks are from remote nodes --0 locks are on remote nodes --0 locks are processed on its own master lock resources 1173 PR locks, where --684 locks are from remote nodes --32 locks are on remote nodes --457 locks are processed on its own master lock resources 0 PW locks, where --0 locks are from remote nodes --0 locks are on remote nodes --0 locks are processed on its own master lock resources 47 EX locks, where --46 locks are from remote nodes --0 locks are on remote nodes --1 locks are processed on its own master lock resources In summary, 11200 locks in total, where -- 8714 locks are from remote nodes (entries with ? Remote: ?) -- 32 locks are on remote nodes (entries with ? Master: ?) -- 2454 locks are processed on its own master lock resources (entries with only lock ID and lock mode) These locks are all in the granted queue. There is nothing under the conversion and waiting queues. ====================================== >From gfs_locks, there are 2932 locks in total, ( grep ?^Glock ? and count the entries). Then for each Glock I got the second number which is the ID of a lock resource, and searched the ID in dlm_locks. I then split the searched results into two groups as shown below: --46 locks are associated with local copies of master lock resources on remote nodes --2886 locks are associated with master lock resources on the node itself ====================================== Now, I tried to find the relationship between the five numbers from two sources but ended up nowhere. Dlm_locks: -- 8714 locks are from remote nodes -- 32 locks are on remote nodes -- 2454 locks are processed on its own master lock resources Gfs_locks: --46 locks are associated with local copies of master lock resources on remote nodes --2886 locks are associated with master lock resources on the node itself Can anyone kindly point out the relationships between the number of locks presented in dlm_locks and gfs_locks? Thanks for your time on reading this long question and look forward to your help. Jas From s.wendy.cheng at gmail.com Wed May 14 01:43:24 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Tue, 13 May 2008 20:43:24 -0500 Subject: [Linux-cluster] Locks reported by gfs_tool lockdump does not match that presented in dlm_locks. Any reason?? In-Reply-To: <389733.26852.qm@web32203.mail.mud.yahoo.com> References: <389733.26852.qm@web32203.mail.mud.yahoo.com> Message-ID: <482A43BC.8040807@gmail.com> Ja S wrote: > Hi, All: > > For a given lock space, at the same time, I saved a > copy of the output of ?gfs_tool lockdump? as > ?gfs_locks? and a copy of dlm_locks. > > Then I checked the locks presents in the two saved > files. I realized that the number of locks in > gfs_locks is not the same as the locks presented in > dlm_locks. > > For instance, > >From dlm_locks: > 9980 NL locks, where > --7984 locks are from remote nodes > --0 locks are on remote nodes > --1996 locks are processed on its own master lock > resources > 0 CR locks, where > --0 locks are from remote nodes > --0 locks are on remote nodes > --0 locks are processed on its own master lock > resources > 0 CW locks, where > --0 locks are from remote nodes > --0 locks are on remote nodes > --0 locks are processed on its own master lock > resources > 1173 PR locks, where > --684 locks are from remote nodes > --32 locks are on remote nodes > --457 locks are processed on its own master lock > resources > 0 PW locks, where > --0 locks are from remote nodes > --0 locks are on remote nodes > --0 locks are processed on its own master lock > resources > 47 EX locks, where > --46 locks are from remote nodes > --0 locks are on remote nodes > --1 locks are processed on its own master lock > resources > > In summary, > 11200 locks in total, where > -- 8714 locks are from remote nodes (entries with ? > Remote: ?) > -- 32 locks are on remote nodes (entries with ? > Master: ?) > -- 2454 locks are processed on its own master lock > resources (entries with only lock ID and lock mode) > > These locks are all in the granted queue. There is > nothing under the conversion and waiting queues. > ====================================== > > >From gfs_locks, there are 2932 locks in total, ( grep > ?^Glock ? and count the entries). Then for each Glock > I got the second number which is the ID of a lock > resource, and searched the ID in dlm_locks. I then > split the searched results into two groups as shown > below: > --46 locks are associated with local copies of master > lock resources on remote nodes > --2886 locks are associated with master lock resources > on the node itself > > > ====================================== > Now, I tried to find the relationship between the five > numbers from two sources but ended up nowhere. > Dlm_locks: > -- 8714 locks are from remote nodes > -- 32 locks are on remote nodes > -- 2454 locks are processed on its own master lock > resources > Gfs_locks: > --46 locks are associated with local copies of master > lock resources on remote nodes > --2886 locks are associated with master lock resources > on the node itself > > Can anyone kindly point out the relationships between > the number of locks presented in dlm_locks and > gfs_locks? > > > Thanks for your time on reading this long question and > look forward to your help. > > I doubt this will help anything from practical point of view.. understanding how to run Oprofile and/or SystemTap will probably help you more on the long run. However, if you want to know .. the following are why they are different: GFS locking is controlled by a subsysgtem called "glock". Glock is designed to run and interact with *different* distributed lock managers; e.g. in RHEL 3, other than DLM, it also works with another lock manager called "GULM". Only active locks has an one-to-one correspondence with the lock entities inside lock manager. If a glock is in UNLOCK state, lock manager may or may not have the subject lock in its record - they are subject to get purged depending on memory and/or resource pressure. The other way around is also true. A lock may exist in lock manager's database but it could have been removed from glock subsystem. Glock itself doesn't know about cluster configuration so it relies on external lock manager to do inter-node communication. On the other hand, it carries some other functions such as data flushing to disk when glock is demoted from exclusive (write) to shared (read). -- Wendy From jas199931 at yahoo.com Wed May 14 04:32:04 2008 From: jas199931 at yahoo.com (Ja S) Date: Tue, 13 May 2008 21:32:04 -0700 (PDT) Subject: [Linux-cluster] Locks reported by gfs_tool lockdump does not match that presented in dlm_locks. Any reason?? In-Reply-To: <482A43BC.8040807@gmail.com> Message-ID: <131844.26918.qm@web32208.mail.mud.yahoo.com> --- Wendy Cheng wrote: > Ja S wrote: > > Hi, All: > > > > For a given lock space, at the same time, I saved > a > > copy of the output of ?gfs_tool lockdump?as > > ?gfs_locks?and a copy of dlm_locks. > > > > Then I checked the locks presents in the two saved > > files. I realized that the number of locks in > > gfs_locks is not the same as the locks presented > in > > dlm_locks. > > > > For instance, > > >From dlm_locks: > > 9980 NL locks, where > > --7984 locks are from remote nodes > > --0 locks are on remote nodes > > --1996 locks are processed on its own master lock > > resources > > 0 CR locks, where > > --0 locks are from remote nodes > > --0 locks are on remote nodes > > --0 locks are processed on its own master lock > > resources > > 0 CW locks, where > > --0 locks are from remote nodes > > --0 locks are on remote nodes > > --0 locks are processed on its own master lock > > resources > > 1173 PR locks, where > > --684 locks are from remote nodes > > --32 locks are on remote nodes > > --457 locks are processed on its own master lock > > resources > > 0 PW locks, where > > --0 locks are from remote nodes > > --0 locks are on remote nodes > > --0 locks are processed on its own master lock > > resources > > 47 EX locks, where > > --46 locks are from remote nodes > > --0 locks are on remote nodes > > --1 locks are processed on its own master lock > > resources > > > > In summary, > > 11200 locks in total, where > > -- 8714 locks are from remote nodes (entries with > ?> > Remote: ? > > -- 32 locks are on remote nodes (entries with ?> > Master: ? > > -- 2454 locks are processed on its own master lock > > resources (entries with only lock ID and lock > mode) > > > > These locks are all in the granted queue. There is > > nothing under the conversion and waiting queues. > > ====================================== > > > > >From gfs_locks, there are 2932 locks in total, ( > grep > > ?^Glock ?and count the entries). Then for each > Glock > > I got the second number which is the ID of a lock > > resource, and searched the ID in dlm_locks. I then > > split the searched results into two groups as > shown > > below: > > --46 locks are associated with local copies of > master > > lock resources on remote nodes > > --2886 locks are associated with master lock > resources > > on the node itself > > > > > > ====================================== > > Now, I tried to find the relationship between the > five > > numbers from two sources but ended up nowhere. > > Dlm_locks: > > -- 8714 locks are from remote nodes > > -- 32 locks are on remote nodes > > -- 2454 locks are processed on its own master lock > > resources > > Gfs_locks: > > --46 locks are associated with local copies of > master > > lock resources on remote nodes > > --2886 locks are associated with master lock > resources > > on the node itself > > > > Can anyone kindly point out the relationships > between > > the number of locks presented in dlm_locks and > > gfs_locks? > > > > > > Thanks for your time on reading this long question > and > > look forward to your help. > > > > > I doubt this will help anything from practical point > of view.. > understanding how to run Oprofile and/or SystemTap > will probably help > you more on the long run. However, if you want to > know .. the following > are why they are different: > > GFS locking is controlled by a subsysgtem called > "glock". Glock is > designed to run and interact with *different* > distributed lock managers; > e.g. in RHEL 3, other than DLM, it also works with > another lock manager > called "GULM". Only active locks has an one-to-one > correspondence with > the lock entities inside lock manager. If a glock is > in UNLOCK state, > lock manager may or may not have the subject lock in > its record - they > are subject to get purged depending on memory and/or > resource pressure. > The other way around is also true. A lock may exist > in lock manager's > database but it could have been removed from glock > subsystem. Glock > itself doesn't know about cluster configuration so > it relies on external > lock manager to do inter-node communication. On the > other hand, it > carries some other functions such as data flushing > to disk when glock is > demoted from exclusive (write) to shared (read). Thanks for the explanation. It is very helpful. Jas > -- Wendy > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From ccaulfie at redhat.com Wed May 14 07:23:18 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Wed, 14 May 2008 08:23:18 +0100 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <394218.92537.qm@web32202.mail.mud.yahoo.com> References: <394218.92537.qm@web32202.mail.mud.yahoo.com> Message-ID: <482A9366.1080500@redhat.com> Ja S wrote: > --- Christine Caulfield wrote: > >> Ja S wrote: >>> --- Christine Caulfield >> wrote: >>>> Ja S wrote: >>>>> --- Christine Caulfield >>>> wrote: >>>>>> Ja S wrote: >>>>>>> Hi, All: >>>>>>> >>>>>>> >>>>>>> When an application on a cluster node A needs >> to >>>>>>> access a file on a SAN storage, how DLM >> process >>>>>> the >>>>>>> lock request? >>>>>>> >>>>>>> Should DLM firstly determine whether there >>>> already >>>>>>> exists a lock resource mapped to the file, by >>>>>> doing >>>>>>> the following things in the order 1) looking >> at >>>>>> the >>>>>>> master lock resources on the node A, 2) >>>> searching >>>>>> the >>>>>>> local copies of lock resources on the node A, >> 3) >>>>>>> searching the lock directory on the node A to >>>> find >>>>>> out >>>>>>> whether a master lock resource assosicated >> with >>>>>> the >>>>>>> file exists on another node, 4) sending >> messages >>>>>> to >>>>>>> other nodes in the cluster for the location of >>>> the >>>>>>> master lock resource? >>>>>>> >>>>>>> I ask this question because from some online >>>>>> articles, >>>>>>> it seems that DLM will always search the >>>>>> cluster-wide >>>>>>> lock directory across the whole cluster first >>>> to >>>>>> find >>>>>>> the location of the master lock resource. >>>>>>> >>>>>>> Can anyone kindly confirm the order of >> processes >>>>>> that >>>>>>> DLM does? >>>>>>> >>>>>> This should be very well documented, as it's >>>> common >>>>>> amongst DLM >>>>>> implementations. >>>>>> >>>>> I think I may be blind. I have not yet found a >>>>> document which describes the sequence of >> processes >>>> in >>>>> a precise way. I tried to read the source code >> but >>>> I >>>>> gave up due to lack of comments. >>>>> >>>>> >>>>>> If a node needs to lock a resource that it >>>> doesn't >>>>>> know about then it >>>>>> hashes the name to get a directory node ID, >> than >>>>>> asks that node for the >>>>>> master node. if there is no master node (the >>>>>> resource is not active) >>>>>> then the requesting node is made master >>>>>> >>>>>> if the node does know the master, (other locks >> on >>>>>> the resource exist) >>>>>> then it will go straight to that master node. >>>>> Thanks for the description. >>>>> >>>>> However, one point is still not clear to me is >> how >>>> a >>>>> node can conclude whether it __knows__ the lock >>>>> resource or not? >>>> A node knows the resource if it has a local copy. >>>> It's as simple as that. >>>> >>> If the node is a human and has a brain, it can >>> "immediately" recall that it knows the lock >> resouce. >>> However, for a computer program, it does not >> "know" >>> anything until it search the target in what it has >> on >>> hand. >>> >>> Therefore, the point here is the __search__. What >>> should the node search and in which order, and how >> it >>> searches? >>> >>> If I missed anything, please kindly point out so >> that >>> I can clarify my question as clear as possible. >>> >>> >> I think you're trying to make this more complicated >> than it is. > > > > Maybe, :-), Just want to know what exact happened. > > > >> As I've >> said several times now, a node "knows" a resource if >> there is a local >> lock on it. That's it! It's not more or less >> difficult than that, really >> it isn't! > > At the same time, there could be 30K local locks on a > node in our system. How are these local locks stored > or mapped, in a hash table, or a big but sparse array? >>From the source code, I guess the local locks are > stored in a list. Correct me if I am wrong since I > really have not yet studied the code very carefully. > > >> If the node doesn't have a local lock on >> the resource then it >> doesn't "know" it and has to ask the directory node >> where it is >> mastered. > > Does it mean even if the node owns the master lock > resource but it doesn't have a local lock associated > with the master lock resource, it still needs to ask > the directory node? > > > >> (As I'm sure you already know, locks are >> known by their lock >> ID numbers, so there's no "search" involved there >> either). > > True. When a request on a file has been issued, the > inode number of file (in hex) will be used to make up > the name of lock resource (the second number of the > name). > > It is true that the node has the list of lock > resources (either local copy or master copy) as long > as it has local locks. However, the node can just like > a teacher, who has a list of students and the students > are known by their names or student IDs. When the > teacher want to fill up the final grade for each > student, he still needs to look at the form and search > for the student name and put the grade beside the > name. The search can be done according to the student > ID if the form is sorted by the student ID or by the > student surname if the form is sorted by the surname. > Either way, the teacher still needs to __search__. > Same thing should be applied to the node. The node may > use a smart way to search the lock resources kept in > the list, possibly a hash function (but I doubt there > is a very good hash function which can find the > location of the target lock resource immediately). > > Am I still wrong? > >> There is no "search" for a lock around the cluster, >> that's what the >> directory node provides. And as I have already said, >> that is located by >> hashing the resource name to yield a node ID. > > Yes, yes, I think I didn't say it clearly. The lock > resource is located by hashing the resource name to > yield a node ID. But before hashing, the node still > needs to perform the search within the list or > whatever data strucute that keeps the local locks on > itself to find out whether the target lock resource is > already in use or "known". Isn't it? I am sorry it > seems that I am so stubborn. > > Thanks for your patient. You are a really good helper. > > Jas > >> So, if you like, the "search" you seem to be looking >> for is simply a >> hash of the resource name. But it's not really a >> search, and it's only >> invoked when the node first encounters a resource. >> > hash tables, hash tables, hash tables ;-) -- Chrissie From jas199931 at yahoo.com Wed May 14 08:51:08 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 01:51:08 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <482A9366.1080500@redhat.com> Message-ID: <560093.12792.qm@web32208.mail.mud.yahoo.com> > >> If the node doesn't have a local lock on > >> the resource then it > >> doesn't "know" it and has to ask the directory > >> node where it is mastered. > > Does it mean even if the node owns the master lock > > resource but it doesn't have a local lock > > associated with the master lock resource, it > > still needs to ask the directory node? > hash tables, hash tables, hash tables ;-) Sure. Now I see what do you mean "knows". Thanks. Could you please kindly answer my last question above? Best, Jas From jakub.suchy at enlogit.cz Wed May 14 08:59:41 2008 From: jakub.suchy at enlogit.cz (Jakub Suchy) Date: Wed, 14 May 2008 10:59:41 +0200 Subject: [Linux-cluster] SeznamFS Message-ID: <20080514085941.GA22634@localhost> Hi, seznam.cz (Czech competitor of Google) has announced it's "SeznamFS" - http://seznamfs.sourceforge.net/ -- cut -- SeznamFS is distributed binlogging filesystem based on FUSE. It works similar to MySQL, it creates a binary log containing all write operations and provides it to slaves as master. Every server has its own server ID and therefore it's possible to use master-master replication (with the same limitations as MySQL master-master replication has) or multimaster round replication. For more information have look at documentation. -- cut -- Seems interesting... Jakub Suchy -- Jakub Such? GSM: +420 - 777 817 949 Enlogit s.r.o, U Cukrovaru 509/4, 400 07 ?st? nad Labem tel.: +420 - 474 745 159, fax: +420 - 474 745 160 e-mail: info at enlogit.cz, web: http://www.enlogit.cz Energy & Logic in IT From ccaulfie at redhat.com Wed May 14 09:04:13 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Wed, 14 May 2008 10:04:13 +0100 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <560093.12792.qm@web32208.mail.mud.yahoo.com> References: <560093.12792.qm@web32208.mail.mud.yahoo.com> Message-ID: <482AAB0D.2020900@redhat.com> Ja S wrote: >>>> If the node doesn't have a local lock on >>>> the resource then it >>>> doesn't "know" it and has to ask the directory >>>> node where it is mastered. > >>> Does it mean even if the node owns the master lock >>> resource but it doesn't have a local lock >>> associated with the master lock resource, it >>> still needs to ask the directory node? > > >> hash tables, hash tables, hash tables ;-) > > Sure. Now I see what do you mean "knows". Thanks. > > Could you please kindly answer my last question above? The answer is "No" ... because it's in the resource hash table. ... see, I told you it was all hash tables ... Chrissie From gordan at bobich.net Wed May 14 09:05:27 2008 From: gordan at bobich.net (gordan at bobich.net) Date: Wed, 14 May 2008 10:05:27 +0100 (BST) Subject: [Linux-cluster] SeznamFS In-Reply-To: <20080514085941.GA22634@localhost> References: <20080514085941.GA22634@localhost> Message-ID: Sounds very much like MySQL FS. Is this an update to that project, a reinvention of that wheel, or something entirely different? Gordan On Wed, 14 May 2008, Jakub Suchy wrote: > Hi, > seznam.cz (Czech competitor of Google) has announced it's "SeznamFS" - > http://seznamfs.sourceforge.net/ > > -- cut -- > SeznamFS is distributed binlogging filesystem based on FUSE. It works > similar to MySQL, it creates a binary log containing all write > operations and provides it to slaves as master. Every server has its own > server ID and therefore it's possible to use master-master replication > (with the same limitations as MySQL master-master replication has) or > multimaster round replication. For more information have look at > documentation. > -- cut -- > > Seems interesting... > > Jakub Suchy > > -- > Jakub Such? > GSM: +420 - 777 817 949 > > Enlogit s.r.o, U Cukrovaru 509/4, 400 07 ?st? nad Labem > tel.: +420 - 474 745 159, fax: +420 - 474 745 160 > e-mail: info at enlogit.cz, web: http://www.enlogit.cz > > Energy & Logic in IT > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jas199931 at yahoo.com Wed May 14 09:31:15 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 02:31:15 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <482AAB0D.2020900@redhat.com> Message-ID: <951183.12102.qm@web32203.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > >>>> If the node doesn't have a local lock on > >>>> the resource then it > >>>> doesn't "know" it and has to ask the directory > >>>> node where it is mastered. > > > >>> Does it mean even if the node owns the master > lock > >>> resource but it doesn't have a local lock > >>> associated with the master lock resource, it > >>> still needs to ask the directory node? > > > > > >> hash tables, hash tables, hash tables ;-) > > > > Sure. Now I see what do you mean "knows". Thanks. > > > > Could you please kindly answer my last question > above? > > The answer is "No" ... because it's in the resource > hash table. > > ... see, I told you it was all hash tables ... > OK. Let's summarise what I have learned from you. If I am wrong, correct me please. A node has a hash table (HT1) which hold the master lock resources and local copies of master lock resources on remote nodes. It also has another hash table (HT2) which holds the content of the lock directory. When an application on a node A requests a lock on a file, DLM feeds the inode number of the file into a hash function and uses the returned hash value to check whether there is a corresponding lock resource record in the hash table HT1. If the record exists, DLM then processes the lock request on the lock resources (either master or local copy). If not, DLM feeds the inode number into another hash function to obtain a node ID (for example node B) which holds the master node information of the target lock resource. DLM then talks with node B and gets the master node ID (for example node C) from the hash table HT2 on node B. Finally, DLM gets the target lock resource from the hash table HT1 on the node C and processes the lock request. Am I right this time, or still missing something (a third hash table?) ? Best, Jas > > Chrissie > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From fdinitto at redhat.com Wed May 14 09:31:21 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Wed, 14 May 2008 11:31:21 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.99.01 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 2st release from the master branch: 2.99.01. GFS1 is *known to be broken* in this release, do _NOT_ use! The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. You have been warned: *this code will have no mercy* for your servers and your data. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.01 release you will need: - - openais 0.83 or higher - - linux kernel (git snapshot or 2.6.26-rc2) from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (but can run on 2.6.25 in compatibility mode) NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We are still shipping shared libraries but remember that they can change anytime without warning. A bunch of new shared libraries have been added and more will come. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.01.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.00): Christine Caulfield (4): [CMAN] Remove external dependancies from config modules [CMAN] Fix localhost checking that I broke last week. [CMAN] make qdisk compile on i386 [CMAN] fix cman_tool join -X David Teigland (18): fence: fence_tool list and fenced_domain_nodes() fence_tool: fix list command libdlm: use linux/dlm.h from 2.6.26-rc libdlmcontrol: new lib interface to dlm_controld dlm_controld: fix build problems in previous commit libdlmcontrol: filling out code dlm_controld: filling out code dlm_controld: code for info/debug queries dlm_tool: add libdlmcontrol query commands daemons: mostly daemonization stuff daemons: queries dlm_controld: fix waiting for removed node dlm_controld: options to disable fencing/quorum dependency dlm_controld: dlm_tool query fixes dlm_tool: refine list output dlm_controld: remove unworking re-merge detection dlm_controld/gfs_controld: ignore write(2) return value on plock dev dlm_controld: use started_count to detect remerges Fabio M. Di Nitto (19): [BUILD] Change build system to cope with new libdlmcontrol libdlm: fix libdlmcontrol in Makefile [CMAN] Do not query ccs as it might not be the right config plugin [CCS] Detach dependency on ccsd to run the cluster [CCS] Fix build with gcc-4.3 [CMAN] Set default syslog facility at build time [BUILD] Allow users to set path to init.d [MISC] Fix build errors with Fedora default build options [MISC] Fix more build errors with Fedora default build options [MISC] Fix even more build errors with Fedora default build options [BUILD] Fix install when building from a separate tree [MISC] Fix some gfs2 build warnings [BUILD] Require 2.6.26 kernel to build [GNBD] Update gnbd to work with 2.6.26 [GFS] Make gfs build with 2.6.26 (DO NOT USE!) [RGMANAGER] ^M's are good for DOS, bad for UNIX [BUILD] Move fencelib in /usr/share [MISC] Cast some love to init scripts [CMAN] Fix path to cman_tool Lon Hohberger (2): [cman] Close sockets in error state in gfs_controld / dlmtest2 / groupd test [rgmanager] Fix #441582 - symlinks in mount points causing failures Marc - A. Dahlhaus (1): [MISC] Add version string to -V options of dlm_tool and group deamons Marek 'marx' Grac (2): [FENCE] SSH support using stdin options [FENCE] Fix #435154: Support for 24 port APC fencing device ccs/Makefile | 2 +- ccs/ccs_test/Makefile | 2 +- ccs/ccs_test/ccs_test.c | 73 +- ccs/ccs_tool/Makefile | 11 +- ccs/ccs_tool/update.c | 2 +- ccs/ccsais/Makefile | 13 +- ccs/ccsais/config.c | 19 +- ccs/daemon/Makefile | 7 +- ccs/daemon/ccsd.c | 3 +- ccs/lib/Makefile | 40 - ccs/lib/ccs.h | 25 - ccs/lib/libccs.c | 764 -------------- ccs/libccscompat/Makefile | 37 + ccs/libccscompat/libccscompat.c | 764 ++++++++++++++ ccs/libccscompat/libccscompat.h | 29 + ccs/libccsconfdb/Makefile | 56 + ccs/libccsconfdb/ccs.h | 27 + ccs/libccsconfdb/libccs.c | 576 +++++++++++ cman/cman_tool/Makefile | 4 +- cman/cman_tool/cman_tool.h | 4 +- cman/cman_tool/join.c | 21 +- cman/cman_tool/main.c | 18 +- cman/daemon/Makefile | 3 +- cman/daemon/ais.c | 6 +- cman/daemon/cman-preconfig.c | 282 ++++-- cman/daemon/cman.h | 2 +- cman/daemon/nodelist.h | 1 - cman/init.d/cman.in | 25 +- cman/init.d/qdiskd | 19 +- cman/qdisk/daemon_init.c | 13 +- cman/qdisk/main.c | 4 +- cman/qdisk/scandisk.c | 20 +- configure | 43 +- dlm/Makefile | 2 +- dlm/lib/51-dlm.rules | 4 - dlm/lib/Makefile | 84 -- dlm/lib/libaislock.c | 468 --------- dlm/lib/libaislock.h | 190 ---- dlm/lib/libdlm.c | 1541 ---------------------------- dlm/lib/libdlm.h | 296 ------ dlm/lib/libdlm_internal.h | 9 - dlm/libdlm/51-dlm.rules | 4 + dlm/libdlm/Makefile | 84 ++ dlm/libdlm/libaislock.c | 468 +++++++++ dlm/libdlm/libaislock.h | 190 ++++ dlm/libdlm/libdlm.c | 1540 ++++++++++++++++++++++++++++ dlm/libdlm/libdlm.h | 296 ++++++ dlm/libdlm/libdlm_internal.h | 9 + dlm/libdlmcontrol/Makefile | 53 + dlm/libdlmcontrol/libdlmcontrol.h | 108 ++ dlm/libdlmcontrol/main.c | 416 ++++++++ dlm/tests/usertest/Makefile | 2 +- dlm/tests/usertest/dlmtest2.c | 2 +- dlm/tool/Makefile | 11 +- dlm/tool/main.c | 362 ++++++-- fence/agents/apc/fence_apc.py | 34 +- fence/agents/lib/fencing.py.py | 4 +- fence/agents/rackswitch/do_rack.c | 20 +- fence/agents/scsi/scsi_reserve | 26 +- fence/agents/xvm/fence_xvm.c | 4 +- fence/agents/xvm/fence_xvmd.c | 6 +- fence/agents/xvm/xml.c | 2 +- fence/fence_tool/fence_tool.c | 95 ++- fence/fenced/cpg.c | 45 +- fence/fenced/fd.h | 14 +- fence/fenced/fenced.h | 6 +- fence/fenced/group.c | 8 +- fence/fenced/main.c | 103 ++- fence/libfence/agent.c | 9 +- fence/libfenced/libfenced.h | 7 +- fence/libfenced/main.c | 46 +- gfs-kernel/src/gfs/ops_address.c | 2 +- gfs-kernel/src/gfs/ops_super.c | 6 +- gfs-kernel/src/gfs/quota.c | 4 +- gfs2/init.d/gfs2 | 13 +- gfs2/libgfs2/gfs2_log.c | 5 +- gfs2/mkfs/main_mkfs.c | 3 +- gfs2/mount/mtab.c | 5 +- gfs2/tool/sb.c | 3 +- gnbd-kernel/src/gnbd.c | 91 +- gnbd-kernel/src/gnbd.h | 4 +- group/daemon/cman.c | 4 +- group/daemon/cpg.c | 2 +- group/daemon/main.c | 18 +- group/dlm_controld/Makefile | 10 +- group/dlm_controld/action.c | 2 +- group/dlm_controld/config.c | 10 + group/dlm_controld/config.h | 6 + group/dlm_controld/cpg.c | 390 ++++++-- group/dlm_controld/deadlock.c | 10 +- group/dlm_controld/dlm_controld.h | 35 +- group/dlm_controld/dlm_daemon.h | 43 +- group/dlm_controld/group.c | 28 +- group/dlm_controld/main.c | 497 ++++++++-- group/dlm_controld/plock.c | 45 +- group/gfs_controld/lock_dlm.h | 1 - group/gfs_controld/main.c | 41 +- group/gfs_controld/plock.c | 6 +- group/test/clientd.c | 2 +- group/tool/main.c | 14 +- make/defines.mk.input | 3 + make/install.mk | 8 +- make/uninstall.mk | 2 +- rgmanager/init.d/rgmanager.in | 15 +- rgmanager/src/clulib/cman.c | 6 +- rgmanager/src/clulib/daemon_init.c | 14 +- rgmanager/src/clulib/msg_cluster.c | 26 +- rgmanager/src/clulib/msgtest.c | 3 +- rgmanager/src/daemons/clurmtabd_lib.c | 2 +- rgmanager/src/daemons/main.c | 3 +- rgmanager/src/resources/ASEHAagent.sh | 1786 ++++++++++++++++---------------- rgmanager/src/resources/clusterfs.sh | 2 +- rgmanager/src/resources/fs.sh | 2 +- rgmanager/src/resources/netfs.sh | 2 +- 114 files changed, 7567 insertions(+), 5090 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSCqxcwgUGcMLQ3qJAQIjKg/5Ae9UJ+cpRoc2szSFhlcLyvoo5plIumjn lQN1v3+yBPO8ZKw75flqGHkMbVF2fv8UMyHSyoaKiNOxRtwomwguM82nd67kbP2a k7C4alAa2HzF4qkbtxCoML4TQfY7ZrzYbnY3CPyXSyzCw/GZnIn/JzoglgkKu+Xn t2DExSo42YDMbE53oQn32iqDZnGbJUEbsB8XD3fH5l/whoGW4cbBAeKgITLuNXDl c+EwxQt2aU3XyRlAeCv3MqKgRlqzB43OBWBx4qcw1VqRR/OYyO90/5XMoroqyA4m IdRAFf9Ex7TdrFnopEt+zjfcCvPW3/nk969cbzWVGs31AqTIlbHKT9F/tf8sl6xm Tm5nD5N+J64Zb7IDKCGOrRarSIydP9bXNDmkYZ4Ak1LAN2eB3w60uR9OLH66ADiS EaF5hbb5bXuaDIrVBYeLtkja1VgorA1RRcZ6QEKBlrvbaBrbJIPmgpwnD6WwMt5H 03EJ2JK8g5vEOL9z5+ylalR/EJw1DrKwyClsabvLQoIdwnP2urush3rWIaCdj3K9 qeVIBEFz/J6PQCPXbNlzth5pgEs58Hhw+F1i8Z/JJUCEUUDIUaqz6FHE7s5U6c6A wlR2VJi50e9GJ0oULnZr/ehwlS4u/WknG4GpVvUtWmztOsFZQQaQc49ohSOTQ9Bf 7XzxkkVLxnI= =RAk5 -----END PGP SIGNATURE----- From ccaulfie at redhat.com Wed May 14 09:39:40 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Wed, 14 May 2008 10:39:40 +0100 Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <951183.12102.qm@web32203.mail.mud.yahoo.com> References: <951183.12102.qm@web32203.mail.mud.yahoo.com> Message-ID: <482AB35C.7090202@redhat.com> Ja S wrote: > --- Christine Caulfield wrote: > >> Ja S wrote: >>>>>> If the node doesn't have a local lock on >>>>>> the resource then it >>>>>> doesn't "know" it and has to ask the directory >>>>>> node where it is mastered. >>>>> Does it mean even if the node owns the master >> lock >>>>> resource but it doesn't have a local lock >>>>> associated with the master lock resource, it >>>>> still needs to ask the directory node? >>> >>>> hash tables, hash tables, hash tables ;-) >>> Sure. Now I see what do you mean "knows". Thanks. >>> >>> Could you please kindly answer my last question >> above? >> >> The answer is "No" ... because it's in the resource >> hash table. >> >> ... see, I told you it was all hash tables ... >> > > OK. Let's summarise what I have learned from you. If I > am wrong, correct me please. > > > A node has a hash table (HT1) which hold the master > lock resources and local copies of master lock > resources on remote nodes. It also has another hash > table (HT2) which holds the content of the lock > directory. > > When an application on a node A requests a lock on a > file, DLM feeds the inode number of the file into a > hash function and uses the returned hash value to > check whether there is a corresponding lock resource > record in the hash table HT1. If the record exists, > DLM then processes the lock request on the lock > resources (either master or local copy). > > If not, DLM feeds the inode number into another hash > function to obtain a node ID (for example node B) > which holds the master node information of the target > lock resource. DLM then talks with node B and gets the > master node ID (for example node C) from the hash > table HT2 on node B. Finally, DLM gets the target lock > resource from the hash table HT1 on the node C and > processes the lock request. > > Am I right this time, or still missing something (a > third hash table?) ? > No, that's correct. It's missing a lot of detail, but the overview is fair. There's a conflation you've done there that is OK for a simplisitic discussion of GFS but hides an important abstraction. The DLM does not deal in inode numbers, it only deals in resource names. The application that uses the DLM (this includes GFS) decides what the resource names are. GFS uses some system I don't know about but looks like it might include the inode number. clvmd (for example) uses LV UUIDs or VG names for its resource names for instance. These resources are isolated from each other in separate lockspaces. Lockspace is a mandatory parameter to all locking calls. Chrissie From jas199931 at yahoo.com Wed May 14 09:57:26 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 02:57:26 -0700 (PDT) Subject: [Linux-cluster] What is the order of processing a lock request? In-Reply-To: <482AB35C.7090202@redhat.com> Message-ID: <306033.27076.qm@web32207.mail.mud.yahoo.com> --- Christine Caulfield wrote: > Ja S wrote: > > --- Christine Caulfield > wrote: > > > >> Ja S wrote: > >>>>>> If the node doesn't have a local lock on > >>>>>> the resource then it > >>>>>> doesn't "know" it and has to ask the > directory > >>>>>> node where it is mastered. > >>>>> Does it mean even if the node owns the master > >> lock > >>>>> resource but it doesn't have a local lock > >>>>> associated with the master lock resource, it > >>>>> still needs to ask the directory node? > >>> > >>>> hash tables, hash tables, hash tables ;-) > >>> Sure. Now I see what do you mean "knows". > Thanks. > >>> > >>> Could you please kindly answer my last question > >> above? > >> > >> The answer is "No" ... because it's in the > resource > >> hash table. > >> > >> ... see, I told you it was all hash tables ... > >> > > > > OK. Let's summarise what I have learned from you. > If I > > am wrong, correct me please. > > > > > > A node has a hash table (HT1) which hold the > master > > lock resources and local copies of master lock > > resources on remote nodes. It also has another > hash > > table (HT2) which holds the content of the lock > > directory. > > > > When an application on a node A requests a lock on > a > > file, DLM feeds the inode number of the file into > a > > hash function and uses the returned hash value to > > check whether there is a corresponding lock > resource > > record in the hash table HT1. If the record > exists, > > DLM then processes the lock request on the lock > > resources (either master or local copy). > > > > If not, DLM feeds the inode number into another > hash > > function to obtain a node ID (for example node B) > > which holds the master node information of the > target > > lock resource. DLM then talks with node B and gets > the > > master node ID (for example node C) from the hash > > table HT2 on node B. Finally, DLM gets the target > lock > > resource from the hash table HT1 on the node C and > > processes the lock request. > > > > Am I right this time, or still missing something > (a > > third hash table?) ? > > > > No, that's correct. It's missing a lot of detail, > but the overview is fair. > > There's a conflation you've done there that is OK > for a simplisitic > discussion of GFS but hides an important > abstraction. > > The DLM does not deal in inode numbers, it only > deals in resource names. > The application that uses the DLM (this includes > GFS) decides what the > resource names are. GFS uses some system I don't > know about but looks > like it might include the inode number. clvmd (for > example) uses LV > UUIDs or VG names for its resource names for > instance. > > These resources are isolated from each other in > separate lockspaces. > Lockspace is a mandatory parameter to all locking > calls. Clear. Great thanks to your detailed explanation. All the best, Jas From jas199931 at yahoo.com Wed May 14 12:23:29 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 05:23:29 -0700 (PDT) Subject: [Linux-cluster] which journaling file system is used in GFS? Message-ID: <314708.36071.qm@web32207.mail.mud.yahoo.com> Hi, All: >From some online articles, in ext3, there are journal, ordered, and writeback three types of journaling file systems. Also in ext3, we can attach the journaling file system to the journal block device located on a different partition. I have not yet found related information for GFS. My questions are: 1. Does GFS also support the three types of journaling file systems? If not, what journaling file system is used in GFS? 2. What command I can use to find out which type of journaling file system is used in the existing GFS file sytem? 3. When updating journal files, does DLM process locks on the journal files as well? 4. Can I attach the journaling file system to the journal block device located on a different LUN for GFS (just like ext3)? Thanks in advance, Jas From mpartio at gmail.com Wed May 14 12:32:34 2008 From: mpartio at gmail.com (Mikko Partio) Date: Wed, 14 May 2008 15:32:34 +0300 Subject: [Linux-cluster] kmod-gfs removed Message-ID: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com> Hello the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package. What's up with this? Regards Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From s.wendy.cheng at gmail.com Wed May 14 15:01:16 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 14 May 2008 11:01:16 -0400 Subject: [Linux-cluster] which journaling file system is used in GFS? In-Reply-To: <314708.36071.qm@web32207.mail.mud.yahoo.com> References: <314708.36071.qm@web32207.mail.mud.yahoo.com> Message-ID: <482AFEBC.90707@gmail.com> Ja S wrote: > Hi, All: > > >From some online articles, in ext3, there are journal, > ordered, and writeback three types of journaling file > systems. Also in ext3, we can attach the journaling > file system to the journal block device located on a > different partition. > GFS *is* a journaling filesystem, same as EXT3. All journaling filesystem has journal(s) which is (are) almost an equivalence of database logging. The internal logic of journaling could be different and we call it journaling "mode". > I have not yet found related information for GFS. > > My questions are: > > 1. Does GFS also support the three types of journaling > file systems? If not, what journaling file system is > used in GFS? > So please don't use "journaling file system" to describe journal. Practically, GFS has only one type of journaling (write-back) but it supports data journaling thru "gfs_tool setflag" command (see "man gfs_tool). GFS2 has improved this by moving the "setflag" command into mount command (so it is less confusing) and has been designed to use three journaling modes (write-back, order-write, and data journaling, with order-write as its default). It (GFS2), however, doesn't allow external journaling devices yet. I understand moving ext3 journal into an external device and/or moving journaling mode from its default (order write) into "write back" can significantly lift its performance. These tricks can *not* be applied to GFS. -- Wendy From lhh at redhat.com Wed May 14 16:29:40 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 14 May 2008 12:29:40 -0400 Subject: [Linux-cluster] Oracle Shared-Nothing In-Reply-To: References: <1210612368.10406.50.camel@ayanami.boston.devel.redhat.com> Message-ID: <1210782580.13237.48.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-05-13 at 07:51 +0100, Stephen Nelson-Smith wrote: > The client is dead set against a RAID array, partly on cost (budget > v.tight), but also on physical space in the rack - there's only 2U > left, and a new rack costs ?1000 pcm. Someday.... =) However, I can't attest to the stability of Oracle on DRBD. I would try with an evaluation license or developer license of Oracle 10g first before deploying in production. -- Lon From jas199931 at yahoo.com Wed May 14 20:44:49 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 13:44:49 -0700 (PDT) Subject: [Linux-cluster] which journaling file system is used in GFS? In-Reply-To: <482AFEBC.90707@gmail.com> Message-ID: <240117.69840.qm@web32202.mail.mud.yahoo.com> --- Wendy Cheng wrote: > Ja S wrote: > > Hi, All: > > > > >From some online articles, in ext3, there are > journal, > > ordered, and writeback three types of journaling > file > > systems. Also in ext3, we can attach the > journaling > > file system to the journal block device located > on a > > different partition. > > > > GFS *is* a journaling filesystem, same as EXT3. All > journaling > filesystem has journal(s) which is (are) almost an > equivalence of > database logging. The internal logic of journaling > could be different > and we call it journaling "mode". > > I have not yet found related information for GFS. > > > > My questions are: > > > > 1. Does GFS also support the three types of > journaling > > file systems? If not, what journaling file system > is > > used in GFS? > > > So please don't use "journaling file system" to > describe journal. > Practically, GFS has only one type of journaling > (write-back) but it > supports data journaling thru "gfs_tool setflag" > command (see "man > gfs_tool). GFS2 has improved this by moving the > "setflag" command into > mount command (so it is less confusing) and has been > designed to use > three journaling modes (write-back, order-write, and > data journaling, > with order-write as its default). It (GFS2), > however, doesn't allow > external journaling devices yet. > > I understand moving ext3 journal into an external > device and/or moving > journaling mode from its default (order write) into > "write back" can > significantly lift its performance. These tricks can > *not* be applied to > GFS. Thank you very much indeed for the clarification. Jas From lhh at redhat.com Wed May 14 21:52:40 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 14 May 2008 17:52:40 -0400 Subject: [Linux-cluster] Complete cluster.conf Schema Description In-Reply-To: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com> References: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com> Message-ID: <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-05-13 at 17:26 -0400, Kelsey Hightower wrote: > Hello, > > > I have been searching the web for weeks. I am trying to get the > complete cluster.conf schema description. I have found a link that > describes most of the options but it seems to omit the resources, > services, and anything related to configuring failover services. > > > http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html Try this: http://people.redhat.com/lhh/ra-info-0.1.tar.gz $ sha256sum ra-info-0.1.tar.gz 7c34a082cc88d9f976a544b3e19be56ef8cf73a1a8c395178b40f16e0f16ad5d ra-info-0.1.tar.gz It's a simple XSLT program that translates Resource Agent metadata to HTML and fires up a text web-browser to look at it. There's probably lots of typos in the RA metadata. Feedback is appreciated. Basically; tar -xzvf ra-info-0.1.tar.gz cd ra-info-0.1 ./ra-info /usr/share/cluster/service.sh [or whatever agent you want] I'll generate web pages for all the agents later. This is a start, though... :) -- Lon From lhh at redhat.com Wed May 14 22:00:32 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 14 May 2008 18:00:32 -0400 Subject: [Linux-cluster] Complete cluster.conf Schema Description In-Reply-To: <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com> References: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com> <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com> Message-ID: <1210802432.13237.58.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-05-14 at 17:52 -0400, Lon Hohberger wrote: > On Tue, 2008-05-13 at 17:26 -0400, Kelsey Hightower wrote: > > Hello, > > > > > > I have been searching the web for weeks. I am trying to get the > > complete cluster.conf schema description. I have found a link that > > describes most of the options but it seems to omit the resources, > > services, and anything related to configuring failover services. > > > > > > http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html > > Try this: > > http://people.redhat.com/lhh/ra-info-0.1.tar.gz > > $ sha256sum ra-info-0.1.tar.gz > 7c34a082cc88d9f976a544b3e19be56ef8cf73a1a8c395178b40f16e0f16ad5d > ra-info-0.1.tar.gz Had a bug - try 0.2: http://people.redhat.com/lhh/ra-info-0.2.tar.gz $ sha256sum ra-info-0.2.tar.gz 6d8a40ae8a6a4006406ff07186331f22279e9aa796f0bedbcd592f6d7c62e856 ra-info-0.2.tar.gz Example output looks like this: http://people.redhat.com/lhh/service.sh.html -- Lon From jas199931 at yahoo.com Wed May 14 22:04:04 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 15:04:04 -0700 (PDT) Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks on journals? In-Reply-To: <482AFEBC.90707@gmail.com> Message-ID: <647561.52231.qm@web32208.mail.mud.yahoo.com> Hi, All: Just want to get a clarification. When using GFS+DLM, will the locks of journals be managed also by DLM in the same way as that for normal data files? Thanks in advance. Jas From gordan at bobich.net Wed May 14 22:21:56 2008 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 14 May 2008 23:21:56 +0100 Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks on journals? In-Reply-To: <647561.52231.qm@web32208.mail.mud.yahoo.com> References: <647561.52231.qm@web32208.mail.mud.yahoo.com> Message-ID: <482B6604.7020608@bobich.net> Ja S wrote: > Hi, All: > > Just want to get a clarification. > > When using GFS+DLM, will the locks of journals be > managed also by DLM in the same way as that for normal > data files? My understanding is that there is no locking on the journals across nodes except when a node gets fenced and it's journal needs to be replayed to ensure data is consistent. Each node has it's own journal. Gordan From jas199931 at yahoo.com Wed May 14 22:27:55 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 15:27:55 -0700 (PDT) Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks on journals? In-Reply-To: <482B6604.7020608@bobich.net> Message-ID: <217848.54530.qm@web32202.mail.mud.yahoo.com> --- Gordan Bobic wrote: > Ja S wrote: > > Hi, All: > > > > Just want to get a clarification. > > > > When using GFS+DLM, will the locks of journals be > > managed also by DLM in the same way as that for > normal > > data files? > > My understanding is that there is no locking on the > journals across > nodes except when a node gets fenced and it's > journal needs to be > replayed to ensure data is consistent. Each node has > it's own journal. Thanks. Then what this "can't acquire the journal glock:" error is about? Jas > Gordan > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From gordan at bobich.net Wed May 14 22:32:21 2008 From: gordan at bobich.net (Gordan Bobic) Date: Wed, 14 May 2008 23:32:21 +0100 Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks on journals? In-Reply-To: <217848.54530.qm@web32202.mail.mud.yahoo.com> References: <217848.54530.qm@web32202.mail.mud.yahoo.com> Message-ID: <482B6875.8090401@bobich.net> Ja S wrote: > --- Gordan Bobic wrote: > >> Ja S wrote: >>> Hi, All: >>> >>> Just want to get a clarification. >>> >>> When using GFS+DLM, will the locks of journals be >>> managed also by DLM in the same way as that for >> normal >>> data files? >> My understanding is that there is no locking on the >> journals across >> nodes except when a node gets fenced and it's >> journal needs to be >> replayed to ensure data is consistent. Each node has >> it's own journal. > > Thanks. Then what this "can't acquire the journal > glock:" error is about? I think the journals are allocated on a first-come first-served basis to the nodes as they connect to the shared storage. Each node locks it's own journal to ensure that it is marked as "in use". That's why you'll see that message at mount time. But I don't think there is any journal locking going on under normal operation. Gordan From cfeist at redhat.com Wed May 14 22:33:24 2008 From: cfeist at redhat.com (Chris Feist) Date: Wed, 14 May 2008 17:33:24 -0500 Subject: [Linux-cluster] kmod-gfs removed In-Reply-To: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com> References: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com> Message-ID: <482B68B4.1010402@redhat.com> Mikko Partio wrote: > Hello > > the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package. > What's up with this? How does this happen? When you're doing a 'yum update', just install the kernel? Can you post the output of the commands that try to remove kmod-gfs? Thanks, Chris > > Regards > > Mikko > > > > ------------------------------------------------------------------------ > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From kelsey.hightower at gmail.com Wed May 14 23:38:20 2008 From: kelsey.hightower at gmail.com (Kelsey Hightower) Date: Wed, 14 May 2008 19:38:20 -0400 Subject: [Linux-cluster] Complete cluster.conf Schema Description In-Reply-To: <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com> References: <1242753f0805131426v5f4b2618rde67360903345f9e@mail.gmail.com> <1210801960.13237.56.camel@ayanami.boston.devel.redhat.com> Message-ID: <1242753f0805141638v5b2c42f0q5c3cc0910fa4f6a4@mail.gmail.com> This was what I was looking for, thanks a lot. On Wed, May 14, 2008 at 5:52 PM, Lon Hohberger wrote: > > On Tue, 2008-05-13 at 17:26 -0400, Kelsey Hightower wrote: > > Hello, > > > > > > I have been searching the web for weeks. I am trying to get the > > complete cluster.conf schema description. I have found a link that > > describes most of the options but it seems to omit the resources, > > services, and anything related to configuring failover services. > > > > > > http://sources.redhat.com/cluster/doc/cluster_schema_rhel5.html > > Try this: > > http://people.redhat.com/lhh/ra-info-0.1.tar.gz > > $ sha256sum ra-info-0.1.tar.gz > 7c34a082cc88d9f976a544b3e19be56ef8cf73a1a8c395178b40f16e0f16ad5d > ra-info-0.1.tar.gz > > It's a simple XSLT program that translates Resource Agent metadata to > HTML and fires up a text web-browser to look at it. > > There's probably lots of typos in the RA metadata. Feedback is > appreciated. > > Basically; > > tar -xzvf ra-info-0.1.tar.gz > cd ra-info-0.1 > ./ra-info /usr/share/cluster/service.sh [or whatever agent you want] > > I'll generate web pages for all the agents later. This is a start, > though... :) > > -- Lon > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jas199931 at yahoo.com Thu May 15 00:27:37 2008 From: jas199931 at yahoo.com (Ja S) Date: Wed, 14 May 2008 17:27:37 -0700 (PDT) Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks on journals? In-Reply-To: <482B6875.8090401@bobich.net> Message-ID: <852258.61230.qm@web32208.mail.mud.yahoo.com> --- Gordan Bobic wrote: > Ja S wrote: > > --- Gordan Bobic wrote: > > > >> Ja S wrote: > >>> Hi, All: > >>> > >>> Just want to get a clarification. > >>> > >>> When using GFS+DLM, will the locks of journals > be > >>> managed also by DLM in the same way as that for > >> normal > >>> data files? > >> My understanding is that there is no locking on > the > >> journals across > >> nodes except when a node gets fenced and it's > >> journal needs to be > >> replayed to ensure data is consistent. Each node > has > >> it's own journal. > > > > Thanks. Then what this "can't acquire the journal > > glock:" error is about? > > I think the journals are allocated on a first-come > first-served basis to > the nodes as they connect to the shared storage. > Each node locks it's > own journal to ensure that it is marked as "in use". > That's why you'll > see that message at mount time. But I don't think > there is any journal > locking going on under normal operation. Great thanks again. Jas From mpartio at gmail.com Thu May 15 05:43:14 2008 From: mpartio at gmail.com (Mikko Partio) Date: Thu, 15 May 2008 08:43:14 +0300 Subject: [Linux-cluster] kmod-gfs removed In-Reply-To: <482B68B4.1010402@redhat.com> References: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com> <482B68B4.1010402@redhat.com> Message-ID: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> On Thu, May 15, 2008 at 1:33 AM, Chris Feist wrote: > Mikko Partio wrote: > >> Hello >> >> the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package. >> What's up with this? >> > > How does this happen? When you're doing a 'yum update', just install the > kernel? Can you post the output of the commands that try to remove > kmod-gfs? > sh-3.1# uname -a Linux node1 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux sh-3.1# yum check-update kernel.x86_64 2.6.18-53.1.19.el5 updates kernel-devel.x86_64 2.6.18-53.1.19.el5 updates kernel-headers.x86_64 2.6.18-53.1.19.el5 updates sh-3.1# yum update ============================================================================= Package Arch Version Repository Size ============================================================================= Installing: kernel x86_64 2.6.18-53.1.19.el5 updates 15 M kernel-devel x86_64 2.6.18-53.1.19.el5 updates 4.9 M Updating: kernel-headers x86_64 2.6.18-53.1.19.el5 updates 822 k Removing: kernel x86_64 2.6.18-8.1.15.el5 installed 72 M kernel-devel x86_64 2.6.18-8.1.15.el5 installed 15 M Removing for dependencies: kmod-gfs x86_64 0.1.16-6.2.6.18_8.1.15.el5 installed 466 k Transaction Summary ============================================================================= Install 2 Package(s) Update 1 Package(s) Remove 3 Package(s) Total download size: 21 M Is this ok [y/N]: N Exiting on user Command Regards Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alain.Moulle at bull.net Thu May 15 07:22:57 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 15 May 2008 09:22:57 +0200 Subject: [Linux-cluster] Meaning of checkinterval in cluster.conf Message-ID: <482BE4D1.8010609@bull.net> Hi I don't remember the meaning of checkinterval value in service record in cluster.conf with regard to the monitor and status values in script.sh ? Thanks Regards Alain Moull? From cfeist at redhat.com Thu May 15 11:44:28 2008 From: cfeist at redhat.com (Chris Feist) Date: Thu, 15 May 2008 06:44:28 -0500 Subject: [Linux-cluster] kmod-gfs removed In-Reply-To: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> References: <2ca799770805140532p76fd7245tab5900bf1bea3f1f@mail.gmail.com> <482B68B4.1010402@redhat.com> <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> Message-ID: <482C221C.1020102@redhat.com> Mikko Partio wrote: > On Thu, May 15, 2008 at 1:33 AM, Chris Feist wrote: > >> Mikko Partio wrote: >> >>> Hello >>> >>> the latest kernel patch for RHEL 5.1 wants to remove kmod-gfs -package. >>> What's up with this? >>> >> How does this happen? When you're doing a 'yum update', just install the >> kernel? Can you post the output of the commands that try to remove >> kmod-gfs? >> > > sh-3.1# uname -a > Linux node1 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 EST 2008 x86_64 > x86_64 x86_64 GNU/Linux > > sh-3.1# yum check-update > > kernel.x86_64 2.6.18-53.1.19.el5 updates > kernel-devel.x86_64 2.6.18-53.1.19.el5 updates > kernel-headers.x86_64 2.6.18-53.1.19.el5 updates > > sh-3.1# yum update > > ============================================================================= > Package Arch Version Repository Size > ============================================================================= > Installing: > kernel x86_64 2.6.18-53.1.19.el5 updates > 15 M > kernel-devel x86_64 2.6.18-53.1.19.el5 updates > 4.9 M > Updating: > kernel-headers x86_64 2.6.18-53.1.19.el5 updates > 822 k > Removing: > kernel x86_64 2.6.18-8.1.15.el5 installed 72 > M > kernel-devel x86_64 2.6.18-8.1.15.el5 installed 15 > M > Removing for dependencies: > kmod-gfs x86_64 0.1.16-6.2.6.18_8.1.15.el5 You have an old kmod-gfs, you should upgrade to the latest one (which doesn't depend on a specific kernel, but it depends on a specific kABI). From rpeterso at redhat.com Thu May 15 13:23:10 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Thu, 15 May 2008 08:23:10 -0500 Subject: [Linux-cluster] When using GFS+DLM, will DLM manage the locks on journals? In-Reply-To: <852258.61230.qm@web32208.mail.mud.yahoo.com> References: <852258.61230.qm@web32208.mail.mud.yahoo.com> Message-ID: <1210857790.21738.8.camel@technetium.msp.redhat.com> On Wed, 2008-05-14 at 17:27 -0700, Ja S wrote: > > >> My understanding is that there is no locking on > > the > > >> journals across > > >> nodes except when a node gets fenced and it's > > >> journal needs to be > > >> replayed to ensure data is consistent. Each node > > has > > >> it's own journal. Hi, The journals in GFS are special files with cluster-wide locks ("glocks"), so inter-node locking still applies. IIRC, all nodes keep a "read" lock on all the journals. However, every node is assigned a primary journal and uses that journal only, under a "write" lock which means there is no lock contention except during recovery situations where a journal has to be replayed. Regards, Bob Peterson Red Hat Clustering & GFS From Alain.Moulle at bull.net Thu May 15 14:21:00 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 15 May 2008 16:21:00 +0200 Subject: [Linux-cluster] CS5 / About qdisk parameters Message-ID: <482C46CC.5050306@bull.net> Hi Lon Thans again, but that's strange because in the man , the recommended values are : intervall="1" tko="10" and so we have a result < 21s which is the default value of heart-beat timer, so not a hair above like you recommened in previous email ... extract of man qddisk : interval="1" This is the frequency of read/write cycles, in seconds. tko="10" This is the number of cycles a node must miss in order to be declared dead. ? So the better values to match with the default heart-beat timeout of 21s should be : interval="2" and tko="11" right ? Thanks Regards Alain Moull? From lhh at redhat.com Thu May 15 15:07:50 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 15 May 2008 11:07:50 -0400 Subject: [Linux-cluster] CS5 / About qdisk parameters In-Reply-To: <482C46CC.5050306@bull.net> References: <482C46CC.5050306@bull.net> Message-ID: <1210864070.13237.65.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-05-15 at 16:21 +0200, Alain Moulle wrote: > Hi Lon > > Thans again, but that's strange because in the man , the recommended > values are : > intervall="1" tko="10" and so we have a result < 21s which is the > default value of heart-beat timer, so not a hair above like you > recommened in previous email ... > extract of man qddisk : > > interval="1" > This is the frequency of read/write cycles, in seconds. > > tko="10" > This is the number of cycles a node must miss in order to be > declared dead. > > ? > > So the better values to match with the default heart-beat timeout of 21s should > be : > > interval="2" and tko="11" > > right ? Yes, but you don't want to match it. You want qdisk to timeout before CMAN with enough time so that ifthe qdisk master node dies, there is enough time to elect a new master *before* CMAN would normally transition. On RHEL4, the default CMAN timeout is 21 seconds. On RHEL5, it's 5 seconds - which must be tweaked currently using the totem parameter. I intend to make qdiskd automatically detect the CMAN death detection time in the near future and automatically configure itself, because this is something users/administrators just *shouldn't* have to deal with... (Does anyone disagree with that? :) ) Anyway, here's a graphical representation as to why qdiskd needs to time out (long) before CMAN: http://people.redhat.com/lhh/cmanvsqdisk.png -- Lon From lhh at redhat.com Thu May 15 15:09:59 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 15 May 2008 11:09:59 -0400 Subject: [Linux-cluster] CS5 / About qdisk parameters In-Reply-To: <1210864070.13237.65.camel@ayanami.boston.devel.redhat.com> References: <482C46CC.5050306@bull.net> <1210864070.13237.65.camel@ayanami.boston.devel.redhat.com> Message-ID: <1210864199.13237.68.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-05-15 at 11:07 -0400, Lon Hohberger wrote: > Anyway, here's a graphical representation as to why qdiskd needs to time > out (long) before CMAN: > > http://people.redhat.com/lhh/cmanvsqdisk.png Hrm, on a second look, the timing isn't 100% accurate there. However, the reasoning is. -- Lon From lhh at redhat.com Thu May 15 15:13:42 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 15 May 2008 11:13:42 -0400 Subject: [Linux-cluster] Meaning of checkinterval in cluster.conf In-Reply-To: <482BE4D1.8010609@bull.net> References: <482BE4D1.8010609@bull.net> Message-ID: <1210864422.13237.72.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-05-15 at 09:22 +0200, Alain Moulle wrote: > Hi > > I don't remember the meaning of checkinterval value in > service record in cluster.conf with regard to the monitor > and status values in script.sh ? In RHEL3 clumanager, checkinterval was the frequency the entire service was checked. In rgmanager, check times are per-resource at a minimum granularity of 10 seconds. There's no such parameter in script.sh/service.sh/etc. If you change the , it will check more or less frequently. You can do the same thing by adding as a child of the service in the resource tree. (I need to put that on the ResourceTrees page on the wiki). -- Lon From kpodesta at redbrick.dcu.ie Thu May 15 17:34:15 2008 From: kpodesta at redbrick.dcu.ie (Karl Podesta) Date: Thu, 15 May 2008 18:34:15 +0100 Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node Message-ID: <20080515173415.GA25881@minerva.redbrick.dcu.ie> Hi folks, I've just added a 3rd node to a live RHEL 3 cluster (RHEL 3 Update 7), which was added successfully. But on the third node when I run clustat, I get the message "No Quorum - Service States Unknown". The other two nodes are running fine and clustat displays all services. A message on one of the other nodes from /var/log/messages gives: cluquorumd[8049]: Dropping connect from (node3-IP): Unauthorized Seems like the other two nodes are rejecting the third's advances towards joining quorum. Is there anything I can do about this? Would appreciate any pointers, I couldn't find an answer in the archives (I notice the question was asked before too). The three nodes are listed as "Active" in clustat on all nodes, but the third obviously just can't join the quorum, despite reboot of the third node. Thanks & regards, Karl -- Karl Podesta Systems Engineer, Securelinx Ltd., Ireland http://www.securelinx.com/ From orkcu at yahoo.com Thu May 15 18:40:31 2008 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Thu, 15 May 2008 11:40:31 -0700 (PDT) Subject: [Linux-cluster] kmod-gfs removed In-Reply-To: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> Message-ID: <81184.14267.qm@web50601.mail.re2.yahoo.com> --- On Thu, 5/15/08, Mikko Partio wrote: > From: Mikko Partio > Subject: Re: [Linux-cluster] kmod-gfs removed > To: "Chris Feist" > Cc: "linux clustering" > Received: Thursday, May 15, 2008, 1:43 AM > On Thu, May 15, 2008 at 1:33 AM, Chris Feist > wrote: > > > Mikko Partio wrote: > > > >> Hello > >> > >> the latest kernel patch for RHEL 5.1 wants to > remove kmod-gfs -package. > >> What's up with this? > >> > > > > How does this happen? When you're doing a > 'yum update', just install the > > kernel? Can you post the output of the commands that > try to remove > > kmod-gfs? > > > > sh-3.1# uname -a > Linux node1 2.6.18-53.1.14.el5 #1 SMP Wed Mar 5 11:37:38 > EST 2008 x86_64 > x86_64 x86_64 GNU/Linux > > sh-3.1# yum check-update > > kernel.x86_64 2.6.18-53.1.19.el5 > updates > kernel-devel.x86_64 2.6.18-53.1.19.el5 > updates > kernel-headers.x86_64 2.6.18-53.1.19.el5 > updates > > sh-3.1# yum update > > ============================================================================= > Package Arch Version > Repository Size > ============================================================================= > Installing: > kernel x86_64 2.6.18-53.1.19.el5 > updates > 15 M > kernel-devel x86_64 2.6.18-53.1.19.el5 > updates > 4.9 M > Updating: > kernel-headers x86_64 2.6.18-53.1.19.el5 > updates > 822 k > Removing: > kernel x86_64 2.6.18-8.1.15.el5 > installed 72 > M > kernel-devel x86_64 2.6.18-8.1.15.el5 > installed 15 > M > Removing for dependencies: > kmod-gfs x86_64 > 0.1.16-6.2.6.18_8.1.15.el5 > installed 466 k yum try to remove kmod-gfs because its depende of the kernel version that its trying to remove, which is not right because you are trying to update a kernel and it should means just install the package without remove any old versions. or do you change the default configuration of yum? cu roger __________________________________________________________________ Looking for the perfect gift? Give the gift of Flickr! http://www.flickr.com/gift/ From lhh at redhat.com Thu May 15 21:06:16 2008 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 15 May 2008 17:06:16 -0400 Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node In-Reply-To: <20080515173415.GA25881@minerva.redbrick.dcu.ie> References: <20080515173415.GA25881@minerva.redbrick.dcu.ie> Message-ID: <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com> On Thu, 2008-05-15 at 18:34 +0100, Karl Podesta wrote: > Hi folks, > > I've just added a 3rd node to a live RHEL 3 cluster (RHEL 3 Update 7), > which was added successfully. But on the third node when I run clustat, > I get the message "No Quorum - Service States Unknown". The other two > nodes are running fine and clustat displays all services. A message on > one of the other nodes from /var/log/messages gives: > > cluquorumd[8049]: Dropping connect from (node3-IP): Unauthorized > > Seems like the other two nodes are rejecting the third's advances > towards joining quorum. Is there anything I can do about this? > Would appreciate any pointers, I couldn't find an answer in the archives > (I notice the question was asked before too). The three nodes are listed > as "Active" in clustat on all nodes, but the third obviously just can't > join the quorum, despite reboot of the third node. The md5sum of /etc/cluster.xml is the same for all nodes, right? -- Lon From paul at huffingtonpost.com Thu May 15 21:44:48 2008 From: paul at huffingtonpost.com (Paul Berry) Date: Thu, 15 May 2008 17:44:48 -0400 Subject: [Linux-cluster] GFS in High Traffic ? Message-ID: <38435f290805151444n11ad8168m720be465c2e09f2a@mail.gmail.com> Hey guys - we're struggling with a GFS setup to get our 8 high traffic servers onto a NEXSAN SataBoy so that we can leave our RSYNC process which we've pushed to the extents of its capacity We don't have all that much data, its less then 1TB total. The trick is that these files get requested simultaneously under pretty significant load. And as soon as we get 3 or 4 servers mounted to the SAN we get melt-downs. We also struggled today with one server messing with the journals and taking down the other servers that were looking at the SAN (disaster). The broad question I'd love to hear - is GFS a good solution to get into for a situation like this? I'd love to hear thoughts on this, and suggestions on the right path is this seems like the wrong one Best regards, Pau From Alain.Moulle at bull.net Fri May 16 07:56:14 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Fri, 16 May 2008 09:56:14 +0200 Subject: [Linux-cluster] Re: CS5 / About qdisk parameters Message-ID: <482D3E1E.5030705@bull.net> Hi Lon Sorry Lon, but it is not completely clear again for me ... : when you write that default cman timeout on RHEL5 is 5 seconds, you mean that the heart-beat timeout is 5s ? whereas each hello message is sent every 5s too ? And the totem in cluster.conf to modify it was in my understanding the "deadnode_timer" in the cman record ... what is the "token" you mention ? And finally, my would be to set deadnode_timer="21s" for cman and to keep interval="1" and tko="10" for quorum disk. Just a precision, it on a only two nodes cluster with quorum disk. Thanks to confirm these points. Regards Alain Moull? > Yes, but you don't want to match it. > You want qdisk to timeout before CMAN with enough time so that ifthe > qdisk master node dies, there is enough time to elect a new master > *before* CMAN would normally transition. > On RHEL4, the default CMAN timeout is 21 seconds. > On RHEL5, it's 5 seconds - which must be tweaked currently using the > totem parameter. > I intend to make qdiskd automatically detect the CMAN death detection > time in the near future and automatically configure itself, because this > is something users/administrators just *shouldn't* have to deal with... > (Does anyone disagree with that? :) ) > Anyway, here's a graphical representation as to why qdiskd needs to time > out (long) before CMAN: > http://people.redhat.com/lhh/cmanvsqdisk.png > -- Lon From lhh at redhat.com Fri May 16 19:11:33 2008 From: lhh at redhat.com (Lon Hohberger) Date: Fri, 16 May 2008 15:11:33 -0400 Subject: [Linux-cluster] Re: CS5 / About qdisk parameters In-Reply-To: <482D3E1E.5030705@bull.net> References: <482D3E1E.5030705@bull.net> Message-ID: <1210965093.3019.175.camel@localhost.localdomain> On Fri, 2008-05-16 at 09:56 +0200, Alain Moulle wrote: > Hi Lon > > Sorry Lon, but it is not completely clear again for me ... : > > when you write that default cman timeout on RHEL5 is 5 seconds, you > mean that the heart-beat timeout is 5s ? whereas each hello message is > sent every 5s too ? On RHEL5, the parameters are different - but basically, on RHEL5, the *equivalent* of the deadnode_timer is "5" seconds by default. (Specifying other values for it is quite different, however) > And the totem in cluster.conf to modify it was in my understanding the > "deadnode_timer" in the cman record ... what is the "token" you mention ? RHEL4: RHEL5: > And finally, my would be to set deadnode_timer="21s" for cman and to keep > interval="1" and tko="10" for quorum disk. Right. :) Using the defaults on rhel4 should work wonderfully. -- Lon From lpleiman at redhat.com Sat May 17 00:37:28 2008 From: lpleiman at redhat.com (Leo Pleiman) Date: Fri, 16 May 2008 20:37:28 -0400 Subject: [Linux-cluster] GFS in High Traffic ? In-Reply-To: <38435f290805151444n11ad8168m720be465c2e09f2a@mail.gmail.com> References: <38435f290805151444n11ad8168m720be465c2e09f2a@mail.gmail.com> Message-ID: <482E28C8.6000304@redhat.com> Paul, We have similar demands at my customer, and with larger file systems. We have gotten good results by placing the cluster traffic on a dedicated interface. Once the cluster traffic (as defined in the cluster.conf file) was placed on a dedicated interface all our stabilization problems disappeared. If you hardware is interface limited, you can use vlan tagging and place the cluster traffic in a dedicated vlan. It doesn't provide the additional bandwidth but it seems to dramatically help. When we asked the same question, the general answer from the developers was, "it is always a good idea to place cluster traffic on a dedicated interface." As an interesting note, for an oracle RAC installation, the Oracle cluster traffic MUST be on a dedicated interface. Paul Berry wrote: > Hey guys - we're struggling with a GFS setup to get our 8 high traffic > servers onto a NEXSAN SataBoy so that we can leave our RSYNC process > which we've pushed to the extents of its capacity > > We don't have all that much data, its less then 1TB total. The trick > is that these files get requested simultaneously under pretty > significant load. And as soon as we get 3 or 4 servers mounted to the > SAN we get melt-downs. > > We also struggled today with one server messing with the journals and > taking down the other servers that were looking at the SAN (disaster). > > The broad question I'd love to hear - is GFS a good solution to get > into for a situation like this? > > I'd love to hear thoughts on this, and suggestions on the right path > is this seems like the wrong one > > Best regards, > Pau > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Leo J Pleiman Senior Consultant, GPS Federal 410-688-3873 -------------- next part -------------- A non-text attachment was scrubbed... Name: lpleiman.vcf Type: text/x-vcard Size: 194 bytes Desc: not available URL: From anujhere at gmail.com Sun May 18 11:56:39 2008 From: anujhere at gmail.com (=?UTF-8?Q?Anuj_Singh_(=E0=A4=85=E0=A4=A8=E0=A5=81=E0=A4=9C)?=) Date: Sun, 18 May 2008 17:26:39 +0530 Subject: [Linux-cluster] cluster make fail on RHEL5 "libdlm.c:324: error: " Message-ID: <3120c9e30805180456t13af5a8bmd55f05933fbc47e@mail.gmail.com> Hi,I have kernel version 2.6.18-8.el5 on rhel5. Downloaded cluster source as follows: 1.git clone git://sources.redhat.com/git/cluster.git 3. cd cluster 2. git checkout -b RHEL5 origin/RHEL5 ./configure --kernel_src=/usr/src/kernels/2.6.18-8.el5-i686/ Now running make command giving me error. gcc -L../../cman/lib -L../../dlm/lib -L//usr/lib/openais -o dlm_controld main.o member_cman.o group.o action.o deadlock.o ../lib/libgroup.a ../../ccs/lib/libccs.a -lcman -ldlm -lcpg -lSaCkpt /usr/bin/ld: cannot find -ldlm collect2: ld returned 1 exit status make[2]: *** [dlm_controld] Error 1 make[2]: Leaving directory `/usr/local/cluster/group/dlm_controld' make[1]: *** [all] Error 2 make[1]: Leaving directory `/usr/local/cluster/group' make: *** [all] Error 2 I did cd into dlm directory and: [root at localhost dlm]# ./configure --kernel_src=/usr/src/kernels/2.6.18-8.el5-i686/ Configuring Makefiles for your system... Completed Makefile configuration Now make command giving me error as follows: [root at localhost dlm]# make make -C lib all make[1]: Entering directory `/usr/local/cluster/dlm/lib' gcc -Wall -g -I. -O2 -D_REENTRANT -c -o libdlm.o libdlm.c libdlm.c: In function 'set_version_v5': libdlm.c:324: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:325: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:326: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'set_version_v6': libdlm.c:335: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:336: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:337: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'detect_kernel_version': libdlm.c:443: error: storage size of 'v' isn't known libdlm.c:446: error: invalid application of 'sizeof' to incomplete type 'struct dlm_device_version' libdlm.c:448: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:449: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:450: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:452: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:453: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:454: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:443: warning: unused variable 'v' libdlm.c: In function 'do_dlm_dispatch': libdlm.c:590: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'ls_lock_v6': libdlm.c:835: error: 'struct dlm_lock_params' has no member named 'xid' libdlm.c:837: error: 'struct dlm_lock_params' has no member named 'timeout' libdlm.c: In function 'ls_lock': libdlm.c:897: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'dlm_ls_lockx': libdlm.c:921: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'dlm_ls_unlock': libdlm.c:1073: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'dlm_ls_deadlock_cancel': libdlm.c:1105: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:1121: error: 'DLM_USER_DEADLOCK' undeclared (first use in this function) libdlm.c:1121: error: (Each undeclared identifier is reported only once libdlm.c:1121: error: for each function it appears in.) libdlm.c: In function 'dlm_ls_purge': libdlm.c:1140: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:1151: error: 'DLM_USER_PURGE' undeclared (first use in this function) libdlm.c:1152: error: 'union ' has no member named 'purge' libdlm.c:1153: error: 'union ' has no member named 'purge' libdlm.c: In function 'create_lockspace': libdlm.c:1317: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'release_lockspace': libdlm.c:1423: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c: In function 'dlm_kernel_version': libdlm.c:1509: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:1510: error: invalid use of undefined type 'struct dlm_device_version' libdlm.c:1511: error: invalid use of undefined type 'struct dlm_device_version' make[1]: *** [libdlm.o] Error 1 make[1]: Leaving directory `/usr/local/cluster/dlm/lib' make: *** [all] Error 2 how to resolve this error? Thanks and Regards Anuj -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpartio at gmail.com Mon May 19 07:26:07 2008 From: mpartio at gmail.com (Mikko Partio) Date: Mon, 19 May 2008 10:26:07 +0300 Subject: [Linux-cluster] kmod-gfs removed In-Reply-To: <81184.14267.qm@web50601.mail.re2.yahoo.com> References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> <81184.14267.qm@web50601.mail.re2.yahoo.com> Message-ID: <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com> On Thu, May 15, 2008 at 9:40 PM, Roger Pe?a wrote: > yum try to remove kmod-gfs because its depende of the kernel version that > its trying to remove, which is not right because you are trying to update a > kernel and it should means just install the package without remove any old > versions. > or do you change the default configuration of yum? I have only added an extra repo. When I did this upgrade and rebooted, the node could not see gfs-mounts any more (obviously, since the gfs-module was not there). Then I had to remove kmod-gfs -package with yum (lots of errors) and re-install it with yum again. After a reboot everything is working again. Regards MIkko -------------- next part -------------- An HTML attachment was scrubbed... URL: From kpodesta at redbrick.dcu.ie Mon May 19 11:25:39 2008 From: kpodesta at redbrick.dcu.ie (Karl Podesta) Date: Mon, 19 May 2008 12:25:39 +0100 Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node In-Reply-To: <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com> References: <20080515173415.GA25881@minerva.redbrick.dcu.ie> <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com> Message-ID: <20080519112539.GF16481@minerva.redbrick.dcu.ie> On Thu, May 15, 2008 at 05:06:16PM -0400, Lon Hohberger wrote: > > cluquorumd[8049]: Dropping connect from (node3-IP): Unauthorized > > > > Seems like the other two nodes are rejecting the third's advances > > towards joining quorum. Is there anything I can do about this? > > Would appreciate any pointers, I couldn't find an answer in the archives > > (I notice the question was asked before too). The three nodes are listed > > as "Active" in clustat on all nodes, but the third obviously just can't > > join the quorum, despite reboot of the third node. > > The md5sum of /etc/cluster.xml is the same for all nodes, right? > > -- Lon Indeed the sums/files are the same for all nodes... However it turns out the issue was resolved by rebooting the existing two production nodes! I'm not sure if just restarting clumanager/cluquorumd on the existing nodes would have made the difference, but when I wasn't having any luck getting the third node into the quorum, we scheduled a reboot of the existing two nodes, then when all 3 nodes came back up they had all joined quorum successfully, and services could be listed/migrated on all of the nodes. Fixed. I know there are probably few people using RHEL 3 cluster anymore, but I found this useful to know; that I need to schedule downtime in future if requested to add a 3rd node to a live 2-node cluster... Thanks a lot for the help as per usual, the list is excellent reading! Karl -- Karl Podesta Systems Engineer, Securelinx Ltd., Ireland http://www.securelinx.com/ From fdinitto at redhat.com Mon May 19 11:26:44 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Mon, 19 May 2008 13:26:44 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.99.02 (development snapshot) released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its community are proud to announce the 3rd release from the master branch: 2.99.02. The 2.99.XX releases are _NOT_ meant to be used for production environments.. yet. You have been warned: *this code will have no mercy* for your servers and your data. The master branch is the main development tree that receives all new features, code, clean up and a whole brand new set of bugs, At some point in time this code will become the 3.0 stable release. Everybody with test equipment and time to spare, is highly encouraged to download, install and test the 2.99 releases and more important report problems. In order to build the 2.99.02 release you will need: - - openais 0.83 or higher - - linux kernel (git snapshot or 2.6.26-rc3) from http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git (but can run on 2.6.25 in compatibility mode) NOTE to packagers: the library API/ABI's are _NOT_ stable (hence 2.9). We are still shipping shared libraries but remember that they can change anytime without warning. A bunch of new shared libraries have been added. The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.99.02.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.99.01): Bob Peterson (1): Replace put_inode with drop_inode Fabio M. Di Nitto (11): [FENCE] Rename bladecenter as it should be .pl -> .py [DLM] Remove unused header file [BUILD] Add --without_kernel_modules configure option [BUILD] Free toplevel config/ dir [CONFIG] Create config/ subsystem [CONFIG] Add missing Makefiles [CCS] Make a bunch of functions static [BUILD] Stop using DEVEL.DATE library soname [GFS] Fix comment [INIT] Do not start services automatically [GFS] Sync with gfs2 init script Jonathan Brassow (1): rgmanager/lvm.sh: HA LVM wasn't working on IA64 Marek 'marx' Grac (3): [FENCE] Fix name of the option in fencing library [FENCE] Fix problem with different menu for admin/user for APC [FENCE] Fix typo in name of the exceptions in fencing agents Makefile | 23 +- ccs/Makefile | 2 +- ccs/ccs_test/Makefile | 44 -- ccs/ccs_test/ccs_test.c | 158 ------- ccs/libccscompat/libccscompat.c | 6 +- ccs/libccsconfdb/Makefile | 56 --- ccs/libccsconfdb/ccs.h | 27 -- ccs/libccsconfdb/libccs.c | 576 ------------------------- ccs/man/Makefile | 1 - ccs/man/ccs_test.8 | 138 ------ cman/init.d/cman.in | 6 +- cman/init.d/qdiskd | 6 +- config/Makefile | 17 + config/copyright.cf | 22 - config/libs/Makefile | 17 + config/libs/libccsconfdb/Makefile | 56 +++ config/libs/libccsconfdb/ccs.h | 27 ++ config/libs/libccsconfdb/libccs.c | 576 +++++++++++++++++++++++++ config/tools/Makefile | 17 + config/tools/ccs_test/Makefile | 44 ++ config/tools/ccs_test/ccs_test.c | 158 +++++++ config/tools/man/Makefile | 17 + config/tools/man/ccs_test.8 | 138 ++++++ configure | 32 +- dlm/include/list.h | 325 -------------- fence/agents/apc/fence_apc.py | 30 ++- fence/agents/bladecenter/fence_bladecenter.pl | 90 ---- fence/agents/bladecenter/fence_bladecenter.py | 90 ++++ fence/agents/drac/fence_drac5.py | 4 +- fence/agents/ilo/fence_ilo.py | 4 +- fence/agents/ipmilan/ipmilan.c | 2 +- fence/agents/lib/fencing.py.py | 2 +- fence/agents/scsi/scsi_reserve | 6 +- fence/agents/wti/fence_wti.py | 4 +- gfs-kernel/src/gfs/ops_super.c | 11 +- gfs/init.d/gfs | 15 +- gfs2/init.d/gfs2 | 6 +- make/copyright.cf | 22 + make/defines.mk.input | 8 +- make/fencebuild.mk | 2 +- make/official_release_version | 1 + rgmanager/init.d/rgmanager.in | 6 +- rgmanager/src/resources/lvm.sh | 2 +- 43 files changed, 1286 insertions(+), 1508 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSDFkAQgUGcMLQ3qJAQJPeg/8C5BxkynDvsjfgSyjUHlzG/zZe5p4viXH NQtYZk/3nFRBXqvZCYS+gHkdMQvRmJzEHCknLryJZMrZaq5Nj5gn8RERrtFUZ81C 6DWGyqkiqERBsMffR0nkZ//gqkktPx2AaAMFQ5nLd8v6qHvY2SdTwjaV/7ucLiWz sTRC7samneKqj8Et6cgSId2a818xEI6LX9h4fXiwIO2DH7yHK/bvHYhatLYgPvQn 0VQ8XwKkvafUjPEBurkzgh9E4GVvOG35KTS8X/ib6whT0oJFRhkofJG2oCv1sULt lkbGLaUiBL0DW66Z/ypXmK8IBEtgXRjE0DmfoK9xGBJBlolobmLNZ4A/pdTaBeW1 s8Qq763/BeZ5Z6pEtHQzwMcHwQjhg0mGWtmthr9TGfJ/EhsoYnp7DHLKZ89ldItE dEHq94VTZ7QpsKPg7HBSahJEvHUzPM20GSyl7hSmx4Nuno2iftR/IUbCjVEKxPHa 0ePadvsndxuQsyVjRSseLRNHeAW0NvMY82rV9UzEX05Fi6ryT2308DzNi9018LWe baQ+slrg7oWJNcInOwkjNcYMxm6VGPwqTyvrlb/BTZUVhZdium7A3zswx/cPt+qG kV3cfkSNGIz/K9CqjdlE/pQFV6SqR7ILOmg4M717vMzdJcWBehD1QEtGtXxyNkSa xG/QjxC2mZw= =1s3u -----END PGP SIGNATURE----- From lhh at redhat.com Mon May 19 15:00:54 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 19 May 2008 11:00:54 -0400 Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node In-Reply-To: <20080519112539.GF16481@minerva.redbrick.dcu.ie> References: <20080515173415.GA25881@minerva.redbrick.dcu.ie> <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com> <20080519112539.GF16481@minerva.redbrick.dcu.ie> Message-ID: <1211209254.32213.65.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-05-19 at 12:25 +0100, Karl Podesta wrote: > However it turns out the issue was resolved by rebooting the existing > two production nodes! I'm not sure if just restarting clumanager/cluquorumd > on the existing nodes would have made the difference, but when I wasn't > having any luck getting the third node into the quorum, we scheduled a > reboot of the existing two nodes, then when all 3 nodes came back up they > had all joined quorum successfully, and services could be listed/migrated > on all of the nodes. Fixed. > > I know there are probably few people using RHEL 3 cluster anymore, but > I found this useful to know; that I need to schedule downtime in future > if requested to add a 3rd node to a live 2-node cluster... Well, it /should/ just work. Maybe there's something that was missed, like adding an entry explicitly to /etc/hosts. It drops the connection attempt if the message subsystem key doesn't match - which is why I asked about md5sum. Also, it's strange that cluqourumd would not work but clumembd did - they use the same code. Maybe the other daemons on the existing cluster nodes didn't reload correctly (service clumanager reload may have fixed it?). What release of clumanager was it ? -- Lon From kpodesta at redbrick.dcu.ie Mon May 19 15:30:38 2008 From: kpodesta at redbrick.dcu.ie (Karl Podesta) Date: Mon, 19 May 2008 16:30:38 +0100 Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node In-Reply-To: <1211209254.32213.65.camel@ayanami.boston.devel.redhat.com> References: <20080515173415.GA25881@minerva.redbrick.dcu.ie> <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com> <20080519112539.GF16481@minerva.redbrick.dcu.ie> <1211209254.32213.65.camel@ayanami.boston.devel.redhat.com> Message-ID: <20080519153038.GC28780@minerva.redbrick.dcu.ie> On Mon, May 19, 2008 at 11:00:54AM -0400, Lon Hohberger wrote: > Well, it /should/ just work. Maybe there's something that was missed, > like adding an entry explicitly to /etc/hosts. > > It drops the connection attempt if the message subsystem key doesn't > match - which is why I asked about md5sum. Also, it's strange that > cluqourumd would not work but clumembd did - they use the same code. > > Maybe the other daemons on the existing cluster nodes didn't reload > correctly (service clumanager reload may have fixed it?). > > What release of clumanager was it ? > > -- Lon Well I followed procedure as directly from the manual, and before adding the node I scp'd over /etc/hosts, /etc/passwd, /etc/groups etc., made relevant changes, and made sure disk mounts were accessible and service software could run OK. Then I added the node through the GUI Cluster Config tool on one of the existing nodes, saved, copied /etc/cluster.xml to the new member, and started clumanager. All nodes immediately listed the 3rd node in clustat, it's just that the 3rd node couldn't list services, and instead had the quorum error above. All nodes were RHAS3, the two existing ones had been built with Update 2, but were kept updated, the new node was built with Update 7 to recognise new hardware and was also updated via RHN prior to adding cluster services. It is possible that just restarting clumanager on those 2 nodes may have fixed it, but just in case this would affect running services we scheduled downtime, then just brought all the nodes down and back up again. Version of clumanager is 1.2.28-1, redhat-config-cluster is 1.0.8-1 I did think it odd alright, clumembd was definitely running... Thanks & regards, Karl -- Karl Podesta Systems Engineer, Securelinx Ltd., Ireland http://www.securelinx.com/ From wcyoung at buffalo.edu Mon May 19 20:18:51 2008 From: wcyoung at buffalo.edu (Wes Young) Date: Mon, 19 May 2008 16:18:51 -0400 Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck Message-ID: I'm having a little trouble with an older installation of RHEL4, cluster/GFS. One of my cluster nodes crashed the other day, when I brought it back up I got a the error: GFS: Trying to join cluster "lock_dlm", "oss:mydisk" GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS... GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock... GFS: fsid=oss:mydisk.0: jid=0: Looking at journal... attempt to access beyond end of device sdb: rw=0, want=19149432840, limit=858673152 GFS: fsid=oss:mydisk.0: fatal: I/O error I tried to run the gfs_fsck and get a Segmentation fault. So, I upgraded the cluster software (latest RHEL4 tag), compile and get: # gfs_fsck -V GFS fsck DEVEL.1211222576 (built May 19 2008 15:05:16) Copyright (C) Red Hat, Inc. 2004-2005 All rights reserved. [root at sproc cluster]# gfs_fsck -vv /dev/sdb Initializing fsck Initializing lists... (bio.c:140) Writing to 65536 - 16 4096 Initializing special inodes... (file.c:45) readi: Offset (400) is >= the file size (400). (super.c:226) 5 journals found. Validating Resource Group index. Level 1 check. Segmentation fault Which is a little further (it didn't do the Level 1 check) than I got last time, but still bails on me. not being a GFS pro here, and a little gfs_tool list.. work, the volume seems to be there, just feels like the server crash damaged some important bits along the way. The data on this drive isn't that critical, just looking to see if i'm missing something dumb, or verification that the partition is hosed (or just not worth trying to really recover the 400 gigs of data at this point). If this should go to the devel list, please let me know. -- Wes Young Network Security Analyst CIT - University at Buffalo ----------------------------------------------- | my OpenID: | http://tinyurl.com/2zu2d3 | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2421 bytes Desc: not available URL: From lhh at redhat.com Mon May 19 20:28:00 2008 From: lhh at redhat.com (Lon Hohberger) Date: Mon, 19 May 2008 16:28:00 -0400 Subject: [Linux-cluster] RHEL 3: no quorum for 3rd node In-Reply-To: <20080519153038.GC28780@minerva.redbrick.dcu.ie> References: <20080515173415.GA25881@minerva.redbrick.dcu.ie> <1210885576.32213.38.camel@ayanami.boston.devel.redhat.com> <20080519112539.GF16481@minerva.redbrick.dcu.ie> <1211209254.32213.65.camel@ayanami.boston.devel.redhat.com> <20080519153038.GC28780@minerva.redbrick.dcu.ie> Message-ID: <1211228880.32213.77.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-05-19 at 16:30 +0100, Karl Podesta wrote: > Version of clumanager is 1.2.28-1, redhat-config-cluster is 1.0.8-1 I suspect you hit this: https://bugzilla.redhat.com/show_bug.cgi?id=172886 -- Lon From michael.osullivan at auckland.ac.nz Mon May 19 21:15:16 2008 From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan) Date: Tue, 20 May 2008 09:15:16 +1200 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID Message-ID: <4831EDE4.5090600@auckland.ac.nz> Thanks for your response Wendy. Please see a diagram of the system at http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the fullscreen view) that (I hope) explains the setup. We are not using FC as we are building the SAN with commodity components (the total cost of the system was less than NZ $9000). The SAN is designed to hold files for staff and students in our department, I'm not sure exactly what applications will use the GFS. We are using iscsi-target software although we may upgrade to using firmware in the future. We have used CLVM on top of software RAID, I agree there are many levels to this system, but I couldn't find the necessary is hardware/software to implement this in a simpler way. I am hoping the list may be helpful here. What I wanted to do was the following: Build a SAN from commodity hardware that has no single point of failure and acts like a single file system. The ethernet fabric provide two paths from each server to each storage device (hence two NICs on all the boxes). Each device contains a single logical disk (striped here across two disks for better performance, there is along story behind why we have two disks in each box). These devices (2+) are presented using iSCSI to 2 (or more) servers, but are put together in a RAID-5 configuration so a single failure of a device will not interrupt access to the data. I used iSCSI as we use ethernet for cost reasons. I used mdadm for multipath as I could not find another way to get the servers to see two iSCSI portals as a single device. I then used mdadm and raided the two iSCSI disks together to get the RAID-5 configuration I wanted. Finally I had to create a logical volume for the GFS system so that servers could properly access the network RAID array. I am more than happy to change this to make it more effective as long as: 1) It doesn't cost very much; 2) The no single point of failure property is maintained; 3) The servers see the SAN as a single entity (that way devices can be added and removed with a minimum of fuss). Thanks again for any help/advice/suggestions. I am very new to implementing storage networks, so any help is great. Regards, Mike From JACOB_LIBERMAN at Dell.com Mon May 19 22:05:26 2008 From: JACOB_LIBERMAN at Dell.com (JACOB_LIBERMAN at Dell.com) Date: Mon, 19 May 2008 17:05:26 -0500 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <4831EDE4.5090600@auckland.ac.nz> References: <4831EDE4.5090600@auckland.ac.nz> Message-ID: <398B0D66E5559F4696716218E0A3C27665C81E@ausx3mps329.aus.amer.dell.com> Hi Mike, I took a peak at the diagram. Does the blue cylinder represent an Ethernet switch? You may want to add another switch if it's a full redundant mesh topology youre after. Thanks, Jacob > -----Original Message----- > From: linux-cluster-bounces at redhat.com > [mailto:linux-cluster-bounces at redhat.com] On Behalf Of > Michael O'Sullivan > Sent: Monday, May 19, 2008 4:15 PM > To: linux-cluster at redhat.com > Subject: Re: [Linux-cluster] GFS, iSCSI, multipaths and RAID > > Thanks for your response Wendy. Please see a diagram of the > system at http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or > http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen > for the fullscreen view) that (I hope) explains the setup. We > are not using FC as we are building the SAN with commodity > components (the total cost of the system was less than NZ > $9000). The SAN is designed to hold files for staff and > students in our department, I'm not sure exactly what > applications will use the GFS. We are using iscsi-target > software although we may upgrade to using firmware in the > future. We have used CLVM on top of software RAID, I agree > there are many levels to this system, but I couldn't find the > necessary is hardware/software to implement this in a simpler > way. I am hoping the list may be helpful here. > > What I wanted to do was the following: > > Build a SAN from commodity hardware that has no single point > of failure and acts like a single file system. The ethernet > fabric provide two paths from each server to each storage > device (hence two NICs on all the boxes). Each device > contains a single logical disk (striped here across two disks > for better performance, there is along story behind why we > have two disks in each box). These devices (2+) are presented > using iSCSI to 2 (or more) servers, but are put together in a > RAID-5 configuration so a single failure of a device will not > interrupt access to the data. > > I used iSCSI as we use ethernet for cost reasons. I used > mdadm for multipath as I could not find another way to get > the servers to see two iSCSI portals as a single device. I > then used mdadm and raided the two iSCSI disks together to > get the RAID-5 configuration I wanted. Finally I had to > create a logical volume for the GFS system so that servers > could properly access the network RAID array. I am more than > happy to change this to make it more effective as long as: > > 1) It doesn't cost very much; > 2) The no single point of failure property is maintained; > 3) The servers see the SAN as a single entity (that way > devices can be added and removed with a minimum of fuss). > > Thanks again for any help/advice/suggestions. I am very new > to implementing storage networks, so any help is great. > > Regards, Mike > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From ross at kallisti.us Mon May 19 23:03:47 2008 From: ross at kallisti.us (Ross Vandegrift) Date: Mon, 19 May 2008 19:03:47 -0400 Subject: [Linux-cluster] New fencing method Message-ID: <20080519230347.GA30667@kallisti.us> Hello everyone, I wrote a new fencing method script that fences by remotely shutting down a switchport. The idea is to fabric fence an iSCSI client by shutting down the port used for iSCSI connectivity. This should work on any Ethernet switch that implements IF-MIB - that's more or less any managed Ethernet switch. It works by setting IF-MIB::ifAdminStatus.ifIndex to down(1) - ie, disable the switchport that the node is plugged into. However, I'm having trouble finding how to integrate my script into the fence_node system. Is there a config file somewhere, or will I need to build a custom version of fence_node? I've attached my script. It uses python and pysnmp v2. Feel free to use it, integrate it, forget about it, etc Thanks in advance for any help! -- Ross Vandegrift ross at kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_snmp.py Type: text/x-python Size: 4880 bytes Desc: not available URL: From d.skorupa at wasko.pl Tue May 20 06:41:41 2008 From: d.skorupa at wasko.pl (Darek Skorupa) Date: Tue, 20 May 2008 08:41:41 +0200 Subject: [Linux-cluster] New fencing method In-Reply-To: <20080519230347.GA30667@kallisti.us> References: <20080519230347.GA30667@kallisti.us> Message-ID: <483272A5.4060304@wasko.pl> > However, I'm having trouble finding how to integrate my script into > the fence_node system. Is there a config file somewhere, or will I > need to build a custom version of fence_node? > > I think, you should copy fence_snmp script to /sbin folder and if script will exit with '0' status fencing is successful in otherwise is unsuccessful. Am I understand it in good way ?? Darek From jergendutch at gmail.com Tue May 20 12:11:45 2008 From: jergendutch at gmail.com (Jergen Dutch) Date: Tue, 20 May 2008 14:11:45 +0200 Subject: [Linux-cluster] any tricks for per-directory gfs quotas Message-ID: <9db683200805200511k201059bat43098d6fc2d58dbe@mail.gmail.com> Hi, Is there any trick that gives the effect of per-directory quotas without requiring a given user to own the files or be writing to the files? Thanks JD From adas at redhat.com Tue May 20 13:22:18 2008 From: adas at redhat.com (Abhijith Das) Date: Tue, 20 May 2008 08:22:18 -0500 Subject: [Linux-cluster] any tricks for per-directory gfs quotas In-Reply-To: <9db683200805200511k201059bat43098d6fc2d58dbe@mail.gmail.com> References: <9db683200805200511k201059bat43098d6fc2d58dbe@mail.gmail.com> Message-ID: <4832D08A.4020302@redhat.com> Jergen Dutch wrote: > Hi, > > Is there any trick that gives the effect of per-directory quotas > without requiring a given user to own the files or be writing to the > files? > > Thanks > JD > You can assign GFS quotas for UIDs or GIDs. In your case, user quotas are clearly out because you can't have users own files. I haven't completely thought through this, but there might be a clumsy way of using group quotas to accomplish what you want... not entirely sure you'd want to do such a thing though :-). Not to mention, this would probably work only with a small number of directories and users and it's going to be difficult to automate it and make it work seamlessly. - 1 group per directory you want to monitor for quota. i.e for each quota-monitored directory 'foo' have a group 'foo-grp' and setup GFS quotas for these groups. - For each such directory 'foo', do 'chgrp foo-grp foo' and 'chmod g+s foo'. (all files and directories in 'foo' created subsequent to this operation will have group 'foo-grp'. You can change the GIDs on the previously existing files of the directory by hand) Oh, and quotas for nested directories probably won't work, not accurately at least. :-) Hey, you asked for tricks :-) and this is what I could come up with. Maybe somebody else would be able come up with a much better idea or throw mine out of the window. Cheers, --Abhi From ross at kallisti.us Tue May 20 14:59:21 2008 From: ross at kallisti.us (Ross Vandegrift) Date: Tue, 20 May 2008 10:59:21 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: <483272A5.4060304@wasko.pl> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> Message-ID: <20080520145921.GA5250@kallisti.us> On Tue, May 20, 2008 at 08:41:41AM +0200, Darek Skorupa wrote: > >However, I'm having trouble finding how to integrate my script into > >the fence_node system. Is there a config file somewhere, or will I > >need to build a custom version of fence_node? > > > > > I think, you should copy fence_snmp script to /sbin folder and if script > will exit with '0' status fencing is successful in otherwise is > unsuccessful. > > Am I understand it in good way ?? Yep - I was wondering if there was additional ways to teach the cluster utilities how to setup the parameters. Then I assume I would write something like this in cluster.conf: But how does fence_node know to call these like: fence_snmp -o down -v 2c -a 10.0.0.1 -c sw1comm -i 10020 fence_snmp -o down -v 1 -a 10.0.0.2 -c sw2comm -i 10021 -- Ross Vandegrift ross at kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 From Alain.Moulle at bull.net Tue May 20 15:22:30 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 20 May 2008 17:22:30 +0200 Subject: [Linux-cluster] CS5 / heart beat tuning Message-ID: <4832ECB6.40400@bull.net> Hi Lon Something bothers me about the CS5 defaut heart-beat timeout : you wrote that it was now default 5s instead of 21s with CS4. So : what is the new default period for HELLO messages ? because it was also 5s with CS4 ... And a strange thing : I have already tested several times the failover with CS5 without any totem record in cluster.conf (just a remaining deadnode_timer="21" in cman record) and when I stopped one node, the other proceed to fence after 21s ... not 5s .. ???? Thanks Regards Alain Moull? From rpeterso at redhat.com Tue May 20 16:40:14 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 20 May 2008 11:40:14 -0500 Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck In-Reply-To: References: Message-ID: <1211301614.10437.24.camel@technetium.msp.redhat.com> On Mon, 2008-05-19 at 16:18 -0400, Wes Young wrote: > I'm having a little trouble with an older installation of RHEL4, > cluster/GFS. > > One of my cluster nodes crashed the other day, when I brought it back > up I got a the error: > > GFS: Trying to join cluster "lock_dlm", "oss:mydisk" > GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS... > GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock... > GFS: fsid=oss:mydisk.0: jid=0: Looking at journal... > attempt to access beyond end of device > sdb: rw=0, want=19149432840, limit=858673152 > GFS: fsid=oss:mydisk.0: fatal: I/O error Hi Wes, Sorry for the long post, but this needs some explanation. >From your email, it sounds like you have corruption in your resource group index file (rindex). You might be the victim of this bug: https://bugzilla.redhat.com/show_bug.cgi?id=436383 If so, there's a fix to gfs_fsck to repair the damage. This is associated with this bug record: https://bugzilla.redhat.com/show_bug.cgi?id=440896 While working on that bug, I discovered some kinds of corruption that confuse the gfs_fsck's rindex repair code. That's described in bug: https://bugzilla.redhat.com/show_bug.cgi?id=442271 I don't think any of these fixes are generally available yet, except in patch form; I think they're scheduled for 4.7. The last one, 442271, is only written against RHEL5 at the moment, so I don't have plans to fix it in RHEL4 yet. So here's what I recommend: First, determine for sure if this is the problem by doing something like this: mount the file system gfs_tool rindex /mnt/gfs | grep "4294967292" (there /mnt/gfs is your mount point) umount the file system If it comes back with "ri_data = 429496729" then that IS the problem, in which case you need to acquire the fixes to the first two bugs listed. You can do this a number of ways: (1) wait until 4.7 comes out, (2) get the patches from the bugzilla and build them from the source tree, (3) grab the RHEL4 branch from the cluster git tree and build from there, because it should include those two fixes. IIRC, I think that the fix to gfs_grow (the original cause of this corruption) has been released as a z-stream fix for 4.6 too, but I don't think we did that for gfs_fsck. If it comes back with no output, then there's a different kind of corruption in your rindex. You could try to build a RHEL4 version of the patch from bug 442271 and see if it fixes your corruption. So this at your own risk; we cannot be responsible for your data. I recommend making a full backup before trying anything. Depending on the size of the file system and your amount of free storage, you could dd the entire GFS device to a file you can restore. You could also save off your file system metadata and put it on an ftp server or web server so I can grab it then I'll use it "in the name of 442271" to figure out if the most recent patch in the bz will fix the corruption and if not, I will adjust the 442271 patch so it does. The problem with that is: there is no code in RHEL4 to do this either. I built a RHEL4 version of a tool (gfs2_edit) that can save off your metadata, but I may need to bring it up to date with recent changes first. Either way, this might take some time to resolve. Regards, Bob Peterson Red Hat Clustering & GFS From wcyoung at buffalo.edu Tue May 20 17:18:11 2008 From: wcyoung at buffalo.edu (Wes Young) Date: Tue, 20 May 2008 13:18:11 -0400 Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck In-Reply-To: <1211301614.10437.24.camel@technetium.msp.redhat.com> References: <1211301614.10437.24.camel@technetium.msp.redhat.com> Message-ID: On May 20, 2008, at 12:40 PM, Bob Peterson wrote: > On Mon, 2008-05-19 at 16:18 -0400, Wes Young wrote: >> I'm having a little trouble with an older installation of RHEL4, >> cluster/GFS. >> >> One of my cluster nodes crashed the other day, when I brought it back >> up I got a the error: >> >> GFS: Trying to join cluster "lock_dlm", "oss:mydisk" >> GFS: fsid=oss:mydisk.0: Joined cluster. Now mounting FS... >> GFS: fsid=oss:mydisk.0: jid=0: Trying to acquire journal lock... >> GFS: fsid=oss:mydisk.0: jid=0: Looking at journal... >> attempt to access beyond end of device >> sdb: rw=0, want=19149432840, limit=858673152 >> GFS: fsid=oss:mydisk.0: fatal: I/O error > > Hi Wes, > > Sorry for the long post, but this needs some explanation. > >> From your email, it sounds like you have corruption in your > resource group index file (rindex). You might be the victim > of this bug: > https://bugzilla.redhat.com/show_bug.cgi?id=436383 > > If so, there's a fix to gfs_fsck to repair the damage. This is > associated with this bug record: > https://bugzilla.redhat.com/show_bug.cgi?id=440896 > > While working on that bug, I discovered some kinds of > corruption that confuse the gfs_fsck's rindex repair code. > That's described in bug: > https://bugzilla.redhat.com/show_bug.cgi?id=442271 > > I don't think any of these fixes are generally available > yet, except in patch form; I think they're scheduled for > 4.7. The last one, 442271, is only written against RHEL5 > at the moment, so I don't have plans to fix it in RHEL4 yet. > > So here's what I recommend: > > First, determine for sure if this is the problem by doing > something like this: > > mount the file system > gfs_tool rindex /mnt/gfs | grep "4294967292" > (there /mnt/gfs is your mount point) > umount the file system That's the problem though, it won't actually let me mount the "disk" because of this problem. Sounds like my best option is to try and patch the gfs_fsck code in RHE4 and see if it still seg-faults on me... If that doesn't work, i'm guessing a move to RHEL5 would be the next step, but given the actual value of the data, probably not worth it at this point. Thanks for the info. I'll let you know how it goes. -- Wes Young Network Security Analyst CIT - University at Buffalo ----------------------------------------------- | my OpenID: | http://tinyurl.com/2zu2d3 | ----------------------------------------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2421 bytes Desc: not available URL: From rpeterso at redhat.com Tue May 20 17:23:42 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 20 May 2008 12:23:42 -0500 Subject: [Linux-cluster] Cluster1, RHEL4 gfs_fsck In-Reply-To: References: <1211301614.10437.24.camel@technetium.msp.redhat.com> Message-ID: <1211304222.10437.27.camel@technetium.msp.redhat.com> On Tue, 2008-05-20 at 13:18 -0400, Wes Young wrote: > That's the problem though, it won't actually let me mount the "disk" > because of this problem. Hm. That must be some crazy corruption to not even allow a mount. > Sounds like my best option is to try and patch the gfs_fsck code in > RHE4 and see if it still seg-faults on me... I'd try the second bug's patch first. If that doesn't work, try the third bug's patch. > If that doesn't work, i'm guessing a move to RHEL5 would be the next > step, but given the actual value of the data, probably not worth it at > this point. > > Thanks for the info. I'll let you know how it goes. I wouldn't mind getting a look at the corruption, so let me know if you want to go that route. Saving the metadata does not save any user data, so your confidentiality is protected. I'll see what I can do to get that tool functional again. Regards, Bob Peterson Red Hat Clustering & GFS From barbos at gmail.com Tue May 20 18:19:17 2008 From: barbos at gmail.com (Alex Kompel) Date: Tue, 20 May 2008 11:19:17 -0700 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <4831EDE4.5090600@auckland.ac.nz> References: <4831EDE4.5090600@auckland.ac.nz> Message-ID: <3ae027040805201119m3598c736gf9ca60470e7584ee@mail.gmail.com> On Mon, May 19, 2008 at 2:15 PM, Michael O'Sullivan wrote: > Thanks for your response Wendy. Please see a diagram of the system at > http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or > http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the > fullscreen view) that (I hope) explains the setup. We are not using FC as we > are building the SAN with commodity components (the total cost of the system > was less than NZ $9000). The SAN is designed to hold files for staff and > students in our department, I'm not sure exactly what applications will use > the GFS. We are using iscsi-target software although we may upgrade to using > firmware in the future. We have used CLVM on top of software RAID, I agree > there are many levels to this system, but I couldn't find the necessary is > hardware/software to implement this in a simpler way. I am hoping the list > may be helpful here. > So what do you want to get out of this configuration? iSCSI SAN, GFS cluster or both? I don't see any reason for 2 additional servers running GFS on top of iSCSI SAN. If you need iSCSI SAN with iscsi-target then there are number of articles on how to set it up. For example: http://www.pcpro.co.uk/realworld/82284/san-on-the-cheap/page1.html Or just google for iscsi-target drdb and heartbeat. If you need GFS then you can run it on the storage servers (there is no need for iSCSI in between). If you need both then it can get tricky but you can try splitting your raid arrays in a way that half is used by GFS cluster and half is for DRDB volumes with iSCSI luns on top and RedHat Cluster acting as a heartbeat for failover (provided you can also do regular failover with GFS running on the same cluster - I have never tried it before). -Alex From michael.osullivan at auckland.ac.nz Tue May 20 19:25:47 2008 From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan) Date: Wed, 21 May 2008 07:25:47 +1200 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <20080520160011.78F95619943@hormel.redhat.com> References: <20080520160011.78F95619943@hormel.redhat.com> Message-ID: <483325BB.9080909@auckland.ac.nz> Thanks Jacob, Originally we designed a full core-edge configuration with ethernet switches, but our design algorithms (initially for Fiber Channel) did not account for the tree structure of ethernet when using unmanaged switches (this is relatively straightforward to incorporate, but we had already purchased the network...!). However, we do have two unmanaged switches connecting the servers to the storage devices so there are two paths between all the boxes. Thanks, Mike From lhh at redhat.com Tue May 20 19:35:18 2008 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 20 May 2008 15:35:18 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: <483272A5.4060304@wasko.pl> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> Message-ID: <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-05-20 at 08:41 +0200, Darek Skorupa wrote: > > However, I'm having trouble finding how to integrate my script into > > the fence_node system. Is there a config file somewhere, or will I > > need to build a custom version of fence_node? > > > > > I think, you should copy fence_snmp script to /sbin folder and if script > will exit with '0' status fencing is successful in otherwise is > unsuccessful. > That's step one. The agent as noted doesn't appear to take arguments from STDIN. Try looking here for more information: http://sources.redhat.com/cluster/wiki/FenceAgentAPI -- Lon From jparsons at redhat.com Tue May 20 19:52:25 2008 From: jparsons at redhat.com (James Parsons) Date: Tue, 20 May 2008 15:52:25 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> Message-ID: <48332BF9.1000703@redhat.com> Lon Hohberger wrote: >On Tue, 2008-05-20 at 08:41 +0200, Darek Skorupa wrote: > > >>>However, I'm having trouble finding how to integrate my script into >>>the fence_node system. Is there a config file somewhere, or will I >>>need to build a custom version of fence_node? >>> >>> >>> >>> >>I think, you should copy fence_snmp script to /sbin folder and if script >>will exit with '0' status fencing is successful in otherwise is >>unsuccessful. >> >> >> > >That's step one. > >The agent as noted doesn't appear to take arguments from STDIN. Try >looking here for more information: > > http://sources.redhat.com/cluster/wiki/FenceAgentAPI > > I think a good pattern to use for an agent is fence_rsa, if you know python. It is a nice vanilla agent. -Jim From lhh at redhat.com Tue May 20 20:31:18 2008 From: lhh at redhat.com (Lon Hohberger) Date: Tue, 20 May 2008 16:31:18 -0400 Subject: [Linux-cluster] CS5 / heart beat tuning In-Reply-To: <4832ECB6.40400@bull.net> References: <4832ECB6.40400@bull.net> Message-ID: <1211315478.771.151.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-05-20 at 17:22 +0200, Alain Moulle wrote: > Hi Lon > > Something bothers me about the CS5 defaut heart-beat timeout : > you wrote that it was now default 5s instead of 21s with CS4. > So : what is the new default period for HELLO messages ? because > it was also 5s with CS4 ... There are no hello messages in rhel5; check 'man openais.conf' and look at the sections dealing with 'totem'. On RHEL5 with CMAN, the default totem 'token' timeout is 10000 (I thought it was 5000, but looking at the code proved differently on lines 487-492 of cman/daemon/ais.c). I still don't understand why it your configuration would be behaving as if totem's token timeout was increased to 21,000 though... > And a strange thing : I have already tested several times > the failover with CS5 without any totem record in cluster.conf > (just a remaining deadnode_timer="21" in cman record) and when > I stopped one node, the other proceed to fence after 21s ... not 5s .. That's strange. Deadnode_timer is ignored in the RHEL5 branch. -- Lon From ross at kallisti.us Tue May 20 22:13:48 2008 From: ross at kallisti.us (Ross Vandegrift) Date: Tue, 20 May 2008 18:13:48 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> Message-ID: <20080520221348.GD6881@kallisti.us> On Tue, May 20, 2008 at 03:35:18PM -0400, Lon Hohberger wrote: > > On Tue, 2008-05-20 at 08:41 +0200, Darek Skorupa wrote: > > > However, I'm having trouble finding how to integrate my script into > > > the fence_node system. Is there a config file somewhere, or will I > > > need to build a custom version of fence_node? > > > > > > > > I think, you should copy fence_snmp script to /sbin folder and if script > > will exit with '0' status fencing is successful in otherwise is > > unsuccessful. > > > > That's step one. > > The agent as noted doesn't appear to take arguments from STDIN. Try > looking here for more information: > > http://sources.redhat.com/cluster/wiki/FenceAgentAPI Awesome - thanks for the pointer. That makes so much more sense and explains how fence_tool can introspect the options passed to the particular fencer. I've attached an updated version that follows the specifications at the above link. I've got a two node cluster running configured with it, though I haven't done any substantial testing yet. Thanks for the help - feel free to use/distribute/forget about it :) -- Ross Vandegrift ross at kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 -------------- next part -------------- A non-text attachment was scrubbed... Name: fence_snmp.py Type: text/x-python Size: 6259 bytes Desc: not available URL: From fdinitto at redhat.com Wed May 21 09:35:44 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Wed, 21 May 2008 11:35:44 +0200 (CEST) Subject: [Linux-cluster] New fencing method In-Reply-To: <20080520221348.GD6881@kallisti.us> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> <20080520221348.GD6881@kallisti.us> Message-ID: Hi Ross, On Tue, 20 May 2008, Ross Vandegrift wrote: > I've attached an updated version that follows the specifications at > the above link. I've got a two node cluster running configured with > it, though I haven't done any substantial testing yet. > > Thanks for the help - feel free to use/distribute/forget about it :) As soon as you feel the code is ready i will be very glad to include it in the release. Please make sure to choose an appropriate licence like GPL2 so that we can easily redistribute and add a copyright entry in the file since it's all your work and you also deserve credits for it. Fabio -- I'm going to make him an offer he can't refuse. From lhh at redhat.com Wed May 21 17:24:03 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 21 May 2008 13:24:03 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> <20080520221348.GD6881@kallisti.us> Message-ID: <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-05-21 at 11:35 +0200, Fabio M. Di Nitto wrote: > Hi Ross, > > On Tue, 20 May 2008, Ross Vandegrift wrote: > > > I've attached an updated version that follows the specifications at > > the above link. I've got a two node cluster running configured with > > it, though I haven't done any substantial testing yet. > > > > Thanks for the help - feel free to use/distribute/forget about it :) > > As soon as you feel the code is ready i will be very glad to include it in > the release. Please make sure to choose an appropriate licence like GPL2 > so that we can easily redistribute and add a copyright entry in the file > since it's all your work and you also deserve credits for it. I'd recommend calling it something besides fence_snmp in the tree - because other agents also use SNMP. For example: fence_ethernet ? -- Lon From jparsons at redhat.com Wed May 21 17:47:20 2008 From: jparsons at redhat.com (James Parsons) Date: Wed, 21 May 2008 13:47:20 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> <20080520221348.GD6881@kallisti.us> <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com> Message-ID: <48346028.6030804@redhat.com> Lon Hohberger wrote: >On Wed, 2008-05-21 at 11:35 +0200, Fabio M. Di Nitto wrote: > > >>Hi Ross, >> >>On Tue, 20 May 2008, Ross Vandegrift wrote: >> >> >> >>>I've attached an updated version that follows the specifications at >>>the above link. I've got a two node cluster running configured with >>>it, though I haven't done any substantial testing yet. >>> >>>Thanks for the help - feel free to use/distribute/forget about it :) >>> >>> >>As soon as you feel the code is ready i will be very glad to include it in >>the release. Please make sure to choose an appropriate licence like GPL2 >>so that we can easily redistribute and add a copyright entry in the file >>since it's all your work and you also deserve credits for it. >> >> > >I'd recommend calling it something besides fence_snmp in the tree - >because other agents also use SNMP. For example: > > fence_ethernet ? > >-- Lon > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >https://www.redhat.com/mailman/listinfo/linux-cluster > > Could you include some doc on how to use it, please? You can use one of the existing agent man pages as a template. Thanks, -J From ross at kallisti.us Wed May 21 23:35:26 2008 From: ross at kallisti.us (Ross Vandegrift) Date: Wed, 21 May 2008 19:35:26 -0400 Subject: [Linux-cluster] New fencing method In-Reply-To: <48346028.6030804@redhat.com> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> <20080520221348.GD6881@kallisti.us> <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com> <48346028.6030804@redhat.com> Message-ID: <20080521233526.GA21955@kallisti.us> On Wed, May 21, 2008 at 01:47:20PM -0400, James Parsons wrote: > Lon Hohberger wrote: > >I'd recommend calling it something besides fence_snmp in the tree - > >because other agents also use SNMP. For example: > > > > fence_ethernet ? > > > Could you include some doc on how to use it, please? You can use one of > the existing agent man pages as a template. Done and done. I settled on fence_ifmib, since there's nothing specific to ethernet about IF-MIB, and it could apply to many different technologies. Diff against today's git is attached. -- Ross Vandegrift ross at kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 -------------- next part -------------- A non-text attachment was scrubbed... Name: rhcs-fence-ifmib.diff Type: text/x-diff Size: 11340 bytes Desc: not available URL: From michael.osullivan at auckland.ac.nz Thu May 22 02:12:27 2008 From: michael.osullivan at auckland.ac.nz (Michael O'Sullivan) Date: Thu, 22 May 2008 14:12:27 +1200 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <20080521160013.8C3D6619BED@hormel.redhat.com> References: <20080521160013.8C3D6619BED@hormel.redhat.com> Message-ID: <4834D68B.9010309@auckland.ac.nz> Hi Alex, We wanted an iSCSI SAN that has highly available data, hence the need for 2 (or more storage devices) and a reliable storage network (omitted from the diagram). Many of the articles I have read for iSCSI don't address multipathing to the iSCSI devices, in our configuration iSCSI Disk 1 presented as /dev/sdc and /dev/sdd on each server (and iSCSI Disk 2 presented as /dev/sde and /dev/sdf), but it wan't clear how to let the servers know that the two iSCSI portals attached to the same target - thus I used mdadm. Also, I wanted to raid the iSCSI disks to make sure the data stays highly available - thus the second use of mdadm. Now we had a single iSCSI raid array spread over 2 (or more) devices which provides the iSCSI SAN. However, I wanted to make sure the servers did not try to access the same data simultaneously, so I used GFS to ensure correct use of the iSCSI SAN. If I understand correctly it seems like the multipathing and raiding may be possible in Red Hat Cluster Suite GFS without using iSCSI? Or to use iSCSI with some other software to ensure proper locking happens for the iSCSI raid array? I am reading the link you suggested to see what other people have done, but as always any suggestions, etc are more than welcome. Thanks, Mike From s.wendy.cheng at gmail.com Thu May 22 04:20:01 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 21 May 2008 23:20:01 -0500 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <3ae027040805201119m3598c736gf9ca60470e7584ee@mail.gmail.com> References: <4831EDE4.5090600@auckland.ac.nz> <3ae027040805201119m3598c736gf9ca60470e7584ee@mail.gmail.com> Message-ID: <4834F471.8030107@gmail.com> Alex Kompel wrote: > On Mon, May 19, 2008 at 2:15 PM, Michael O'Sullivan > wrote: > >> Thanks for your response Wendy. Please see a diagram of the system at >> http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or >> http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the >> fullscreen view) that (I hope) explains the setup. We are not using FC as we >> are building the SAN with commodity components (the total cost of the system >> was less than NZ $9000). The SAN is designed to hold files for staff and >> students in our department, I'm not sure exactly what applications will use >> the GFS. We are using iscsi-target software although we may upgrade to using >> firmware in the future. We have used CLVM on top of software RAID, I agree >> there are many levels to this system, but I couldn't find the necessary is >> hardware/software to implement this in a simpler way. I am hoping the list >> may be helpful here. >> >> > > So what do you want to get out of this configuration? iSCSI SAN, GFS > cluster or both? I don't see any reason for 2 additional servers > running GFS on top of iSCSI SAN. > There are advantages (for 2 additional storage servers) because serving data traffic over IP network has its own overhead(s). They offload CPU as well as memory consumption(s) away from GFS nodes. If done right, the setup could emulate high end SAN box using commodity hardware to provide low cost solutions. The issue here is how to find the right set of software subcomponents to build this configuration. I personally never use Linux iscsi target or multi-path md devices - so can't comment on their features and/or performance characteristics. I was hoping folks well versed in these Linux modules (software raid, dm multi-path, clvm raid level etc) could provide their comments. Check out linux-lvm and/or dm-devel mailing lists .. you may be able to find good links and/or ideas there, or even start to generate interesting discussions from scratch. So, if this configuration will be used as a research project, I'm certainly interested to read the final report. Let us know what works and which one sucks. If it is for a production system to store critical data, better to do more searches to see what are available in the market (to replace the components grouped inside the "iscsi-raid" box in your diagram - it is too complicated to isolate issues if problems popped up). There should be plenty of them out there (e.g. Netapp has offered iscsi SAN boxes with additional feature set such as failover, data de-duplication, backup, performance monitoring, etc). At the same time, it would be nice to have support group to call if things go wrong. From GFS side, I learned from previous GFS-GNBD experiences that serving data from IP networks have its overhead and it is not as cheap as people would expect. The issue is further complicated by the newer Red Hat cluster infra-structure that also places non-trivial amount of workloads on the TCP/IP stacks. So separating these IP traffic(s) (cluster HA, data, and/or GFS node access by applications) should be a priority to make the whole setup works. -- Wendy From s.wendy.cheng at gmail.com Thu May 22 04:34:47 2008 From: s.wendy.cheng at gmail.com (Wendy Cheng) Date: Wed, 21 May 2008 23:34:47 -0500 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <4834D68B.9010309@auckland.ac.nz> References: <20080521160013.8C3D6619BED@hormel.redhat.com> <4834D68B.9010309@auckland.ac.nz> Message-ID: <4834F7E7.7090903@gmail.com> Michael O'Sullivan wrote: > Hi Alex, > > We wanted an iSCSI SAN that has highly available data, hence the need > for 2 (or more storage devices) and a reliable storage network > (omitted from the diagram). Many of the articles I have read for iSCSI > don't address multipathing to the iSCSI devices, in our configuration > iSCSI Disk 1 presented as /dev/sdc and /dev/sdd on each server (and > iSCSI Disk 2 presented as /dev/sde and /dev/sdf), but it wan't clear > how to let the servers know that the two iSCSI portals attached to the > same target - thus I used mdadm. Also, I wanted to raid the iSCSI > disks to make sure the data stays highly available - thus the second > use of mdadm. Now we had a single iSCSI raid array spread over 2 (or > more) devices which provides the iSCSI SAN. However, I wanted to make > sure the servers did not try to access the same data simultaneously, > so I used GFS to ensure correct use of the iSCSI SAN. If I understand > correctly it seems like the multipathing and raiding may be possible > in Red Hat Cluster Suite GFS without using iSCSI? Or to use iSCSI with > some other software to ensure proper locking happens for the iSCSI > raid array? I am reading the link you suggested to see what other > people have done, but as always any suggestions, etc are more than > welcome. > Check out dm-multipath (*not* md-multi-path) to see whether you can make use of it: http://www.redhat.com/docs/manuals/csgfs/browse/4.6/DM_Multipath/MPIO_description.html -- Wendy From Alain.Moulle at bull.net Thu May 22 07:36:01 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 22 May 2008 09:36:01 +0200 Subject: [Linux-cluster] CS5 / heart beat tuning Message-ID: <48352261.9070102@bull.net> Hi Lon OK it seems I miss some big evolutions with CS5 versus CS4 ... Where can I find a short documentation (or all documentation) to understand all evolutions of CS5 , like openais , etc. ? Thanks Regards Alain Moull? On Tue, 2008-05-20 at 17:22 +0200, Alain Moulle wrote: >> Hi Lon >> >> Something bothers me about the CS5 defaut heart-beat timeout : >> you wrote that it was now default 5s instead of 21s with CS4. >> So : what is the new default period for HELLO messages ? because >> it was also 5s with CS4 ... There are no hello messages in rhel5; check 'man openais.conf' and look at the sections dealing with 'totem'. On RHEL5 with CMAN, the default totem 'token' timeout is 10000 (I thought it was 5000, but looking at the code proved differently on lines 487-492 of cman/daemon/ais.c). I still don't understand why it your configuration would be behaving as if totem's token timeout was increased to 21,000 though... >> And a strange thing : I have already tested several times >> the failover with CS5 without any totem record in cluster.conf >> (just a remaining deadnode_timer="21" in cman record) and when >> I stopped one node, the other proceed to fence after 21s ... not 5s .. That's strange. Deadnode_timer is ignored in the RHEL5 branch. -- Lon From ccaulfie at redhat.com Thu May 22 07:44:38 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Thu, 22 May 2008 08:44:38 +0100 Subject: [Linux-cluster] CS5 / heart beat tuning In-Reply-To: <48352261.9070102@bull.net> References: <48352261.9070102@bull.net> Message-ID: <48352466.8040400@redhat.com> Alain Moulle wrote: > Hi Lon > > OK it seems I miss some big evolutions with CS5 versus CS4 ... > Where can I find a short documentation (or all documentation) > to understand all evolutions of CS5 , like openais , etc. ? > http://sources.redhat.com/cluster/wiki/HomePage?action=AttachFile&do=view&target=aiscman.pdf -- Chrissie From fdinitto at redhat.com Thu May 22 08:02:47 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 22 May 2008 10:02:47 +0200 (CEST) Subject: [Linux-cluster] New fencing method In-Reply-To: <20080521233526.GA21955@kallisti.us> References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> <20080520221348.GD6881@kallisti.us> <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com> <48346028.6030804@redhat.com> <20080521233526.GA21955@kallisti.us> Message-ID: Hi Ross, On Wed, 21 May 2008, Ross Vandegrift wrote: > On Wed, May 21, 2008 at 01:47:20PM -0400, James Parsons wrote: >> Lon Hohberger wrote: >>> I'd recommend calling it something besides fence_snmp in the tree - >>> because other agents also use SNMP. For example: >>> >>> fence_ethernet ? >>> >> Could you include some doc on how to use it, please? You can use one of >> the existing agent man pages as a template. > > Done and done. I settled on fence_ifmib, since there's nothing > specific to ethernet about IF-MIB, and it could apply to many > different technologies. > > Diff against today's git is attached. > thank you very much. fence_ifmib is now in git master branch. I did a few changes to plug it in. I will need to review the fence building script to avoid that hack for the COPYRIGHT header generation but it's low priority in my list for now. I made sure to keep the original one in the header. Could you please consider adding a few print to the help menu to add information about build date, release version and your copyright to be consistent with the other fence agents? Thanks Fabio -- I'm going to make him an offer he can't refuse. From jamesbewley at gmail.com Thu May 22 09:25:38 2008 From: jamesbewley at gmail.com (James Bewley) Date: Thu, 22 May 2008 10:25:38 +0100 Subject: [Linux-cluster] Fault tollerant filesystem Message-ID: Hi all, I'm running a cluster and looking for a fault tolerant filesystem. currently have failover via linux-ha and need a shared drive between 4 machines idealy no one machine would need to be master and the removal of any machine would not compromise data integrity. Can anyone suggest a good implementation that will fill my requirements? Regards James -------------- next part -------------- An HTML attachment was scrubbed... URL: From a_mdl at mail.ru Thu May 22 10:08:19 2008 From: a_mdl at mail.ru (Denis Medvedev) Date: Thu, 22 May 2008 14:08:19 +0400 Subject: =?koi8-r?Q?Re=3A_[Linux-cluster]_Fault_tollerant_filesystem?= In-Reply-To: References: Message-ID: Hello, you can try www.cleversafe.org they provide iscsi fault-tolerant multi-node storage solution Denis Medvedev -----Original Message----- From: "James Bewley" To: linux-cluster at redhat.com Date: Thu, 22 May 2008 10:25:38 +0100 Subject: [Linux-cluster] Fault tollerant filesystem > > Hi all, > > I'm running a cluster and looking for a fault tolerant filesystem. > > currently have failover via linux-ha and need a shared drive between 4 > machines idealy no one machine would need to be master and the removal of > any machine would not compromise data integrity. > > Can anyone suggest a good implementation that will fill my requirements? > > > Regards > > James > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > From jamesbewley at gmail.com Thu May 22 10:30:17 2008 From: jamesbewley at gmail.com (James Bewley) Date: Thu, 22 May 2008 11:30:17 +0100 Subject: [Linux-cluster] Fault tollerant filesystem In-Reply-To: References: Message-ID: Yes that would be nice, my budget is very much smaller than that. >From what i've read so far, the best looking solution appears to be DRDB and NFS. With a structure similar to the following (v. bad) asci diagram: ----------------- ----------------- | r/w node | | r/w node | ------------------ ------------------ ^ NFS mount v ---------------------- --------------------- | DRDB node | <- linux HA -> | DRDB node | ---------------------- -------------------- Does GFS have any advantages over NFS, or am i being ignorant to the prupose GFS? James 2008/5/22 Denis Medvedev : > > Hello, > you can try www.cleversafe.org > they provide iscsi fault-tolerant multi-node storage solution > Denis Medvedev > > -----Original Message----- > From: "James Bewley" > To: linux-cluster at redhat.com > Date: Thu, 22 May 2008 10:25:38 +0100 > Subject: [Linux-cluster] Fault tollerant filesystem > > > > > Hi all, > > > > I'm running a cluster and looking for a fault tolerant filesystem. > > > > currently have failover via linux-ha and need a shared drive between 4 > > machines idealy no one machine would need to be master and the removal of > > any machine would not compromise data integrity. > > > > Can anyone suggest a good implementation that will fill my requirements? > > > > > > Regards > > > > James > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Aviation Briefing Ltd. Registered in England and Wales Company No: 3709975 Registered Office: Glen Yeo House, Station Road, Congresbury, North Somerset BS49 5DY From sasmaz at itu.edu.tr Thu May 22 11:25:00 2008 From: sasmaz at itu.edu.tr (aydin sasmaz) Date: Thu, 22 May 2008 14:25:00 +0300 Subject: [Linux-cluster] Fault tollerant filesystem In-Reply-To: References: Message-ID: <023401c8bbfe$81c2e560$8548b020$@edu.tr> Hi It is suitable to use enbd or redhat gnbd solution with gfs2. cheers -----Original Message----- From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of James Bewley Sent: Thursday, May 22, 2008 1:30 PM To: Denis Medvedev; linux clustering Subject: Re: [Linux-cluster] Fault tollerant filesystem Yes that would be nice, my budget is very much smaller than that. >From what i've read so far, the best looking solution appears to be DRDB and NFS. With a structure similar to the following (v. bad) asci diagram: ----------------- ----------------- | r/w node | | r/w node | ------------------ ------------------ ^ NFS mount v ---------------------- --------------------- | DRDB node | <- linux HA -> | DRDB node | ---------------------- -------------------- Does GFS have any advantages over NFS, or am i being ignorant to the prupose GFS? James 2008/5/22 Denis Medvedev : > > Hello, > you can try www.cleversafe.org > they provide iscsi fault-tolerant multi-node storage solution > Denis Medvedev > > -----Original Message----- > From: "James Bewley" > To: linux-cluster at redhat.com > Date: Thu, 22 May 2008 10:25:38 +0100 > Subject: [Linux-cluster] Fault tollerant filesystem > > > > > Hi all, > > > > I'm running a cluster and looking for a fault tolerant filesystem. > > > > currently have failover via linux-ha and need a shared drive between 4 > > machines idealy no one machine would need to be master and the removal of > > any machine would not compromise data integrity. > > > > Can anyone suggest a good implementation that will fill my requirements? > > > > > > Regards > > > > James > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster at redhat.com > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Aviation Briefing Ltd. Registered in England and Wales Company No: 3709975 Registered Office: Glen Yeo House, Station Road, Congresbury, North Somerset BS49 5DY -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From Alain.Moulle at bull.net Thu May 22 12:44:02 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 22 May 2008 14:44:02 +0200 Subject: [Linux-cluster] CS5 / cluster_id in cluster.conf ? Message-ID: <48356A92.5040800@bull.net> Hi With CS5, is there always the possibility to set cluster_id in cluster.conf : References: <48356A92.5040800@bull.net> Message-ID: <48356CB1.7040308@redhat.com> Alain Moulle wrote: > Hi > > With CS5, is there always the possibility to set cluster_id > in cluster.conf : likewise with CS4 ? > (where this cluster_id was used instead > of generated from cluster name) > Err, yes. It was you that requested the feature in the first place ! https://bugzilla.redhat.com/show_bug.cgi?id=219588 Chrissie From Alain.Moulle at bull.net Thu May 22 14:43:01 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Thu, 22 May 2008 16:43:01 +0200 Subject: [Linux-cluster] CS5 still problem "Node x is undead" Message-ID: <48358675.4050506@bull.net> Hi Lon I've applied the patch (see resulting code below) but the patch does not solve the problem. Is there another patch linked to this problem ? Thanks Regards Alain Moull? >> when testing a two-nodes cluster with quorum disk, when >> I poweroff the node1 , node 2 fences well the node 1 and >> failovers the service, but in log of node 2 I have before and after >> the fence success messages many messages like this: >> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >> Apr 24 11:30:08 s_sys at xn3 qdiskd[13740]: Node 2 is undead. http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 Resulting code after patch application in cman/qdisk/main.c : =========================================================== Transition from Online -> Evicted */ if (ni[x].ni_misses > ctx->qc_tko && state_run(ni[x].ni_status.ps_state)) { /* Mark our internal views as dead if nodes miss too many heartbeats... This will cause a master transition if no live master exists. */ if (ni[x].ni_status.ps_state >= S_RUN && ni[x].ni_seen) { clulog(LOG_DEBUG, "Node %d DOWN\n", ni[x].ni_status.ps_nodeid); ni[x].ni_seen = 0; } ni[x].ni_state = S_EVICT; ni[x].ni_status.ps_state = S_EVICT; ni[x].ni_evil_incarnation = ni[x].ni_status.ps_incarnation; /* Write eviction notice if we're the master. */ if (ctx->qc_status == S_MASTER) { clulog(LOG_NOTICE, "Writing eviction notice for node %d\n", ni[x].ni_status.ps_nodeid); qd_write_status(ctx, ni[x].ni_status.ps_nodeid, S_EVICT, NULL, NULL, NULL); if (ctx->qc_flags & RF_ALLOW_KILL) { clulog(LOG_DEBUG, "Telling CMAN to " "kill the node\n"); cman_kill_node(ctx->qc_ch, ni[x].ni_status.ps_nodeid); } } /* Clear our master mask for the node after eviction */ if (mask) clear_bit(mask, (ni[x].ni_status.ps_nodeid-1), sizeof(memb_mask_t)); continue; } From sasmaz at itu.edu.tr Thu May 22 16:02:43 2008 From: sasmaz at itu.edu.tr (aydin sasmaz) Date: Thu, 22 May 2008 19:02:43 +0300 Subject: [Linux-cluster] GNBD CLuster In-Reply-To: References: <20080519230347.GA30667@kallisti.us> <483272A5.4060304@wasko.pl> <1211312118.771.138.camel@ayanami.boston.devel.redhat.com> <20080520221348.GD6881@kallisti.us> <1211390643.3174.1.camel@ayanami.boston.devel.redhat.com> <48346028.6030804@redhat.com> <20080521233526.GA21955@kallisti.us> Message-ID: <025c01c8bc25$49ca84c0$dd5f8e40$@edu.tr> Hi All, I would like to share disk volumes to the other nodes in my cluster with using a high available gnbd cluster. In this topology, In addition to cnodes, there are two hpdl380 server connected to msa20 enclosure by scsi cabling. So they are presented with disk volumes. At this point, 1) I wouldn't like to use gfs solution 2) Keep serving disk volumes to cnodes when one of two node gnbd cluster fails. I mean, would like to migrate gnbd service to the failover cluster node 3) when one of the gnbd server fails would like to fence it with a proper method by using HP-ILO fence device. But don't know how to test failing gnbd server. Platforms :Two HP DL380 g3 servers with RHAP5.1 vith virtualization, cluster and cluster storage Two HP MSA20 enclosures Any advice would be appreciated Thanks aydin From jerlyon at gmail.com Thu May 22 17:03:46 2008 From: jerlyon at gmail.com (Jeremy Lyon) Date: Thu, 22 May 2008 11:03:46 -0600 Subject: [Linux-cluster] Cluster starts, but a node won't rejoin after reboot Message-ID: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com> Hi, I'm running Cluster 2 on RHEL 5.2 (I saw this behavior on 5.1 and updated just yesterday to see if it fixed it, but no luck) and I'm seeing issues when I reboot a node. I tried increasing the post_join_delay to 60 and the totem token to 25000, but nothing seems to be working. During the boot when the cman init script runs, I see openais messages on the current running node for anywhere between 15 to 30 seconds: May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 0. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 560 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering COMMIT state. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61: May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] previous ring seq 1372 rep 151.117.65.61 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token May 22 11:52:20 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:20 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:20 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:20 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state. May 22 11:52:20 lxomp83k openais[3602]: [CLM ] got nodejoin message 151.117.65.61 May 22 11:52:20 lxomp83k openais[3602]: [CPG ] got joinlist message from node 1 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 9. That repeats until I finally see this... May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 568 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering COMMIT state. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61: May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1380 rep 151.117.65.61 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [1] member 151.117.65.62: May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1368 rep 151.117.65.62 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru c high delivered c received flag 1 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token May 22 11:52:26 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:26 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:26 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.61) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.62) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:27 lxomp83k openais[3602]: [CLM ] r(0) ip( 151.117.65.62) May 22 11:52:27 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service. May 22 11:52:27 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state. May 22 11:52:27 lxomp83k openais[3602]: [MAIN ] Killing node lxomp84k because it has rejoined the cluster with existing state At this point when the second node comes up, I can login and run service cman stop and service cman start. On that start the node joins the cluster immediately with no issue. [root at lxomp84k ~]# uname -a Linux lxomp84k 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [root at lxomp84k ~]# rpm -q cman cman-2.0.84-2.el5 Any suggestions?? TIA, Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: From fog at t.is Thu May 22 17:12:38 2008 From: fog at t.is (=?iso-8859-1?Q?Finnur_=D6rn_Gu=F0mundsson_-_TM_Software?=) Date: Thu, 22 May 2008 17:12:38 -0000 Subject: [Linux-cluster] Cluster starts, but a node won't rejoin after reboot In-Reply-To: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com> References: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com> Message-ID: <3DDA6E3E456E144DA3BB0A62A7F7F779020C6285@SKYHQAMX08.klasi.is> Hi, I'm having the exact same issue on a RHEL 5.2 system, and have a open support case with Redhat. When it will be resolved i can post the details .... Thanks, Finnur From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Jeremy Lyon Sent: 22. ma? 2008 17:04 To: linux clustering Subject: [Linux-cluster] Cluster starts, but a node won't rejoin after reboot Hi, I'm running Cluster 2 on RHEL 5.2 (I saw this behavior on 5.1 and updated just yesterday to see if it fixed it, but no luck) and I'm seeing issues when I reboot a node. I tried increasing the post_join_delay to 60 and the totem token to 25000, but nothing seems to be working. During the boot when the cman init script runs, I see openais messages on the current running node for anywhere between 15 to 30 seconds: May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 0. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 560 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering COMMIT state. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61: May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] previous ring seq 1372 rep 151.117.65.61 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token May 22 11:52:20 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:20 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] r(0) ip(151.117.65.61) May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:20 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] r(0) ip(151.117.65.61) May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:20 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:20 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service. May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state. May 22 11:52:20 lxomp83k openais[3602]: [CLM ] got nodejoin message 151.117.65.61 May 22 11:52:20 lxomp83k openais[3602]: [CPG ] got joinlist message from node 1 May 22 11:52:20 lxomp83k openais[3602]: [TOTEM] entering GATHER state from 9. That repeats until I finally see this... May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Creating commit token because I am the rep. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Saving state aru 89 high seq received 89 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Storing new sequence id for ring 568 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering COMMIT state. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] entering RECOVERY state. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [0] member 151.117.65.61: May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1380 rep 151.117.65.61 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru 89 high delivered 89 received flag 1 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] position [1] member 151.117.65.62: May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] previous ring seq 1368 rep 151.117.65.62 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] aru c high delivered c received flag 1 May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Did not need to originate any messages in recovery. May 22 11:52:26 lxomp83k openais[3602]: [TOTEM] Sending initial ORF token May 22 11:52:26 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:26 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip(151.117.65.61) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] CLM CONFIGURATION CHANGE May 22 11:52:26 lxomp83k openais[3602]: [CLM ] New Configuration: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip(151.117.65.61) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] r(0) ip(151.117.65.62) May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Left: May 22 11:52:26 lxomp83k openais[3602]: [CLM ] Members Joined: May 22 11:52:27 lxomp83k openais[3602]: [CLM ] r(0) ip(151.117.65.62) May 22 11:52:27 lxomp83k openais[3602]: [SYNC ] This node is within the primary component and will provide service. May 22 11:52:27 lxomp83k openais[3602]: [TOTEM] entering OPERATIONAL state. May 22 11:52:27 lxomp83k openais[3602]: [MAIN ] Killing node lxomp84k because it has rejoined the cluster with existing state At this point when the second node comes up, I can login and run service cman stop and service cman start. On that start the node joins the cluster immediately with no issue. [root at lxomp84k ~]# uname -a Linux lxomp84k 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [root at lxomp84k ~]# rpm -q cman cman-2.0.84-2.el5 Any suggestions?? TIA, Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ross at kallisti.us Thu May 22 17:52:23 2008 From: ross at kallisti.us (Ross Vandegrift) Date: Thu, 22 May 2008 13:52:23 -0400 Subject: [Linux-cluster] concurrent write performance Message-ID: <20080522175223.GB27548@kallisti.us> Hi everyone, I've been doing some tests with a clustered GFS installation that will evetually host an application that will make heavy use of concurrent writes across nodes. Testing such a scenarios with a script designed to simulate multiple writers shows that add I add writer processes across nodes, performance drops off. This makes some sense to me, as the nodes need to do more complicated neogtiation of locking. Two questions: 1) What is the expected scalability of GFS for many writer nodes as the number of nodes increases? 2) What kinds of things can I do to increase random write performance on GFS? I'm even interested in things that cause some trade-off with read performance. I've got the filesystem mounted on all nodes with noatime,quota=off. My filesystem isn't large enough to benefit from reducing the number of resource groups. It looks like drop_count for the dlm isn't there anymore. I looked at /sys/kernel/config/dlm/cluster - what do the various items in there tune, and which can I try to mess with to help write performance? Finally, I don't see any sign of statfs_slots in the current gfs2_tool gettune output. Is there an equivalent I can muck with? -- Ross Vandegrift ross at kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 From mpartio at gmail.com Fri May 23 06:04:18 2008 From: mpartio at gmail.com (Mikko Partio) Date: Fri, 23 May 2008 09:04:18 +0300 Subject: [Linux-cluster] Problems with gfs_grow Message-ID: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com> Hello all, I tried to expand my gfs filesystem from 1,5TB to 2TB. I added the new 500G disk to volume manager etc, and finally run gfs_grow. The command finished without warnings, but a few seconds after that my cluster crashed with "Kernel Panic - not syncing. Fatal exception". When I got the cluster up again and executed gfs_fsck on the filesystem I get this error: sh-3.1# gfs_fsck -v /dev/xxx-vg/xxx-lv Initializing fsck Initializing lists... Initializing special inodes... Validating Resource Group index. Level 1 check. 5167 resource groups found. (passed) Setting block ranges... Can't seek to last block in file system: 4738147774 Unable to determine the boundaries of the file system. Freeing buffers. What could be the problem? Regards Mikko Info on the system: CentOS 5.1 sh-3.1# rpm -qa |grep gfs gfs2-utils-0.1.38-1.el5 kmod-gfs-0.1.19-7.el5_1.1 gfs-utils-0.1.12-1.el5 sh-3.1# uname -a Linux xxx 2.6.18-53.1.21.el5 #1 SMP Tue May 20 09:35:07 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux -------------- next part -------------- An HTML attachment was scrubbed... URL: From denisb+gmane at gmail.com Fri May 23 07:47:31 2008 From: denisb+gmane at gmail.com (denis) Date: Fri, 23 May 2008 09:47:31 +0200 Subject: [Linux-cluster] Re: kmod-gfs removed In-Reply-To: <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com> References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> <81184.14267.qm@web50601.mail.re2.yahoo.com> <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com> Message-ID: Mikko Partio wrote: > yum try to remove kmod-gfs because its depende of the kernel version > that its trying to remove, which is not right because you are trying > to update a kernel and it should means just install the package > without remove any old versions. > or do you change the default configuration of yum? > I have only added an extra repo. > > When I did this upgrade and rebooted, the node could not see gfs-mounts > any more (obviously, since the gfs-module was not there). Then I had to > remove kmod-gfs -package with yum (lots of errors) and re-install it > with yum again. After a reboot everything is working again. What is the status of this one? I am seeing the same here (did not perform upgrade yet) : Removing: kernel x86_64 2.6.18-53.1.19.el5 installed kmod-gfs x86_64 0.1.19-7.el5_1.1 installed The only gfs related packages in the upgrade list are : gfs-utils x86_64 0.1.17-1.el5 rhel-x86_64-server-cluster-storage-5 gfs2-utils x86_64 0.1.44-1.el5 rhel-x86_64-server-5 Specifically how should one perform the upgrade with the least amount of hassle? Regards -- Denis From vimal at monster.co.in Fri May 23 08:18:00 2008 From: vimal at monster.co.in (Vimal Gupta) Date: Fri, 23 May 2008 13:48:00 +0530 Subject: [Linux-cluster] kernel: dlm: lockspace ERROR !!! Message-ID: <48367DB8.2090704@monster.co.in> Hi, I made a cluster of two nodes sharing HDD via GNBD. when I was going to mount the exported partition on Node B the Node B got hang. And After the hardboot now I am getting these following entries in my /var/log/message of both nodes Node A May 23 13:42:26 mint10 kernel: dlm: lockspace 20001 from 1 type 1 not found May 23 13:42:26 mint10 kernel: dlm: lockspace 30001 from 1 type 1 not found Node B May 23 13:45:59 mint26 kernel: dlm: lockspace 20002 from 2 type 1 not found May 23 13:46:00 mint26 kernel: dlm: lockspace 30002 from 2 type 1 not found And ON node B [clurgmgrd] showing I am not able to kill this pid Please reply ASAP -- Vimal Gupta Sr. System Administrator Monster.com India Pvt.Ltd. From denisb+gmane at gmail.com Fri May 23 08:18:32 2008 From: denisb+gmane at gmail.com (denis) Date: Fri, 23 May 2008 10:18:32 +0200 Subject: [Linux-cluster] Re: kmod-gfs removed In-Reply-To: References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> <81184.14267.qm@web50601.mail.re2.yahoo.com> <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com> Message-ID: denis wrote: > What is the status of this one? > I am seeing the same here (did not perform upgrade yet) : > Removing: > kernel x86_64 2.6.18-53.1.19.el5 installed > kmod-gfs x86_64 0.1.19-7.el5_1.1 installed > The only gfs related packages in the upgrade list are : > gfs-utils x86_64 0.1.17-1.el5 rhel-x86_64-server-cluster-storage-5 > gfs2-utils x86_64 0.1.44-1.el5 rhel-x86_64-server-5 Scratch that, these appear to be installed too : kmod-gfs x86_64 0.1.23-5.el5 rhel-x86_64-server-cluster-storage-5 kmod-gfs2 x86_64 1.92-1.1.el5 rhel-x86_64-server-cluster-storage-5 > Specifically how should one perform the upgrade with the least amount of > hassle? So I guess an update should work out fine afterall?! Regards -- Denis From mpartio at gmail.com Fri May 23 08:37:35 2008 From: mpartio at gmail.com (Mikko Partio) Date: Fri, 23 May 2008 11:37:35 +0300 Subject: [Linux-cluster] Re: kmod-gfs removed In-Reply-To: References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> <81184.14267.qm@web50601.mail.re2.yahoo.com> <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com> Message-ID: <2ca799770805230137r1ec3ae19r269c955cabe6d74e@mail.gmail.com> On Fri, May 23, 2008 at 11:18 AM, denis > wrote: > denis wrote: > >> What is the status of this one? >> I am seeing the same here (did not perform upgrade yet) : >> Removing: >> kernel x86_64 2.6.18-53.1.19.el5 installed >> kmod-gfs x86_64 0.1.19-7.el5_1.1 installed >> The only gfs related packages in the upgrade list are : >> gfs-utils x86_64 0.1.17-1.el5 rhel-x86_64-server-cluster-storage-5 >> gfs2-utils x86_64 0.1.44-1.el5 rhel-x86_64-server-5 >> > > Scratch that, these appear to be installed too : > kmod-gfs x86_64 0.1.23-5.el5 rhel-x86_64-server-cluster-storage-5 > kmod-gfs2 x86_64 1.92-1.1.el5 rhel-x86_64-server-cluster-storage-5 > > Specifically how should one perform the upgrade with the least amount of >> hassle? >> > > So I guess an update should work out fine afterall?! Is this RHEL/CentOS 5.2? I don't see that kmod-gfs version with 5.1. Regards Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From denisb+gmane at gmail.com Fri May 23 10:44:13 2008 From: denisb+gmane at gmail.com (denis) Date: Fri, 23 May 2008 12:44:13 +0200 Subject: [Linux-cluster] Re: kmod-gfs removed In-Reply-To: <2ca799770805230137r1ec3ae19r269c955cabe6d74e@mail.gmail.com> References: <2ca799770805142243l65dc9c32t7d2ecfc1bcefdc48@mail.gmail.com> <81184.14267.qm@web50601.mail.re2.yahoo.com> <2ca799770805190026i5c96098pd31d55eba10ed39f@mail.gmail.com> <2ca799770805230137r1ec3ae19r269c955cabe6d74e@mail.gmail.com> Message-ID: Mikko Partio wrote: >> On Fri, May 23, 2008 at 11:18 AM, denis denisb wrote: >> Scratch that, these appear to be installed too : >> kmod-gfs x86_64 0.1.23-5.el5 rhel-x86_64-server-cluster-storage-5 >> kmod-gfs2 x86_64 1.92-1.1.el5 rhel-x86_64-server-cluster-storage-5 > Is this RHEL/CentOS 5.2? I don't see that kmod-gfs version with 5.1. Yes, this is RHEL5.2. Regards -- Denis From nico at altiva.fr Fri May 23 15:08:02 2008 From: nico at altiva.fr (NM) Date: Fri, 23 May 2008 15:08:02 +0000 (UTC) Subject: [Linux-cluster] Booting node 1 causes it to fence node 2 Message-ID: I have two nodes, each fenceable through a Dell RAC card. When I power cycle one of them, it reboots ... and proceeds to fence the other one! I must be missing something ... (btw should cman be started in init.d automatically? or should it be launched by an operator after having made sure the node was sane?) From rpeterso at redhat.com Fri May 23 17:50:13 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Fri, 23 May 2008 12:50:13 -0500 Subject: [Linux-cluster] Problems with gfs_grow In-Reply-To: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com> References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com> Message-ID: <1211565013.10437.119.camel@technetium.msp.redhat.com> Hi Mikko, On Fri, 2008-05-23 at 09:04 +0300, Mikko Partio wrote: > Hello all, > > I tried to expand my gfs filesystem from 1,5TB to 2TB. I added the new > 500G disk to volume manager etc, and finally run gfs_grow. The command > finished without warnings, but a few seconds after that my cluster > crashed with "Kernel Panic - not syncing. Fatal exception". When I got > the cluster up again and executed gfs_fsck on the filesystem I get > this error: > > sh-3.1# gfs_fsck -v /dev/xxx-vg/xxx-lv > Initializing fsck > Initializing lists... > Initializing special inodes... > Validating Resource Group index. > Level 1 check. > 5167 resource groups found. > (passed) > Setting block ranges... > Can't seek to last block in file system: 4738147774 > Unable to determine the boundaries of the file system. > Freeing buffers. You've probably hit the gfs_grow bug described in bz #434962 (436383) and the gfs_fsck bug described in 440897 (440896). My apologies if you can't read them; permissions to individual bugzilla records are out of my control. It's not guaranteed to be your problem, but it sounds similar. The fixes are available in the recently released RHEL5.2, although I don't know when they'll hit Centos. The fixes are also available in the latest cluster git tree if you want to compile/install them from source code yourself. Documentation for doing this can be found at: http://sources.redhat.com/cluster/wiki/ClusterGit Regards, Bob Peterson Red Hat Clustering & GFS From Klaus.Steinberger at physik.uni-muenchen.de Sat May 24 08:31:33 2008 From: Klaus.Steinberger at physik.uni-muenchen.de (Klaus Steinberger) Date: Sat, 24 May 2008 10:31:33 +0200 Subject: [Linux-cluster] Re: Booting node 1 causes it to fence node 2 (NM) In-Reply-To: <20080523160008.98774619B25@hormel.redhat.com> References: <20080523160008.98774619B25@hormel.redhat.com> Message-ID: <200805241031.36484.Klaus.Steinberger@physik.uni-muenchen.de> Hi, > I have two nodes, each fenceable through a Dell RAC card. When I power > cycle one of them, it reboots ... and proceeds to fence the other one! Do you have the cluster Communication and the RAC card's on the same subnet? There is some hidden hint in the docu that on a two node cluster both cluster communication and fencing devices must be on the same network. I had similar symptoms as long as I tried cluster comm on fencing on different subnet in a two node cluster. > (btw should cman be started in init.d automatically? or should it be It should be started automatically. Sincerly, Klaus -- Klaus Steinberger Beschleunigerlaboratorium Phone: (+49 89)289 14287 Am Coulombwall 6, D-85748 Garching, Germany FAX: (+49 89)289 14280 EMail: Klaus.Steinberger at Physik.Uni-Muenchen.DE URL: http://www.physik.uni-muenchen.de/~Klaus.Steinberger/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2002 bytes Desc: not available URL: From dist-list at LEXUM.UMontreal.CA Sat May 24 13:04:50 2008 From: dist-list at LEXUM.UMontreal.CA (FM) Date: Sat, 24 May 2008 09:04:50 -0400 Subject: [Linux-cluster] kernel panic umounting GFS FS Message-ID: <48381272.1010003@lexum.umontreal.ca> well, today, I try to unmount GFS form one node to update it (for the latest kernel). All nodes had a kernel panic. Here is the stack : May 24 07:23:09 ancona kernel: CMAN: too many transition restarts - will die May 24 07:23:09 ancona kernel: CMAN: we are leaving the cluster. Inconsistent cluster view May 24 07:23:09 ancona kernel: WARNING: dlm_emergency_shutdown May 24 07:23:09 ancona clurgmgrd[4604]: #67: Shutting down uncleanly May 24 07:23:09 ancona kernel: WARNING: dlm_emergency_shutdown May 24 07:23:09 ancona kernel: SM: 00000006 sm_stop: SG still joined May 24 07:23:09 ancona kernel: SM: 01000008 sm_stop: SG still joined May 24 07:23:09 ancona kernel: SM: 02000014 sm_stop: SG still joined May 24 07:23:09 ancona kernel: SM: 0300000a sm_stop: SG still joined May 24 07:23:09 ancona ccsd[3732]: Cluster manager shutdown. Attemping to reconnect... May 24 07:23:10 ancona kernel: dlm: dlm_unlock: lkid 947100ed lockspace not found May 24 07:23:10 ancona kernel: nval 91ed0131 fr 8 r 8 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 921a00a3 fr 8 r 8 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 924b0156 fr 7 r 7 2 May 24 07:23:10 ancona kernel: home send einval to 7 May 24 07:23:10 ancona kernel: home (3942) req reply einval 934f0161 fr 1 r 1 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 90a603ad fr 8 r 8 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 92b600d0 fr 4 r 4 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 915b02a7 fr 5 r 5 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 935b0262 fr 5 r 5 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 922d0261 fr 5 r 5 2 May 24 07:23:10 ancona kernel: home send einval to 2 May 24 07:23:10 ancona kernel: home send einval to 8 May 24 07:23:10 ancona kernel: home (3942) req reply einval 92b00008 fr 7 r 7 2 May 24 07:23:10 ancona kernel: home send einval to 7 May 24 07:23:10 ancona kernel: home (3942) req reply einval 92ca0337 fr 1 r 1 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 932d0128 fr 1 r 1 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 9276022a fr 7 r 7 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 94a90311 fr 8 r 8 2 May 24 07:23:10 ancona kernel: home (3942) req reply einval 93ec0156 fr 8 r 8 2 May 24 07:23:10 ancona kernel: 3931 pr_start last_stop 0 last_start 6 last_finish 0 May 24 07:23:10 ancona kernel: 3931 pr_start count 7 type 2 event 6 flags 250 May 24 07:23:10 ancona kernel: 3931 claim_jid 4 May 24 07:23:10 ancona kernel: 3931 pr_start 6 done 1 May 24 07:23:10 ancona kernel: 3931 pr_finish flags 5a May 24 07:23:10 ancona kernel: 3916 recovery_done jid 4 msg 309 a May 24 07:23:10 ancona kernel: 3916 recovery_done nodeid 6 flg 18 May 24 07:23:10 ancona kernel: 3930 pr_start last_stop 6 last_start 8 last_finish 6 May 24 07:23:10 ancona kernel: 3930 pr_start count 8 type 2 event 8 flags 21a May 24 07:23:10 ancona kernel: 3930 pr_start 8 done 1 May 24 07:23:10 ancona kernel: 3930 pr_finish flags 1a May 24 07:23:10 ancona kernel: May 24 07:23:10 ancona kernel: lock_dlm: Assertion failed on line 361 of file /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c May 24 07:23:10 ancona kernel: lock_dlm: assertion: "!error || (plock && error == -EINPROGRESS)" May 24 07:23:10 ancona kernel: lock_dlm: time = 2227212828 May 24 07:23:10 ancona kernel: home: error=-22 num=5,9641cab lkf=0 flags=84 May 24 07:23:10 ancona kernel: May 24 07:23:10 ancona kernel: ------------[ cut here ]------------ May 24 07:23:10 ancona kernel: kernel BUG at /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c:361! May 24 07:23:10 ancona kernel: invalid operand: 0000 [#1] May 24 07:23:10 ancona kernel: SMP May 24 07:23:11 ancona kernel: Modules linked in: autofs4 lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc arpt_mangle arptable_filter arp_tables dm_mirror dm_round_robin dm_multipath button battery ac ohci_hcd tg3 bonding(U) floppy sg st ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc cciss sd_mod scsi_mod May 24 07:23:11 ancona kernel: CPU: 1 May 24 07:23:11 ancona kernel: EIP: 0060:[] Not tainted VLI May 24 07:23:11 ancona kernel: EFLAGS: 00010246 (2.6.9-67.0.7.ELsmp) May 24 07:23:11 ancona kernel: EIP is at do_dlm_unlock+0xa9/0xbf [lock_dlm] May 24 07:23:11 ancona kernel: eax: 00000001 ebx: e82b5b80 ecx: f6cdbef0 edx: f8aed2d3 May 24 07:23:11 ancona kernel: esi: ffffffea edi: 00000000 ebp: f8a53000 esp: f6cdbeec May 24 07:23:11 ancona kernel: ds: 007b es: 007b ss: 0068 May 24 07:23:11 ancona kernel: Process gfs_glockd (pid: 3939, threadinfo=f6cdb000 task=f70d19b0) May 24 07:23:11 ancona kernel: Stack: f8aed2d3 f8a53000 00000003 e82b5b80 f8ae88b2 f8b48ede 00000001 ea3e7874 May 24 07:23:11 ancona kernel: ea3e7858 f8b3ed63 f8b75e60 ed9fcac0 ea3e7858 f8b75e60 e6fcf8fc f8b3e257 May 24 07:23:11 ancona kernel: ea3e7858 00000001 ea3e7858 f8b3e30e ea3e7874 00000000 f8b3f5f2 00000000 May 24 07:23:11 ancona kernel: Call Trace: May 24 07:23:11 ancona kernel: [] lm_dlm_unlock+0x14/0x1c [lock_dlm] May 24 07:23:11 ancona kernel: [] gfs_lm_unlock+0x2c/0x42 [gfs] May 24 07:23:11 ancona kernel: [] gfs_glock_drop_th+0xf3/0x12d [gfs] May 24 07:23:11 ancona kernel: [] rq_demote+0x7f/0x98 [gfs] May 24 07:23:11 ancona kernel: [] run_queue+0x5a/0xc1 [gfs] May 24 07:23:11 ancona kernel: [] gfs_glock_dq+0x15f/0x16e [gfs] May 24 07:23:11 ancona kernel: [] gfs_glock_dq_uninit+0x8/0x10 [gfs] May 24 07:23:11 ancona kernel: [] gfs_inode_destroy+0x8e/0xbf [gfs] May 24 07:23:11 ancona kernel: [] gfs_reclaim_glock+0xa2/0x13c [gfs] May 24 07:23:11 ancona kernel: [] gfs_glockd+0x39/0xde [gfs] May 24 07:23:11 ancona kernel: [] default_wake_function+0x0/0xc May 24 07:23:11 ancona kernel: [] ret_from_fork+0x6/0x14 May 24 07:23:11 ancona kernel: [] default_wake_function+0x0/0xc May 24 07:23:11 ancona kernel: [] gfs_glockd+0x0/0xde [gfs] May 24 07:23:11 ancona kernel: [] kernel_thread_helper+0x5/0xb May 24 07:23:11 ancona kernel: Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 ff 70 18 68 ef d3 ae f8 e8 de a2 63 c7 83 c4 34 68 d3 d2 ae f8 e8 d1 a2 63 c7 <0f> 0b 69 01 1b d2 ae f8 68 d5 d2 ae f8 e8 8c 9a 63 c7 5b 5e 5f May 24 07:23:11 ancona kernel: <0>Fatal exception: panic in 5 seconds May 24 07:23:11 ancona kernel: dlm: dlm_lock: no lockspace May 24 07:23:12 ancona kernel: nval 91ed0131 fr 8 r 8 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 921a00a3 fr 8 r 8 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 924b0156 fr 7 r 7 2 May 24 07:23:12 ancona kernel: home send einval to 7 May 24 07:23:12 ancona kernel: home (3942) req reply einval 934f0161 fr 1 r 1 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 90a603ad fr 8 r 8 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 92b600d0 fr 4 r 4 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 915b02a7 fr 5 r 5 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 935b0262 fr 5 r 5 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 922d0261 fr 5 r 5 2 May 24 07:23:12 ancona kernel: home send einval to 2 May 24 07:23:12 ancona kernel: home send einval to 8 May 24 07:23:12 ancona kernel: home (3942) req reply einval 92b00008 fr 7 r 7 2 May 24 07:23:12 ancona kernel: home send einval to 7 May 24 07:23:12 ancona kernel: home (3942) req reply einval 92ca0337 fr 1 r 1 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 932d0128 fr 1 r 1 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 9276022a fr 7 r 7 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 94a90311 fr 8 r 8 2 May 24 07:23:12 ancona kernel: home (3942) req reply einval 93ec0156 fr 8 r 8 2 May 24 07:23:12 ancona kernel: 3931 pr_start last_stop 0 last_start 6 last_finish 0 May 24 07:23:12 ancona kernel: 3931 pr_start count 7 type 2 event 6 flags 250 May 24 07:23:12 ancona kernel: 3931 claim_jid 4 May 24 07:23:12 ancona kernel: 3931 pr_start 6 done 1 May 24 07:23:12 ancona kernel: 3931 pr_finish flags 5a May 24 07:23:12 ancona kernel: 3916 recovery_done jid 4 msg 309 a May 24 07:23:12 ancona kernel: 3916 recovery_done nodeid 6 flg 18 May 24 07:23:12 ancona kernel: 3930 pr_start last_stop 6 last_start 8 last_finish 6 May 24 07:23:12 ancona kernel: 3930 pr_start count 8 type 2 event 8 flags 21a May 24 07:23:12 ancona kernel: 3930 pr_start 8 done 1 May 24 07:23:12 ancona kernel: 3930 pr_finish flags 1a May 24 07:23:12 ancona kernel: May 24 07:23:12 ancona kernel: lock_dlm: Assertion failed on line 432 of file /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c May 24 07:23:13 ancona kernel: lock_dlm: assertion: "!error" May 24 07:23:13 ancona kernel: lock_dlm: time = 2227213341 May 24 07:23:13 ancona kernel: home: num=2,7a2ec26 err=-22 cur=-1 req=3 lkf=10000 May 24 07:23:13 ancona kernel: May 24 07:23:13 ancona kernel: ------------[ cut here ]------------ May 24 07:23:13 ancona kernel: kernel BUG at /builddir/build/BUILD/gfs-kernel-2.6.9-75/smp/src/dlm/lock.c:432! May 24 07:23:13 ancona kernel: invalid operand: 0000 [#2] May 24 07:23:13 ancona kernel: SMP May 24 07:23:13 ancona kernel: Modules linked in: autofs4 lock_dlm(U) gfs(U) lock_harness(U) dlm(U) cman(U) md5 ipv6 sunrpc arpt_mangle arptable_filter arp_tables dm_mirror dm_round_robin dm_multipath button battery ac ohci_hcd tg3 bonding(U) floppy sg st ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc cciss sd_mod scsi_mod May 24 07:23:13 ancona kernel: CPU: 0 May 24 07:23:13 ancona kernel: EIP: 0060:[] Not tainted VLI May 24 07:23:13 ancona kernel: EFLAGS: 00010246 (2.6.9-67.0.7.ELsmp) May 24 07:23:13 ancona kernel: EIP is at do_dlm_lock+0x134/0x14e [lock_dlm] May 24 07:23:13 ancona kernel: eax: 00000001 ebx: ffffffea ecx: ee2f5c34 edx: f8aed2d3 May 24 07:23:13 ancona kernel: esi: f8ae87b7 edi: f7e4cc00 ebp: e5fa9280 esp: ee2f5c30 May 24 07:23:13 ancona kernel: ds: 007b es: 007b ss: 0068 May 24 07:23:13 ancona kernel: Process httpd (pid: 10812, threadinfo=ee2f5000 task=f4f6b830) May 24 07:23:13 ancona kernel: Stack: f8aed2d3 20202020 32202020 20202020 20202020 32613720 36326365 f8b30018 May 24 07:23:13 ancona kernel: 00000246 e5fa9280 00000003 00000000 e5fa9280 f8ae8847 00000003 f8af0c80 May 24 07:23:13 ancona kernel: f8a53000 f8b48e9a 00000008 00000001 e78e546c e78e5450 f8a53000 f8b3ea9a May 24 07:23:13 ancona kernel: Call Trace: May 24 07:23:13 ancona kernel: [] gfs_acl_validate_set+0x18/0x8d [gfs] May 24 07:23:13 ancona kernel: [] lm_dlm_lock+0x49/0x52 [lock_dlm] May 24 07:23:13 ancona kernel: [] gfs_lm_lock+0x35/0x4d [gfs] May 24 07:23:13 ancona kernel: [] gfs_glock_xmote_th+0x130/0x172 [gfs] May 24 07:23:13 ancona kernel: [] rq_promote+0xc8/0x147 [gfs] May 24 07:23:13 ancona kernel: [] run_queue+0x91/0xc1 [gfs] May 24 07:23:13 ancona kernel: [] gfs_glock_nq+0xcf/0x116 [gfs] May 24 07:23:13 ancona kernel: [] gfs_glock_nq_init+0x13/0x26 [gfs] May 24 07:23:13 ancona kernel: [] gfs_lookupi+0x321/0x3bf [gfs] May 24 07:23:13 ancona kernel: [] gfs_lookup+0x83/0xfb [gfs] From mpartio at gmail.com Sun May 25 16:40:16 2008 From: mpartio at gmail.com (Mikko Partio) Date: Sun, 25 May 2008 19:40:16 +0300 Subject: [Linux-cluster] Problems with gfs_grow In-Reply-To: <1211565013.10437.119.camel@technetium.msp.redhat.com> References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com> <1211565013.10437.119.camel@technetium.msp.redhat.com> Message-ID: <2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com> On Fri, May 23, 2008 at 8:50 PM, Bob Peterson wrote: > Hi Mikko, > > On Fri, 2008-05-23 at 09:04 +0300, Mikko Partio wrote: > > Hello all, > > > > I tried to expand my gfs filesystem from 1,5TB to 2TB. I added the new > > 500G disk to volume manager etc, and finally run gfs_grow. The command > > finished without warnings, but a few seconds after that my cluster > > crashed with "Kernel Panic - not syncing. Fatal exception". When I got > > the cluster up again and executed gfs_fsck on the filesystem I get > > this error: > > > > sh-3.1# gfs_fsck -v /dev/xxx-vg/xxx-lv > > Initializing fsck > > Initializing lists... > > Initializing special inodes... > > Validating Resource Group index. > > Level 1 check. > > 5167 resource groups found. > > (passed) > > Setting block ranges... > > Can't seek to last block in file system: 4738147774 > > Unable to determine the boundaries of the file system. > > Freeing buffers. > > You've probably hit the gfs_grow bug described in bz #434962 (436383) > and the gfs_fsck bug described in 440897 (440896). My apologies if > you can't read them; permissions to individual bugzilla records are > out of my control. It's not guaranteed to be your problem, but it > sounds similar. > > The fixes are available in the recently released RHEL5.2, although > I don't know when they'll hit Centos. The fixes are also available > in the latest cluster git tree if you want to compile/install them > from source code yourself. Documentation for doing this can > be found at: http://sources.redhat.com/cluster/wiki/ClusterGit > Hi Bob and thanks for you reply. So, what I should do is to upgrade to 5.2 and then run gfs_fsck on the filesystem? Regards Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpartio at gmail.com Mon May 26 05:24:20 2008 From: mpartio at gmail.com (Mikko Partio) Date: Mon, 26 May 2008 08:24:20 +0300 Subject: [Linux-cluster] Problems with gfs_grow In-Reply-To: <2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com> References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com> <1211565013.10437.119.camel@technetium.msp.redhat.com> <2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com> Message-ID: <2ca799770805252224i7019fdfatd064f1ab158ad791@mail.gmail.com> On Sun, May 25, 2008 at 7:40 PM, Mikko Partio wrote: > The fixes are available in the recently released RHEL5.2, although >> I don't know when they'll hit Centos. The fixes are also available >> in the latest cluster git tree if you want to compile/install them >> from source code yourself. Documentation for doing this can >> be found at: http://sources.redhat.com/cluster/wiki/ClusterGit >> > > Hi Bob and thanks for you reply. > > So, what I should do is to upgrade to 5.2 and then run gfs_fsck on the > filesystem? > > Seeing that CentOS 5.2 is not released yet, I decided to take the git way. I have never used it before so I'm not sure if I'm doing everything correctly, but it seems that a compiled version from RHEL52 branch does not fix the issue (details below). Would the HEAD version of gfs_fsck do any better? Regards Mikko sh-3.1$ ../git checkout my52 Already on "my52" sh-3.1$ cd gfs sh-3.1$ ./configure Configuring Makefiles for your system... Completed Makefile configuration sh-3.1$ cd gfs_fsck sh-3.1$ make gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 main.c -o main.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 initialize.c -o initialize.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass1.c -o pass1.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass1b.c -o pass1b.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass1c.c -o pass1c.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass2.c -o pass2.o pass2.c: In function 'build_rooti': pass2.c:533: warning: pointer targets in initialization differ in signedness pass2.c:540: warning: pointer targets in initialization differ in signedness gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass3.c -o pass3.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass4.c -o pass4.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 pass5.c -o pass5.o pass5.c: In function 'check_block_status': pass5.c:188: warning: pointer targets in assignment differ in signedness pass5.c:190: warning: pointer targets in assignment differ in signedness gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 block_list.c -o block_list.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 super.c -o super.o super.c: In function 'gfs_rgindex_calculate': super.c:1023: warning: pointer targets in passing argument 2 of 'hexdump' differ in signedness super.c: In function 'ri_update': super.c:1098: warning: pointer targets in passing argument 3 of 'gfs_rgindex_calculate' differ in signedness super.c:1107: warning: pointer targets in passing argument 3 of 'gfs_rgindex_rebuild' differ in signedness super.c: In function 'gfs_rgindex_calculate': super.c:899: warning: 'length' may be used uninitialized in this function super.c:899: warning: 'addr' may be used uninitialized in this function super.c: In function 'gfs_rgindex_rebuild': super.c:683: warning: 'end_block' may be used uninitialized in this function gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 bio.c -o bio.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 ondisk.c -o ondisk.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 file.c -o file.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 rgrp.c -o rgrp.o rgrp.c: In function 'fs_rgrp_recount': rgrp.c:329: warning: pointer targets in passing argument 1 of 'fs_bitcount' differ in signedness gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_bits.c -o fs_bits.o fs_bits.c: In function 'fs_get_bitmap': fs_bits.c:297: warning: pointer targets in assignment differ in signedness fs_bits.c: In function 'fs_set_bitmap': fs_bits.c:354: warning: pointer targets in passing argument 1 of 'fs_setbit' differ in signedness gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 util.c -o util.o util.c: In function 'next_rg_meta': util.c:173: warning: pointer targets in passing argument 1 of 'fs_bitfit' differ in signedness util.c: In function 'next_rg_meta_free': util.c:226: warning: pointer targets in passing argument 1 of 'fs_bitfit' differ in signedness util.c:229: warning: pointer targets in passing argument 1 of 'fs_bitfit' differ in signedness gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_bmap.c -o fs_bmap.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_inode.c -o fs_inode.o fs_inode.c: In function 'fs_mkdir': fs_inode.c:519: warning: pointer targets in assignment differ in signedness gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_dir.c -o fs_dir.o fs_dir.c: In function 'leaf_search': fs_dir.c:298: warning: pointer targets in passing argument 1 of 'gfs_dir_hash' differ in signedness fs_dir.c: In function 'linked_leaf_search': fs_dir.c:385: warning: pointer targets in passing argument 1 of 'gfs_dir_hash' differ in signedness fs_dir.c: In function 'dir_e_add': fs_dir.c:1259: warning: pointer targets in passing argument 1 of 'gfs_dir_hash' differ in signedness fs_dir.c: In function 'dir_l_add': fs_dir.c:1456: warning: pointer targets in passing argument 1 of 'gfs_dir_hash' differ in signedness fs_dir.c: In function 'fs_dir_search': fs_dir.c:467: warning: 'bh' may be used uninitialized in this function gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 fs_recovery.c -o fs_recovery.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 log.c -o log.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 hash.c -o hash.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 inode_hash.c -o inode_hash.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 bitmap.c -o bitmap.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 lost_n_found.c -o lost_n_found.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 inode.c -o inode.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 link.c -o link.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 metawalk.c -o metawalk.o gcc -MMD -c -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 eattr.c -o eattr.o gcc -Wall -I../include -I../config -D_FILE_OFFSET_BITS=64 -DHELPER_PROGRAM -DGFS_RELEASE_NAME=\"DEVEL.1211779210\" -Wall -O2 main.o initialize.o pass1.o pass1b.o pass1c.o pass2.o pass3.o pass4.o pass5.o block_list.o super.o bio.o ondisk.o file.o rgrp.o fs_bits.o util.o fs_bmap.o fs_inode.o fs_dir.o fs_recovery.o log.o hash.o inode_hash.o bitmap.o lost_n_found.o inode.o link.o metawalk.o eattr.o -o gfs_fsck sh-3.1$ ./gfs_fsck -V GFS fsck DEVEL.1211779210 (built May 26 2008 08:20:46) Copyright (C) Red Hat, Inc. 2004-2005 All rights reserved. sh-3.1$ sudo ./gfs_fsck -v /dev/xxx-vg/xxx-lv Password: Initializing fsck Initializing lists... Initializing special inodes... Validating Resource Group index. Level 1 check. 5167 resource groups found. (passed) Setting block ranges... Can't seek to last block in file system: 4738147774 Unable to determine the boundaries of the file system. Freeing buffers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mpartio at gmail.com Mon May 26 07:31:23 2008 From: mpartio at gmail.com (Mikko Partio) Date: Mon, 26 May 2008 10:31:23 +0300 Subject: [Linux-cluster] Problems with gfs_grow In-Reply-To: <2ca799770805252224i7019fdfatd064f1ab158ad791@mail.gmail.com> References: <2ca799770805222304p385a9933t46574dec8f928a4@mail.gmail.com> <1211565013.10437.119.camel@technetium.msp.redhat.com> <2ca799770805250940o6d0d5cf9t5b9ed45c2227cda@mail.gmail.com> <2ca799770805252224i7019fdfatd064f1ab158ad791@mail.gmail.com> Message-ID: <2ca799770805260031j6122421dhb38ceefead2739d9@mail.gmail.com> On Mon, May 26, 2008 at 8:24 AM, Mikko Partio wrote: > > > On Sun, May 25, 2008 at 7:40 PM, Mikko Partio wrote: > >> The fixes are available in the recently released RHEL5.2, although >>> I don't know when they'll hit Centos. The fixes are also available >>> in the latest cluster git tree if you want to compile/install them >>> from source code yourself. Documentation for doing this can >>> be found at: http://sources.redhat.com/cluster/wiki/ClusterGit >>> >> >> Hi Bob and thanks for you reply. >> >> So, what I should do is to upgrade to 5.2 and then run gfs_fsck on the >> filesystem? >> >> > Seeing that CentOS 5.2 is not released yet, I decided to take the git way. > I have never used it before so I'm not sure if I'm doing everything > correctly, but it seems that a compiled version from RHEL52 branch does not > fix the issue (details below). Would the HEAD version of gfs_fsck do any > better? > Sorry to continue this monologue, but I got the issue resolved. I compiled another version of gfs_fsck ("master" in git) and it immediately found rindex errors from the filesystem. Now everything *seems* to be ok. Thanks for your help! Regards Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From shajie_ahmed at yahoo.com Mon May 26 08:20:12 2008 From: shajie_ahmed at yahoo.com (shajie ahmed) Date: Mon, 26 May 2008 01:20:12 -0700 (PDT) Subject: [Linux-cluster] Regarding RHEL 4 cluster failover Message-ID: <325862.30726.qm@web37402.mail.mud.yahoo.com> Hi , I am using cluster suite in RHEL 4 for a two -node cluster , using ILO for fencing. I have configured and running my services on it. I have two questions to ask -- Q1 :: How can I set the maximum number or restarts for a service ? If a service has failed on one node and cluster is trying to restart it and for any reason if the service does not starts on that node. How can I set the cluster to relocate the service to other node after a fixed number or restarts?? Q2. When power cable gets faulty?? If for any reason one node goes out of power supply then the services running on it are not relocated to the other node and even the other node is unaware of the failure of that node .....what can I do for this?? Please suggest. Regards Syed Shajie Ahmed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Alain.Moulle at bull.net Mon May 26 08:33:11 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Mon, 26 May 2008 10:33:11 +0200 Subject: [Linux-cluster] CS5 still problem "Node x is undead" (contd.) Message-ID: <483A75C7.60003@bull.net> Hi As told before, the patch : http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 does not solve the problem for my configuration ... Just an idea/question : could this problem be also linked to the defaut value of token ? Or has it nothing to do with it ? Because currently, I have this problem with a Quorum disk configured and no token record in cluster.conf, so token is at its default value ... ??? Thanks Regards Alain Moull? > Hi Lon > I've applied the patch (see resulting code below) but the patch > does not solve the problem. > Is there another patch linked to this problem ? > Thanks > Regards > Alain Moull? < >>>> when testing a two-nodes cluster with quorum disk, when >>>> I poweroff the node1 , node 2 fences well the node 1 and >>>> failovers the service, but in log of node 2 I have before and after >>>> the fence success messages many messages like this: >>>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >>>> Apr 24 11:30:04 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >>>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >>>> Apr 24 11:30:05 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >>>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >>>> Apr 24 11:30:06 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >>>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: Node 2 is undead. >>>> Apr 24 11:30:07 s_sys at xn3 qdiskd[13740]: Writing eviction notice for node 2 >>>> Apr 24 11:30:08 s_sys at xn3 qdiskd[13740]: Node 2 is undead. http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 Resulting code after patch application in cman/qdisk/main.c : =========================================================== Transition from Online -> Evicted */ if (ni[x].ni_misses > ctx->qc_tko && state_run(ni[x].ni_status.ps_state)) { /* Mark our internal views as dead if nodes miss too many heartbeats... This will cause a master transition if no live master exists. */ if (ni[x].ni_status.ps_state >= S_RUN && ni[x].ni_seen) { clulog(LOG_DEBUG, "Node %d DOWN\n", ni[x].ni_status.ps_nodeid); ni[x].ni_seen = 0; } ni[x].ni_state = S_EVICT; ni[x].ni_status.ps_state = S_EVICT; ni[x].ni_evil_incarnation = ni[x].ni_status.ps_incarnation; /* Write eviction notice if we're the master. */ if (ctx->qc_status == S_MASTER) { clulog(LOG_NOTICE, "Writing eviction notice for node %d\n", ni[x].ni_status.ps_nodeid); qd_write_status(ctx, ni[x].ni_status.ps_nodeid, S_EVICT, NULL, NULL, NULL); if (ctx->qc_flags & RF_ALLOW_KILL) { clulog(LOG_DEBUG, "Telling CMAN to " "kill the node\n"); cman_kill_node(ctx->qc_ch, ni[x].ni_status.ps_nodeid); } } /* Clear our master mask for the node after eviction */ if (mask) clear_bit(mask, (ni[x].ni_status.ps_nodeid-1), sizeof(memb_mask_t)); continue; } ------------------------------ From pmshehzad at yahoo.com Mon May 26 13:45:24 2008 From: pmshehzad at yahoo.com (Mshehzad Pankhawala) Date: Mon, 26 May 2008 06:45:24 -0700 (PDT) Subject: [Linux-cluster] How to Configuring DRBD on already Mounted Disk Message-ID: <184977.74826.qm@web45814.mail.sp1.yahoo.com> Hi Sir, We are intermediate for DRBD   DRBD setup carefully and it is ok. I want to setup server (Main Server - serving mail service) serving the postfix mail service  which is using my /home directory and mounted on /dev/sda5. I want this /dev/sda5 (/home) to be used in the DRBD, and evenif it is mounted on /home. But when I start drbd service #service drbd start It start with diskless mode. Starting DRBD resources:    [ d0 Failure: (114) Lower device is already claimed. This usually means it is mounted. cmd /sbin/drbdsetup /dev/drbd0 disk /dev/sda5 /dev/sda5 internal --set-defaults --create-device  failed - continuing! s0 n0 ]. ........... cmd /sbin/drbdsetup /dev/drbd0 disk /dev/sda5 internal --set-defaults --create-device --on-io-error=detach failed - continuing!   My kernel version 2.6.18 Is it possible to use my mounted disk (/dev/sda5) evenif it is mounted. If Yes. Then please tell me how? Thanks. MohammedShehzad Pankhawala Check out the all-new face of Yahoo! India. Go to http://in..yahoo.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From corey.kovacs at gmail.com Mon May 26 23:56:18 2008 From: corey.kovacs at gmail.com (Corey Kovacs) Date: Tue, 27 May 2008 00:56:18 +0100 Subject: [Linux-cluster] Cluster NFS operation... Message-ID: <7d6e8da40805261656q4d0113fdhc5cbea3b8aabb512@mail.gmail.com> After setting up my 5 node RHEL5.2 cluster I began some testing of the NFS failover capabilities. My config is simple... gfs2 /home filesystem gfs2 /apps filesystem gfs2 /projects filesystem I've tried both managed IP's and managed NFS services for failover. Both seem to have problems handling "failback" in my case. The NFS services consist of the following model. service name IP Address GFS FS NFS Export NFS Client The three services are spread across three of the 5 nodes. Reading the man page for clurmtabd reveals that it is supposed to maintain the client states and "merge" rmtab entries etc to prevent stale filehandles etc. The clients are RHEl 4.6 using automounted nfs. The clients are requesting nfs ver 3, and tcp, with the hard and intr flags. THings seem to work fine for an initial failover, but when I try to failback, things hang I am planning on using this cluster to replace an aging alpha cluster running Tru64/TruCluster. So I guess my questions are.. 1. Is this a known issue? 2. Is there a document other than the nfscookbook from R.P. or at least a version thats been updated in the last year (if somethings changed that is) 3. How, when simply floating IP addresses as outlined in the cookbook, does the rmtab get managed since no NFS service is configured? Any help in understanding these issues would be appreciated. -C From yamato at redhat.com Tue May 27 06:18:24 2008 From: yamato at redhat.com (Masatake YAMATO) Date: Tue, 27 May 2008 15:18:24 +0900 (JST) Subject: [Linux-cluster] [PATCH] checking NULL pointer in device_write of dlm-control Message-ID: <20080527.151824.252812822.yamato@redhat.com> Hi, (This list is good place to submit a patch? If not, please let me know where I should do.) I found a way to let linux dereference NULL pointer in gfs2-2.6-nmw/fs/dlm/user.c. If `device_write' method is called via "dlm-control", file->private_data is NULL. (See ctl_device_open() in user.c. ) Through proc->flags is read: if ((kbuf->cmd == DLM_USER_LOCK || kbuf->cmd == DLM_USER_UNLOCK) && test_bit(DLM_PROC_FLAGS_CLOSING, &proc->flags)) return -EINVAL; It causes following message on my Fedora 9 PC: BUG: unable to handle kernel NULL pointer dereference at 00000004 IP: [] :dlm:device_write+0xa5/0x432 *pde = 7f11b067 Oops: 0000 [#2] SMP Modules linked in: ... Pid: 26899, comm: a.out Tainted: G D (2.6.25-14.fc9.i686 #1) EIP: 0060:[] EFLAGS: 00210297 CPU: 1 EIP is at device_write+0xa5/0x432 [dlm] EAX: f66ad200 EBX: f66ad280 ECX: 00000000 EDX: 00000006 ESI: 00000064 EDI: 00000000 EBP: c8a45f70 ESP: c8a45f44 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process a.out (pid: 26899, ti=c8a45000 task=c8a72000 task.ti=c8a45000) Stack: bfe90b34 f66ad280 00000000 c8a45f58 c04cc41c c8a45f70 c0482bd5 00000001 def7a0c0 f8f5553a 00000064 c8a45f90 c04832bb c8a45f9c bfe90b34 c04817e7 def7a0c0 fffffff7 080483a0 c8a45fb0 c04833f8 c8a45f9c 00000000 00000000 Call Trace: [] ? security_file_permission+0xf/0x11 [] ? rw_verify_area+0x76/0x97 [] ? device_write+0x0/0x432 [dlm] [] ? vfs_write+0x8a/0x12e [] ? do_sys_open+0xab/0xb5 [] ? sys_write+0x3b/0x60 [] ? syscall_call+0x7/0xb [] ? acpi_pci_root_add+0x22f/0x2a0 ======================= Code: EIP: [] device_write+0xa5/0x432 [dlm] SS:ESP 0068:c8a45f44 ---[ end trace 74c3a9c3bd1a789d ]--- [yamato at localhost dlm-crash]$ Here is a patch. Signed-off-by: Masatake YAMATO diff --git a/fs/dlm/user.c b/fs/dlm/user.c index ebbcf38..1aa76b3 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -538,7 +538,7 @@ static ssize_t device_write(struct file *file, const char __user *buf, /* do we really need this? can a write happen after a close? */ if ((kbuf->cmd == DLM_USER_LOCK || kbuf->cmd == DLM_USER_UNLOCK) && - test_bit(DLM_PROC_FLAGS_CLOSING, &proc->flags)) + (proc && test_bit(DLM_PROC_FLAGS_CLOSING, &proc->flags))) return -EINVAL; sigfillset(&allsigs); From Alain.Moulle at bull.net Tue May 27 13:30:57 2008 From: Alain.Moulle at bull.net (Alain Moulle) Date: Tue, 27 May 2008 15:30:57 +0200 Subject: [Linux-cluster] CS5 / IP ressource with bonding ? Message-ID: <483C0D11.7070900@bull.net> Hi Is it possible to manage IP ressources linked to a bounded interface ? Or is there any known problem with that ? Thanks Regards Alain Moull? From nico at altiva.fr Tue May 27 13:55:36 2008 From: nico at altiva.fr (NM) Date: Tue, 27 May 2008 13:55:36 +0000 (UTC) Subject: [Linux-cluster] Re: Booting node 1 causes it to fence node 2 (NM) References: <20080523160008.98774619B25@hormel.redhat.com> <200805241031.36484.Klaus.Steinberger@physik.uni-muenchen.de> Message-ID: On Sat, 24 May 2008 10:31:33 +0200, Klaus Steinberger wrote: > Do you have the cluster Communication and the RAC card's on the same > subnet? There is some hidden hint in the docu that on a two node cluster > both cluster communication and fencing devices must be on the same > network. I had similar symptoms as long as I tried cluster comm on > fencing on different subnet in a two node cluster. I eventually solved this by upping the "post join delay" from 3s (default) to 20s (recommended in the doc). Why is the default value so low compared to the recommended value? This is weird. >> (btw should cman be started in init.d automatically? or should it be > It should be started automatically. Thanks. From teemu.m2 at luukku.com Tue May 27 17:05:43 2008 From: teemu.m2 at luukku.com (m.. mm..) Date: Tue, 27 May 2008 20:05:43 +0300 (EEST) Subject: [Linux-cluster] cluster name not in cluster.conf?? Message-ID: <1211907943851.teemu.m2.58786.h27Hz02gMOQpp4y6ciu3cw@luukku.com> Hi, How can is fix this problem. "cluster name not in cluster.conf" I get this info when i do start cman. I don't understand this because i have cluster name right in my cluster.conf Has somebody fix to this, this problem came with RedHat 5.1 version...i think i have 2 nodes here.. ................................................................... Luukku Plus paketilla p??set eroon tila- ja turvallisuusongelmista. Hanki Luukku Plus ja helpotat el?m??si. http://www.mtv3.fi/luukku From rpeterso at redhat.com Tue May 27 18:41:38 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 27 May 2008 13:41:38 -0500 Subject: [Linux-cluster] [PATCH] checking NULL pointer in device_write of dlm-control In-Reply-To: <20080527.151824.252812822.yamato@redhat.com> References: <20080527.151824.252812822.yamato@redhat.com> Message-ID: <1211913698.10437.137.camel@technetium.msp.redhat.com> On Tue, 2008-05-27 at 15:18 +0900, Masatake YAMATO wrote: > Hi, > > (This list is good place to submit a patch? > If not, please let me know where I should do.) Hi Mr. Masatake, The proper place to submit patches to the cluster code is the public cluster-devel mailing list. Please see: https://www.redhat.com/mailman/listinfo/cluster-devel Although a lot of us read both mailing lists. Regards, Bob Peterson Red Hat Clustering & GFS From rpeterso at redhat.com Tue May 27 18:43:32 2008 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 27 May 2008 13:43:32 -0500 Subject: [Linux-cluster] cluster name not in cluster.conf?? In-Reply-To: <1211907943851.teemu.m2.58786.h27Hz02gMOQpp4y6ciu3cw@luukku.com> References: <1211907943851.teemu.m2.58786.h27Hz02gMOQpp4y6ciu3cw@luukku.com> Message-ID: <1211913812.10437.138.camel@technetium.msp.redhat.com> Hi, On Tue, 2008-05-27 at 20:05 +0300, m.. mm.. wrote: > Hi, > > How can is fix this problem. "cluster name not in cluster.conf" > > I get this info when i do start cman. I don't understand this because i have cluster name right in my cluster.conf > > Has somebody fix to this, this problem came with RedHat 5.1 version...i think i have 2 nodes here.. Please post your cluster.conf file here (removing any passwords first, of course). Regards, Bob Peterson Red Hat Clustering & GFS From d.degroot at griffith.edu.au Wed May 28 01:57:06 2008 From: d.degroot at griffith.edu.au (Darrin De Groot) Date: Wed, 28 May 2008 12:57:06 +1100 Subject: [Linux-cluster] multipathed quorum disk Message-ID: Hi, I am running a 4 node cluster with a multipathed quorum disk, configured to use the path /dev/dm-1. The problem that I am having is that if I lose one path to the disk (am testing by pulling one fibre), the node is almost always fenced (one node, once, managed to stay up, out of more than 10 attempts). Is there some timeout that needs changing to give qdiskd the time to realise that a path is down? I have tried an interval of 3 seconds with at TKO of 10, with no success, and a token timeout set at 45000ms: output of mkqdisk -L: [root at host3 ~]# mkqdisk -L mkqdisk v0.5.1 /dev/sdc1: Magic: eb7a62c2 Label: cms_qdisk Created: Mon May 26 14:24:29 2008 Host: host3 /dev/sdd1: Magic: eb7a62c2 Label: cms_qdisk Created: Mon May 26 14:24:29 2008 Host: host3 /dev/dm-1: Magic: eb7a62c2 Label: cms_qdisk Created: Mon May 26 14:24:29 2008 Host: host3 When the node subsequently boots, with only one path, everything works just fine, so it can obviously use both paths. Is anyone able to offer any advice on why this is happening (and how to stop it)? Regards, Darrin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lhh at redhat.com Wed May 28 15:09:28 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 28 May 2008 11:09:28 -0400 Subject: [Linux-cluster] CS5 still problem "Node x is undead" (contd.) In-Reply-To: <483A75C7.60003@bull.net> References: <483A75C7.60003@bull.net> Message-ID: <1211987368.3174.178.camel@ayanami.boston.devel.redhat.com> On Mon, 2008-05-26 at 10:33 +0200, Alain Moulle wrote: > Hi > > As told before, the patch : > http://sources.redhat.com/git/?p=cluster.git;a=commit;h=b2686ffe984c517110b949d604c54a71800b67c9 > does not solve the problem for my configuration ... > > Just an idea/question : could this problem be also linked > to the defaut value of token ? Or has it nothing to do with it ? > Because currently, I have this problem with a Quorum disk > configured and no token record in cluster.conf, so token > is at its default value ... It could be - try setting it to 21000: (you can put the tag right below the tag). -- Lon From lhh at redhat.com Wed May 28 15:12:44 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 28 May 2008 11:12:44 -0400 Subject: [Linux-cluster] CS5 / IP ressource with bonding ? In-Reply-To: <483C0D11.7070900@bull.net> References: <483C0D11.7070900@bull.net> Message-ID: <1211987564.3174.183.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-05-27 at 15:30 +0200, Alain Moulle wrote: > Hi > > Is it possible to manage IP ressources linked to a bounded interface ? > > Or is there any known problem with that ? What do you mean - manage the only IP on an interface? Currently, no - you need to have an interface with an IP in the same subnet mask bound to it in order for the IP resource agent to select the appropriate interface. There was a patch floating around some time ago on the mailing list which allowed the specification of something like: ethernet_device="eth0" However, the IP resource agent does not perform routing tasks, and it's really not its job to do so, which is why the patch was rejected. -- Lon From lhh at redhat.com Wed May 28 15:45:40 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 28 May 2008 11:45:40 -0400 Subject: [Linux-cluster] multipathed quorum disk In-Reply-To: References: Message-ID: <1211989540.3174.216.camel@ayanami.boston.devel.redhat.com> On Wed, 2008-05-28 at 12:57 +1100, Darrin De Groot wrote: > > Hi, > > I am running a 4 node cluster with a multipathed quorum disk, > configured to use the path /dev/dm-1. The problem that I am having is > that if I lose one path to the disk (am testing by pulling one fibre), > the node is almost always fenced (one node, once, managed to stay up, > out of more than 10 attempts). Is there some timeout that needs > changing to give qdiskd the time to realise that a path is down? I > have tried an interval of 3 seconds with at TKO of 10, with no > success, and a token timeout set at 45000ms: > > token_retransmits_before_loss_const="20"/> > tko="10" votes="3"/> > As a general rule, you want qdiskd's timeout to exceed the path failover time with some time for the I/Os to get out after a path failover completes. As a general rule of thumb, totem's token timeout needs to approximately double the qdisk timeout. E.g.: [Note: Obviously, I think qdiskd should algorithmically determine fairly optimial timings based on the totem token timeout in the future. ] -- Lon From barbos at gmail.com Wed May 28 21:18:39 2008 From: barbos at gmail.com (Alex Kompel) Date: Wed, 28 May 2008 14:18:39 -0700 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <4834D68B.9010309@auckland.ac.nz> References: <20080521160013.8C3D6619BED@hormel.redhat.com> <4834D68B.9010309@auckland.ac.nz> Message-ID: <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com> On Wed, May 21, 2008 at 7:12 PM, Michael O'Sullivan wrote: > Hi Alex, > > We wanted an iSCSI SAN that has highly available data, hence the need for 2 > (or more storage devices) and a reliable storage network (omitted from the > diagram). Many of the articles I have read for iSCSI don't address > multipathing to the iSCSI devices, in our configuration iSCSI Disk 1 > presented as /dev/sdc and /dev/sdd on each server (and iSCSI Disk 2 > presented as /dev/sde and /dev/sdf), but it wan't clear how to let the > servers know that the two iSCSI portals attached to the same target - thus I > used mdadm. Also, I wanted to raid the iSCSI disks to make sure the data > stays highly available - thus the second use of mdadm. Now we had a single > iSCSI raid array spread over 2 (or more) devices which provides the iSCSI > SAN. However, I wanted to make sure the servers did not try to access the > same data simultaneously, so I used GFS to ensure correct use of the iSCSI > SAN. If I understand correctly it seems like the multipathing and raiding > may be possible in Red Hat Cluster Suite GFS without using iSCSI? Or to use > iSCSI with some other software to ensure proper locking happens for the > iSCSI raid array? I am reading the link you suggested to see what other > people have done, but as always any suggestions, etc are more than welcome. > I would not use multipath I/O with iSCSI unless you have specific reasons for doing so. iSCSI is only as highly-available as you network infrastructure allows it to be. If you have a full failover within the network then you don't need multipath. That simplifies configuration a lot. Provided your network core is fully redundant (both link and routing layers), you can connect 2 NICs on each server to separate switches and bond them (google for "channel bonding"). Once you have redundant network connection you can use the setup from the article I posted earlier. This will give you iSCSI endpoint failover. -Alex From lhh at redhat.com Wed May 28 21:35:20 2008 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 28 May 2008 17:35:20 -0400 Subject: [Linux-cluster] Cluster NFS operation... In-Reply-To: <7d6e8da40805261656q4d0113fdhc5cbea3b8aabb512@mail.gmail.com> References: <7d6e8da40805261656q4d0113fdhc5cbea3b8aabb512@mail.gmail.com> Message-ID: <1212010520.3174.233.camel@ayanami.boston.devel.redhat.com> On Tue, 2008-05-27 at 00:56 +0100, Corey Kovacs wrote: > service name > IP Address > GFS FS > NFS Export > NFS Client > > The three services are spread across three of the 5 nodes. Reading the > man page for clurmtabd reveals that it is supposed to > maintain the client states and "merge" rmtab entries etc to prevent > stale filehandles etc. > > The clients are RHEl 4.6 using automounted nfs. The clients are > requesting nfs ver 3, and tcp, with the hard and intr flags. > THings seem to work fine for an initial failover, but when I try to > failback, things hang * On 2.6 kernels including RHEL4, clurmtabd isn't used. * TCP takes 0-15 minutes to fail over or failback depending on the I/O pattern: https://bugzilla.redhat.com/show_bug.cgi?id=369991 This doesn't happen with UDP. > 2. Is there a document other than the nfscookbook from R.P. or at > least a version thats been updated in the last year (if somethings > changed that is) The general procedures haven't changed. -- Lon From ross at kallisti.us Wed May 28 21:37:13 2008 From: ross at kallisti.us (Ross Vandegrift) Date: Wed, 28 May 2008 17:37:13 -0400 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com> References: <20080521160013.8C3D6619BED@hormel.redhat.com> <4834D68B.9010309@auckland.ac.nz> <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com> Message-ID: <20080528213713.GB18367@kallisti.us> On Wed, May 28, 2008 at 02:18:39PM -0700, Alex Kompel wrote: > I would not use multipath I/O with iSCSI unless you have specific > reasons for doing so. iSCSI is only as highly-available as you network > infrastructure allows it to be. If you have a full failover within the > network then you don't need multipath. That simplifies configuration a > lot. Provided your network core is fully redundant (both link and > routing layers), you can connect 2 NICs on each server to separate > switches and bond them (google for "channel bonding"). Once you have > redundant network connection you can use the setup from the article I > posted earlier. This will give you iSCSI endpoint failover. This depends on a lot of things. In all of the iSCSI storage systems I'm familiar with, the same target is provided redundantly via different portal IPs. This provides failover in the case of an iscsi controller failing on the storage system. The network can be as redundant as you like, but without multipath, you won't survive a portal failure. If you bond between two different switches, you'll only be able to do failover between the NICs. If you use multipath, you can round-robin between them to provide a greater bandwidth overhead. I'd suggest using multipath. Check the open-iscsi documentation and mailing list archives for tips on tuning the timing for those pieces. -- Ross Vandegrift ross at kallisti.us "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 From barbos at gmail.com Wed May 28 23:16:55 2008 From: barbos at gmail.com (Alex Kompel) Date: Wed, 28 May 2008 16:16:55 -0700 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <20080528213713.GB18367@kallisti.us> References: <20080521160013.8C3D6619BED@hormel.redhat.com> <4834D68B.9010309@auckland.ac.nz> <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com> <20080528213713.GB18367@kallisti.us> Message-ID: <3ae027040805281616l1d9856f3o1f06ac2f7de51e6f@mail.gmail.com> On Wed, May 28, 2008 at 2:37 PM, Ross Vandegrift wrote: > On Wed, May 28, 2008 at 02:18:39PM -0700, Alex Kompel wrote: >> I would not use multipath I/O with iSCSI unless you have specific >> reasons for doing so. iSCSI is only as highly-available as you network >> infrastructure allows it to be. If you have a full failover within the >> network then you don't need multipath. That simplifies configuration a >> lot. Provided your network core is fully redundant (both link and >> routing layers), you can connect 2 NICs on each server to separate >> switches and bond them (google for "channel bonding"). Once you have >> redundant network connection you can use the setup from the article I >> posted earlier. This will give you iSCSI endpoint failover. > > This depends on a lot of things. In all of the iSCSI storage systems > I'm familiar with, the same target is provided redundantly via > different portal IPs. This provides failover in the case of an iscsi > controller failing on the storage system. The network can be as > redundant as you like, but without multipath, you won't survive a > portal failure. In this case the portal failure is handled by host failover mechanisms (heartbeat, RedHat cluster, etc) and connection failure is handled by the network layer. Sometimes you have to use multipath (for example, if there is no way to do transparent failover on storage controllers) but it adds extra complexity on the initiator side so if there is a way to avoid it why not do it? > If you bond between two different switches, you'll only be able to do > failover between the NICs. If you use multipath, you can round-robin > between them to provide a greater bandwidth overhead. Same goes for bonding: link aggregation with active-active bonding. -Alex From fdinitto at redhat.com Thu May 29 04:19:06 2008 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Thu, 29 May 2008 06:19:06 +0200 (CEST) Subject: [Linux-cluster] Cluster 2.03.03 released Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The cluster team and its vibrant community are proud to announce the 5th release from the STABLE2 branch: 2.03.03. The STABLE2 branch collects, on a daily base, all bug fixes and the bare minimal changes required to run the cluster on top of the most recent Linux kernel (2.6.25) and rock solid openais (0.80.3 or higher). The new source tarball can be downloaded here: ftp://sources.redhat.com/pub/cluster/releases/cluster-2.03.03.tar.gz In order to use GFS1, the Linux kernel requires a minimal patch: ftp://sources.redhat.com/pub/cluster/releases/lockproto-exports.patch To report bugs or issues: https://bugzilla.redhat.com/ Would you like to meet the cluster team or members of its community? Join us on IRC (irc.freenode.net #linux-cluster) and share your experience with other sysadministrators or power users. Happy clustering, Fabio Under the hood (from 2.03.02): Bob Peterson (1): bz 446085: Back-port faster bitfit algorithm from gfs2 for better David Teigland (1): gfs_controld: ignore write(2) return value on plock dev Fabio M. Di Nitto (16): [RGMANAGER] ^M's are good for DOS, bad for UNIX [FENCE] Rename bladecenter as it should be .pl -> .py [CCS] Make a bunch of functions static [BUILD] Stop using DEVEL.DATE library soname [BUILD] Set soname to 2.3 [BUILD] Move fencelib in /usr/share [BUILD] Allow users to set path to init.d [BUILD] Add --without_kernel_modules configure option [GFS] Sync with gfs2 init script [MISC] Cast some love to init scripts [CMAN] Fix path to cman_tool [INIT] Do not start services automatically [MISC] Update copyright [BUILD] Fix sparc #ifdef according to the new gcc tables [BUILD] Fix rg_test linking [BUILD] Fix install permissions Jonathan Brassow (1): ger/lvm.sh: HA LVM wasn't working on IA64 Lon Hohberger (2): [cman] Fix infinite loop in several daemons [rgmanager] Fix #441582 - symlinks in mount points causing failures Marc - A. Dahlhaus (1): [MISC] Add version string to -V options of dlm_tool and group deamons Marek 'marx' Grac (8): [FENCE] SSH support using stdin options [FENCE] Fix #435154: Support for 24 port APC fencing device [FENCE] Fix name of the option in fencing library [FENCE] Fix problem with different menu for admin/user for APC [FENCE] Fix typo in name of the exceptions in fencing agents [FENCE] Fix #248609: SSH support in Bladecenter fencing (ssh) [FENCE] Fix #446995: Parse error: Unknown option 'switch=3' [FENCE] Fix #447378 - fence_apc unable to connect via ssh to APC 7900 ccs/lib/libccs.c | 6 +- cman/init.d/cman.in | 27 +- cman/init.d/qdiskd | 19 +- cman/qdisk/disk_util.c | 2 +- configure | 35 +- dlm/tool/main.c | 4 +- fence/agents/apc/fence_apc.py | 112 ++- fence/agents/bladecenter/fence_bladecenter.pl | 90 -- fence/agents/bladecenter/fence_bladecenter.py | 90 ++ fence/agents/drac/fence_drac5.py | 4 +- fence/agents/ilo/fence_ilo.py | 4 +- fence/agents/lib/fencing.py.py | 55 +- fence/agents/scsi/scsi_reserve | 28 +- fence/agents/wti/fence_wti.py | 4 +- gfs-kernel/src/gfs/bits.c | 85 +- gfs-kernel/src/gfs/bits.h | 3 +- gfs-kernel/src/gfs/rgrp.c | 3 +- gfs/init.d/gfs | 15 +- gfs2/init.d/gfs2 | 15 +- group/daemon/main.c | 3 +- group/dlm_controld/main.c | 5 +- group/gfs_controld/main.c | 5 +- group/gfs_controld/plock.c | 6 +- group/test/clientd.c | 2 +- group/tool/main.c | 4 +- make/defines.mk.input | 5 +- make/install.mk | 6 +- make/official_release_version | 1 + make/uninstall.mk | 2 +- rgmanager/include/platform.h | 2 +- rgmanager/init.d/rgmanager.in | 15 +- rgmanager/src/clulib/vft.c | 2 +- rgmanager/src/daemons/Makefile | 2 +- rgmanager/src/resources/ASEHAagent.sh | 1786 ++++++++++++------------ rgmanager/src/resources/Makefile | 17 +- rgmanager/src/resources/clusterfs.sh | 2 +- rgmanager/src/resources/fs.sh | 2 +- rgmanager/src/resources/lvm.sh | 2 +- rgmanager/src/resources/netfs.sh | 2 +- scripts/fenceparse | 2 +- 40 files changed, 1359 insertions(+), 1115 deletions(-) - -- I'm going to make him an offer he can't refuse. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iQIVAwUBSD4u1wgUGcMLQ3qJAQLGgA//QZ0cHFlKMmBKiREVS/7XYfTM1CSMXGeq g5rVXbh8lqobDgfqHSs10Q8BwxkP6XCPodYv3z5ws6uKvnGGhV+8ceDhTdxJUYBE 16VMLGC1pHT/cRHiYeukAfCvt3fXXV6Q114OGZYJSYGCMfXpjPBXMyqi4xTbDoUn wYB4vUTjx++j7WsaW9uKVT5ORRmj+Xg6ubbCDchjZitiAp8Fwfx1Lz5RlRm7XhSo 7z/3GzMIN2oPz1g15aGqLq6/SYBmM4iAX9KzC1xTslyxcw5/2+5UlFGF5JcXXidd QiPoRv1hDbk8xwoFBSgMmkUERldO3RSTQTBhN2SEBeaAP7E9hDgh3a6ZMq74jvab sWZ9LUBDS8rDNDjB0ak+BNUZy1loRQWj57ASL+jMXADy3QtL9vWNxyhsZFLRMpJ7 aUuzJWA3mFR1MyqOS/Zxy1Dea6A6LQETafwWMnbAk6M+h5SbCOfjl2Ti/7bvlG9E pthE8F0LygzorLnp+68jerYjSqKMwWjbM4etsOo/iV66utc27Udmwbf0VX5nBo8N ZnxoNDF/VtXtrccljEBnHdnhpru+wsUPLL6B+3nx7Gv6Ats3axOmuOYlQpW/vJpG FGCwXcP4+qLVmz4v7Hi2qshDLsbHUSXNtH2Mlzl9EMhk3OW1U4g4Vk2Tml2MUCp1 9ad09m2XsSc= =APS2 -----END PGP SIGNATURE----- From sunhux at gmail.com Thu May 29 05:43:58 2008 From: sunhux at gmail.com (sunhux G) Date: Thu, 29 May 2008 13:43:58 +0800 Subject: [Linux-cluster] what's IBM "Remote Supervisor Adapter II", "power fencing device" & clustering Message-ID: <60f08e700805282243y3c15e8f4l3211376babdec865@mail.gmail.com> Hi, What's the purpose of "Remote Supervisor Adapter II" & can it be used to configure a Redhat cluster? What's "power fencing Device"? Can we set up a cluster between 2 RHES (ver 5.1 AP) just by using one additional network port on each server or do we need 2 network ports per server? Thought heartbeat link requires one only or usually the general practice is to use 2? Without getting additional network port, can we just RSA II port? Any specific information for setting up Linux clustering for IBM hardware (x3850 M2 & x3950 M2) is appreciated Tks U -------------- next part -------------- An HTML attachment was scrubbed... URL: From denisb+gmane at gmail.com Thu May 29 07:18:28 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 29 May 2008 09:18:28 +0200 Subject: [Linux-cluster] Updating issues Message-ID: While doing a yum update on one RHCS node I got this: Transaction Check Error: file /etc/depmod.d/gfs2.conf from install of kmod-gfs2-1.92-1.1.el5 conflicts with file from package kmod-gfs2-1.52-1.16.el5 This node does not use GFS2 (and I already unmounted any GFS1 volumes anyway), so I removed the package, and the update transaction test then passes without errors. The kmod-gfs2 package should probably have been removed in the transaction too? Is this something I should report via bugzilla? Regards -- Denis From denisb+gmane at gmail.com Thu May 29 08:20:18 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 29 May 2008 10:20:18 +0200 Subject: [Linux-cluster] LVM hanging during mkinitrd checks Message-ID: During the update to RHEL5.2 I had another problem. The mkinitrd process would hang indefinitely while scanning my block devices with "lvm.static lvs --ignorelockingfailure --noheadings -o vg_name blockdev" stracing the lvs process showed no signs of life. It hung on random function calls (I checked several hung lvs processes during the update). I simply killed the lvs processes when they hung. A manual check of lvs did hang and never returned output so the issue was system wide, not specific to the update run. Could this be because I unmounted my shared GFS volume prior to the update? I cannot really see why that should be problematic, but lvs on the other (non upgraded node) worked fine. After finishing the update, lvs works fine and returns the expected : " No volume groups found" What is the best practise for updates of GFS components? Should I keep my volumes mounted during updates or unmount them? Regards -- Denis From denisb+gmane at gmail.com Thu May 29 08:25:02 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 29 May 2008 10:25:02 +0200 Subject: [Linux-cluster] cman_tool returns Flags: Dirty Message-ID: Hi list, Sorry for the noise, but I thought posting my results with the update to RHEL5.2 would be of interest to a few of you. The node that has been updated to RHEL5.2 seems to operate very nicely in the cluster so far. After a boot it did rejoin the cluster, and got back its affinity assigned service. Quorum disk operations resumed after a bit too. The only warning sign I can see is this output from cman_tool status : # cman_tool status Version: 6.1.0 Config Version: 39 Cluster Name: cluster_clustername Cluster Id: 19444 Cluster Member: Yes Cluster Generation: 776 Membership state: Cluster-Member Nodes: 2 Expected votes: 3 Total votes: 3 Quorum: 2 Active subsystems: 10 Flags: Dirty Ports Bound: 0 11 177 Node name: nodename.customername.com Node ID: 1 What does "Flags: Dirty" mean? Is it anything to worry about? Google was unhelpful. Regards -- Denis From magawake at gmail.com Thu May 29 10:41:31 2008 From: magawake at gmail.com (Mag Gam) Date: Thu, 29 May 2008 06:41:31 -0400 Subject: [Linux-cluster] GFS implementation Message-ID: <1cbd6f830805290341s20e2cccdp1e1befdb3c568dc5@mail.gmail.com> Hello: I am planning to implement GFS for my university as a summer project. I have 10 servers each with SAN disks attached. I will be reading and writing many files for professor's research projects. Each file can be anywhere from 1k to 120GB (fluid dynamic research images). The 10 servers will be using NIC bonding (1GB/network). So, would GFS be ideal for this? I have been reading a lot about it and it seems like a perfect solution. Any thoughts? TIA -------------- next part -------------- An HTML attachment was scrubbed... URL: From maciej.bogucki at artegence.com Thu May 29 10:55:29 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Thu, 29 May 2008 12:55:29 +0200 Subject: [Linux-cluster] GFS implementation In-Reply-To: <1cbd6f830805290341s20e2cccdp1e1befdb3c568dc5@mail.gmail.com> References: <1cbd6f830805290341s20e2cccdp1e1befdb3c568dc5@mail.gmail.com> Message-ID: <483E8BA1.9080608@artegence.com> Mag Gam wrote: > Hello: > > I am planning to implement GFS for my university as a summer project. > I have 10 servers each with SAN disks attached. I will be reading and > writing many files for professor's research projects. Each file can be > anywhere from 1k to 120GB (fluid dynamic research images). The 10 > servers will be using NIC bonding (1GB/network). So, would GFS be > ideal for this? I have been reading a lot about it and it seems like a > perfect solution. > > Any thoughts? Please remember about fencing. Best Regards Maciej Bogucki From denisb+gmane at gmail.com Thu May 29 11:13:21 2008 From: denisb+gmane at gmail.com (denis) Date: Thu, 29 May 2008 13:13:21 +0200 Subject: [Linux-cluster] Re: cman_tool returns Flags: Dirty In-Reply-To: References: Message-ID: denis wrote: > What does "Flags: Dirty" mean? Is it anything to worry about? > Google was unhelpful. Google was sort of helpful after all; http://www.redhat.com/archives/cluster-devel/2007-September/msg00091.html NODE_FLAGS_DIRTY - This node has internal state and must not join a cluster that also has state. What does this actually imply? Anything to care about? How would this node "recover" from being dirty? Regards -- Denis From deka.lipika at gmail.com Thu May 29 14:35:17 2008 From: deka.lipika at gmail.com (Lipika Deka) Date: Thu, 29 May 2008 15:35:17 +0100 Subject: [Linux-cluster] Intra file sharing in GFS Message-ID: <43097e740805290735o78802497tbb2a3a38ead14680@mail.gmail.com> Hi, Does DLM take care of intra file sharing in GFS and if so how? Thanks, Cheers -------------- next part -------------- An HTML attachment was scrubbed... URL: From kees at tweakers.net Thu May 29 14:37:02 2008 From: kees at tweakers.net (Kees Hoekzema) Date: Thu, 29 May 2008 16:37:02 +0200 Subject: [Linux-cluster] GFS in a small cluster Message-ID: <001701c8c199$757b48b0$6071da10$@net> Hello List, Recently we have been looking at replacing our NFS server with a SAN in our (relatively small) webserver cluster. We decided to go with the Dell MD3000i, an iSCSI SAN. Right now I have it for testing purposes and I'm trying to set up a simple cluster to get more experience with it. At the moment we do not run Redhat, but Debian; so although this is probably the wrong mailing list for me, I could not find any other place where problems like this are discussed. The cluster, if it goes into production, will have to serve 'dynamic' files to the webservers, these include images, videos and generic downloads. So what will happen on the SAN is many reads, and relatively very few writes, at the moment the read-write proportions on the NFS server are around 99% reads vs 1% writes, the only writes that occur are users uploading a new image, or one server creating some graphs. Not only the webservers will use this SAN, but also the database servers will use it to read some files from it. I have been looking at different filesystems to run on this SAN the suit my needs, and GFS is one of those, but I have a few problems and questions. - Is locking really needed? There is no chance one webserver will try to write to a file that is being written to by another file. - How about fencing? I'd rather have a corrupt filesystem than a corrupt database, how silly that may sound, but I do not want the webservers be able to switch off the (infinite more important) database servers, and all servers can easily work without any problem without the share, they will still serve most of the content, just not the user-uploaded images / videos / downloads. Is GFS the right FS for me or do I need to look to other (cluster aware) filesystems? >From the FAQ: http://sources.redhat.com/cluster/wiki/FAQ/GFS#gfs_whatgood What I really need is a filesystem that is cluster-aware, aka that it knows and reacts to the fact that other systems than himself are able to write to the disk, and as said, ext3 does not know that; mount it on both systems and they do see the original data, but as soon as one changes something the other won't pick it up. Anyway, I tried gfs with the lock_nolock protocol, but I might as well use ext3 than. With any other protocol, the mount will just hang with: Trying to join cluster "lock_dlm", "tweakers:webdata" dlm: Using TCP for communications dlm: connect from non cluster node BUG: soft lockup - CPU#2 stuck for 11s! [dlm_send:3566] Pid: 3566, comm: dlm_send Not tainted (2.6.24-1-686 #1) EIP: 0060:[] EFLAGS: 00000202 CPU: 2 EIP is at _spin_unlock_irqrestore+0xa/0x13 The other FS we looked at was OCFS2, but although it is a lot easier to set up, and it works without any problems, it does have a limit of 32k directories in one directory, something which we easily surpass on our current shares (over 50k directories in one dir). Anyway, is there a method to have gfs mounted without locking, but still be cluster-aware (aka; the fs can be updated by other servers) and without fencing? -kees From lp at xbe.ch Thu May 29 15:22:30 2008 From: lp at xbe.ch (Lorenz Pfiffner) Date: Thu, 29 May 2008 17:22:30 +0200 Subject: [Linux-cluster] apache resource problem in RHCS 5.1 Message-ID: <483ECA36.7070007@xbe.ch> Hello everybody I have the following test setup: - RHEL 5.1 Cluster Suite with rgmanager-2.0.31-1 and cman-2.0.73-1 - Two VMware machines on an ESX 3.5 U1, so no fence device (it's only a test) - 4 IP resources defined - GFS over DRBD, doesn't matter, because it doesn't even work on a local disk Now I would like to have an "Apache Resource" which i can select in the luci interface. I assume it's using the /usr/share/cluster/apache.sh script. If I try to start it, the error message looks like this: May 28 16:18:15 testsrv clurgmgrd: [18475]: Starting Service apache:test_httpd > Failed May 28 16:18:15 testsrv clurgmgrd[18475]: start on apache "test_httpd" returned 1 (generic error) May 28 16:18:15 testsrv clurgmgrd[18475]: #68: Failed to start service:test_proxy_http; return value: 1 May 28 16:18:15 testsrv clurgmgrd[18475]: Stopping service service:test_proxy_http May 28 16:18:16 testsrv clurgmgrd: [18475]: Checking Existence Of File /var/run/cluster/apache/apache:test_httpd.pid [apache:test_httpd] > Failed - File Doesn't Exist May 28 16:18:16 testsrv clurgmgrd: [18475]: Stopping Service apache:test_httpd > Failed May 28 16:18:16 testsrv clurgmgrd[18475]: stop on apache "test_httpd" returned 1 (generic error) May 28 16:18:16 testsrv clurgmgrd[18475]: #71: Relocating failed service service:test_proxy_http I've another cluster in which I had to alter the default init.d/httpd script to be able to run multiple apache instances (not vhosts) on one server. But there I have the Apache Service configured with a "Script Resource". Is this supposed to work of is it a feature in development? I don't see something like "Apache Resource" in the current documentation. Kind Regards Lorenz From sghosh at redhat.com Thu May 29 15:42:22 2008 From: sghosh at redhat.com (Subhendu Ghosh) Date: Thu, 29 May 2008 11:42:22 -0400 Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID In-Reply-To: <3ae027040805281616l1d9856f3o1f06ac2f7de51e6f@mail.gmail.com> References: <20080521160013.8C3D6619BED@hormel.redhat.com> <4834D68B.9010309@auckland.ac.nz> <3ae027040805281418u7bf389f6q417c5cdaa029b618@mail.gmail.com> <20080528213713.GB18367@kallisti.us> <3ae027040805281616l1d9856f3o1f06ac2f7de51e6f@mail.gmail.com> Message-ID: <483ECEDE.30208@redhat.com> >> If you bond between two different switches, you'll only be able to do >> failover between the NICs. If you use multipath, you can round-robin >> between them to provide a greater bandwidth overhead. > > Same goes for bonding: link aggregation with active-active bonding. > active-active bonding across two network switches is just bad. Spanning Tree does not like it. -Subhendu From jerlyon at gmail.com Thu May 29 18:36:40 2008 From: jerlyon at gmail.com (Jeremy Lyon) Date: Thu, 29 May 2008 12:36:40 -0600 Subject: [Linux-cluster] Cluster starts, but a node won't rejoin after reboot In-Reply-To: <3DDA6E3E456E144DA3BB0A62A7F7F779020C6285@SKYHQAMX08.klasi.is> References: <779919740805221003k5b799927qfc0c11f65e1bf340@mail.gmail.com> <3DDA6E3E456E144DA3BB0A62A7F7F779020C6285@SKYHQAMX08.klasi.is> Message-ID: <779919740805291136i166b37ado2d2d4b21112cbbfe@mail.gmail.com> > I'm having the exact same issue on a RHEL 5.2 system, and have a open > support case with Redhat. When it will be resolved i can post the details > .... > Any word on this? I think I may get my own case going. Do you know if a bugzilla got assigned to this? Thanks! Jeremy -------------- next part -------------- An HTML attachment was scrubbed... URL: From ccaulfie at redhat.com Fri May 30 07:29:51 2008 From: ccaulfie at redhat.com (Christine Caulfield) Date: Fri, 30 May 2008 08:29:51 +0100 Subject: [Linux-cluster] Re: cman_tool returns Flags: Dirty In-Reply-To: References: Message-ID: <483FACEF.2080509@redhat.com> denis wrote: > denis wrote: >> What does "Flags: Dirty" mean? Is it anything to worry about? >> Google was unhelpful. > > Google was sort of helpful after all; > > http://www.redhat.com/archives/cluster-devel/2007-September/msg00091.html > > NODE_FLAGS_DIRTY - This node has internal state and must not join > a cluster that also has state. > > > What does this actually imply? Anything to care about? How would this > node "recover" from being dirty? > It's a perfectly normal state. in fact it's expected if you are running services. It simply means that the cluster has some services running that have state of their own that cannot be recovered without a full restart. I would be more worried if you did NOT see this in cman_tool status. It's NOT a warning. don't worry about it :) -- Chrissie From maciej.bogucki at artegence.com Fri May 30 07:38:16 2008 From: maciej.bogucki at artegence.com (Maciej Bogucki) Date: Fri, 30 May 2008 09:38:16 +0200 Subject: [Linux-cluster] GFS in a small cluster Message-ID: <483FAEE8.9050706@artegence.com> > The cluster, if it goes into production, will have to serve 'dynamic' files > to the webservers, these include images, videos and generic downloads. So > what will happen on the SAN is many reads, and relatively very few writes, > at the moment the read-write proportions on the NFS server are around 99% > reads vs 1% writes, the only writes that occur are users uploading a new > image, or one server creating some graphs. No problem to GFS. > Not only the webservers will use this SAN, but also the database servers > will use it to read some files from it. I have been looking at different > filesystems to run on this SAN the suit my needs, and GFS is one of those, > but I have a few problems and questions. Create two LUN on the array, one for database and the second for static files with two GFS fs on the top of it. > - Is locking really needed? There is no chance one webserver will try to write to a file that is being written to by another file. Yes, you need locking, if You have more than one serwer in the cluster. > - How about fencing? I'd rather have a corrupt filesystem than a corrupt > database, how silly that may sound, but I do not want the webservers be able > to switch off the (infinite more important) database servers, and all > servers can easily work without any problem without the share, they will > still serve most of the content, just not the user-uploaded images / videos >/ downloads. Configure one ore more fencing method for the cluster and sleep well ;) > Is GFS the right FS for me or do I need to look to other (cluster aware) > filesystems? Yes, but when You properly configure it(fe. configure/test fencing). > The other FS we looked at was OCFS2, but although it is a lot easier to set > up, and it works without any problems, it does have a limit of 32k > directories in one directory, something which we easily surpass on our > current shares (over 50k directories in one dir). OCFS2 is similar to GFS, and it is for Oracle RAC environment. I suggest to use GFS, because it is more popular than OCFS2. > Anyway, is there a method to have gfs mounted without locking, > but still be > cluster-aware (aka; the fs can be updated by other servers) and without > fencing? Yes, but only on one node. Manual fencing is needed for production environment! Best Regards Maciej Bogucki From michael.osullivan at auckland.ac.nz Fri May 30 09:16:23 2008 From: michael.osullivan at auckland.ac.nz (michael.osullivan at auckland.ac.nz) Date: Fri, 30 May 2008 21:16:23 +1200 (NZST) Subject: [Linux-cluster] GFS, iSCSI, multipaths and RAID Message-ID: <50463.222.152.69.120.1212138983.squirrel@mail.esc.auckland.ac.nz> Hi everyone, We chose not to bond the NICs because we'd heard this does not scale the bandwidth linearly. To keep performance of the network high we wanted to allow the load to be spread across multiple links and multipath seemed the best way. The iSCSI setup suggested by the article http://www.pcpro.co.uk/realworld/82284/san-on-the-cheap/page1.html uses one storage device as the primary storage and the second one as the secondary storage. The iSCSI target is presented via the first device and will failover to the second device. This allows for failure of either of the devices, but does not allow the storage load to be shared amongst the devices. By having the setup as described in http://www.ndsg.net.nz/ndsg_cluster.jpg/view (or http://www.ndsg.net.nz/ndsg_cluster.jpg/image_view_fullscreen for the fullscreen view) with multipath we provide two distinct paths between each server and each storage device, both of which can be used to send/receive data. By creating a RAID-5 array out of the iSCSI disks I hope I have allowed both of them to share the storage load. Our setup is intended to provide diverse protection for the storage system via: 1) RAID for the storage devices; 2) multipathing over the network - we've had dm multipath recommended instead of mdadm - any comments?; 3) a cluster for the servers using GFS to allow locking of the storage system; but also allows all the components to share the load instead of using a primary/secondary type setup (which largely "wastes" the scondary resources). We are going to use IOMeter to test our setup and see how it performs. We will then run the same tests with different parts of the network disabled and see what happens. As usual any comments/suggestions/criticisms are welcome. Thanks for all the discussion, it has been very useful and enlightening, Mike From deka.lipika at gmail.com Fri May 30 10:12:47 2008 From: deka.lipika at gmail.com (Lipika Deka) Date: Fri, 30 May 2008 11:12:47 +0100 Subject: [Linux-cluster] Granularity of a lock Message-ID: <43097e740805300312l66756578qdc627eb53334e728@mail.gmail.com> Hello List, Would anyone tell me what is the granularity of a lock in GFS using DLM and is locking part of a file possible.Is there something similar to byte range locks of GPFS in GFS? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Fri May 30 10:17:51 2008 From: swhiteho at redhat.com (Steven Whitehouse) Date: Fri, 30 May 2008 11:17:51 +0100 Subject: [Linux-cluster] Granularity of a lock In-Reply-To: <43097e740805300312l66756578qdc627eb53334e728@mail.gmail.com> References: <43097e740805300312l66756578qdc627eb53334e728@mail.gmail.com> Message-ID: <1212142671.3474.31.camel@quoit> Hi, On Fri, 2008-05-30 at 11:12 +0100, Lipika Deka wrote: > Hello List, > Would anyone tell me what is the granularity of a lock in GFS > using DLM and is locking part of a file possible.Is there something > similar to byte range locks of GPFS in GFS? > > Thanks in advance. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster I'm not sure of exactly what GPFS does in this regard, but the locking in GFS (and GFS2) is one lock per inode I'm afraid. Its a RW-lock though so that read accesses at least can be done across the whole cluster at once. With GFS2 that includes read accesses to rw-mapped mmaped regions of files, for GFS that requires an exclusive lock I'm afraid, Steve. From T.Kumar at alcoa.com Fri May 30 17:25:07 2008 From: T.Kumar at alcoa.com (Kumar, T Santhosh (TCS)) Date: Fri, 30 May 2008 13:25:07 -0400 Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. Message-ID: <0C3FC6B507AF684199E57BFCA3EAB5532565630D@NOANDC-MXU11.NOA.Alcoa.com> I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm along with the other three dependencies listed below. lvm2-cluster-2.02.32-4.el5.x86_64.rpm device-mapper-event-1.02.24-1.el5.x86_64.rpm device-mapper-1.02.24-1.el5.x86_64.rpm I prefer to do this as I realise the below. lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which resolves the "clvmd -R did not work as expected". Do any one know of any problems which might come with upgrading the lvm2, device mapper packages. From orkcu at yahoo.com Fri May 30 18:14:41 2008 From: orkcu at yahoo.com (=?iso-8859-1?Q?Roger_Pe=F1a?=) Date: Fri, 30 May 2008 11:14:41 -0700 (PDT) Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. In-Reply-To: <0C3FC6B507AF684199E57BFCA3EAB5532565630D@NOANDC-MXU11.NOA.Alcoa.com> Message-ID: <767810.9219.qm@web50605.mail.re2.yahoo.com> --- On Fri, 5/30/08, Kumar, T Santhosh (TCS) wrote: > From: Kumar, T Santhosh (TCS) > Subject: [Linux-cluster] Upgrading to lvm2-cluster-2.02.32-4.el5.x86_64 - Impact analysis. > To: linux-cluster at redhat.com > Received: Friday, May 30, 2008, 1:25 PM > I am planning to upgrade to lvm2-2.02.32-4.el5.x86_64.rpm > along with > the other three dependencies listed below. > > lvm2-cluster-2.02.32-4.el5.x86_64.rpm > device-mapper-event-1.02.24-1.el5.x86_64.rpm > device-mapper-1.02.24-1.el5.x86_64.rpm > > I prefer to do this as I realise the below. > > lvm2-2.02.32-4.el5.x86_64.rpm is an updated package which > resolves the > "clvmd -R did not work as expected". > > Do any one know of any problems which might come with > upgrading the > lvm2, device mapper packages. I suggest you to take a look in bugzilla. I dont have a linux server in my hand right now to check so I dont know tom what RHEL release you are refering, but we got some clvm problems when we update a RHEL4.5 to RHEL4.6 + update. and also there is bug, fixed for 5.2 but dont know for 4.6, that I think you should into, it was discussed in this list days ago (subject: LVM manager or something) cu roger __________________________________________________________________ Be smarter than spam. See how smart SpamGuard is at giving junk email the boot with the All-new Yahoo! Mail. Click on Options in Mail and switch to New Mail today or register for free at http://mail.yahoo.ca