From mshk_00 at hotmail.com Tue Mar 1 11:25:25 2005 From: mshk_00 at hotmail.com (maria perez) Date: Tue, 01 Mar 2005 12:25:25 +0100 Subject: [Linux-cluster] problems with XFree86 and red hat enterprise 3.0 Message-ID: Hi to everybody!!! I have a problem with XFree and Red Hat Enterprise 3.0 in a server. I had to compile a kernel 2.4.26 in order to achieve shutdown the server, because the original kernel does not support ACPI (the bios neither support apm). I achieved shutdown the system with acpi in this kernel, but now xfree86 does not work. Nothing more looks fail. We had the same problem when we compiled other kernels, I think is a problem of red hat enterprise, I consider is a system very fanciful and delicate, but I do not know how to repair it. Has someone any idea?? Can someone help me?? Had someone a similar problem?? Thanks for your attention. And excuse me for my poor english. Maria. _________________________________________________________________ M?viles, DVD, c?maras digitales, coleccionismo... Con unas ofertas que ni te imaginas. http://www.msn.es/Subastas/ From ialberdi at histor.fr Tue Mar 1 13:28:42 2005 From: ialberdi at histor.fr (Ion Alberdi) Date: Tue, 01 Mar 2005 14:28:42 +0100 Subject: [Linux-cluster] Configuring rgmanager Message-ID: <42246E0A.6070000@histor.fr> Ion Alberdi wrote: > > Results from my tests with two nodes(buba and gump)(and latest > cvs(update done today)): > I tried to put a basic script in failover on two nodes. > Initialization: > ..... > (buba seems not to have any clurgmrgrd running, even if I started the > rgmanager...) > > I don't know if it's a bug of the rgmanager or if I'm doing something > wrong, but I don't understand why during the first reboot everything > worked and nothing then... > Aparently this has nothing to do with the reboot, but with the use of clusvcadm. When I only reboot nodes without relocating manually services, it works well: the service switching is always done. However, after having relocated manually the service and rebooting the server running the service, it seems that magma doesn't say to the remaining node that the service must be relocated (membership_update is not called, so nothing is done...) Is this a bug? If it is not, I would find very usefull to relocate manually a service, for example if the cluster is composed of a huge server and a little rescue server, after having repaired the huge server and making it join the cluster, it will be welcome to relocate manually the service. Also if the administrator wants to do some operations on a server running a criticial service, it will be necessary to have previously relocated the service. Best regards From ecashin at coraid.com Tue Mar 1 15:13:12 2005 From: ecashin at coraid.com (Ed L Cashin) Date: Tue, 01 Mar 2005 10:13:12 -0500 Subject: [Linux-cluster] gfs space usage (was Re: a question about gfs file system) References: <000901c50ae3$feb22970$69050364@yazanz> Message-ID: <87k6orz9fb.fsf@coraid.com> "Yazan Al-Sheyyab" writes: > hello everybody, > > how much space that gfs file system can take from a partion ? > > i mean that if i have a 500 MB partition and want to format it as gfs file > system , so how much space will remain for me in this partition after format > it as gfs file system? You can try it on a local file. As you can see below, there's not much left after adding three default-sized journals. ecashin at kokone tmp$ /sbin/mkfs.gfs -j 3 -p lock_nolock test.gfs This will destroy any data on test.gfs. Are you sure you want to proceed? [y/n] y Device: test.gfs Blocksize: 4096 Filesystem Size: 29652 Journals: 3 Resource Groups: 8 Locking Protocol: lock_nolock Lock Table: Syncing... All Done ecashin at kokone tmp$ su Password: root at kokone tmp# losetup /dev/loop0 test.gfs root at kokone tmp# modprobe gfs root at kokone tmp# mount /dev/loop0 /mnt/ FC2-i386-disc1 mooch-i386-disc1 FC2-i386-disc2 mooch-i386-disc1-bootdisk FC2-i386-disc3 mooch-i386-disc2 e15.3 mooch-i386-disc3 ed-gfs mooch-i386-stage2 file netstg2.img loop0 sah root at kokone tmp# mount /dev/loop0 /mnt/loop0 root at kokone tmp# ls !$ ls /mnt/loop0 root at kokone tmp# df -h Filesystem Size Used Avail Use% Mounted on /dev/hda3 71G 40G 29G 59% / tmpfs 236M 0 236M 0% /dev/shm /dev/hda1 942M 37M 858M 5% /boot /dev/loop0 116M 20K 116M 1% /mnt/loop0 root at kokone tmp# -- Ed L Cashin From jbrassow at redhat.com Tue Mar 1 18:29:36 2005 From: jbrassow at redhat.com (Jonathan E Brassow) Date: Tue, 1 Mar 2005 12:29:36 -0600 Subject: [Linux-cluster] GFS and iscsi? In-Reply-To: <20050224170202.GD24154@ultraviolet.org> References: <20050224164527.6384.qmail@webmail29.rediffmail.com> <20050224170202.GD24154@ultraviolet.org> Message-ID: <3b239cc3d1d022b56ce2510afd918088@redhat.com> I've used iSCSI with some degree of success. Problems that I had were more related to the iSCSI driver though. Some people also use GNBD. brassow On Feb 24, 2005, at 11:02 AM, Tracy R Reed wrote: > What SAN/LAN technology are most people using with GFS these days? I > know > it is traditionally used with fibrechannel but that remains very > expensive > technology. I recently heard from someone contemplating using it with > iscsi. Is that known to work? > > -- > Tracy Reed > http://ultraviolet.org > This message is cryptographically signed for your protection. > Info: http://copilotconsulting.com/sig > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > http://www.redhat.com/mailman/listinfo/linux-cluster From mshk_00 at hotmail.com Wed Mar 2 08:56:51 2005 From: mshk_00 at hotmail.com (maria perez) Date: Wed, 02 Mar 2005 09:56:51 +0100 Subject: [Linux-cluster] problem with xfree86 Message-ID: Hi again!!! I continued with my problem of xfree86, i compiled a kernel 2.4.26 in my s.o. red hat enterprise 3 (udpate 2). I do not have any problem in the compilation but now I lose the Xwindow. In the file /var/log/messages appear: mdmpd failed modprobe: Cant locale module char-major-10-224 modprobe: Cant locale module char-major-10-134 gdm[3098]gdm_slave_xioerror_handler: (fatal erro of X) In the file /var/log/XFree86.0.log appear: (WW) Open APM failed (/dev/apm_bios) (NO such device) I do not find more errors, what is happen?? Thanks, MARIA. _________________________________________________________________ Moda para esta temporada. Ponte al d?a de todas las tendencias. http://www.msn.es/Mujer/moda/default.asp From lhh at redhat.com Wed Mar 2 09:05:07 2005 From: lhh at redhat.com (Lon Hohberger) Date: Wed, 02 Mar 2005 04:05:07 -0500 Subject: [Linux-cluster] Configuring rgmanager In-Reply-To: <42246E0A.6070000@histor.fr> References: <42246E0A.6070000@histor.fr> Message-ID: <1109754307.5740.10.camel@ayanami.boston.redhat.com> On Tue, 2005-03-01 at 14:28 +0100, Ion Alberdi wrote: > Ion Alberdi wrote: > > > > > Results from my tests with two nodes(buba and gump)(and latest > > cvs(update done today)): > > I tried to put a basic script in failover on two nodes. > > Initialization: > > ..... > > > (buba seems not to have any clurgmrgrd running, even if I started the > > rgmanager...) > > > > I don't know if it's a bug of the rgmanager or if I'm doing something > > wrong, but I don't understand why during the first reboot everything > > worked and nothing then... > > > Aparently this has nothing to do with the reboot, but with the use of > clusvcadm. > When I only reboot nodes without relocating manually services, it works > well: the service switching is always done. > However, after having relocated manually the service and rebooting the > server running the service, it seems that magma doesn't say to the > remaining node that the service must be relocated (membership_update is > not called, so nothing is done...) > Is this a bug? There was a bug where rgmanager was randomly getting stuck in accept(2) because the listen sockets weren't set O_NONBLOCK (like they should have been). Once it got there, it stopped picking up cluster events (including membership changes). This should be fixed in current CVS. Could this have been the problem? > If it is not, I would find very usefull to relocate manually a service, > for example if the cluster is composed of a huge server and a little > rescue server, after having repaired the huge server and making it join > the cluster, it will be welcome to relocate manually the service. > Also if the administrator wants to do some operations on a server > running a criticial service, it will be necessary to have previously > relocated the service. Relocating services is an obvious feature. -- Lon From ialberdi at histor.fr Wed Mar 2 10:01:46 2005 From: ialberdi at histor.fr (Ion Alberdi) Date: Wed, 02 Mar 2005 11:01:46 +0100 Subject: [Linux-cluster] Configuring rgmanager In-Reply-To: <1109754307.5740.10.camel@ayanami.boston.redhat.com> References: <42246E0A.6070000@histor.fr> <1109754307.5740.10.camel@ayanami.boston.redhat.com> Message-ID: <42258F0A.4020302@histor.fr> Lon Hohberger wrote: >On Tue, 2005-03-01 at 14:28 +0100, Ion Alberdi wrote: > > >>Ion Alberdi wrote: >> >> >> >>>Results from my tests with two nodes(buba and gump)(and latest >>>cvs(update done today)): >>>I tried to put a basic script in failover on two nodes. >>>Initialization: >>>..... >>> >>> >>>(buba seems not to have any clurgmrgrd running, even if I started the >>>rgmanager...) >>> >>>I don't know if it's a bug of the rgmanager or if I'm doing something >>>wrong, but I don't understand why during the first reboot everything >>>worked and nothing then... >>> >>> >>> >>Aparently this has nothing to do with the reboot, but with the use of >>clusvcadm. >>When I only reboot nodes without relocating manually services, it works >>well: the service switching is always done. >>However, after having relocated manually the service and rebooting the >>server running the service, it seems that magma doesn't say to the >>remaining node that the service must be relocated (membership_update is >>not called, so nothing is done...) >>Is this a bug? >> >> > >There was a bug where rgmanager was randomly getting stuck in accept(2) >because the listen sockets weren't set O_NONBLOCK (like they should have >been). Once it got there, it stopped picking up cluster events >(including membership changes). This should be fixed in current CVS. > >Could this have been the problem? > > > Yes I've just downloaded/installed the latest cvs and it works now,thanks! >>If it is not, I would find very usefull to relocate manually a service, >>for example if the cluster is composed of a huge server and a little >>rescue server, after having repaired the huge server and making it join >>the cluster, it will be welcome to relocate manually the service. >>Also if the administrator wants to do some operations on a server >>running a criticial service, it will be necessary to have previously >>relocated the service. >> >> > >Relocating services is an obvious feature. > > > >-- Lon > >-- >Linux-cluster mailing list >Linux-cluster at redhat.com >http://www.redhat.com/mailman/listinfo/linux-cluster > > > From jhahm at yahoo.com Wed Mar 2 19:56:00 2005 From: jhahm at yahoo.com (Jiho Hahm) Date: Wed, 2 Mar 2005 11:56:00 -0800 (PST) Subject: [Linux-cluster] latest CVS build error in rgmanager Message-ID: <20050302195601.90087.qmail@web50908.mail.yahoo.com> Hi, I'm getting this error while compiling rgmanager/src/daemons/groups.c: groups.c: In function `send_rg_state': groups.c:537: error: structure has no member named `rs_id' groups.c:537: error: structure has no member named `rs_id' make[3]: *** [groups.o] Error 1 It looks like the error was introduced in rgmanager/include/resgroup.h version 1.4. rs_id field was removed from rg_state_t structure, but a line was added to swab_rg_state_t macro that references rs_id. I grabbed the code about 10 minutes ago. -Jiho __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com From lhh at redhat.com Thu Mar 3 19:09:59 2005 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Mar 2005 14:09:59 -0500 Subject: [Linux-cluster] latest CVS build error in rgmanager In-Reply-To: <20050302195601.90087.qmail@web50908.mail.yahoo.com> References: <20050302195601.90087.qmail@web50908.mail.yahoo.com> Message-ID: <1109876999.7556.35.camel@ayanami.boston.redhat.com> On Wed, 2005-03-02 at 11:56 -0800, Jiho Hahm wrote: > Hi, I'm getting this error while compiling > rgmanager/src/daemons/groups.c: > > groups.c: In function `send_rg_state': > groups.c:537: error: structure has no member named `rs_id' > groups.c:537: error: structure has no member named `rs_id' > make[3]: *** [groups.o] Error 1 > > It looks like the error was introduced in > rgmanager/include/resgroup.h version 1.4. rs_id field was > removed from rg_state_t structure, but a line was added to > swab_rg_state_t macro that references rs_id. > > I grabbed the code about 10 minutes ago. Fixed. Sorry, I thought I committed this fix earlier. -- Lon From lhh at redhat.com Thu Mar 3 19:19:27 2005 From: lhh at redhat.com (Lon Hohberger) Date: Thu, 03 Mar 2005 14:19:27 -0500 Subject: [Linux-cluster] Triggering failover at Resource Manager level In-Reply-To: <20050225210925.37778.qmail@web50905.mail.yahoo.com> References: <20050225210925.37778.qmail@web50905.mail.yahoo.com> Message-ID: <1109877567.7556.43.camel@ayanami.boston.redhat.com> On Fri, 2005-02-25 at 13:09 -0800, Jiho Hahm wrote: > Lon, I logged it as bug 149735. I didn't know if it should > be marked as bug or enhancement, so I just took the > defaults (bug). Yup, that's fine. It should work with current CVS: restart -> normal recovery. Try to restart the resource group locally. If it fails, try to start it on each legal target. If no other node could start it, stop the resource and wait for the next node/service- group transition. [This is the default, and should have happened last week, but there was a bug preventing this behavior.] relocate -> Same as "restart", but don't try to restart locally. Do the relocate dance. disable -> Don't bother trying to fix it. Disable the service. -- Lon From bastian at waldi.eu.org Fri Mar 4 09:34:02 2005 From: bastian at waldi.eu.org (Bastian Blank) Date: Fri, 4 Mar 2005 10:34:02 +0100 Subject: [Linux-cluster] cman_serviced crashed while rebooting two nodes Message-ID: <20050304093402.GB20484@wavehammer.waldi.eu.org> Hi folks While rebooting two nodes, both nodes crashed. Both runs 2.6.10 with cman from 2005-02-06. One (name: gfs1) with: | CMAN: nmembers in HELLO message from 4 does not match our view (got 7, exp 8) | CMAN: node gfs1 has been removed from the cluster : No response to messages | CMAN: killed by NODEDOWN message | CMAN: we are leaving the cluster. | SM: 01000003 sm_stop: SG still joined The other with the more serious: | Unable to handle kernel NULL pointer dereference at virtual address 0000000c | printing eip: | c025f7fd | *pde = ma 00000000 pa 55555000 | [] cancel_uevents+0xd1/0x1d0 | [] schedule+0x2c8/0x4b0 | [] process_lstart_done+0x9/0x30 | [] process_message+0xa5/0xc0 | [] serviced+0x0/0x180 | [] process_nodechange+0x2e/0x60 | [] serviced+0x163/0x180 | [] kthread+0x94/0xa0 | [] kthread+0x0/0xa0 | [] kernel_thread_helper+0x5/0x14 | Oops: 0000 [#1] | CPU: 0 | EIP: 0061:[] Not tainted VLI | EFLAGS: 00010246 (2.6.10-xen-gfs-1) | EIP is at cancel_one_uevent+0x48d/0x570 | eax: 00000000 ebx: c1211cac ecx: 00000001 edx: 00000000 | esi: 00000000 edi: c1211c60 ebp: c0cd1fc4 esp: c0cd1f5c | ds: 007b es: 007b ss: 0069 | Process cman_serviced (pid: 627, threadinfo=c0cd0000 task=c0cac600) | Stack: c1211c60 c030f600 0000000b ffffffff 80764db0 00000001 c0390ee8 c1211c60 | c0390e3c c0390ee8 c025f9b1 c02eae88 c0cd1fa0 c0260689 c0260755 00000000 | c0390f00 c0390f00 c0390ee8 00000001 c0cd1fc4 c0cd0000 c13bfebc 00000000 | Call Trace: | [] cancel_uevents+0xd1/0x1d0 | [] schedule+0x2c8/0x4b0 | [] process_lstart_done+0x9/0x30 | [] process_message+0xa5/0xc0 | [] serviced+0x0/0x180 | [] process_nodechange+0x2e/0x60 | [] serviced+0x163/0x180 | [] kthread+0x94/0xa0 | [] kthread+0x0/0xa0 | [] kernel_thread_helper+0x5/0x14 | Code: 89 f8 8d 54 24 14 c7 44 24 14 00 00 00 00 e8 0b fa ff ff 85 c0 89 c6 0f 85 cb 00 00 00 8b 43 0c e8 89 14 00 00 89 c2 8b 4c 24 14 <8b> 40 0c a8 01 0f 45 f2 85 c9 75 53 85 f6 74 2d 8b 47 14 a8 08 | dlm: test: dlm_dir_rebuild_local failed -1 They don't longer respond to xen shutdown requests. Bastian -- Lots of people drink from the wrong bottle sometimes. -- Edith Keeler, "The City on the Edge of Forever", stardate unknown -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL: From vze8dkpl at verizon.net Wed Mar 2 13:53:55 2005 From: vze8dkpl at verizon.net (vze8dkpl at verizon.net) Date: Wed, 02 Mar 2005 13:53:55 +0000 Subject: [Linux-cluster] GFS 6.0.2-24 + NFS (ALSO) Message-ID: <0ICQ009AO99VUQF0@vms040.mailsrvcs.net> Michael, thanks for the tip. Turns out this was exactly the problem. I rebuilt clumanager and things are working great. Corey > > From: Michael Conrad Tadpol Tilstra > Date: 2005/02/28 Mon PM 02:02:36 GMT > To: linux clustering > Subject: Re: [Linux-cluster] GFS 6.0.2-24 + NFS (ALSO) > > -------------- next part -------------- -- Linux-cluster mailing list Linux-cluster at redhat.com http://www.redhat.com/mailman/listinfo/linux-cluster From sunjw at onewaveinc.com Mon Mar 7 13:55:11 2005 From: sunjw at onewaveinc.com (=?gb2312?B?y++/oc6w?=) Date: Mon, 7 Mar 2005 21:55:11 +0800 Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 11, Issue 2 Message-ID: Hi, if I have a cluster in which only one node do write operations to the storage at the same time, can I use GFS without fence system? Or use the fence system without hardware (such as FC switch, power switch) involved, do the fence operations automaticly immediately after the heartbeat detects a bad node. In reverse, at what condition that data corruption would happen without fence in GFS system? Two nodes write at the same LV|GFS|file|block (which?) at the same time? Thanks for any reply! Luckey. From sunjw at onewaveinc.com Mon Mar 7 13:56:05 2005 From: sunjw at onewaveinc.com (=?GB2312?B?y++/oc6w?=) Date: Mon, 7 Mar 2005 21:56:05 +0800 Subject: [Linux-cluster] GFS without fence system? Message-ID: Hi, if I have a cluster in which only one node do write operations to the storage at the same time, can I use GFS without fence system? Or use the fence system without hardware (such as FC switch, power switch) involved, do the fence operations automaticly immediately after the heartbeat detects a bad node. In reverse, at what condition that data corruption would happen without fence in GFS system? Two nodes write at the same LV|GFS|file|block (which?) at the same time? Thanks for any reply! Luckey. From ialberdi at histor.fr Mon Mar 7 14:55:02 2005 From: ialberdi at histor.fr (Ion Alberdi) Date: Mon, 07 Mar 2005 15:55:02 +0100 Subject: [Linux-cluster] Clvm over gnbd + rgmanager Message-ID: <422C6B46.60507@histor.fr> Hi everybody, I'm now trying to use the cluster logical volume manager. (/dev/hdb) debian--------------------------->buba (/dev/gnbd/dd) | --GNBD----> | |________(/dev/hdb)____>gump(/dev/gnbd/dd) I create on buba or gump a logical volume of 1G (After launching the cluster and gnbd): buba#pvcreate /dev/gnbd/dd buba#vgcreate vg1 /dev/gnbd/dd buba#lvcreate -L1024 -n lv1 vg1 and #vgchange -a y on the three nodes, now the three nodes have /dev/vg1/lv1. On one of the nodes I create en ext3 fs: #mkfs.ext3 -j /dev/vg1/lv1 I launch the rgmanager, which has to put a basic script which writes the name of the node running the script on a file in the ext3 fs. All works well until the syslog from the node running the script shows: Mar 7 15:19:53 gump clurgmgrd[3978]: status on fs "my fs" returned 1 (generic error) /*There starts the problem I don't know why status (isMounted in /usr/share/cluster/fs.sh) returns a failure code...)*/ Mar 7 15:19:53 gump clurgmgrd[3978]: Stopping resource group hello Mar 7 15:19:55 gump clurgmgrd[3978]: Resource group hello is recovering Mar 7 15:19:55 gump clurgmgrd[3978]: Recovering failed resource group hello Mar 7 15:19:55 gump clurgmgrd[3978]: start on fs "my fs" returned 2 (invalid argument(s)) /*Syslog is wrong there because the fs.sh is not ocf compliant, in fs.sh exit 2 does not meen wrong argument, but FAIL*/ Mar 7 15:19:55 gump clurgmgrd[3978]: #68: Failed to start hello; return value: 1 Mar 7 15:19:55 gump clurgmgrd[3978]: Stopping resource group hello Mar 7 15:19:57 gump clurgmgrd[3978]: Resource group hello is recovering Mar 7 15:19:57 gump clurgmgrd[3978]: #71: Relocating failed resource group hello and on the other node: ar 7 15:23:14 buba clurgmgrd[5205]: start on script "Hello Script" returned 1 (generic error) Mar 7 15:23:14 buba clurgmgrd[5205]: #68: Failed to start hello; return value: 1 Mar 7 15:23:14 buba clurgmgrd[5205]: Stopping resource group hello Mar 7 15:23:16 buba clurgmgrd[5205]: Resource group hello is recoverin Also at this point the fs is mounted on the two nodes, which must normally never happen... Is this problem a bug from clvm or from the fs.sh script? I tried a more simple prototype, with gndb only (the script mounts /dev/gnbd/dd on the nodes and writes there) and everything works well, so I think the problem comes from clvm. There is my cluster.conf :