From stephen.rankin at stfc.ac.uk Mon Mar 10 18:15:08 2014 From: stephen.rankin at stfc.ac.uk (stephen.rankin at stfc.ac.uk) Date: Mon, 10 Mar 2014 18:15:08 +0000 Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> Message-ID: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> Hello, When using gfs2 with quotas on a SAN that is providing storage to two clustered systems running CentOS6.5, one of the systems can crash. This crash appears to be caused when a user tries to add something to a SAN disk when they have exceeded their quota on that disk. Sometimes a stack trace is produced in /var/log/messages which appears to indicate that it was gfs2 that caused the problem. At the same time you get the gfs2 stack trace you also see problems with someone exceeding their quota. The stack trace is below. Has anyone got a solution to this, other than switching of quotas? I have switched of quotas which appears to have stabilised the system so far, but I do need the quotas on. Your help is appreciated. Stephen Rankin STFC, RAL, ISIS Mar 5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660) returned NULL: Success Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid DN syntax Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed: Invalid DN syntax Mar 5 11:41:46 chadwick kernel: ------------[ cut here ]------------ Mar 5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Not tainted) Mar 5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910 Mar 5 11:41:46 chadwick kernel: list_add corruption. next->prev should be prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0). Mar 5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib] Mar 5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted 2.6.32-431.3.1.el6.x86_64 #1 Mar 5 11:41:46 chadwick kernel: Call Trace: Mar 5 11:41:46 chadwick kernel: [] ? warn_slowpath_common+0x87/0xc0 Mar 5 11:41:46 chadwick kernel: [] ? warn_slowpath_fmt+0x46/0x50 Mar 5 11:41:46 chadwick kernel: [] ? __list_add+0x6d/0xa0 Mar 5 11:41:46 chadwick kernel: [] ? new_inode+0x72/0xb0 Mar 5 11:41:46 chadwick kernel: [] ? gfs2_create_inode+0x1b5/0x1150 [gfs2] Mar 5 11:41:46 chadwick kernel: [] ? gfs2_glock_nq_init+0x16/0x40 [gfs2] Mar 5 11:41:46 chadwick kernel: [] ? gfs2_mkdir+0x24/0x30 [gfs2] Mar 5 11:41:46 chadwick kernel: [] ? security_inode_mkdir+0x1f/0x30 Mar 5 11:41:46 chadwick kernel: [] ? vfs_mkdir+0xd9/0x140 Mar 5 11:41:46 chadwick kernel: [] ? sys_mkdirat+0xc7/0x1b0 Mar 5 11:41:46 chadwick kernel: [] ? sys_mkdir+0x18/0x20 Mar 5 11:41:46 chadwick kernel: [] ? system_call_fastpath+0x16/0x1b Mar 5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]--- Mar 5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 Mar 5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1' creation detected Mar 5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt Mar 5 11:41:47 chadwick abrtd: Can't open file '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or directory Mar 5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355 -- Scanned by iCritical. -------------- next part -------------- An HTML attachment was scrubbed... URL: From adas at redhat.com Mon Mar 10 19:38:06 2014 From: adas at redhat.com (Abhijith Das) Date: Mon, 10 Mar 2014 15:38:06 -0400 (EDT) Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> Message-ID: <382425903.12347329.1394480286571.JavaMail.zimbra@redhat.com> ----- Original Message ----- > From: "stephen rankin" > To: linux-cluster at redhat.com > Sent: Monday, March 10, 2014 1:15:08 PM > Subject: [Linux-cluster] gfs2 and quotas - system crash > > Hello, > > > > When using gfs2 with quotas on a SAN that is providing storage to two > clustered systems running CentOS6.5, one of the systems > can crash. This crash appears to be caused when a user tries > to add something to a SAN disk when they have exceeded their > quota on that disk. Sometimes a stack trace is produced in /var/log/messages > which appears to indicate that it was gfs2 that caused the problem. > At the same time you get the gfs2 stack trace you also see problems > with someone exceeding their quota. > > The stack trace is below. > > Has anyone got a solution to this, other than switching of quotas? I have > switched of quotas which appears to have stabilised the system so far, but I > do need the quotas on. > > Your help is appreciated. > Hi Stephen, We have another report of this bug when gfs2 was exported using NFS. https://bugzilla.redhat.com/show_bug.cgi?id=1059808. Are you using NFS in your setup as well? We have not able to reproduce it to figure out what might be going on. Do you have a set procedure that you're able to recreate with reliably? If so, it would be of great help. Also, more info about your setup (file sizes, number of files, how many nodes mounting gfs2, what kinds of operations are being run) etc would be helpful as well. Cheers! --Abhi > Stephen Rankin > STFC, RAL, ISIS > > Mar 5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded > for user 101355 > Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660) > returned NULL: Success > Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid > DN syntax > Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed: > Invalid DN syntax > Mar 5 11:41:46 chadwick kernel: ------------[ cut here ]------------ > Mar 5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26 > __list_add+0x6d/0xa0() (Not tainted) > Mar 5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910 > Mar 5 11:41:46 chadwick kernel: list_add corruption. next->prev should be > prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0). > Mar 5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge > autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc > 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support > dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses > enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod > cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi > ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash > dm_log dm_mod [last unloaded: speedstep_lib] > Mar 5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted > 2.6.32-431.3.1.el6.x86_64 #1 > Mar 5 11:41:46 chadwick kernel: Call Trace: > Mar 5 11:41:46 chadwick kernel: [] ? > warn_slowpath_common+0x87/0xc0 > Mar 5 11:41:46 chadwick kernel: [] ? > warn_slowpath_fmt+0x46/0x50 > Mar 5 11:41:46 chadwick kernel: [] ? __list_add+0x6d/0xa0 > Mar 5 11:41:46 chadwick kernel: [] ? new_inode+0x72/0xb0 > Mar 5 11:41:46 chadwick kernel: [] ? > gfs2_create_inode+0x1b5/0x1150 [gfs2] > Mar 5 11:41:46 chadwick kernel: [] ? > gfs2_glock_nq_init+0x16/0x40 [gfs2] > Mar 5 11:41:46 chadwick kernel: [] ? gfs2_mkdir+0x24/0x30 > [gfs2] > Mar 5 11:41:46 chadwick kernel: [] ? > security_inode_mkdir+0x1f/0x30 > Mar 5 11:41:46 chadwick kernel: [] ? vfs_mkdir+0xd9/0x140 > Mar 5 11:41:46 chadwick kernel: [] ? > sys_mkdirat+0xc7/0x1b0 > Mar 5 11:41:46 chadwick kernel: [] ? sys_mkdir+0x18/0x20 > Mar 5 11:41:46 chadwick kernel: [] ? > system_call_fastpath+0x16/0x1b > Mar 5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]--- > Mar 5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded > for user 101355 > Mar 5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1' > creation detected > Mar 5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt > Mar 5 11:41:47 chadwick abrtd: Can't open file > '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or > directory > Mar 5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded > for user 101355 > > > > > -- > Scanned by iCritical. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From ajb2 at mssl.ucl.ac.uk Mon Mar 10 20:46:57 2014 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Mon, 10 Mar 2014 20:46:57 +0000 Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> Message-ID: <531E24C1.5000005@mssl.ucl.ac.uk> On 10/03/14 18:15, stephen.rankin at stfc.ac.uk wrote: > > Hello, > > When using gfs2 with quotas on a SAN that is providing storage to two > clustered systems running CentOS6.5, one of the systems > can crash. This crash appears to be caused when a user tries > to add something to a SAN disk when they have exceeded their > quota on that disk. Sometimes a stack trace is produced in > /var/log/messages > which appears to indicate that it was gfs2 that caused the problem. > At the same time you get the gfs2 stack trace you also see problems > with someone exceeding their quota. > We have exactly the same problem and an open ticket with RH support. They've been trying to finger FS corruption as the cause. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajb2 at mssl.ucl.ac.uk Mon Mar 10 20:48:29 2014 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Mon, 10 Mar 2014 20:48:29 +0000 Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> Message-ID: <531E251D.10003@mssl.ucl.ac.uk> On 10/03/14 18:15, stephen.rankin at stfc.ac.uk wrote: > > Hello, > > When using gfs2 with quotas on a SAN that is providing storage to two > clustered systems running CentOS6.5, > As a matter of interest: how are you exporting the storage, or is this integral to the cluster itself? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen.rankin at stfc.ac.uk Tue Mar 11 09:47:40 2014 From: stephen.rankin at stfc.ac.uk (stephen.rankin at stfc.ac.uk) Date: Tue, 11 Mar 2014 09:47:40 +0000 Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <531E251D.10003@mssl.ucl.ac.uk> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> <531E251D.10003@mssl.ucl.ac.uk> Message-ID: <4EC8429AA448A54D86E52F450C43247E74223C24@EXCHMBX03.fed.cclrc.ac.uk> The storage is a separate Hitachi SAN connected by 4Gig fibre channel, which itself does not report any problems when the crash happens. With the quota switched off, all is fine. From: Alan Brown [mailto:ajb2 at mssl.ucl.ac.uk] Sent: 10 March 2014 20:48 To: linux clustering Subject: Re: [Linux-cluster] gfs2 and quotas - system crash On 10/03/14 18:15, stephen.rankin at stfc.ac.uk wrote: Hello, When using gfs2 with quotas on a SAN that is providing storage to two clustered systems running CentOS6.5, As a matter of interest: how are you exporting the storage, or is this integral to the cluster itself? -- Scanned by iCritical. -------------- next part -------------- An HTML attachment was scrubbed... URL: From swhiteho at redhat.com Tue Mar 11 10:01:30 2014 From: swhiteho at redhat.com (Steven Whitehouse) Date: Tue, 11 Mar 2014 10:01:30 +0000 Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223C24@EXCHMBX03.fed.cclrc.ac.uk> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> <531E251D.10003@mssl.ucl.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223C24@EXCHMBX03.fed.cclrc.ac.uk> Message-ID: <1394532090.2747.5.camel@menhir> Hi, On Tue, 2014-03-11 at 09:47 +0000, stephen.rankin at stfc.ac.uk wrote: > The storage is a separate Hitachi SAN connected by 4Gig fibre channel, > which itself does not report any problems when the crash happens. With > the quota switched off, all is fine. > Are you exporting that GFS2 filesystem via NFS, or can you reproduce this without NFS? Also, what kind of workload is involved? Is this a lot of small files, or are they mostly larger ones? Are they being read/written sequentially or randomly? Is there anything unusual going on (e.g. use of splice, ACLs or non-standard mount options, etc.) Steve. From stephen.rankin at stfc.ac.uk Tue Mar 11 10:43:24 2014 From: stephen.rankin at stfc.ac.uk (stephen.rankin at stfc.ac.uk) Date: Tue, 11 Mar 2014 10:43:24 +0000 Subject: [Linux-cluster] gfs2 and quotas - system crash In-Reply-To: <382425903.12347329.1394480286571.JavaMail.zimbra@redhat.com> References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk> <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk> <382425903.12347329.1394480286571.JavaMail.zimbra@redhat.com> Message-ID: <4EC8429AA448A54D86E52F450C43247E74223C93@EXCHMBX03.fed.cclrc.ac.uk> No we are not using NFS. Our setup is: 1. Two node cluster with two node option. 2. Hitachi SAN (RAID 6) connected to both nodes via 4 Gig. 3. 10TB, two 4TB and one 2TB disks to each node using gfs2 (separate file systems on each disk) with user quota enabled. Only the two nodes in the cluster mount the drives. 4. User fills up their quota on the 10TB disk and the system crashes (which appears to be a consistent outcome). The quota was only 10G for the user so they were not using a vast amount of space. In total 5TB is currently used on the drive: Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_chadwick-LogVol00 9.8G 3.4G 6.0G 36% / tmpfs 253G 47M 253G 1% /dev/shm /dev/mapper/mpathap3 1008M 148M 810M 16% /boot /dev/mapper/vg_chadwick-LogVol06 11T 4.7T 5.8T 45% /home /dev/mapper/vg_chadwick-LogVol05 9.8G 7.5G 1.8G 82% /opt /dev/mapper/vg_chadwick-LogVol01 5.0G 140M 4.6G 3% /tmp /dev/mapper/vg_chadwick-LogVol02 9.8G 8.3G 976M 90% /usr /dev/mapper/vg_chadwick-LogVol03 5.0G 2.7G 2.1G 57% /var /dev/mapper/sanvg1-sanlv1 4.0T 2.9T 1.2T 71% /san1 /dev/mapper/sanvg2-sanlv2 4.0T 3.2T 851G 80% /san2 /dev/mapper/sanvg3-sanlv3 2.0T 1.8T 259G 88% /san3 /dev/mapper/sanvg4-lvol0 10T 5.1T 5.0T 51% /san4 Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/vg_chadwick-LogVol00 647168 54317 592851 9% / tmpfs 66157732 58 66157674 1% /dev/shm /dev/mapper/mpathap3 65536 62 65474 1% /boot /dev/mapper/vg_chadwick-LogVol06 749502464 1002734 748499730 1% /home /dev/mapper/vg_chadwick-LogVol05 647168 236023 411145 37% /opt /dev/mapper/vg_chadwick-LogVol01 327680 378 327302 1% /tmp /dev/mapper/vg_chadwick-LogVol02 647168 318728 328440 50% /usr /dev/mapper/vg_chadwick-LogVol03 327680 7228 320452 3% /var /dev/mapper/sanvg1-sanlv1 320266537 140997 320125540 1% /san1 /dev/mapper/sanvg2-sanlv2 223028034 44074 222983960 1% /san2 /dev/mapper/sanvg3-sanlv3 67820453 8357 67812096 1% /san3 /dev/mapper/sanvg4-lvol0 1336002497 392526 1335609971 1% /san4 Thanks, Stephen. -----Original Message----- From: Abhijith Das [mailto:adas at redhat.com] Sent: 10 March 2014 19:38 To: linux clustering Subject: Re: [Linux-cluster] gfs2 and quotas - system crash ----- Original Message ----- > From: "stephen rankin" > To: linux-cluster at redhat.com > Sent: Monday, March 10, 2014 1:15:08 PM > Subject: [Linux-cluster] gfs2 and quotas - system crash > > Hello, > > > > When using gfs2 with quotas on a SAN that is providing storage to two > clustered systems running CentOS6.5, one of the systems can crash. > This crash appears to be caused when a user tries to add something to > a SAN disk when they have exceeded their quota on that disk. Sometimes > a stack trace is produced in /var/log/messages which appears to > indicate that it was gfs2 that caused the problem. > At the same time you get the gfs2 stack trace you also see problems > with someone exceeding their quota. > > The stack trace is below. > > Has anyone got a solution to this, other than switching of quotas? I > have switched of quotas which appears to have stabilised the system so > far, but I do need the quotas on. > > Your help is appreciated. > Hi Stephen, We have another report of this bug when gfs2 was exported using NFS. https://bugzilla.redhat.com/show_bug.cgi?id=1059808. Are you using NFS in your setup as well? We have not able to reproduce it to figure out what might be going on. Do you have a set procedure that you're able to recreate with reliably? If so, it would be of great help. Also, more info about your setup (file sizes, number of files, how many nodes mounting gfs2, what kinds of operations are being run) etc would be helpful as well. Cheers! --Abhi > Stephen Rankin > STFC, RAL, ISIS > > Mar 5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota > exceeded for user 101355 Mar 5 11:40:50 chadwick nslcd[11420]: > [767df3] ldap_explode_dn(usi660) returned NULL: Success Mar 5 > 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid > DN syntax Mar 5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of > user usi660 failed: > Invalid DN syntax > Mar 5 11:41:46 chadwick kernel: ------------[ cut here ]------------ > Mar 5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26 > __list_add+0x6d/0xa0() (Not tainted) > Mar 5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910 Mar 5 > 11:41:46 chadwick kernel: list_add corruption. next->prev should be > prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0). > Mar 5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs > bridge > autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe > libfc 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt > iTCO_vendor_support dcdbas serio_raw ixgbe dca ptp pps_core mdio > lpc_ich mfd_core sg ses enclosure i7core_edac edac_core bnx2 ext4 jbd2 > mbcache dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx > scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas > dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: > speedstep_lib] Mar 5 11:41:46 chadwick kernel: Pid: 74823, comm: > vncserver Not tainted > 2.6.32-431.3.1.el6.x86_64 #1 > Mar 5 11:41:46 chadwick kernel: Call Trace: > Mar 5 11:41:46 chadwick kernel: [] ? > warn_slowpath_common+0x87/0xc0 > Mar 5 11:41:46 chadwick kernel: [] ? > warn_slowpath_fmt+0x46/0x50 > Mar 5 11:41:46 chadwick kernel: [] ? > __list_add+0x6d/0xa0 Mar 5 11:41:46 chadwick kernel: > [] ? new_inode+0x72/0xb0 Mar 5 11:41:46 chadwick kernel: [] ? > gfs2_create_inode+0x1b5/0x1150 [gfs2] > Mar 5 11:41:46 chadwick kernel: [] ? > gfs2_glock_nq_init+0x16/0x40 [gfs2] > Mar 5 11:41:46 chadwick kernel: [] ? > gfs2_mkdir+0x24/0x30 [gfs2] Mar 5 11:41:46 chadwick kernel: > [] ? > security_inode_mkdir+0x1f/0x30 > Mar 5 11:41:46 chadwick kernel: [] ? > vfs_mkdir+0xd9/0x140 Mar 5 11:41:46 chadwick kernel: [] ? > sys_mkdirat+0xc7/0x1b0 > Mar 5 11:41:46 chadwick kernel: [] ? > sys_mkdir+0x18/0x20 Mar 5 11:41:46 chadwick kernel: [] ? > system_call_fastpath+0x16/0x1b > Mar 5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]--- > Mar 5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota > exceeded for user 101355 Mar 5 11:41:47 chadwick abrtd: Directory > 'oops-2014-03-05-11:41:47-12194-1' > creation detected > Mar 5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to > Abrt Mar 5 11:41:47 chadwick abrtd: Can't open file > '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file > or directory Mar 5 11:41:54 chadwick kernel: GFS2: > fsid=analysis:lvol0.1: quota exceeded for user 101355 > > > > > -- > Scanned by iCritical. > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -- Scanned by iCritical. From eranb at celltick.com Tue Mar 11 12:02:41 2014 From: eranb at celltick.com (Eran Ben Natan) Date: Tue, 11 Mar 2014 12:02:41 +0000 Subject: [Linux-cluster] A Newbie question about HA fail over Message-ID: <705C9B8622696B478640B16B7A1A1B94106EF8@Cobra.celltick.com> Hi, I have just set up a 2 nodes RH cluster with MySQL. I was able to start the service and relocate it to the other node. When I restart the active node, MySQL relocates automatically to the other node, but when I disconnect the active node from the network, it doesn't. Is this behavior normal? How can I set the resource to relocate in this situation? Thanks, Eran Ben-Natan | R&D Infrastructure -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Tue Mar 11 13:43:53 2014 From: emi2fast at gmail.com (emmanuel segura) Date: Tue, 11 Mar 2014 14:43:53 +0100 Subject: [Linux-cluster] A Newbie question about HA fail over In-Reply-To: <705C9B8622696B478640B16B7A1A1B94106EF8@Cobra.celltick.com> References: <705C9B8622696B478640B16B7A1A1B94106EF8@Cobra.celltick.com> Message-ID: maybe you are missing to show your cluster.conf and tell us what network of cluster you disconneted 2014-03-11 13:02 GMT+01:00 Eran Ben Natan : > Hi, > > > > I have just set up a 2 nodes RH cluster with MySQL. I was able to start > the service and relocate it to the other node. > > When I restart the active node, MySQL relocates automatically to the other > node, but when I disconnect the active node from the network, it doesn't. > > Is this behavior normal? How can I set the resource to relocate in this > situation? > > > > Thanks, > > > > *Eran Ben-Natan | R&D Infrastructure* > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From lipson12 at yahoo.com Wed Mar 12 06:31:27 2014 From: lipson12 at yahoo.com (Kaisar Ahmed Khan) Date: Tue, 11 Mar 2014 23:31:27 -0700 (PDT) Subject: [Linux-cluster] iscsi sysmlink create problem In-Reply-To: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com> References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com> Message-ID: <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com> Dear Experts : ?following rule is not working to create symlink for iscsi disk , my iscsi device ID /dev/sda and want to link as /dev/iscsi/vendor_kernel please guide me if i miss anything . ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK", SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664" Thanks ?kaisar -------------- next part -------------- An HTML attachment was scrubbed... URL: From emi2fast at gmail.com Wed Mar 12 11:43:37 2014 From: emi2fast at gmail.com (emmanuel segura) Date: Wed, 12 Mar 2014 12:43:37 +0100 Subject: [Linux-cluster] iscsi sysmlink create problem In-Reply-To: <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com> References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com> <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com> Message-ID: but it's a cluster problem? ummm 2014-03-12 7:31 GMT+01:00 Kaisar Ahmed Khan : > > > Dear Experts : > > following rule is not working to create symlink for iscsi disk , my iscsi > device ID /dev/sda and want to link as /dev/iscsi/vendor_kernel > > please guide me if i miss anything . > > ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK", > SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664" > > Thanks > kaisar > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- esta es mi vida e me la vivo hasta que dios quiera -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekuric at redhat.com Wed Mar 12 12:01:44 2014 From: ekuric at redhat.com (Elvir Kuric) Date: Wed, 12 Mar 2014 13:01:44 +0100 Subject: [Linux-cluster] iscsi sysmlink create problem In-Reply-To: References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com> <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com> Message-ID: <53204CA8.2060300@redhat.com> On 03/12/2014 12:43 PM, emmanuel segura wrote: > but it's a cluster problem? ummm > > > 2014-03-12 7:31 GMT+01:00 Kaisar Ahmed Khan >: > > > > Dear Experts : > > following rule is not working to create symlink for iscsi disk , > my iscsi device ID /dev/sda and want to link as > /dev/iscsi/vendor_kernel > > please guide me if i miss anything . > > ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK", > SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664" > > Thanks > kaisar > > > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > > if you share more information with us ( os version ) it could help. Also below outputs can help to understand how system see device if rhel 5 ( and clones ) #udevinfo -a -p $(udevinfo -q path -n /dev/DEVICE) if rhel 6 ( and clones ) # udevadm info --query=all -n /dev/DEVICE --attribute-walk where DEVICE is device you want to write udev rule for. Kind regards, -- Elvir Kuric,TSE / Red Hat / GSS EMEA / -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Vallevand at UNISYS.com Wed Mar 12 14:43:41 2014 From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K) Date: Wed, 12 Mar 2014 09:43:41 -0500 Subject: [Linux-cluster] Resource placement rules Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E0C32347@USEA-EXCH8.na.uis.unisys.com> I have resources A, B, C and D. (Or more.) All are using agent X. Is there a way to simply specify that resources A, B, C and D must each run on a different node? I can create a series of negative infinity collocation rules something like: collocation c1 -inf: A ( B C D ) collocation c2 -inf: B ( A C D ) collocation c3 -inf: C ( A B D ) collocation c4 -inf: D ( A B C ) Is that my choice? Will that have the affect I want? Is there a more concise way to specify it? It would be nice to just set an attribute saying any resource using agent X must run on its own node. Regards. Mark K Vallevand Mark.Vallevand at Unisys.com May you live in interesting times, may you come to the attention of important people and may all your wishes come true. THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From andreas at hastexo.com Thu Mar 13 10:34:33 2014 From: andreas at hastexo.com (Andreas Kurz) Date: Thu, 13 Mar 2014 11:34:33 +0100 Subject: [Linux-cluster] Resource placement rules In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E0C32347@USEA-EXCH8.na.uis.unisys.com> References: <99C8B2929B39C24493377AC7A121E21FC5E0C32347@USEA-EXCH8.na.uis.unisys.com> Message-ID: <532189B9.2060108@hastexo.com> On 2014-03-12 15:43, Vallevand, Mark K wrote: > I have resources A, B, C and D. (Or more.) All are using agent X. Is > there a way to simply specify that resources A, B, C and D must each run > on a different node? > > I can create a series of negative infinity collocation rules something like: > > collocation c1 ?inf: A ( B C D ) > > collocation c2 ?inf: B ( A C D ) > > collocation c3 ?inf: C ( A B D ) > > collocation c4 ?inf: D ( A B C ) > > Is that my choice? Will that have the affect I want? Is there a more > concise way to specify it? > > It would be nice to just set an attribute saying any resource using > agent X must run on its own node. All resources use agentX ... what keeps you from using a clone resource? Regards, Andreas > > > > Regards. > Mark K Vallevand Mark.Vallevand at Unisys.com > > > May you live in interesting times, may you come to the attention of > important people and may all your wishes come true. > > THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY > MATERIAL and is thus for use only by the intended recipient. If you > received this in error, please contact the sender and delete the e-mail > and its attachments from all computers. > > > > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 287 bytes Desc: OpenPGP digital signature URL: From pine5514 at gmail.com Tue Mar 18 13:38:00 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Tue, 18 Mar 2014 17:08:00 +0330 Subject: [Linux-cluster] unformat gfs2 Message-ID: I have accidentally reformatted a GFS cluster. We need to unformat it.. is there any way to recover disk ? I read this post http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6 it say that I can use gfs2_edit to recover data. I need more details about changing block map to 0xff tnx -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Tue Mar 18 16:10:30 2014 From: lists at alteeve.ca (Digimer) Date: Tue, 18 Mar 2014 12:10:30 -0400 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: Message-ID: <53286FF6.40707@alteeve.ca> On 18/03/14 09:38 AM, Mr.Pine wrote: > I have accidentally reformatted a GFS cluster. > We need to unformat it.. is there any way to recover disk ? > > I read this post > http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6 > > it say that I can use gfs2_edit to recover data. > I need more details about changing block map to 0xff > > tnx Do you have a support agreement with Red Hat? If so, open a ticket with them. If not, then you can try also asking for help in freenode's #linux-cluster channel. It says "no gfs support", but that's to prevent confusion with tracking open tickets, which won't apply if you don't have official red hat support. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From rpeterso at redhat.com Tue Mar 18 16:38:07 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 18 Mar 2014 12:38:07 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: Message-ID: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> ----- Original Message ----- > I have accidentally reformatted a GFS cluster. > We need to unformat it.. is there any way to recover disk ? > > I read this post > http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6 > > it say that I can use gfs2_edit to recover data. > I need more details about changing block map to 0xff > > tnx Hi, Sorry to hear about your file system mishap. It's not clear from your post whether you mean GFS or GFS2. Your subject like says GFS2, but your comment said you reformatted it to GFS. So my first questions are: What was it really? and What is it now? Assuming it was, and still is, gfs2, there's another important question: Was the file system _ever_ grown via gfs2_grow since the very first mkfs? If so, the subsequent mkfs would most likely place the resource groups in a different location, so the file system would be damaged beyond repair. The next important question is: did you override any of the mkfs.gfs2 parameters either the first time or the second time, like -b, -J, -j, or -r? Once again, if you specified a different block size (-b) or resource group size, the second mkfs.gfs2 would have placed the resource groups in different locations, once again damaging the original contents beyond repair. The third important question is: Was the device altered in any other way, for example, mkfs.ext4 or mkfs.xfs, which might have changed things? If so, it's probably done irreparable damage. However, if you never ran gfs2_grow, and never overrode -b or -r during either mkfs, the mkfs would likely have placed the resource groups in the exact same locations as it did the first time. In that case, you might be able to repair the file system by doing what you describe: Setting all the bits in the bitmaps to 0xff, then letting fsck.gfs2 sort it out. Unfortunately, there is no tool that can do this en-mass. You could manually set the bits to 0xff with gfs2_edit, but depending on the size of the file system, it would take a very long time. If it was my valuable data, and I had no backup, I would first make a block-for-block copy of the entire device so I had a sandbox to run experiments on. Next, I'd write a program that opened the block device, did a block-by-block search for GFS2 dinodes, then twiddle that block's bitmap from 0 to 3. Then I'd run fsck.gfs2 to see how well it can put the pieces back together. That program would have to be a hybrid with pieces pulled from fsck.gfs2 and gfs2_edit. It's no small task, and you'd have to know what you're doing. Unfortunately, there are only a handful of programmers who know enough about this to do it correctly (I'm one of them). All of them work for Red Hat. As much as it sounds like a fun project, it would probably be considered a conflict of interest, unless you somehow hired Red Hat and got my management involved. If I got their blessing, I'd be happy to do this. The trouble is, even with such a program, there are no guarantees that your file system would be back the way it used to be. It would likely be cheaper and better to restore from a backup. Regards, Bob Peterson Red Hat File Systems From pine5514 at gmail.com Tue Mar 18 17:11:27 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Tue, 18 Mar 2014 20:41:27 +0330 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> Message-ID: On Tue, Mar 18, 2014 at 8:08 PM, Bob Peterson wrote: > > ----- Original Message ----- > > I have accidentally reformatted a GFS cluster. > > We need to unformat it.. is there any way to recover disk ? > > > > I read this post > > http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6 > > > > it say that I can use gfs2_edit to recover data. > > I need more details about changing block map to 0xff > > > > tnx > > Hi, > > Sorry to hear about your file system mishap. > > It's not clear from your post whether you mean GFS or GFS2. Your subject > like says GFS2, but your comment said you reformatted it to GFS. > So my first questions are: What was it really? and What is it now? It's GFS2 . > > Assuming it was, and still is, gfs2, there's another important question: > Was the file system _ever_ grown via gfs2_grow since the very first mkfs? > If so, the subsequent mkfs would most likely place the resource groups in > a different location, so the file system would be damaged beyond repair. Without any use of gfs2_grow, > > The next important question is: did you override any of the mkfs.gfs2 > parameters either the first time or the second time, like -b, -J, -j, > or -r? Once again, if you specified a different block size (-b) or > resource group size, the second mkfs.gfs2 would have placed the > resource groups in different locations, once again damaging the original > contents beyond repair. > All options are equal .. > The third important question is: Was the device altered in any other > way, for example, mkfs.ext4 or mkfs.xfs, which might have changed things? > If so, it's probably done irreparable damage No, . > > However, if you never ran gfs2_grow, and never overrode -b or -r during > either mkfs, the mkfs would likely have placed the resource groups in the > exact same locations as it did the first time. In that case, you might > be able to repair the file system by doing what you describe: Setting > all the bits in the bitmaps to 0xff, then letting fsck.gfs2 sort it out. > > Unfortunately, there is no tool that can do this en-mass. > You could manually set the bits to 0xff with gfs2_edit, but depending > on the size of the file system, it would take a very long time. > > If it was my valuable data, and I had no backup, I would first > make a block-for-block copy of the entire device so I had a sandbox > to run experiments on. Next, I'd write a program that opened the block > device, did a block-by-block search for GFS2 dinodes, then twiddle that > block's bitmap from 0 to 3. Then I'd run fsck.gfs2 to see how well it can There is many many block types (http://linux.die.net/man/8/gfs2_edit), Do we find 4 (Dinode)? instead of for example 3 (Resource Group Bitmap)? How we could fine bitmaps? > put the pieces back together. That program would have to be a hybrid > with pieces pulled from fsck.gfs2 and gfs2_edit. It's no small task, > and you'd have to know what you're doing. Unfortunately, there are only > a handful of programmers who know enough about this to do it correctly > (I'm one of them). All of them work for Red Hat. As much as it sounds > like a fun project, it would probably be considered a conflict of > interest, unless you somehow hired Red Hat and got my management > involved. If I got their blessing, I'd be happy to do this. The trouble is, > even with such a program, there are no guarantees that your file system > would be back the way it used to be. It would likely be cheaper and > better to restore from a backup. > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Tue Mar 18 17:30:44 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 18 Mar 2014 13:30:44 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> Message-ID: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> ----- Original Message ----- > It's GFS2 . > Without any use of gfs2_grow, > All options are equal .. > No, > > There is many many block types (http://linux.die.net/man/8/gfs2_edit), > Do we find 4 (Dinode)? instead of for example 3 (Resource Group Bitmap)? > > How we could fine bitmaps? Hi, To use gfs2_edit properly, you should have an understanding of how the gfs2 file system is kept on disk. If you are a Red Hat customer, I have several videos on the Red Hat customer portal on how to use gfs2_edit. The dinode blocks are type 4. There are two kinds of bitmaps: bitmaps associated with rgrps (type 2) and bitmaps that follow rgrps (type 3). The rgrps are indexed by the rindex system file in the master directory, and ri_length tells you how many bitmap blocks follow each rgrp block. In newer versions (RHEL6+) there are little helper functions in gfs2_edit that can tell you the bitmap status, and alter it. For example: gfs2_edit -p blocktype /dev/your/device This command will tell you the block type, for example: # gfs2_edit -p root blocktype /dev/mpathc/scratch 4 (Block 22 is type 4: Dinode) gfs2_edit -p blockalloc /dev/your/device This command will tell you the current bitmap setting. The bitmap setting may be changed to "3" (dinode) with: gfs2_edit -p blockalloc 3 /dev/your/device So you could write a script to do it, but again, you would have to be careful, and work on a copy, never the original. Regards, Bob Peterson Red Hat File Systems From pine5514 at gmail.com Tue Mar 18 17:43:23 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Tue, 18 Mar 2014 21:13:23 +0330 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> Message-ID: On Tue, Mar 18, 2014 at 9:00 PM, Bob Peterson wrote: > ----- Original Message ----- >> It's GFS2 . >> Without any use of gfs2_grow, >> All options are equal .. >> No, >> >> There is many many block types (http://linux.die.net/man/8/gfs2_edit), >> Do we find 4 (Dinode)? instead of for example 3 (Resource Group Bitmap)? >> >> How we could fine bitmaps? > > Hi, > > To use gfs2_edit properly, you should have an understanding of how the > gfs2 file system is kept on disk. If you are a Red Hat customer, I have > several videos on the Red Hat customer portal on how to use gfs2_edit. > > The dinode blocks are type 4. There are two kinds of bitmaps: bitmaps > associated with rgrps (type 2) and bitmaps that follow rgrps (type 3). > The rgrps are indexed by the rindex system file in the master directory, > and ri_length tells you how many bitmap blocks follow each rgrp block. > > In newer versions (RHEL6+) there are little helper functions in gfs2_edit > that can tell you the bitmap status, and alter it. For example: > > gfs2_edit -p blocktype /dev/your/device > > This command will tell you the block type, for example: > # gfs2_edit -p root blocktype /dev/mpathc/scratch > 4 (Block 22 is type 4: Dinode) > > gfs2_edit -p blockalloc /dev/your/device > This command will tell you the current bitmap setting. The bitmap setting > may be changed to "3" (dinode) with: > gfs2_edit -p blockalloc 3 /dev/your/device > > So you could write a script to do it, but again, you would have to be > careful, and work on a copy, never the original. > > Regards, > > Bob Peterson > Red Hat File Systems > What is your opinion about this scrip? for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1; done Could we change all of block allocations to "3"? > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From rpeterso at redhat.com Tue Mar 18 17:55:48 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 18 Mar 2014 13:55:48 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> Message-ID: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> ----- Original Message ----- > What is your opinion about this scrip? > > for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3 > /dev/sdb >/dev/null 2>&1; done > > Could we change all of block allocations to "3"? Hi, That would be dangerous. I would hope that the resource groups would be ignored, but I've never done it. I wouldn't be surprised if you had gfs2_edit segfault for some of it. At the very least, you would be turning all your journal blocks to appear like dinodes, as well as all the extended attributes, directory leaf blocks, etc., which will confuse fsck.gfs2. The fsck.gfs2 will do a much better job if you only change the dinode blocks from 0 to 3. You would be much better off writing a loop that first checked the block's current type, with -p blocktype, and only change its bit to 3 if it is type 4 (dinode). Also, the script would take a very long time, because it's going to invoke gfs2_edit a trillion and a half times. Writing a program to do this once would be quicker. Regards, Bob Peterson Red Hat File Systems From rpeterso at redhat.com Tue Mar 18 18:06:40 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 18 Mar 2014 14:06:40 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> Message-ID: <296851692.1537300.1395166000377.JavaMail.zimbra@redhat.com> ----- Original Message ----- > ----- Original Message ----- > > What is your opinion about this scrip? > > > > for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3 > > /dev/sdb >/dev/null 2>&1; done > > > > Could we change all of block allocations to "3"? > > Hi, > > gfs2_edit segfault for some of it. At the very least, you would be turning > all your journal blocks to appear like dinodes, as well as all the This is worth clarifying: You should be careful with the journals. The journal blocks may look like dinodes, but they should be marked as data blocks. That's because the journal's data can contain dinodes. Will fsck.gfs2 figure it out properly? I don't know. It might, but it might not. Better to be safe with your data. Regards, Bob Peterson Red Hat File Systems From pine5514 at gmail.com Tue Mar 18 18:11:07 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Tue, 18 Mar 2014 21:41:07 +0330 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> Message-ID: On Tue, Mar 18, 2014 at 9:25 PM, Bob Peterson wrote: > ----- Original Message ----- >> What is your opinion about this scrip? >> >> for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3 >> /dev/sdb >/dev/null 2>&1; done >> >> Could we change all of block allocations to "3"? > > Hi, > > That would be dangerous. I would hope that the resource groups would be > ignored, but I've never done it. I wouldn't be surprised if you had > gfs2_edit segfault for some of it. At the very least, you would be turning > all your journal blocks to appear like dinodes, as well as all the > extended attributes, directory leaf blocks, etc., which will confuse > fsck.gfs2. The fsck.gfs2 will do a much better job if you only change > the dinode blocks from 0 to 3. > > You would be much better off writing a loop that first checked the > block's current type, with -p blocktype, and only > change its bit to 3 if it is type 4 (dinode). > > Also, the script would take a very long time, because it's going to > invoke gfs2_edit a trillion and a half times. Writing a program to > do this once would be quicker. > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Hi, Do you mean program likes this? for ((i = 17; i < 1756377984; ++i)); do ss=$(gfs2_edit -p $i blocktype /dev/sdc | cut -d " " -f 1); if [[ $ss -eq 4 ]]; then gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1; fi done I'm a C/C++ programmer, if you trust program logic, i would try to implement with C/C++. and would public in reply. Regards. Pine. From rpeterso at redhat.com Tue Mar 18 18:22:19 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 18 Mar 2014 14:22:19 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> Message-ID: <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Hi, > > Do you mean program likes this? > > for ((i = 17; i < 1756377984; ++i)); do > ss=$(gfs2_edit -p $i blocktype /dev/sdc | cut -d " " -f 1); > if [[ $ss -eq 4 ]]; then > gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1; > fi > done > > I'm a C/C++ programmer, if you trust program logic, i would try > to implement with C/C++. and would public in reply. > > Regards. > Pine. Hi, Yes, you can do something like that, but again, do not include the journal's blocks. You can do gfs2_edit -p master /dev/sdc to determine the block of the quota file, which should be past the journals. Then use that value for the starting point of i. For example: # gfs2_edit -p master /dev/mpathc/scratch | grep quota 8/8 [6c1c0fed] 12/33132 (0xc/0x816c): File quota for ((i = 33133; ... Regards, Bob Peterson Red Hat File Systems From pine5514 at gmail.com Tue Mar 18 18:31:24 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Tue, 18 Mar 2014 22:01:24 +0330 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> References: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com> <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> Message-ID: On Tue, Mar 18, 2014 at 9:52 PM, Bob Peterson wrote: > ----- Original Message ----- >> Hi, >> >> Do you mean program likes this? >> >> for ((i = 17; i < 1756377984; ++i)); do >> ss=$(gfs2_edit -p $i blocktype /dev/sdc | cut -d " " -f 1); >> if [[ $ss -eq 4 ]]; then >> gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1; >> fi >> done >> >> I'm a C/C++ programmer, if you trust program logic, i would try >> to implement with C/C++. and would public in reply. >> >> Regards. >> Pine. > > Hi, > > Yes, you can do something like that, but again, do not include the > journal's blocks. You can do gfs2_edit -p master /dev/sdc to > determine the block of the quota file, which should be past the > journals. Then use that value for the starting point of i. > For example: > # gfs2_edit -p master /dev/mpathc/scratch | grep quota > 8/8 [6c1c0fed] 12/33132 (0xc/0x816c): File quota > for ((i = 33133; ... > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Hi, Output of this command on my system: gfs2_edit -p master /dev/sdb | grep quota 8. (8). 264950 (0x40af6): File quota Do you mean "i" would start from 264950? All blocks before 264950 are journal blocks? Regards Pine. From rpeterso at redhat.com Tue Mar 18 18:36:08 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Tue, 18 Mar 2014 14:36:08 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> Message-ID: <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Hi, > > Output of this command on my system: > > gfs2_edit -p master /dev/sdb | grep quota > 8. (8). 264950 (0x40af6): File quota > > Do you mean "i" would start from 264950? > All blocks before 264950 are journal blocks? > > Regards > Pine. Yes, exactly. You could try that and see how well it works, but again, it might take a very long time to do more than 3 trillion invocations of the program. Regards, Bob Peterson Red Hat File Systems From pine5514 at gmail.com Tue Mar 18 18:42:47 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Tue, 18 Mar 2014 22:12:47 +0330 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> References: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> Message-ID: Tnx alot, I would run script and send back result. Regards. Pine. On Tue, Mar 18, 2014 at 10:06 PM, Bob Peterson wrote: > ----- Original Message ----- >> Hi, >> >> Output of this command on my system: >> >> gfs2_edit -p master /dev/sdb | grep quota >> 8. (8). 264950 (0x40af6): File quota >> >> Do you mean "i" would start from 264950? >> All blocks before 264950 are journal blocks? >> >> Regards >> Pine. > > Yes, exactly. You could try that and see how well it works, > but again, it might take a very long time to do more than > 3 trillion invocations of the program. > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Wed Mar 19 01:27:04 2014 From: lists at alteeve.ca (Digimer) Date: Tue, 18 Mar 2014 21:27:04 -0400 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' Message-ID: <5328F268.9080605@alteeve.ca> Hi all, I would like to tell rgmanager to give more time for VMs to stop. I want this: I already use ccs to create the entry: via: ccs -h localhost --activate --sync --password "secret" \ --addvm vm01-win2008 \ --domain="primary_n01" \ path="/shared/definitions/" \ autostart="0" \ exclusive="0" \ recovery="restart" \ max_restarts="2" \ restart_expire_time="600" I'm hoping it's a simple additional switch. :) Thanks! -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From pine5514 at gmail.com Wed Mar 19 06:23:14 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Wed, 19 Mar 2014 10:53:14 +0430 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com> <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> Message-ID: On Tue, Mar 18, 2014 at 11:12 PM, Mr.Pine wrote: > Tnx alot, > > I would run script and send back result. > Regards. > Pine. > > Hi, Scripts is very very slow, so i should write program in c/c++. I need some confidence about data structures and data location on disk. As i reviewed blocks of data: All reserved blocks (GFS2 specific blocks) start by : 0x01161970 Blocktype store location is at Byte # 8, Type of start block of each resource group is: 2 Bitmaps are in block types 2 & 3. In block type 2, bitmap info starts from Byte # 129 In block type 3, bitmap info starts from Byte # 25 Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex /dev/..) Is this info right? Logic of my program seams should be like this: (1) Loop in device and temporary store block id of dinode blocks, and also their bitmap locations (2) Change bitmap of blocks to 3 (11) Bob, could you confirm this? Regards Pine. From rpeterso at redhat.com Wed Mar 19 12:28:38 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Wed, 19 Mar 2014 08:28:38 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> Message-ID: <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Hi, > > Scripts is very very slow, so i should write program in c/c++. > > I need some confidence about data structures and data location on disk. > As i reviewed blocks of data: > > All reserved blocks (GFS2 specific blocks) start by : 0x01161970 > Blocktype store location is at Byte # 8, > Type of start block of each resource group is: 2 > Bitmaps are in block types 2 & 3. > In block type 2, bitmap info starts from Byte # 129 > In block type 3, bitmap info starts from Byte # 25 > Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex > /dev/..) > > Is this info right? > > Logic of my program seams should be like this: > > (1) > Loop in device and temporary store block id of dinode blocks, and also > their bitmap locations > > (2) > Change bitmap of blocks to 3 (11) > > Bob, could you confirm this? > > Regards > Pine. Hi Pine, This is correct. The length of RGs is properly determined by the values in the "rindex" system file, but 5 is very common, and is usually constant. (It may change if you used gfs2_grow or gfs2_convert from gfs1). The bitmap is 2 bits per block in the resource group, and it's relative to the start of the particular rgrp. You should probably use the same algorithm in libgfs2 to change the proper bit in the bitmaps. You can get this from the public gfs2-utils git tree. Regards, Bob Peterson Red Hat File Systems From mgrac at redhat.com Wed Mar 19 14:37:50 2014 From: mgrac at redhat.com (Marek Grac) Date: Wed, 19 Mar 2014 15:37:50 +0100 Subject: [Linux-cluster] joining the gitfence-agents group In-Reply-To: References: Message-ID: <5329ABBE.70304@redhat.com> On 03/18/2014 12:20 AM, David Smith wrote: > any chance you can sponsor and approve so i can submit the code via git? > > or, if you prefer, I can send you the code modifications. > Hi, sorry for late response, The write access to git repository is still limited to very small group of people and I will be happy to add you there after you become regular contributor. Currently, please send a patch to cluster-devel at redhat.com where code review will be done. After that review, we will add your code into upstream using git-am so you will be preserved as author. m, From ajb2 at mssl.ucl.ac.uk Wed Mar 19 15:16:57 2014 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Wed, 19 Mar 2014 15:16:57 +0000 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: Message-ID: <5329B4E9.9070400@site.mssl.ucl.ac.uk> On 18/03/14 13:38, Mr.Pine wrote: > I have accidentally reformatted a GFS cluster. > We need to unformat it.. is there any way to recover disk ? Backups? From Mark.Vallevand at UNISYS.com Wed Mar 19 19:55:44 2014 From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K) Date: Wed, 19 Mar 2014 14:55:44 -0500 Subject: [Linux-cluster] Resource instance is getting restarted when a node is rebooted Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E11149A7@USEA-EXCH8.na.uis.unisys.com> I'm testing my cluster configuration by rebooting nodes to see what happens. I can't explain what I see in some cases. The setup: I have a cloned resource with its own agent and an IP address resource that is collocated with the cloned resource. The IP address doesn't need to run on all of the nodes running an instance of the cloned resource. It just needs to be on one of the nodes. It's not cloned or meant to be load-balanced. I do something like this: crm -F configure < May you live in interesting times, may you come to the attention of important people and may all your wishes come true. THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Mark.Vallevand at UNISYS.com Wed Mar 19 20:52:40 2014 From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K) Date: Wed, 19 Mar 2014 15:52:40 -0500 Subject: [Linux-cluster] Resource instance is getting restarted when a node is rebooted In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E11149A7@USEA-EXCH8.na.uis.unisys.com> References: <99C8B2929B39C24493377AC7A121E21FC5E11149A7@USEA-EXCH8.na.uis.unisys.com> Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E1114AF9@USEA-EXCH8.na.uis.unisys.com> Never mind. It's like one of Murphy's Laws, or at least a Murphy's Corollary. As soon as you ask for help and describe a problem in some detail, the answer becomes obvious. It's the order command. Duh. Regards. Mark K Vallevand Mark.Vallevand at Unisys.com May you live in interesting times, may you come to the attention of important people and may all your wishes come true. THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Vallevand, Mark K Sent: Wednesday, March 19, 2014 02:56 PM To: linux clustering Subject: [Linux-cluster] Resource instance is getting restarted when a node is rebooted I'm testing my cluster configuration by rebooting nodes to see what happens. I can't explain what I see in some cases. The setup: I have a cloned resource with its own agent and an IP address resource that is collocated with the cloned resource. The IP address doesn't need to run on all of the nodes running an instance of the cloned resource. It just needs to be on one of the nodes. It's not cloned or meant to be load-balanced. I do something like this: crm -F configure < May you live in interesting times, may you come to the attention of important people and may all your wishes come true. THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cfeist at redhat.com Wed Mar 19 22:31:20 2014 From: cfeist at redhat.com (Chris Feist) Date: Wed, 19 Mar 2014 17:31:20 -0500 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <5328F268.9080605@alteeve.ca> References: <5328F268.9080605@alteeve.ca> Message-ID: <532A1AB8.307@redhat.com> On 03/18/2014 08:27 PM, Digimer wrote: > Hi all, > > I would like to tell rgmanager to give more time for VMs to stop. I want this: > > path="/shared/definitions/" exclusive="0" recovery="restart" max_restarts="2" > restart_expire_time="600"> > > > > I already use ccs to create the entry: > > path="/shared/definitions/" exclusive="0" recovery="restart" max_restarts="2" > restart_expire_time="600"/> > > via: > > ccs -h localhost --activate --sync --password "secret" \ > --addvm vm01-win2008 \ > --domain="primary_n01" \ > path="/shared/definitions/" \ > autostart="0" \ > exclusive="0" \ > recovery="restart" \ > max_restarts="2" \ > restart_expire_time="600" > > I'm hoping it's a simple additional switch. :) Unfortunately currently ccs doesn't support setting resource actions. However it's my understanding that rgmanager doesn't check timeouts unless __enforce_timeouts is set to "1". So you shouldn't be seeing a vm resource go to failed if it takes a long time to stop. Are you trying to make the vm resource fail if it takes longer than 10 minutes to stop? > > Thanks! > From lists at alteeve.ca Wed Mar 19 23:45:56 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 19 Mar 2014 19:45:56 -0400 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <532A1AB8.307@redhat.com> References: <5328F268.9080605@alteeve.ca> <532A1AB8.307@redhat.com> Message-ID: <532A2C34.3080903@alteeve.ca> On 19/03/14 06:31 PM, Chris Feist wrote: > On 03/18/2014 08:27 PM, Digimer wrote: >> Hi all, >> >> I would like to tell rgmanager to give more time for VMs to stop. I >> want this: >> >> > path="/shared/definitions/" exclusive="0" recovery="restart" >> max_restarts="2" >> restart_expire_time="600"> >> >> >> >> I already use ccs to create the entry: >> >> > path="/shared/definitions/" exclusive="0" recovery="restart" >> max_restarts="2" >> restart_expire_time="600"/> >> >> via: >> >> ccs -h localhost --activate --sync --password "secret" \ >> --addvm vm01-win2008 \ >> --domain="primary_n01" \ >> path="/shared/definitions/" \ >> autostart="0" \ >> exclusive="0" \ >> recovery="restart" \ >> max_restarts="2" \ >> restart_expire_time="600" >> >> I'm hoping it's a simple additional switch. :) > > Unfortunately currently ccs doesn't support setting resource actions. > However it's my understanding that rgmanager doesn't check timeouts > unless __enforce_timeouts is set to "1". So you shouldn't be seeing a > vm resource go to failed if it takes a long time to stop. Are you > trying to make the vm resource fail if it takes longer than 10 minutes > to stop? I was afraid you were going to say that. :( The problem is that after calling 'disable' against the VM service, rgmanager waits two minutes. If the service isn't closed in that time, the server is forced off (at least, this was the behaviour when I last tested this). The concern is that, by default, windows installs queue updates to install when the system shuts down. During this time, windows makes it very clear that you should not power off the system during the updates. So if this timer is hit, and the VM is forced off, the guest OS can be damaged. Of course, we can debate the (lack of) wisdom of this behaviour, and I already document this concern (and even warn people to check for updates before stopping the server), it's not sufficient. If a user doesn't read the warning, or simply forgets to check, the consequences can be non-trivial. If ccs can't be made to add this attribute, and if the behaviour persists (I will test shortly after sending this reply), then I will have to edit the cluster.conf directly, something I am loath to do if at all avoidable. Cheers -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From esanchezvela.redhatcluster at gmail.com Thu Mar 20 01:13:43 2014 From: esanchezvela.redhatcluster at gmail.com (Enrique Sanchez) Date: Wed, 19 Mar 2014 21:13:43 -0400 Subject: [Linux-cluster] RH Summit '14 In-Reply-To: References: Message-ID: I just found out I am going, want me to send u a text message to meet up? On Thu, Feb 27, 2014 at 12:04 PM, Jeff Stoner wrote: > Anyone else going to Red Hat Summit this year? Wanna meetup for a > beer/coffee/tea/soda? > > -- > *Jeff Stoner * > *Cloud Evangelist* > Dimension Data CBU > Tel +1-703-723-5620 > Mobile +1-703-475-7720 > jeff.stoner at dimensiondata.com > Twitter > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- Enrique Sanchez Vela ------------------------------------------ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at alteeve.ca Thu Mar 20 01:26:56 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 19 Mar 2014 21:26:56 -0400 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <532A2C34.3080903@alteeve.ca> References: <5328F268.9080605@alteeve.ca> <532A1AB8.307@redhat.com> <532A2C34.3080903@alteeve.ca> Message-ID: <532A43E0.6030603@alteeve.ca> On 19/03/14 07:45 PM, Digimer wrote: > On 19/03/14 06:31 PM, Chris Feist wrote: >> On 03/18/2014 08:27 PM, Digimer wrote: >>> Hi all, >>> >>> I would like to tell rgmanager to give more time for VMs to stop. I >>> want this: >>> >>> >> path="/shared/definitions/" exclusive="0" recovery="restart" >>> max_restarts="2" >>> restart_expire_time="600"> >>> >>> >>> >>> I already use ccs to create the entry: >>> >>> >> path="/shared/definitions/" exclusive="0" recovery="restart" >>> max_restarts="2" >>> restart_expire_time="600"/> >>> >>> via: >>> >>> ccs -h localhost --activate --sync --password "secret" \ >>> --addvm vm01-win2008 \ >>> --domain="primary_n01" \ >>> path="/shared/definitions/" \ >>> autostart="0" \ >>> exclusive="0" \ >>> recovery="restart" \ >>> max_restarts="2" \ >>> restart_expire_time="600" >>> >>> I'm hoping it's a simple additional switch. :) >> >> Unfortunately currently ccs doesn't support setting resource actions. >> However it's my understanding that rgmanager doesn't check timeouts >> unless __enforce_timeouts is set to "1". So you shouldn't be seeing a >> vm resource go to failed if it takes a long time to stop. Are you >> trying to make the vm resource fail if it takes longer than 10 minutes >> to stop? > > I was afraid you were going to say that. :( > > The problem is that after calling 'disable' against the VM service, > rgmanager waits two minutes. If the service isn't closed in that time, > the server is forced off (at least, this was the behaviour when I last > tested this). > > The concern is that, by default, windows installs queue updates to > install when the system shuts down. During this time, windows makes it > very clear that you should not power off the system during the updates. > So if this timer is hit, and the VM is forced off, the guest OS can be > damaged. > > Of course, we can debate the (lack of) wisdom of this behaviour, and I > already document this concern (and even warn people to check for updates > before stopping the server), it's not sufficient. If a user doesn't read > the warning, or simply forgets to check, the consequences can be > non-trivial. > > If ccs can't be made to add this attribute, and if the behaviour > persists (I will test shortly after sending this reply), then I will > have to edit the cluster.conf directly, something I am loath to do if at > all avoidable. > > Cheers Confirmed; I called disable on a VM with gnome running, so that I could abort the VM's shut down. an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date Wed Mar 19 21:06:29 EDT 2014 Local machine disabling vm:vm01-rhel6...Success Wed Mar 19 21:08:36 EDT 2014 2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been a windows guest in the middle of installing updates, it would be highly likely to be screwed now. To confirm, I changed the config to: Then I repeated the test: an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date Wed Mar 19 21:13:18 EDT 2014 Local machine disabling vm:vm01-rhel6...Success Wed Mar 19 21:23:31 EDT 2014 10 minutes and 13 seconds before the cluster killed the server, much less likely to interrupt a in-progress OS update (truth be told, I plan to set 30 minutes. I understand that this blocks other processes, but in an HA environment, I'd strongly argue that safe > speed. digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From morpheus.ibis at gmail.com Thu Mar 20 02:12:08 2014 From: morpheus.ibis at gmail.com (Pavel Herrmann) Date: Thu, 20 Mar 2014 03:12:08 +0100 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <532A43E0.6030603@alteeve.ca> References: <5328F268.9080605@alteeve.ca> <532A2C34.3080903@alteeve.ca> <532A43E0.6030603@alteeve.ca> Message-ID: <1857777.TcdlaGUVy9@bloomfield> Hi On Wednesday 19 of March 2014 21:26:56 Digimer wrote: > On 19/03/14 07:45 PM, Digimer wrote: > > On 19/03/14 06:31 PM, Chris Feist wrote: > >> On 03/18/2014 08:27 PM, Digimer wrote: > >>> Hi all, > >>> > >>> I would like to tell rgmanager to give more time for VMs to stop. I > >>> > >>> want this: > >>> > >>> >>> path="/shared/definitions/" exclusive="0" recovery="restart" > >>> max_restarts="2" > >>> restart_expire_time="600"> > >>> > >>> > >>> > >>> > >>> > >>> I already use ccs to create the entry: > >>> > >>> >>> path="/shared/definitions/" exclusive="0" recovery="restart" > >>> max_restarts="2" > >>> restart_expire_time="600"/> > >>> > >>> via: > >>> > >>> ccs -h localhost --activate --sync --password "secret" \ > >>> > >>> --addvm vm01-win2008 \ > >>> --domain="primary_n01" \ > >>> path="/shared/definitions/" \ > >>> autostart="0" \ > >>> exclusive="0" \ > >>> recovery="restart" \ > >>> max_restarts="2" \ > >>> restart_expire_time="600" > >>> > >>> I'm hoping it's a simple additional switch. :) > >> > >> Unfortunately currently ccs doesn't support setting resource actions. > >> However it's my understanding that rgmanager doesn't check timeouts > >> unless __enforce_timeouts is set to "1". So you shouldn't be seeing a > >> vm resource go to failed if it takes a long time to stop. Are you > >> trying to make the vm resource fail if it takes longer than 10 minutes > >> to stop? > > > > I was afraid you were going to say that. :( > > > > The problem is that after calling 'disable' against the VM service, > > rgmanager waits two minutes. If the service isn't closed in that time, > > the server is forced off (at least, this was the behaviour when I last > > tested this). > > > > The concern is that, by default, windows installs queue updates to > > install when the system shuts down. During this time, windows makes it > > very clear that you should not power off the system during the updates. > > So if this timer is hit, and the VM is forced off, the guest OS can be > > damaged. > > > > Of course, we can debate the (lack of) wisdom of this behaviour, and I > > already document this concern (and even warn people to check for updates > > before stopping the server), it's not sufficient. If a user doesn't read > > the warning, or simply forgets to check, the consequences can be > > non-trivial. > > > > If ccs can't be made to add this attribute, and if the behaviour > > persists (I will test shortly after sending this reply), then I will > > have to edit the cluster.conf directly, something I am loath to do if at > > all avoidable. > > > > Cheers > > Confirmed; > > I called disable on a VM with gnome running, so that I could abort the > VM's shut down. > > an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date > Wed Mar 19 21:06:29 EDT 2014 > Local machine disabling vm:vm01-rhel6...Success > Wed Mar 19 21:08:36 EDT 2014 > > 2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been > a windows guest in the middle of installing updates, it would be highly > likely to be screwed now. Is this really the best way to handle such an event? >From what I remember, Windows can (or could, I don't have any 'modern' windows laying around) be told to shutdown without updating. maybe a wiser approach would be to make the stop event (which I believe is delivered to the guest as pressing the ACPI power button) trigger a shutdown without updates. keep in mind that doing system updates on timer is dangerous, irrelevant of the actual time regards Pavel Herrmann > To confirm, I changed the config to: > > name="vm01-rhel6" path="/shared/definitions/" recovery="restart" > restart_expire_time="600"> > > > > Then I repeated the test: > > an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date > Wed Mar 19 21:13:18 EDT 2014 > Local machine disabling vm:vm01-rhel6...Success > Wed Mar 19 21:23:31 EDT 2014 > > 10 minutes and 13 seconds before the cluster killed the server, much > less likely to interrupt a in-progress OS update (truth be told, I plan > to set 30 minutes. > > I understand that this blocks other processes, but in an HA environment, > I'd strongly argue that safe > speed. > > digimer From lists at alteeve.ca Thu Mar 20 02:35:55 2014 From: lists at alteeve.ca (Digimer) Date: Wed, 19 Mar 2014 22:35:55 -0400 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <1857777.TcdlaGUVy9@bloomfield> References: <5328F268.9080605@alteeve.ca> <532A2C34.3080903@alteeve.ca> <532A43E0.6030603@alteeve.ca> <1857777.TcdlaGUVy9@bloomfield> Message-ID: <532A540B.8070104@alteeve.ca> On 19/03/14 10:12 PM, Pavel Herrmann wrote: > Hi > > On Wednesday 19 of March 2014 21:26:56 Digimer wrote: >> On 19/03/14 07:45 PM, Digimer wrote: >>> On 19/03/14 06:31 PM, Chris Feist wrote: >>>> On 03/18/2014 08:27 PM, Digimer wrote: >>>>> Hi all, >>>>> >>>>> I would like to tell rgmanager to give more time for VMs to stop. I >>>>> >>>>> want this: >>>>> >>>>> >>>> path="/shared/definitions/" exclusive="0" recovery="restart" >>>>> max_restarts="2" >>>>> restart_expire_time="600"> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> I already use ccs to create the entry: >>>>> >>>>> >>>> path="/shared/definitions/" exclusive="0" recovery="restart" >>>>> max_restarts="2" >>>>> restart_expire_time="600"/> >>>>> >>>>> via: >>>>> >>>>> ccs -h localhost --activate --sync --password "secret" \ >>>>> >>>>> --addvm vm01-win2008 \ >>>>> --domain="primary_n01" \ >>>>> path="/shared/definitions/" \ >>>>> autostart="0" \ >>>>> exclusive="0" \ >>>>> recovery="restart" \ >>>>> max_restarts="2" \ >>>>> restart_expire_time="600" >>>>> >>>>> I'm hoping it's a simple additional switch. :) >>>> >>>> Unfortunately currently ccs doesn't support setting resource actions. >>>> However it's my understanding that rgmanager doesn't check timeouts >>>> unless __enforce_timeouts is set to "1". So you shouldn't be seeing a >>>> vm resource go to failed if it takes a long time to stop. Are you >>>> trying to make the vm resource fail if it takes longer than 10 minutes >>>> to stop? >>> >>> I was afraid you were going to say that. :( >>> >>> The problem is that after calling 'disable' against the VM service, >>> rgmanager waits two minutes. If the service isn't closed in that time, >>> the server is forced off (at least, this was the behaviour when I last >>> tested this). >>> >>> The concern is that, by default, windows installs queue updates to >>> install when the system shuts down. During this time, windows makes it >>> very clear that you should not power off the system during the updates. >>> So if this timer is hit, and the VM is forced off, the guest OS can be >>> damaged. >>> >>> Of course, we can debate the (lack of) wisdom of this behaviour, and I >>> already document this concern (and even warn people to check for updates >>> before stopping the server), it's not sufficient. If a user doesn't read >>> the warning, or simply forgets to check, the consequences can be >>> non-trivial. >>> >>> If ccs can't be made to add this attribute, and if the behaviour >>> persists (I will test shortly after sending this reply), then I will >>> have to edit the cluster.conf directly, something I am loath to do if at >>> all avoidable. >>> >>> Cheers >> >> Confirmed; >> >> I called disable on a VM with gnome running, so that I could abort the >> VM's shut down. >> >> an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date >> Wed Mar 19 21:06:29 EDT 2014 >> Local machine disabling vm:vm01-rhel6...Success >> Wed Mar 19 21:08:36 EDT 2014 >> >> 2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been >> a windows guest in the middle of installing updates, it would be highly >> likely to be screwed now. > > Is this really the best way to handle such an event? > > From what I remember, Windows can (or could, I don't have any 'modern' windows > laying around) be told to shutdown without updating. maybe a wiser approach > would be to make the stop event (which I believe is delivered to the guest as > pressing the ACPI power button) trigger a shutdown without updates. > > keep in mind that doing system updates on timer is dangerous, irrelevant of > the actual time > > regards > Pavel Herrmann This assumes that we can modify how windows behaves. Unless there is a magic ACPI event that windows will reliably interpret as "power off without updating", we can't rely on this. We have clients (and I am sure we aren't the only ones) who install their own OSes without any input from us. As mentioned earlier, we do document the risks, but that's not good enough. We can't force users to read. So we have a choice; Take mitigating steps or let the user shoot themselves in the foot "because they should have known better". As personally satisfying as option #2 might seem, option #1 is the more professional approach, I would _strongly_ argue. digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Thu Mar 20 19:31:22 2014 From: lists at alteeve.ca (Digimer) Date: Thu, 20 Mar 2014 15:31:22 -0400 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <5328F268.9080605@alteeve.ca> References: <5328F268.9080605@alteeve.ca> Message-ID: <532B420A.5060606@alteeve.ca> On 18/03/14 09:27 PM, Digimer wrote: > Hi all, > > I would like to tell rgmanager to give more time for VMs to stop. I > want this: > > path="/shared/definitions/" exclusive="0" recovery="restart" > max_restarts="2" restart_expire_time="600"> > > > > I already use ccs to create the entry: > > path="/shared/definitions/" exclusive="0" recovery="restart" > max_restarts="2" restart_expire_time="600"/> > > via: > > ccs -h localhost --activate --sync --password "secret" \ > --addvm vm01-win2008 \ > --domain="primary_n01" \ > path="/shared/definitions/" \ > autostart="0" \ > exclusive="0" \ > recovery="restart" \ > max_restarts="2" \ > restart_expire_time="600" > > I'm hoping it's a simple additional switch. :) > > Thanks! As per the request on #linux-cluster, I have opened a rhbz for this: https://bugzilla.redhat.com/show_bug.cgi?id=1079032 -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From lists at alteeve.ca Thu Mar 20 20:06:14 2014 From: lists at alteeve.ca (Digimer) Date: Thu, 20 Mar 2014 16:06:14 -0400 Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs' In-Reply-To: <532B420A.5060606@alteeve.ca> References: <5328F268.9080605@alteeve.ca> <532B420A.5060606@alteeve.ca> Message-ID: <532B4A36.90703@alteeve.ca> On 20/03/14 03:31 PM, Digimer wrote: > On 18/03/14 09:27 PM, Digimer wrote: >> Hi all, >> >> I would like to tell rgmanager to give more time for VMs to stop. I >> want this: >> >> > path="/shared/definitions/" exclusive="0" recovery="restart" >> max_restarts="2" restart_expire_time="600"> >> >> >> >> I already use ccs to create the entry: >> >> > path="/shared/definitions/" exclusive="0" recovery="restart" >> max_restarts="2" restart_expire_time="600"/> >> >> via: >> >> ccs -h localhost --activate --sync --password "secret" \ >> --addvm vm01-win2008 \ >> --domain="primary_n01" \ >> path="/shared/definitions/" \ >> autostart="0" \ >> exclusive="0" \ >> recovery="restart" \ >> max_restarts="2" \ >> restart_expire_time="600" >> >> I'm hoping it's a simple additional switch. :) >> >> Thanks! > > As per the request on #linux-cluster, I have opened a rhbz for this: > > https://bugzilla.redhat.com/show_bug.cgi?id=1079032 Split the rgmanager section out: https://bugzilla.redhat.com/show_bug.cgi?id=1079039 -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From pine5514 at gmail.com Sat Mar 22 20:13:06 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Sun, 23 Mar 2014 00:43:06 +0430 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com> References: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com> Message-ID: Good news for all : I successfully recoved all of my data(1.5TB) without even one bit lost! my program tooks only 1 hour to do all the jobs on my 1.7 TB partition.(I could not wait 100 days for my bash script to finish). I will publish my source code very soon for the public use. Special thanks to Bob for the help. Mr.Pine On Wed, Mar 19, 2014 at 4:58 PM, Bob Peterson wrote: > ----- Original Message ----- >> Hi, >> >> Scripts is very very slow, so i should write program in c/c++. >> >> I need some confidence about data structures and data location on disk. >> As i reviewed blocks of data: >> >> All reserved blocks (GFS2 specific blocks) start by : 0x01161970 >> Blocktype store location is at Byte # 8, >> Type of start block of each resource group is: 2 >> Bitmaps are in block types 2 & 3. >> In block type 2, bitmap info starts from Byte # 129 >> In block type 3, bitmap info starts from Byte # 25 >> Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex >> /dev/..) >> >> Is this info right? >> >> Logic of my program seams should be like this: >> >> (1) >> Loop in device and temporary store block id of dinode blocks, and also >> their bitmap locations >> >> (2) >> Change bitmap of blocks to 3 (11) >> >> Bob, could you confirm this? >> >> Regards >> Pine. > > Hi Pine, > > This is correct. The length of RGs is properly determined by the values > in the "rindex" system file, but 5 is very common, and is usually constant. > (It may change if you used gfs2_grow or gfs2_convert from gfs1). > The bitmap is 2 bits per block in the resource group, and it's relative > to the start of the particular rgrp. You should probably use the same > algorithm in libgfs2 to change the proper bit in the bitmaps. You can > get this from the public gfs2-utils git tree. > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lists at alteeve.ca Sun Mar 23 01:34:44 2014 From: lists at alteeve.ca (Digimer) Date: Sat, 22 Mar 2014 21:34:44 -0400 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com> <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com> Message-ID: <532E3A34.6050107@alteeve.ca> That is very good news! Now, about your backups... ;) Look forward to seeing your code! digimer On 22/03/14 04:13 PM, Mr.Pine wrote: > Good news for all : > I successfully recoved all of my data(1.5TB) without even one bit lost! > my program tooks only 1 hour to do all the jobs on my 1.7 TB > partition.(I could not wait 100 days for my bash script to finish). > > I will publish my source code very soon for the public use. > > Special thanks to Bob for the help. > > Mr.Pine > > On Wed, Mar 19, 2014 at 4:58 PM, Bob Peterson wrote: >> ----- Original Message ----- >>> Hi, >>> >>> Scripts is very very slow, so i should write program in c/c++. >>> >>> I need some confidence about data structures and data location on disk. >>> As i reviewed blocks of data: >>> >>> All reserved blocks (GFS2 specific blocks) start by : 0x01161970 >>> Blocktype store location is at Byte # 8, >>> Type of start block of each resource group is: 2 >>> Bitmaps are in block types 2 & 3. >>> In block type 2, bitmap info starts from Byte # 129 >>> In block type 3, bitmap info starts from Byte # 25 >>> Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex >>> /dev/..) >>> >>> Is this info right? >>> >>> Logic of my program seams should be like this: >>> >>> (1) >>> Loop in device and temporary store block id of dinode blocks, and also >>> their bitmap locations >>> >>> (2) >>> Change bitmap of blocks to 3 (11) >>> >>> Bob, could you confirm this? >>> >>> Regards >>> Pine. >> >> Hi Pine, >> >> This is correct. The length of RGs is properly determined by the values >> in the "rindex" system file, but 5 is very common, and is usually constant. >> (It may change if you used gfs2_grow or gfs2_convert from gfs1). >> The bitmap is 2 bits per block in the resource group, and it's relative >> to the start of the particular rgrp. You should probably use the same >> algorithm in libgfs2 to change the proper bit in the bitmaps. You can >> get this from the public gfs2-utils git tree. >> >> Regards, >> >> Bob Peterson >> Red Hat File Systems >> >> -- >> Linux-cluster mailing list >> Linux-cluster at redhat.com >> https://www.redhat.com/mailman/listinfo/linux-cluster > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From rpeterso at redhat.com Mon Mar 24 12:19:58 2014 From: rpeterso at redhat.com (Bob Peterson) Date: Mon, 24 Mar 2014 08:19:58 -0400 (EDT) Subject: [Linux-cluster] unformat gfs2 In-Reply-To: References: <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com> Message-ID: <25322364.403317.1395663598176.JavaMail.zimbra@redhat.com> ----- Original Message ----- > Good news for all : > I successfully recoved all of my data(1.5TB) without even one bit lost! > my program tooks only 1 hour to do all the jobs on my 1.7 TB > partition.(I could not wait 100 days for my bash script to finish). > > I will publish my source code very soon for the public use. > > Special thanks to Bob for the help. > > Mr.Pine Hi Mr. Pine, I'm glad I could help. Perhaps when you post your program, we can somehow incorporate it into "gfs2_edit unformat" tool or something. I assume you ran fsck.gfs2 after the program, right? Regards, Bob Peterson Red Hat File Systems From pine5514 at gmail.com Mon Mar 24 16:19:26 2014 From: pine5514 at gmail.com (Mr.Pine) Date: Mon, 24 Mar 2014 20:49:26 +0430 Subject: [Linux-cluster] unformat gfs2 In-Reply-To: <25322364.403317.1395663598176.JavaMail.zimbra@redhat.com> References: <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com> <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com> <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com> <25322364.403317.1395663598176.JavaMail.zimbra@redhat.com> Message-ID: Hi Bob Very good Idea ... It can be useful for many users. Yes your right. I ran fsck.gfs2 and It took less than a hour to fix filesystem. Mr.Pine On Mon, Mar 24, 2014 at 4:49 PM, Bob Peterson wrote: > ----- Original Message ----- >> Good news for all : >> I successfully recoved all of my data(1.5TB) without even one bit lost! >> my program tooks only 1 hour to do all the jobs on my 1.7 TB >> partition.(I could not wait 100 days for my bash script to finish). >> >> I will publish my source code very soon for the public use. >> >> Special thanks to Bob for the help. >> >> Mr.Pine > > Hi Mr. Pine, > > I'm glad I could help. Perhaps when you post your program, we can > somehow incorporate it into "gfs2_edit unformat" tool or something. > I assume you ran fsck.gfs2 after the program, right? > > Regards, > > Bob Peterson > Red Hat File Systems > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From lipson12 at yahoo.com Thu Mar 27 04:12:22 2014 From: lipson12 at yahoo.com (Kaisar Ahmed Khan) Date: Wed, 26 Mar 2014 21:12:22 -0700 (PDT) Subject: [Linux-cluster] iscsi sysmlink create problem In-Reply-To: <53204CA8.2060300@redhat.com> References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com> <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com> <53204CA8.2060300@redhat.com> Message-ID: <1395893542.35997.YahooMailNeo@web141202.mail.bf1.yahoo.com> ? Elvir Kuric, My OS version is RHEL 6.2 , i just want to create Symlink for iscsi disk /dev/sda? with udev rules . i have done same thing in rhel 5.4. thanks kaisar On Wednesday, March 12, 2014 6:12 PM, Elvir Kuric wrote: On 03/12/2014 12:43 PM, emmanuel segura wrote: but it's a cluster problem? ummm > > > > >2014-03-12 7:31 GMT+01:00 Kaisar Ahmed Khan : > > >> >> >> >>Dear Experts : >> >> >> >>?following rule is not working to create symlink for iscsi disk , my iscsi device ID /dev/sda and want to link as /dev/iscsi/vendor_kernel >> >>please guide me if i miss anything . >> >> >>ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK", SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664" >> >>Thanks >>?kaisar >> >> >>-- >>Linux-cluster mailing list >>Linux-cluster at redhat.com >>https://www.redhat.com/mailman/listinfo/linux-cluster >> > > >-- >esta es mi vida e me la vivo hasta que dios quiera > >if you? share more information with us ( os version ) it could help. Also below outputs can help to understand how system see device if rhel 5? ( and clones ) #udevinfo -a -p $(udevinfo -q path -n /dev/DEVICE) if rhel 6 ( and clones ) # udevadm info --query=all -n /dev/DEVICE --attribute-walk where DEVICE is device you want to write udev rule for. Kind regards, -- Elvir Kuric,TSE / Red Hat / GSS EMEA / -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster -------------- next part -------------- An HTML attachment was scrubbed... URL: From eduar47 at gmail.com Fri Mar 28 06:06:44 2014 From: eduar47 at gmail.com (Eduar Arley) Date: Fri, 28 Mar 2014 01:06:44 -0500 Subject: [Linux-cluster] IP address for replies Message-ID: Hello everyone. I have an HA Cluster in CentOS 6, using CMAN + RGManager (the supported and 'official' stack in CentOS 6.5). Everything works OK; however, there is a thing that could give me problems in the future, so I would like thinking right now how to solve it. When an incoming packet comes to my cluster (through a floating IP address), mi active node receives it OK; however, it replies from his 'real' IP address, not from the floating IP. As i'm deploying SIP in this cluster, maybe some provider in the future could dislike this IP and reject my calls. I've read heartbeat have some functionality to fix this, called IPSrcAddr; however I don't see a similar resource in Conga Web Interface or in Red Hat documentation. In other websites, I've read about a workaround involving IP routing rules tables, but I don't think this is an optimal solution. Anybody knows a way to fix this in my scenario? Thanks! Eduar Cardona From christian.masopust at siemens.com Fri Mar 28 07:24:00 2014 From: christian.masopust at siemens.com (Masopust, Christian) Date: Fri, 28 Mar 2014 07:24:00 +0000 Subject: [Linux-cluster] IP address for replies In-Reply-To: References: Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net> Hi Eduar, I have both configurations running on my systems, some with IPsrcaddr and one with iptables. I did the first with iptables only because it was my first cluster and was not aware about IPsrcaddr and was in a hurry to get the cluster up and running :) Anyway, both solutions work fine on my CentOS 6.x systems. But I don't use any Web interface for configuration. br, christian -----Urspr?ngliche Nachricht----- Von: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von Eduar Arley Gesendet: Freitag, 28. M?rz 2014 07:07 An: linux-cluster at redhat.com Betreff: [Linux-cluster] IP address for replies Hello everyone. I have an HA Cluster in CentOS 6, using CMAN + RGManager (the supported and 'official' stack in CentOS 6.5). Everything works OK; however, there is a thing that could give me problems in the future, so I would like thinking right now how to solve it. When an incoming packet comes to my cluster (through a floating IP address), mi active node receives it OK; however, it replies from his 'real' IP address, not from the floating IP. As i'm deploying SIP in this cluster, maybe some provider in the future could dislike this IP and reject my calls. I've read heartbeat have some functionality to fix this, called IPSrcAddr; however I don't see a similar resource in Conga Web Interface or in Red Hat documentation. In other websites, I've read about a workaround involving IP routing rules tables, but I don't think this is an optimal solution. Anybody knows a way to fix this in my scenario? Thanks! Eduar Cardona -- Linux-cluster mailing list Linux-cluster at redhat.com https://www.redhat.com/mailman/listinfo/linux-cluster From eduar47 at gmail.com Fri Mar 28 12:40:16 2014 From: eduar47 at gmail.com (Eduar Arley) Date: Fri, 28 Mar 2014 07:40:16 -0500 Subject: [Linux-cluster] IP address for replies In-Reply-To: <7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net> References: <7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net> Message-ID: 2014-03-28 2:24 GMT-05:00 Masopust, Christian : > Hi Eduar, > > I have both configurations running on my systems, some with IPsrcaddr and > one with iptables. > I did the first with iptables only because it was my first cluster and was > not aware about IPsrcaddr and was in a hurry to get the cluster up and running :) > > Anyway, both solutions work fine on my CentOS 6.x systems. But I don't use any > Web interface for configuration. > > br, > christian > > -----Urspr?ngliche Nachricht----- > Von: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von Eduar Arley > Gesendet: Freitag, 28. M?rz 2014 07:07 > An: linux-cluster at redhat.com > Betreff: [Linux-cluster] IP address for replies > > Hello everyone. > > I have an HA Cluster in CentOS 6, using CMAN + RGManager (the supported and 'official' stack in CentOS 6.5). Everything works OK; however, there is a thing that could give me problems in the future, so I would like thinking right now how to solve it. > > When an incoming packet comes to my cluster (through a floating IP address), mi active node receives it OK; however, it replies from his 'real' IP address, not from the floating IP. As i'm deploying SIP in this cluster, maybe some provider in the future could dislike this IP and reject my calls. > > I've read heartbeat have some functionality to fix this, called IPSrcAddr; however I don't see a similar resource in Conga Web Interface or in Red Hat documentation. In other websites, I've read about a workaround involving IP routing rules tables, but I don't think this is an optimal solution. > > Anybody knows a way to fix this in my scenario? > > Thanks! > > > Eduar Cardona > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster Hello Christian, thanks for your advice. Could you please share here the relevant sections of your cluster.conf file for IPSrcAddr? I currently use Conga web interface, but I can directly modify the config file if needed. Thanks! Eduar Cardona From bergman at merctech.com Fri Mar 28 16:37:17 2014 From: bergman at merctech.com (bergman at merctech.com) Date: Fri, 28 Mar 2014 12:37:17 -0400 Subject: [Linux-cluster] mixing OS versions? Message-ID: <12440.1396024637@localhost> I've got a 3-node cluster under CentOS5. I'd like to add 3 additional nodes, running CentOS6. Are there any known issues, guidelines, or recommendations for having a single RHCS cluster with different OS releases on the nodes? Thanks, Mark From fdinitto at redhat.com Fri Mar 28 19:31:38 2014 From: fdinitto at redhat.com (Fabio M. Di Nitto) Date: Fri, 28 Mar 2014 20:31:38 +0100 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: <12440.1396024637@localhost> References: <12440.1396024637@localhost> Message-ID: <5335CE1A.3060509@redhat.com> On 03/28/2014 05:37 PM, bergman at merctech.com wrote: > > > I've got a 3-node cluster under CentOS5. > > I'd like to add 3 additional nodes, running CentOS6. > > Are there any known issues, guidelines, or recommendations for having > a single RHCS cluster with different OS releases on the nodes? Only one answer.. don't do it. It's not supported and it's only asking for troubles. Fabio From washer at trlp.com Fri Mar 28 20:35:20 2014 From: washer at trlp.com (James Washer) Date: Fri, 28 Mar 2014 13:35:20 -0700 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: <5335CE1A.3060509@redhat.com> References: <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> Message-ID: You can get by, for a short time, with a minor revision difference, say 5.7 and 5.8, but, mixing 5 and 6 will not work. Period On Fri, Mar 28, 2014 at 12:31 PM, Fabio M. Di Nitto wrote: > On 03/28/2014 05:37 PM, bergman at merctech.com wrote: > > > > > > I've got a 3-node cluster under CentOS5. > > > > I'd like to add 3 additional nodes, running CentOS6. > > > > Are there any known issues, guidelines, or recommendations for having > > a single RHCS cluster with different OS releases on the nodes? > > Only one answer.. don't do it. It's not supported and it's only asking > for troubles. > > Fabio > > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster > -- - jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajb2 at mssl.ucl.ac.uk Fri Mar 28 22:07:48 2014 From: ajb2 at mssl.ucl.ac.uk (Alan Brown) Date: Fri, 28 Mar 2014 22:07:48 +0000 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: <5335CE1A.3060509@redhat.com> References: <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> Message-ID: <5335F2B4.6080605@mssl.ucl.ac.uk> On 28/03/14 19:31, Fabio M. Di Nitto wrote: > > Are there any known issues, guidelines, or recommendations for having > a single RHCS cluster with different OS releases on the nodes? > Only one answer.. don't do it. It's not supported and it's only asking > for troubles. > > Seconded. There are _substantial_ differences between Centos/RHEL 5 and 6 clustering. You can run one or the other OS, but you can't mix them. The on-disk format isn't affected. Best path is to setup a cluster in 6, shut down the 5 cluster, attach disks to the 6 cluster and bring it all back up. The 5 boxes can be converted to version 6 afterwards. (I'm going through this at the moment, as I have 2 EL5 clusters and 1 EL6 cluster.) TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - if you enable GFS2 quotas and someone busts his quota the machine will panic. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bergman at merctech.com Fri Mar 28 22:35:42 2014 From: bergman at merctech.com (bergman at merctech.com) Date: Fri, 28 Mar 2014 18:35:42 -0400 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: Your message of "Fri, 28 Mar 2014 22:07:48 -0000." <5335F2B4.6080605@mssl.ucl.ac.uk> References: <5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> Message-ID: <6022.1396046142@localhost> In the message dated: Fri, 28 Mar 2014 22:07:48 -0000, The pithy ruminations from Alan Brown on were: => On 28/03/14 19:31, Fabio M. Di Nitto wrote: => > => > Are there any known issues, guidelines, or recommendations for having => > a single RHCS cluster with different OS releases on the nodes? => > Only one answer.. don't do it. It's not supported and it's only asking => > for troubles. Thanks for all the warnings...not what I wanted to hear, but it's good to get a clear, consistent message. => > => > => => Seconded. There are _substantial_ differences between Centos/RHEL 5 and => 6 clustering. => => You can run one or the other OS, but you can't mix them. The on-disk => format isn't affected. For clarification, we're not using RHCS to manange any shared storage. The only 'disk' component is the quorum disk. We're using GPFS as the storage layer. RHCS manages several services, such as: httpd mysql nis pgsql => => Best path is to setup a cluster in 6, shut down the 5 cluster, attach => disks to the 6 cluster and bring it all back up. The 5 boxes can be => converted to version 6 afterwards. That's what I was expecting, unfortunately. I'll probably do a more gradual approach...bring up a CentOS6 cluster with it's own quorum disk, and one-by-one add services (httpd, nis, etc.) to that, bringing them down on the old cluster. Add in some CNAMES and coordination with the network group and it should be relatively transparent to the users. => => (I'm going through this at the moment, as I have 2 EL5 clusters and 1 => EL6 cluster.) => => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - => if you enable GFS2 quotas and someone busts his quota the machine will => panic. That's an example of why I no longer use GFS2. :) Thanks, Mark => => => => => -- => Linux-cluster mailing list => Linux-cluster at redhat.com => https://www.redhat.com/mailman/listinfo/linux-cluster From christian.masopust at siemens.com Sat Mar 29 09:00:04 2014 From: christian.masopust at siemens.com (Masopust, Christian) Date: Sat, 29 Mar 2014 09:00:04 +0000 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: <6022.1396046142@localhost> References: <5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> <6022.1396046142@localhost> Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net> > => > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - > => if you enable GFS2 quotas and someone busts his quota the machine will => panic. > > That's an example of why I no longer use GFS2. :) > > Thanks, > > Mark Hi Mark, what instead of GFS2 ? br, christian From bergman at merctech.com Sat Mar 29 17:30:41 2014 From: bergman at merctech.com (bergman at merctech.com) Date: Sat, 29 Mar 2014 13:30:41 -0400 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: Your message of "Sat, 29 Mar 2014 09:00:04 -0000." <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net> References: <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net> <5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> <6022.1396046142@localhost> Message-ID: <22237.1396114241@localhost> In the message dated: Sat, 29 Mar 2014 09:00:04 -0000, The pithy ruminations from "Masopust, Christian" on were: => > => => > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - => > => if you enable GFS2 quotas and someone busts his quota the machine will => panic. => > => > That's an example of why I no longer use GFS2. :) => > => > Thanks, => > => > Mark => => Hi Mark, => => what instead of GFS2 ? GPFS, as I wrote in the message to which you replied: ------------------------------------------- From: bergman at merctech.com To: linux clustering Subject: Re: [Linux-cluster] mixing OS versions? Date: Fri, 28 Mar 2014 18:35:42 -0400 [SNIP!] For clarification, we're not using RHCS to manange any shared storage. The only 'disk' component is the quorum disk. We're using GPFS as the storage layer. ------------------------------------------- => => br, => christian => => -- Mark Bergman From christian.masopust at siemens.com Sat Mar 29 17:50:42 2014 From: christian.masopust at siemens.com (Masopust, Christian) Date: Sat, 29 Mar 2014 17:50:42 +0000 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: <22237.1396114241@localhost> References: <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net> <5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> <6022.1396046142@localhost> <22237.1396114241@localhost> Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC334CE@ATNETS9912TMSX.ww300.siemens.net> > In the message dated: Sat, 29 Mar 2014 09:00:04 -0000, The pithy ruminations from "Masopust, Christian" on > were: > => > => > => > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - => > => if you enable GFS2 quotas > and someone busts his quota the machine will => panic. > => > > => > That's an example of why I no longer use GFS2. :) => > => > Thanks, => > => > Mark => => Hi Mark, => > => what instead of GFS2 ? > > GPFS, as I wrote in the message to which you replied: > sorry, my fault... didn't notice it as GPFS has not been on my radar up to now :) From swhiteho at redhat.com Sun Mar 30 11:34:26 2014 From: swhiteho at redhat.com (Steven Whitehouse) Date: Sun, 30 Mar 2014 12:34:26 +0100 Subject: [Linux-cluster] mixing OS versions? In-Reply-To: <5335F2B4.6080605@mssl.ucl.ac.uk> References: <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com> <5335F2B4.6080605@mssl.ucl.ac.uk> Message-ID: <1396179266.2659.30.camel@menhir> Hi, On Fri, 2014-03-28 at 22:07 +0000, Alan Brown wrote: > On 28/03/14 19:31, Fabio M. Di Nitto wrote: > > > > > Are there any known issues, guidelines, or recommendations for having > > a single RHCS cluster with different OS releases on the nodes? > > Only one answer.. don't do it. It's not supported and it's only asking > > for troubles. > > > > > > Seconded. There are _substantial_ differences between Centos/RHEL 5 > and 6 clustering. > > You can run one or the other OS, but you can't mix them. The on-disk > format isn't affected. > > Best path is to setup a cluster in 6, shut down the 5 cluster, attach > disks to the 6 cluster and bring it all back up. The 5 boxes can be > converted to version 6 afterwards. > > (I'm going through this at the moment, as I have 2 EL5 clusters and 1 > EL6 cluster.) > > TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time > - if you enable GFS2 quotas and someone busts his quota the machine > will panic. > Well that is not entirely true. We have done a great deal of investigation into this issue. We do test quotas (among many other things) on each release to ensure that they are working. Our tests have all passed correctly, and to date you have provided the only report of this particular issue via our support team. So it is certainly not something that lots of people are hitting. We do now have a good idea of where the issue is. However it is clear that simply exceeding quotas is not enough to trigger it. Instead quotas need to be exceeded in a particular way. Abhi is working on a fix which should be available very shortly now. Returning to the original point however, it is certainly not recommended to have mixed RHEL or CentOS versions running in the same cluster. It is much better to keep everything the same, even though the GFS2 on-disk format has not changed between the versions. I hope that answers a few questions - let us know if you need more info, Steve. > -- > Linux-cluster mailing list > Linux-cluster at redhat.com > https://www.redhat.com/mailman/listinfo/linux-cluster From hamid.jafarian at pdnsoft.com Sun Mar 30 14:34:51 2014 From: hamid.jafarian at pdnsoft.com (Hamid Jafarian) Date: Sun, 30 Mar 2014 19:04:51 +0430 Subject: [Linux-cluster] GFS2 unformat helper tool Message-ID: <53382B8B.8060204@pdnsoft.com> Hi, We developed GFS2 volume unformat helper tool. Read about this code at: http://pdnsoft.com/en/web/pdnen/blog/-/blogs/gfs2-unformat-helper-tool-1 Regards -- Hamid Jafarian CEO at PDNSoft Co. Web site: http://www.pdnsoft.com Blog: http://jafarian.pdnsoft.com From lists at alteeve.ca Sun Mar 30 18:13:40 2014 From: lists at alteeve.ca (Digimer) Date: Sun, 30 Mar 2014 14:13:40 -0400 Subject: [Linux-cluster] GFS2 unformat helper tool In-Reply-To: <53382B8B.8060204@pdnsoft.com> References: <53382B8B.8060204@pdnsoft.com> Message-ID: <53385ED4.7040006@alteeve.ca> On 30/03/14 10:34 AM, Hamid Jafarian wrote: > Hi, > > We developed GFS2 volume unformat helper tool. > Read about this code at: > http://pdnsoft.com/en/web/pdnen/blog/-/blogs/gfs2-unformat-helper-tool-1 > > Regards Thanks for sharing this! Madi -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? From christian.masopust at siemens.com Mon Mar 31 05:52:40 2014 From: christian.masopust at siemens.com (Masopust, Christian) Date: Mon, 31 Mar 2014 05:52:40 +0000 Subject: [Linux-cluster] IP address for replies In-Reply-To: References: <7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net> Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC34A23@ATNETS9912TMSX.ww300.siemens.net> > > > Hello Christian, thanks for your advice. > > Could you please share here the relevant sections of your cluster.conf file for IPSrcAddr? > I currently use Conga web interface, but I can directly modify the config file if needed. > > Thanks! > > > Eduar Cardona Hi Eduar, sorry, I'm not allowed to give out any configuration settings, but the IPsrcaddr resource is quite easy to configure: # pcs resource describe IPsrcaddr Resource options for: ocf:heartbeat:IPsrcaddr ipaddress (required): The IP address. cidr_netmask: The netmask for the interface in CIDR format. (ie, 24), or in dotted quad notation 255.255.255.0). br, christian