From stephen.rankin at stfc.ac.uk  Mon Mar 10 18:15:08 2014
From: stephen.rankin at stfc.ac.uk (stephen.rankin at stfc.ac.uk)
Date: Mon, 10 Mar 2014 18:15:08 +0000
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
Message-ID: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>

Hello,



When using gfs2 with quotas on a SAN that is providing storage to two
clustered systems running CentOS6.5, one of the systems
can crash. This crash appears to be caused when a user tries
to add something to a SAN disk when they have exceeded their
quota on that disk. Sometimes a stack trace is produced in /var/log/messages
which appears to indicate that it was gfs2 that caused the problem.
At the same time you get the gfs2 stack trace you also see problems
with someone exceeding their quota.

The stack trace is below.

Has anyone got a solution to this, other than switching of quotas? I have
switched of quotas which appears to have stabilised the system so far, but I
do need the quotas on.

Your help is appreciated.

Stephen Rankin
STFC, RAL, ISIS

Mar  5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355
Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660) returned NULL: Success
Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid DN syntax
Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed: Invalid DN syntax
Mar  5 11:41:46 chadwick kernel: ------------[ cut here ]------------
Mar  5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26 __list_add+0x6d/0xa0() (Not tainted)
Mar  5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910
Mar  5 11:41:46 chadwick kernel: list_add corruption. next->prev should be prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0).
Mar  5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Mar  5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted 2.6.32-431.3.1.el6.x86_64 #1
Mar  5 11:41:46 chadwick kernel: Call Trace:
Mar  5 11:41:46 chadwick kernel: [<ffffffff81071e27>] ? warn_slowpath_common+0x87/0xc0
Mar  5 11:41:46 chadwick kernel: [<ffffffff81071f16>] ? warn_slowpath_fmt+0x46/0x50
Mar  5 11:41:46 chadwick kernel: [<ffffffff812944ed>] ? __list_add+0x6d/0xa0
Mar  5 11:41:46 chadwick kernel: [<ffffffff811a6c02>] ? new_inode+0x72/0xb0
Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f45d5>] ? gfs2_create_inode+0x1b5/0x1150 [gfs2]
Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f3986>] ? gfs2_glock_nq_init+0x16/0x40 [gfs2]
Mar  5 11:41:46 chadwick kernel: [<ffffffffa03ffc74>] ? gfs2_mkdir+0x24/0x30 [gfs2]
Mar  5 11:41:46 chadwick kernel: [<ffffffff8122766f>] ? security_inode_mkdir+0x1f/0x30
Mar  5 11:41:46 chadwick kernel: [<ffffffff81198149>] ? vfs_mkdir+0xd9/0x140
Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ab67>] ? sys_mkdirat+0xc7/0x1b0
Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ac68>] ? sys_mkdir+0x18/0x20
Mar  5 11:41:46 chadwick kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Mar  5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]---
Mar  5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355
Mar  5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1' creation detected
Mar  5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt
Mar  5 11:41:47 chadwick abrtd: Can't open file '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or directory
Mar  5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded for user 101355




-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140310/99cb5aaa/attachment.htm>

From adas at redhat.com  Mon Mar 10 19:38:06 2014
From: adas at redhat.com (Abhijith Das)
Date: Mon, 10 Mar 2014 15:38:06 -0400 (EDT)
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
Message-ID: <382425903.12347329.1394480286571.JavaMail.zimbra@redhat.com>



----- Original Message -----
> From: "stephen rankin" <stephen.rankin at stfc.ac.uk>
> To: linux-cluster at redhat.com
> Sent: Monday, March 10, 2014 1:15:08 PM
> Subject: [Linux-cluster] gfs2 and quotas - system crash
> 
> Hello,
> 
> 
> 
> When using gfs2 with quotas on a SAN that is providing storage to two
> clustered systems running CentOS6.5, one of the systems
> can crash. This crash appears to be caused when a user tries
> to add something to a SAN disk when they have exceeded their
> quota on that disk. Sometimes a stack trace is produced in /var/log/messages
> which appears to indicate that it was gfs2 that caused the problem.
> At the same time you get the gfs2 stack trace you also see problems
> with someone exceeding their quota.
> 
> The stack trace is below.
> 
> Has anyone got a solution to this, other than switching of quotas? I have
> switched of quotas which appears to have stabilised the system so far, but I
> do need the quotas on.
> 
> Your help is appreciated.
> 

Hi Stephen,

We have another report of this bug when gfs2 was exported using NFS. 
https://bugzilla.redhat.com/show_bug.cgi?id=1059808. Are you using
NFS in your setup as well? We have not able to reproduce it to figure
out what might be going on. Do you have a set procedure that you're
able to recreate with reliably? If so, it would be of great help.
Also, more info about your setup (file sizes, number of files, how
many nodes mounting gfs2, what kinds of operations are being run)
etc would be helpful as well.

Cheers!
--Abhi

> Stephen Rankin
> STFC, RAL, ISIS
> 
> Mar  5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded
> for user 101355
> Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660)
> returned NULL: Success
> Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid
> DN syntax
> Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed:
> Invalid DN syntax
> Mar  5 11:41:46 chadwick kernel: ------------[ cut here ]------------
> Mar  5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26
> __list_add+0x6d/0xa0() (Not tainted)
> Mar  5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910
> Mar  5 11:41:46 chadwick kernel: list_add corruption. next->prev should be
> prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0).
> Mar  5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge
> autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc
> 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support
> dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses
> enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod
> cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi
> ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: speedstep_lib]
> Mar  5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted
> 2.6.32-431.3.1.el6.x86_64 #1
> Mar  5 11:41:46 chadwick kernel: Call Trace:
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81071e27>] ?
> warn_slowpath_common+0x87/0xc0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81071f16>] ?
> warn_slowpath_fmt+0x46/0x50
> Mar  5 11:41:46 chadwick kernel: [<ffffffff812944ed>] ? __list_add+0x6d/0xa0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff811a6c02>] ? new_inode+0x72/0xb0
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f45d5>] ?
> gfs2_create_inode+0x1b5/0x1150 [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f3986>] ?
> gfs2_glock_nq_init+0x16/0x40 [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03ffc74>] ? gfs2_mkdir+0x24/0x30
> [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8122766f>] ?
> security_inode_mkdir+0x1f/0x30
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81198149>] ? vfs_mkdir+0xd9/0x140
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ab67>] ?
> sys_mkdirat+0xc7/0x1b0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ac68>] ? sys_mkdir+0x18/0x20
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8100b072>] ?
> system_call_fastpath+0x16/0x1b
> Mar  5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]---
> Mar  5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded
> for user 101355
> Mar  5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1'
> creation detected
> Mar  5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt
> Mar  5 11:41:47 chadwick abrtd: Can't open file
> '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or
> directory
> Mar  5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded
> for user 101355
> 
> 
> 
> 
> --
> Scanned by iCritical.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From ajb2 at mssl.ucl.ac.uk  Mon Mar 10 20:46:57 2014
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Mon, 10 Mar 2014 20:46:57 +0000
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
Message-ID: <531E24C1.5000005@mssl.ucl.ac.uk>

On 10/03/14 18:15, stephen.rankin at stfc.ac.uk wrote:
>
> Hello,
>
> When using gfs2 with quotas on a SAN that is providing storage to two
> clustered systems running CentOS6.5, one of the systems
> can crash. This crash appears to be caused when a user tries
> to add something to a SAN disk when they have exceeded their
> quota on that disk. Sometimes a stack trace is produced in 
> /var/log/messages
> which appears to indicate that it was gfs2 that caused the problem.
> At the same time you get the gfs2 stack trace you also see problems
> with someone exceeding their quota.
>

We have exactly the same problem and an open ticket with RH support.

They've been trying to finger FS corruption as the cause.






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140310/c5c067c0/attachment.htm>

From ajb2 at mssl.ucl.ac.uk  Mon Mar 10 20:48:29 2014
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Mon, 10 Mar 2014 20:48:29 +0000
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
Message-ID: <531E251D.10003@mssl.ucl.ac.uk>

On 10/03/14 18:15, stephen.rankin at stfc.ac.uk wrote:
>
> Hello,
>
> When using gfs2 with quotas on a SAN that is providing storage to two
> clustered systems running CentOS6.5,
>

As a matter of interest: how are you exporting the storage, or is this 
integral to the cluster itself?






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140310/e119ff4c/attachment.htm>

From stephen.rankin at stfc.ac.uk  Tue Mar 11 09:47:40 2014
From: stephen.rankin at stfc.ac.uk (stephen.rankin at stfc.ac.uk)
Date: Tue, 11 Mar 2014 09:47:40 +0000
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <531E251D.10003@mssl.ucl.ac.uk>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
	<531E251D.10003@mssl.ucl.ac.uk>
Message-ID: <4EC8429AA448A54D86E52F450C43247E74223C24@EXCHMBX03.fed.cclrc.ac.uk>

The storage is a separate Hitachi SAN connected by 4Gig fibre channel, which itself does not report any problems when the crash happens. With the quota switched off, all is fine.

From: Alan Brown [mailto:ajb2 at mssl.ucl.ac.uk]
Sent: 10 March 2014 20:48
To: linux clustering
Subject: Re: [Linux-cluster] gfs2 and quotas - system crash

On 10/03/14 18:15, stephen.rankin at stfc.ac.uk<mailto:stephen.rankin at stfc.ac.uk> wrote:

Hello,



When using gfs2 with quotas on a SAN that is providing storage to two
clustered systems running CentOS6.5,

As a matter of interest: how are you exporting the storage, or is this integral to the cluster itself?




-- 
Scanned by iCritical.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140311/27f43fb9/attachment.htm>

From swhiteho at redhat.com  Tue Mar 11 10:01:30 2014
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Tue, 11 Mar 2014 10:01:30 +0000
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <4EC8429AA448A54D86E52F450C43247E74223C24@EXCHMBX03.fed.cclrc.ac.uk>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
	<531E251D.10003@mssl.ucl.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223C24@EXCHMBX03.fed.cclrc.ac.uk>
Message-ID: <1394532090.2747.5.camel@menhir>

Hi,

On Tue, 2014-03-11 at 09:47 +0000, stephen.rankin at stfc.ac.uk wrote:
> The storage is a separate Hitachi SAN connected by 4Gig fibre channel,
> which itself does not report any problems when the crash happens. With
> the quota switched off, all is fine.
> 
Are you exporting that GFS2 filesystem via NFS, or can you reproduce
this without NFS? Also, what kind of workload is involved? Is this a lot
of small files, or are they mostly larger ones? Are they being
read/written sequentially or randomly? Is there anything unusual going
on (e.g. use of splice, ACLs or non-standard mount options, etc.)

Steve.





From stephen.rankin at stfc.ac.uk  Tue Mar 11 10:43:24 2014
From: stephen.rankin at stfc.ac.uk (stephen.rankin at stfc.ac.uk)
Date: Tue, 11 Mar 2014 10:43:24 +0000
Subject: [Linux-cluster] gfs2 and quotas - system crash
In-Reply-To: <382425903.12347329.1394480286571.JavaMail.zimbra@redhat.com>
References: <4EC8429AA448A54D86E52F450C43247E7421E76B@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E742239E7@EXCHMBX03.fed.cclrc.ac.uk>
	<4EC8429AA448A54D86E52F450C43247E74223A18@EXCHMBX03.fed.cclrc.ac.uk>
	<382425903.12347329.1394480286571.JavaMail.zimbra@redhat.com>
Message-ID: <4EC8429AA448A54D86E52F450C43247E74223C93@EXCHMBX03.fed.cclrc.ac.uk>

No we are not using NFS. Our setup is:

1. Two node cluster with two node option.
2. Hitachi SAN (RAID 6) connected to both nodes via 4 Gig.
3. 10TB, two 4TB and one 2TB disks to each node using gfs2 (separate file systems on each disk) with user quota enabled. Only the two nodes in the cluster mount the drives. 
4. User fills up their quota on the 10TB disk and the system crashes (which appears to be a consistent outcome).

The quota was only 10G for the user so they were not using a vast amount of space. In total 5TB is currently used on the drive:

Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/vg_chadwick-LogVol00     9.8G  3.4G  6.0G  36% /
tmpfs                                253G   47M  253G   1% /dev/shm
/dev/mapper/mpathap3                1008M  148M  810M  16% /boot
/dev/mapper/vg_chadwick-LogVol06      11T  4.7T  5.8T  45% /home
/dev/mapper/vg_chadwick-LogVol05     9.8G  7.5G  1.8G  82% /opt
/dev/mapper/vg_chadwick-LogVol01     5.0G  140M  4.6G   3% /tmp
/dev/mapper/vg_chadwick-LogVol02     9.8G  8.3G  976M  90% /usr
/dev/mapper/vg_chadwick-LogVol03     5.0G  2.7G  2.1G  57% /var
/dev/mapper/sanvg1-sanlv1            4.0T  2.9T  1.2T  71% /san1
/dev/mapper/sanvg2-sanlv2            4.0T  3.2T  851G  80% /san2
/dev/mapper/sanvg3-sanlv3            2.0T  1.8T  259G  88% /san3
/dev/mapper/sanvg4-lvol0              10T  5.1T  5.0T  51% /san4

Filesystem                              Inodes   IUsed      IFree IUse% Mounted on
/dev/mapper/vg_chadwick-LogVol00        647168   54317     592851    9% /
tmpfs                                 66157732      58   66157674    1% /dev/shm
/dev/mapper/mpathap3                     65536      62      65474    1% /boot
/dev/mapper/vg_chadwick-LogVol06     749502464 1002734  748499730    1% /home
/dev/mapper/vg_chadwick-LogVol05        647168  236023     411145   37% /opt
/dev/mapper/vg_chadwick-LogVol01        327680     378     327302    1% /tmp
/dev/mapper/vg_chadwick-LogVol02        647168  318728     328440   50% /usr
/dev/mapper/vg_chadwick-LogVol03        327680    7228     320452    3% /var
/dev/mapper/sanvg1-sanlv1            320266537  140997  320125540    1% /san1
/dev/mapper/sanvg2-sanlv2            223028034   44074  222983960    1% /san2
/dev/mapper/sanvg3-sanlv3             67820453    8357   67812096    1% /san3
/dev/mapper/sanvg4-lvol0            1336002497  392526 1335609971    1% /san4

Thanks,

Stephen.

-----Original Message-----
From: Abhijith Das [mailto:adas at redhat.com] 
Sent: 10 March 2014 19:38
To: linux clustering
Subject: Re: [Linux-cluster] gfs2 and quotas - system crash



----- Original Message -----
> From: "stephen rankin" <stephen.rankin at stfc.ac.uk>
> To: linux-cluster at redhat.com
> Sent: Monday, March 10, 2014 1:15:08 PM
> Subject: [Linux-cluster] gfs2 and quotas - system crash
> 
> Hello,
> 
> 
> 
> When using gfs2 with quotas on a SAN that is providing storage to two 
> clustered systems running CentOS6.5, one of the systems can crash. 
> This crash appears to be caused when a user tries to add something to 
> a SAN disk when they have exceeded their quota on that disk. Sometimes 
> a stack trace is produced in /var/log/messages which appears to 
> indicate that it was gfs2 that caused the problem.
> At the same time you get the gfs2 stack trace you also see problems 
> with someone exceeding their quota.
> 
> The stack trace is below.
> 
> Has anyone got a solution to this, other than switching of quotas? I 
> have switched of quotas which appears to have stabilised the system so 
> far, but I do need the quotas on.
> 
> Your help is appreciated.
> 

Hi Stephen,

We have another report of this bug when gfs2 was exported using NFS. 
https://bugzilla.redhat.com/show_bug.cgi?id=1059808. Are you using NFS in your setup as well? We have not able to reproduce it to figure out what might be going on. Do you have a set procedure that you're able to recreate with reliably? If so, it would be of great help.
Also, more info about your setup (file sizes, number of files, how many nodes mounting gfs2, what kinds of operations are being run) etc would be helpful as well.

Cheers!
--Abhi

> Stephen Rankin
> STFC, RAL, ISIS
> 
> Mar  5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota 
> exceeded for user 101355 Mar  5 11:40:50 chadwick nslcd[11420]: 
> [767df3] ldap_explode_dn(usi660) returned NULL: Success Mar  5 
> 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid 
> DN syntax Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of 
> user usi660 failed:
> Invalid DN syntax
> Mar  5 11:41:46 chadwick kernel: ------------[ cut here ]------------ 
> Mar  5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26
> __list_add+0x6d/0xa0() (Not tainted)
> Mar  5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910 Mar  5 
> 11:41:46 chadwick kernel: list_add corruption. next->prev should be 
> prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0).
> Mar  5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs 
> bridge
> autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe 
> libfc 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt 
> iTCO_vendor_support dcdbas serio_raw ixgbe dca ptp pps_core mdio 
> lpc_ich mfd_core sg ses enclosure i7core_edac edac_core bnx2 ext4 jbd2 
> mbcache dm_round_robin sr_mod cdrom sd_mod crc_t10dif qla2xxx 
> scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix megaraid_sas 
> dm_multipath dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
> speedstep_lib] Mar  5 11:41:46 chadwick kernel: Pid: 74823, comm: 
> vncserver Not tainted
> 2.6.32-431.3.1.el6.x86_64 #1
> Mar  5 11:41:46 chadwick kernel: Call Trace:
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81071e27>] ?
> warn_slowpath_common+0x87/0xc0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81071f16>] ?
> warn_slowpath_fmt+0x46/0x50
> Mar  5 11:41:46 chadwick kernel: [<ffffffff812944ed>] ? 
> __list_add+0x6d/0xa0 Mar  5 11:41:46 chadwick kernel: 
> [<ffffffff811a6c02>] ? new_inode+0x72/0xb0 Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f45d5>] ?
> gfs2_create_inode+0x1b5/0x1150 [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f3986>] ?
> gfs2_glock_nq_init+0x16/0x40 [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03ffc74>] ? 
> gfs2_mkdir+0x24/0x30 [gfs2] Mar  5 11:41:46 chadwick kernel: 
> [<ffffffff8122766f>] ?
> security_inode_mkdir+0x1f/0x30
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81198149>] ? 
> vfs_mkdir+0xd9/0x140 Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ab67>] ?
> sys_mkdirat+0xc7/0x1b0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ac68>] ? 
> sys_mkdir+0x18/0x20 Mar  5 11:41:46 chadwick kernel: [<ffffffff8100b072>] ?
> system_call_fastpath+0x16/0x1b
> Mar  5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]--- 
> Mar  5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota 
> exceeded for user 101355 Mar  5 11:41:47 chadwick abrtd: Directory 
> 'oops-2014-03-05-11:41:47-12194-1'
> creation detected
> Mar  5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to 
> Abrt Mar  5 11:41:47 chadwick abrtd: Can't open file
> '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file 
> or directory Mar  5 11:41:54 chadwick kernel: GFS2: 
> fsid=analysis:lvol0.1: quota exceeded for user 101355
> 
> 
> 
> 
> --
> Scanned by iCritical.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-- 
Scanned by iCritical.



From eranb at celltick.com  Tue Mar 11 12:02:41 2014
From: eranb at celltick.com (Eran Ben Natan)
Date: Tue, 11 Mar 2014 12:02:41 +0000
Subject: [Linux-cluster] A Newbie question about HA fail over
Message-ID: <705C9B8622696B478640B16B7A1A1B94106EF8@Cobra.celltick.com>

Hi,

I have just set up a 2 nodes RH cluster with MySQL. I was able to start the service and relocate it to the other node.
When I restart the active node, MySQL relocates automatically to the other node, but when I disconnect the active node from the network, it doesn't.
Is this behavior normal? How can I set the resource to relocate in this situation?

Thanks,

Eran Ben-Natan | R&D Infrastructure


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140311/a0acebf2/attachment.htm>

From emi2fast at gmail.com  Tue Mar 11 13:43:53 2014
From: emi2fast at gmail.com (emmanuel segura)
Date: Tue, 11 Mar 2014 14:43:53 +0100
Subject: [Linux-cluster] A Newbie question about HA fail over
In-Reply-To: <705C9B8622696B478640B16B7A1A1B94106EF8@Cobra.celltick.com>
References: <705C9B8622696B478640B16B7A1A1B94106EF8@Cobra.celltick.com>
Message-ID: <CAE7pJ3CrdLp6orRu+0rYOZOwnGRKEqENuAKTs=6h4wBO7LTzvw@mail.gmail.com>

maybe you are missing to show your cluster.conf and tell us what network of
cluster you disconneted


2014-03-11 13:02 GMT+01:00 Eran Ben Natan <eranb at celltick.com>:

>  Hi,
>
>
>
> I have just set up a 2 nodes RH cluster with MySQL. I was able to start
> the service and relocate it to the other node.
>
> When I restart the active node, MySQL relocates automatically to the other
> node, but when I disconnect the active node from the network, it doesn't.
>
> Is this behavior normal? How can I set the resource to relocate in this
> situation?
>
>
>
> Thanks,
>
>
>
> *Eran Ben-Natan | R&D Infrastructure*
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140311/d6d8f852/attachment.htm>

From lipson12 at yahoo.com  Wed Mar 12 06:31:27 2014
From: lipson12 at yahoo.com (Kaisar Ahmed Khan)
Date: Tue, 11 Mar 2014 23:31:27 -0700 (PDT)
Subject: [Linux-cluster] iscsi sysmlink  create problem
In-Reply-To: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com>
References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com>
Message-ID: <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com>





Dear Experts :


?following rule is not working to create symlink for iscsi disk , my iscsi device ID /dev/sda and want to link as /dev/iscsi/vendor_kernel

please guide me if i miss anything .


ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK", SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664"

Thanks 
?kaisar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140311/67626adf/attachment.htm>

From emi2fast at gmail.com  Wed Mar 12 11:43:37 2014
From: emi2fast at gmail.com (emmanuel segura)
Date: Wed, 12 Mar 2014 12:43:37 +0100
Subject: [Linux-cluster] iscsi sysmlink create problem
In-Reply-To: <1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com>
References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com>
	<1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com>
Message-ID: <CAE7pJ3Ah==ffS_suuazua4RXLJ81z-xrK3QySo4RT0jtL4bNhA@mail.gmail.com>

but it's a cluster problem? ummm


2014-03-12 7:31 GMT+01:00 Kaisar Ahmed Khan <lipson12 at yahoo.com>:

>
>
> Dear Experts :
>
>  following rule is not working to create symlink for iscsi disk , my iscsi
> device ID /dev/sda and want to link as /dev/iscsi/vendor_kernel
>
> please guide me if i miss anything .
>
> ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK",
> SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664"
>
> Thanks
>  kaisar
>
>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
esta es mi vida e me la vivo hasta que dios quiera
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140312/84945fef/attachment.htm>

From ekuric at redhat.com  Wed Mar 12 12:01:44 2014
From: ekuric at redhat.com (Elvir Kuric)
Date: Wed, 12 Mar 2014 13:01:44 +0100
Subject: [Linux-cluster] iscsi sysmlink create problem
In-Reply-To: <CAE7pJ3Ah==ffS_suuazua4RXLJ81z-xrK3QySo4RT0jtL4bNhA@mail.gmail.com>
References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com>	<1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com>
	<CAE7pJ3Ah==ffS_suuazua4RXLJ81z-xrK3QySo4RT0jtL4bNhA@mail.gmail.com>
Message-ID: <53204CA8.2060300@redhat.com>

On 03/12/2014 12:43 PM, emmanuel segura wrote:
> but it's a cluster problem? ummm
>
>
> 2014-03-12 7:31 GMT+01:00 Kaisar Ahmed Khan <lipson12 at yahoo.com
> <mailto:lipson12 at yahoo.com>>:
>
>
>
>     Dear Experts :
>
>      following rule is not working to create symlink for iscsi disk ,
>     my iscsi device ID /dev/sda and want to link as
>     /dev/iscsi/vendor_kernel
>
>     please guide me if i miss anything .
>
>     ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK",
>     SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664"
>
>     Thanks
>      kaisar
>
>
>
>     --
>     Linux-cluster mailing list
>     Linux-cluster at redhat.com <mailto:Linux-cluster at redhat.com>
>     https://www.redhat.com/mailman/listinfo/linux-cluster
>
>
>
>
> -- 
> esta es mi vida e me la vivo hasta que dios quiera
>
>
if you  share more information with us ( os version ) it could help.

Also below outputs can help to understand how system see device

if rhel 5  ( and clones )

#udevinfo -a -p $(udevinfo -q path -n /dev/DEVICE)

if rhel 6 ( and clones )

# udevadm info --query=all -n /dev/DEVICE --attribute-walk

where DEVICE is device you want to write udev rule for.

Kind regards,

-- 
Elvir Kuric,TSE / Red Hat / GSS EMEA / 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140312/c96f1bce/attachment.htm>

From Mark.Vallevand at UNISYS.com  Wed Mar 12 14:43:41 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Wed, 12 Mar 2014 09:43:41 -0500
Subject: [Linux-cluster] Resource placement rules
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E0C32347@USEA-EXCH8.na.uis.unisys.com>

I have resources A, B, C and D.  (Or more.)  All are using agent X.  Is there a way to simply specify that resources A, B, C and D must each run on a different node?
I can create a series of negative infinity collocation rules something like:
                collocation c1 -inf: A ( B C D )
                collocation c2 -inf: B ( A C D )
                collocation c3 -inf: C ( A B D )
                collocation c4 -inf: D ( A B C )
Is that my choice?  Will that have the affect I want?  Is there a more concise way to specify it?
It would be nice to just set an attribute saying any resource using agent X must run on its own node.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140312/b04806bf/attachment.htm>

From andreas at hastexo.com  Thu Mar 13 10:34:33 2014
From: andreas at hastexo.com (Andreas Kurz)
Date: Thu, 13 Mar 2014 11:34:33 +0100
Subject: [Linux-cluster] Resource placement rules
In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E0C32347@USEA-EXCH8.na.uis.unisys.com>
References: <99C8B2929B39C24493377AC7A121E21FC5E0C32347@USEA-EXCH8.na.uis.unisys.com>
Message-ID: <532189B9.2060108@hastexo.com>

On 2014-03-12 15:43, Vallevand, Mark K wrote:
> I have resources A, B, C and D.  (Or more.)  All are using agent X.  Is
> there a way to simply specify that resources A, B, C and D must each run
> on a different node?
> 
> I can create a series of negative infinity collocation rules something like:
> 
>                 collocation c1 ?inf: A ( B C D )
> 
>                 collocation c2 ?inf: B ( A C D )
> 
>                 collocation c3 ?inf: C ( A B D )
> 
>                 collocation c4 ?inf: D ( A B C )
> 
> Is that my choice?  Will that have the affect I want?  Is there a more
> concise way to specify it?
> 
> It would be nice to just set an attribute saying any resource using
> agent X must run on its own node.

All resources use agentX ... what keeps you from using a clone resource?

Regards,
Andreas


> 
>  
> 
> Regards.
> Mark K Vallevand   Mark.Vallevand at Unisys.com
> <mailto:Mark.Vallevand at Unisys.com>
> 
> May you live in interesting times, may you come to the attention of
> important people and may all your wishes come true.
> 
> THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
> MATERIAL and is thus for use only by the intended recipient. If you
> received this in error, please contact the sender and delete the e-mail
> and its attachments from all computers.
> 
>  
> 
> 
> 


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 287 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140313/048910c9/attachment.sig>

From pine5514 at gmail.com  Tue Mar 18 13:38:00 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Tue, 18 Mar 2014 17:08:00 +0330
Subject: [Linux-cluster] unformat gfs2
Message-ID: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>

I have accidentally reformatted a GFS cluster.
We need to unformat it.. is there any way to recover disk ?

I read this post
http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6

it say that I can use gfs2_edit to recover data.
I need more details about changing block map to 0xff

tnx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140318/c5f86165/attachment.htm>

From lists at alteeve.ca  Tue Mar 18 16:10:30 2014
From: lists at alteeve.ca (Digimer)
Date: Tue, 18 Mar 2014 12:10:30 -0400
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
Message-ID: <53286FF6.40707@alteeve.ca>

On 18/03/14 09:38 AM, Mr.Pine wrote:
> I have accidentally reformatted a GFS cluster.
> We need to unformat it.. is there any way to recover disk ?
>
> I read this post
> http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6
>
> it say that I can use gfs2_edit to recover data.
> I need more details about changing block map to 0xff
>
> tnx

Do you have a support agreement with Red Hat? If so, open a ticket with 
them. If not, then you can try also asking for help in freenode's 
#linux-cluster channel. It says "no gfs support", but that's to prevent 
confusion with tracking open tickets, which won't apply if you don't 
have official red hat support.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From rpeterso at redhat.com  Tue Mar 18 16:38:07 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 18 Mar 2014 12:38:07 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
Message-ID: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>

----- Original Message -----
> I have accidentally reformatted a GFS cluster.
> We need to unformat it.. is there any way to recover disk ?
> 
> I read this post
> http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6
> 
> it say that I can use gfs2_edit to recover data.
> I need more details about changing block map to 0xff
> 
> tnx

Hi,

Sorry to hear about your file system mishap.

It's not clear from your post whether you mean GFS or GFS2. Your subject
like says GFS2, but your comment said you reformatted it to GFS.
So my first questions are: What was it really? and What is it now?

Assuming it was, and still is, gfs2, there's another important question:
Was the file system _ever_ grown via gfs2_grow since the very first mkfs?
If so, the subsequent mkfs would most likely place the resource groups in
a different location, so the file system would be damaged beyond repair.

The next important question is: did you override any of the mkfs.gfs2
parameters either the first time or the second time, like -b, -J, -j,
or -r? Once again, if you specified a different block size (-b) or
resource group size, the second mkfs.gfs2 would have placed the
resource groups in different locations, once again damaging the original
contents beyond repair.

The third important question is: Was the device altered in any other
way, for example, mkfs.ext4 or mkfs.xfs, which might have changed things?
If so, it's probably done irreparable damage.

However, if you never ran gfs2_grow, and never overrode -b or -r during
either mkfs, the mkfs would likely have placed the resource groups in the
exact same locations as it did the first time. In that case, you might
be able to repair the file system by doing what you describe: Setting
all the bits in the bitmaps to 0xff, then letting fsck.gfs2 sort it out.

Unfortunately, there is no tool that can do this en-mass.
You could manually set the bits to 0xff with gfs2_edit, but depending
on the size of the file system, it would take a very long time.

If it was my valuable data, and I had no backup, I would first
make a block-for-block copy of the entire device so I had a sandbox
to run experiments on. Next, I'd write a program that opened the block
device, did a block-by-block search for GFS2 dinodes, then twiddle that
block's bitmap from 0 to 3. Then I'd run fsck.gfs2 to see how well it can
put the pieces back together. That program would have to be a hybrid
with pieces pulled from fsck.gfs2 and gfs2_edit. It's no small task,
and you'd have to know what you're doing. Unfortunately, there are only
a handful of programmers who know enough about this to do it correctly
(I'm one of them). All of them work for Red Hat. As much as it sounds
like a fun project, it would probably be considered a conflict of
interest, unless you somehow hired Red Hat and got my management
involved. If I got their blessing, I'd be happy to do this. The trouble is,
even with such a program, there are no guarantees that your file system
would be back the way it used to be. It would likely be cheaper and
better to restore from a backup.

Regards,

Bob Peterson
Red Hat File Systems



From pine5514 at gmail.com  Tue Mar 18 17:11:27 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Tue, 18 Mar 2014 20:41:27 +0330
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>

On Tue, Mar 18, 2014 at 8:08 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>
> ----- Original Message -----
> > I have accidentally reformatted a GFS cluster.
> > We need to unformat it.. is there any way to recover disk ?
> >
> > I read this post
> > http://web.archiveorange.com/archive/v/TUhSn11xEn9QxXBIZ0k6
> >
> > it say that I can use gfs2_edit to recover data.
> > I need more details about changing block map to 0xff
> >
> > tnx
>
> Hi,
>
> Sorry to hear about your file system mishap.
>
> It's not clear from your post whether you mean GFS or GFS2. Your subject
> like says GFS2, but your comment said you reformatted it to GFS.
> So my first questions are: What was it really? and What is it now?

It's GFS2 .

>
> Assuming it was, and still is, gfs2, there's another important question:
> Was the file system _ever_ grown via gfs2_grow since the very first mkfs?
> If so, the subsequent mkfs would most likely place the resource groups in
> a different location, so the file system would be damaged beyond repair.

Without any use of gfs2_grow,

>
> The next important question is: did you override any of the mkfs.gfs2
> parameters either the first time or the second time, like -b, -J, -j,
> or -r? Once again, if you specified a different block size (-b) or
> resource group size, the second mkfs.gfs2 would have placed the
> resource groups in different locations, once again damaging the original
> contents beyond repair.
>

All options are equal ..

> The third important question is: Was the device altered in any other
> way, for example, mkfs.ext4 or mkfs.xfs, which might have changed things?
> If so, it's probably done irreparable damage

No,
.
>
> However, if you never ran gfs2_grow, and never overrode -b or -r during
> either mkfs, the mkfs would likely have placed the resource groups in the
> exact same locations as it did the first time. In that case, you might
> be able to repair the file system by doing what you describe: Setting
> all the bits in the bitmaps to 0xff, then letting fsck.gfs2 sort it out.
>
> Unfortunately, there is no tool that can do this en-mass.
> You could manually set the bits to 0xff with gfs2_edit, but depending
> on the size of the file system, it would take a very long time.
>
> If it was my valuable data, and I had no backup, I would first
> make a block-for-block copy of the entire device so I had a sandbox
> to run experiments on. Next, I'd write a program that opened the block
> device, did a block-by-block search for GFS2 dinodes, then twiddle that
> block's bitmap from 0 to 3. Then I'd run fsck.gfs2 to see how well it can

There is many many block types (http://linux.die.net/man/8/gfs2_edit),
Do we find 4 (Dinode)? instead of for example 3 (Resource Group Bitmap)?

How we could fine bitmaps?

> put the pieces back together. That program would have to be a hybrid
> with pieces pulled from fsck.gfs2 and gfs2_edit. It's no small task,
> and you'd have to know what you're doing. Unfortunately, there are only
> a handful of programmers who know enough about this to do it correctly
> (I'm one of them). All of them work for Red Hat. As much as it sounds
> like a fun project, it would probably be considered a conflict of
> interest, unless you somehow hired Red Hat and got my management
> involved. If I got their blessing, I'd be happy to do this. The trouble is,
> even with such a program, there are no guarantees that your file system
> would be back the way it used to be. It would likely be cheaper and
> better to restore from a backup.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From rpeterso at redhat.com  Tue Mar 18 17:30:44 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 18 Mar 2014 13:30:44 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
Message-ID: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>

----- Original Message -----
> It's GFS2 .
> Without any use of gfs2_grow,
> All options are equal ..
> No,
> 
> There is many many block types (http://linux.die.net/man/8/gfs2_edit),
> Do we find 4 (Dinode)? instead of for example 3 (Resource Group Bitmap)?
> 
> How we could fine bitmaps?

Hi,

To use gfs2_edit properly, you should have an understanding of how the
gfs2 file system is kept on disk. If you are a Red Hat customer, I have
several videos on the Red Hat customer portal on how to use gfs2_edit.

The dinode blocks are type 4. There are two kinds of bitmaps: bitmaps
associated with rgrps (type 2) and bitmaps that follow rgrps (type 3).
The rgrps are indexed by the rindex system file in the master directory,
and ri_length tells you how many bitmap blocks follow each rgrp block.

In newer versions (RHEL6+) there are little helper functions in gfs2_edit
that can tell you the bitmap status, and alter it. For example:

gfs2_edit -p <block number> blocktype /dev/your/device

This command will tell you the block type, for example:
# gfs2_edit -p root blocktype /dev/mpathc/scratch
4 (Block 22 is type 4: Dinode)

gfs2_edit -p <block number> blockalloc /dev/your/device
This command will tell you the current bitmap setting. The bitmap setting
may be changed to "3" (dinode) with:
gfs2_edit -p <block number> blockalloc 3 /dev/your/device

So you could write a script to do it, but again, you would have to be
careful, and work on a copy, never the original.

Regards,

Bob Peterson
Red Hat File Systems



From pine5514 at gmail.com  Tue Mar 18 17:43:23 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Tue, 18 Mar 2014 21:13:23 +0330
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>

On Tue, Mar 18, 2014 at 9:00 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> It's GFS2 .
>> Without any use of gfs2_grow,
>> All options are equal ..
>> No,
>>
>> There is many many block types (http://linux.die.net/man/8/gfs2_edit),
>> Do we find 4 (Dinode)? instead of for example 3 (Resource Group Bitmap)?
>>
>> How we could fine bitmaps?
>
> Hi,
>
> To use gfs2_edit properly, you should have an understanding of how the
> gfs2 file system is kept on disk. If you are a Red Hat customer, I have
> several videos on the Red Hat customer portal on how to use gfs2_edit.
>
> The dinode blocks are type 4. There are two kinds of bitmaps: bitmaps
> associated with rgrps (type 2) and bitmaps that follow rgrps (type 3).
> The rgrps are indexed by the rindex system file in the master directory,
> and ri_length tells you how many bitmap blocks follow each rgrp block.
>
> In newer versions (RHEL6+) there are little helper functions in gfs2_edit
> that can tell you the bitmap status, and alter it. For example:
>
> gfs2_edit -p <block number> blocktype /dev/your/device
>
> This command will tell you the block type, for example:
> # gfs2_edit -p root blocktype /dev/mpathc/scratch
> 4 (Block 22 is type 4: Dinode)
>
> gfs2_edit -p <block number> blockalloc /dev/your/device
> This command will tell you the current bitmap setting. The bitmap setting
> may be changed to "3" (dinode) with:
> gfs2_edit -p <block number> blockalloc 3 /dev/your/device
>
> So you could write a script to do it, but again, you would have to be
> careful, and work on a copy, never the original.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>

What is your opinion about this scrip?

for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3
/dev/sdb >/dev/null 2>&1; done

Could we change all of block allocations to "3"?

> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From rpeterso at redhat.com  Tue Mar 18 17:55:48 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 18 Mar 2014 13:55:48 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
Message-ID: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>

----- Original Message -----
> What is your opinion about this scrip?
> 
> for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3
> /dev/sdb >/dev/null 2>&1; done
> 
> Could we change all of block allocations to "3"?

Hi,

That would be dangerous. I would hope that the resource groups would be
ignored, but I've never done it. I wouldn't be surprised if you had
gfs2_edit segfault for some of it. At the very least, you would be turning
all your journal blocks to appear like dinodes, as well as all the
extended attributes, directory leaf blocks, etc., which will confuse
fsck.gfs2. The fsck.gfs2 will do a much better job if you only change
the dinode blocks from 0 to 3.

You would be much better off writing a loop that first checked the
block's current type, with -p <block number> blocktype, and only
change its bit to 3 if it is type 4 (dinode).

Also, the script would take a very long time, because it's going to
invoke gfs2_edit a trillion and a half times. Writing a program to
do this once would be quicker.

Regards,

Bob Peterson
Red Hat File Systems



From rpeterso at redhat.com  Tue Mar 18 18:06:40 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 18 Mar 2014 14:06:40 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
Message-ID: <296851692.1537300.1395166000377.JavaMail.zimbra@redhat.com>

----- Original Message -----
> ----- Original Message -----
> > What is your opinion about this scrip?
> > 
> > for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3
> > /dev/sdb >/dev/null 2>&1; done
> > 
> > Could we change all of block allocations to "3"?
> 
> Hi,
> 
> gfs2_edit segfault for some of it. At the very least, you would be turning
> all your journal blocks to appear like dinodes, as well as all the

This is worth clarifying:

You should be careful with the journals. The journal blocks may look
like dinodes, but they should be marked as data blocks. That's because
the journal's data can contain dinodes. Will fsck.gfs2 figure it out
properly? I don't know. It might, but it might not. Better to be safe
with your data.

Regards,

Bob Peterson
Red Hat File Systems



From pine5514 at gmail.com  Tue Mar 18 18:11:07 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Tue, 18 Mar 2014 21:41:07 +0330
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>

On Tue, Mar 18, 2014 at 9:25 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> What is your opinion about this scrip?
>>
>> for ((i = 17; i < 1756377984; ++i)); do gfs2_edit -p $i blockalloc 3
>> /dev/sdb >/dev/null 2>&1; done
>>
>> Could we change all of block allocations to "3"?
>
> Hi,
>
> That would be dangerous. I would hope that the resource groups would be
> ignored, but I've never done it. I wouldn't be surprised if you had
> gfs2_edit segfault for some of it. At the very least, you would be turning
> all your journal blocks to appear like dinodes, as well as all the
> extended attributes, directory leaf blocks, etc., which will confuse
> fsck.gfs2. The fsck.gfs2 will do a much better job if you only change
> the dinode blocks from 0 to 3.
>
> You would be much better off writing a loop that first checked the
> block's current type, with -p <block number> blocktype, and only
> change its bit to 3 if it is type 4 (dinode).
>
> Also, the script would take a very long time, because it's going to
> invoke gfs2_edit a trillion and a half times. Writing a program to
> do this once would be quicker.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi,

Do you mean program likes this?

for ((i = 17; i < 1756377984; ++i)); do
        ss=$(gfs2_edit -p $i blocktype /dev/sdc | cut -d " " -f 1);
        if [[ $ss -eq 4 ]]; then
                gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1;
        fi
done

I'm a C/C++ programmer, if you trust program logic, i would try
to implement with C/C++. and would public in reply.

Regards.
Pine.



From rpeterso at redhat.com  Tue Mar 18 18:22:19 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 18 Mar 2014 14:22:19 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
Message-ID: <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Hi,
> 
> Do you mean program likes this?
> 
> for ((i = 17; i < 1756377984; ++i)); do
>         ss=$(gfs2_edit -p $i blocktype /dev/sdc | cut -d " " -f 1);
>         if [[ $ss -eq 4 ]]; then
>                 gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1;
>         fi
> done
> 
> I'm a C/C++ programmer, if you trust program logic, i would try
> to implement with C/C++. and would public in reply.
> 
> Regards.
> Pine.

Hi,

Yes, you can do something like that, but again, do not include the
journal's blocks. You can do gfs2_edit -p master /dev/sdc to
determine the block of the quota file, which should be past the
journals. Then use that value for the starting point of i.
For example:
# gfs2_edit -p master /dev/mpathc/scratch | grep quota
   8/8 [6c1c0fed] 12/33132 (0xc/0x816c): File    quota
for ((i = 33133; ...

Regards,

Bob Peterson
Red Hat File Systems



From pine5514 at gmail.com  Tue Mar 18 18:31:24 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Tue, 18 Mar 2014 22:01:24 +0330
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<746485932.1415874.1395160687725.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>

On Tue, Mar 18, 2014 at 9:52 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> Hi,
>>
>> Do you mean program likes this?
>>
>> for ((i = 17; i < 1756377984; ++i)); do
>>         ss=$(gfs2_edit -p $i blocktype /dev/sdc | cut -d " " -f 1);
>>         if [[ $ss -eq 4 ]]; then
>>                 gfs2_edit -p $i blockalloc 3 /dev/sdb >/dev/null 2>&1;
>>         fi
>> done
>>
>> I'm a C/C++ programmer, if you trust program logic, i would try
>> to implement with C/C++. and would public in reply.
>>
>> Regards.
>> Pine.
>
> Hi,
>
> Yes, you can do something like that, but again, do not include the
> journal's blocks. You can do gfs2_edit -p master /dev/sdc to
> determine the block of the quota file, which should be past the
> journals. Then use that value for the starting point of i.
> For example:
> # gfs2_edit -p master /dev/mpathc/scratch | grep quota
>    8/8 [6c1c0fed] 12/33132 (0xc/0x816c): File    quota
> for ((i = 33133; ...
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

Hi,

Output of this command on my system:

gfs2_edit -p master /dev/sdb | grep quota
     8. (8). 264950 (0x40af6): File    quota

Do you mean "i" would start from 264950?
All blocks before 264950 are journal blocks?

Regards
Pine.



From rpeterso at redhat.com  Tue Mar 18 18:36:08 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Tue, 18 Mar 2014 14:36:08 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
Message-ID: <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Hi,
> 
> Output of this command on my system:
> 
> gfs2_edit -p master /dev/sdb | grep quota
>      8. (8). 264950 (0x40af6): File    quota
> 
> Do you mean "i" would start from 264950?
> All blocks before 264950 are journal blocks?
> 
> Regards
> Pine.

Yes, exactly. You could try that and see how well it works,
but again, it might take a very long time to do more than
3 trillion invocations of the program.

Regards,

Bob Peterson
Red Hat File Systems



From pine5514 at gmail.com  Tue Mar 18 18:42:47 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Tue, 18 Mar 2014 22:12:47 +0330
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>

Tnx alot,

I would run script and send back result.
Regards.
Pine.


On Tue, Mar 18, 2014 at 10:06 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> Hi,
>>
>> Output of this command on my system:
>>
>> gfs2_edit -p master /dev/sdb | grep quota
>>      8. (8). 264950 (0x40af6): File    quota
>>
>> Do you mean "i" would start from 264950?
>> All blocks before 264950 are journal blocks?
>>
>> Regards
>> Pine.
>
> Yes, exactly. You could try that and see how well it works,
> but again, it might take a very long time to do more than
> 3 trillion invocations of the program.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From lists at alteeve.ca  Wed Mar 19 01:27:04 2014
From: lists at alteeve.ca (Digimer)
Date: Tue, 18 Mar 2014 21:27:04 -0400
Subject: [Linux-cluster] Adding a stop timeout to a VM service using 'ccs'
Message-ID: <5328F268.9080605@alteeve.ca>

Hi all,

   I would like to tell rgmanager to give more time for VMs to stop. I 
want this:

<vm name="vm01-win2008" domain="primary_n01" autostart="0" 
path="/shared/definitions/" exclusive="0" recovery="restart" 
max_restarts="2" restart_expire_time="600">
   <action name="stop" timeout="10m" />
</vm>

I already use ccs to create the entry:

<vm name="vm01-win2008" domain="primary_n01" autostart="0" 
path="/shared/definitions/" exclusive="0" recovery="restart" 
max_restarts="2" restart_expire_time="600"/>

via:

ccs -h localhost --activate --sync --password "secret" \
  --addvm vm01-win2008 \
  --domain="primary_n01" \
  path="/shared/definitions/" \
  autostart="0" \
  exclusive="0" \
  recovery="restart" \
  max_restarts="2" \
  restart_expire_time="600"

I'm hoping it's a simple additional switch. :)

Thanks!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From pine5514 at gmail.com  Wed Mar 19 06:23:14 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Wed, 19 Mar 2014 10:53:14 +0430
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<CAJ7bT7WJwLPyemRWbtJ9og6KdZOFk4u_Bw2cSy20hCRRdfBegw@mail.gmail.com>
	<1619187131.1466288.1395163844339.JavaMail.zimbra@redhat.com>
	<CAJ7bT7U6bhi92-BT6nCtuAYnm8+p45nFoC+ZM3Bj7izZNdLsvA@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>
Message-ID: <CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>

On Tue, Mar 18, 2014 at 11:12 PM, Mr.Pine <pine5514 at gmail.com> wrote:
> Tnx alot,
>
> I would run script and send back result.
> Regards.
> Pine.
>
>
Hi,

Scripts is very very slow, so i should write program in c/c++.

I need some confidence about data structures and data location on disk.
As i reviewed blocks of data:

All reserved blocks (GFS2 specific blocks) start by : 0x01161970
Blocktype store location is at Byte # 8,
Type of start block of each resource group is: 2
Bitmaps are in block types 2 & 3.
In block type 2, bitmap info starts from Byte # 129
In block type 3, bitmap info starts from Byte # 25
Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex /dev/..)

Is this info right?

Logic of my program seams should be like this:

(1)
Loop in device and temporary store block id of dinode blocks, and also
their bitmap locations

(2)
Change bitmap of blocks to 3 (11)

Bob, could you confirm this?

Regards
Pine.



From rpeterso at redhat.com  Wed Mar 19 12:28:38 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Wed, 19 Mar 2014 08:28:38 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>
	<CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>
Message-ID: <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Hi,
> 
> Scripts is very very slow, so i should write program in c/c++.
> 
> I need some confidence about data structures and data location on disk.
> As i reviewed blocks of data:
> 
> All reserved blocks (GFS2 specific blocks) start by : 0x01161970
> Blocktype store location is at Byte # 8,
> Type of start block of each resource group is: 2
> Bitmaps are in block types 2 & 3.
> In block type 2, bitmap info starts from Byte # 129
> In block type 3, bitmap info starts from Byte # 25
> Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex
> /dev/..)
> 
> Is this info right?
> 
> Logic of my program seams should be like this:
> 
> (1)
> Loop in device and temporary store block id of dinode blocks, and also
> their bitmap locations
> 
> (2)
> Change bitmap of blocks to 3 (11)
> 
> Bob, could you confirm this?
> 
> Regards
> Pine.

Hi Pine,

This is correct. The length of RGs is properly determined by the values
in the "rindex" system file, but 5 is very common, and is usually constant.
(It may change if you used gfs2_grow or gfs2_convert from gfs1).
The bitmap is 2 bits per block in the resource group, and it's relative
to the start of the particular rgrp. You should probably use the same
algorithm in libgfs2 to change the proper bit in the bitmaps. You can
get this from the public gfs2-utils git tree.

Regards,

Bob Peterson
Red Hat File Systems



From mgrac at redhat.com  Wed Mar 19 14:37:50 2014
From: mgrac at redhat.com (Marek Grac)
Date: Wed, 19 Mar 2014 15:37:50 +0100
Subject: [Linux-cluster] joining the gitfence-agents group
In-Reply-To: <CAMKtzE9pjic-cXah9_wKbEhxVmcQj+K4fH88voSdfKP4dPjLfw@mail.gmail.com>
References: <CAMKtzE9pjic-cXah9_wKbEhxVmcQj+K4fH88voSdfKP4dPjLfw@mail.gmail.com>
Message-ID: <5329ABBE.70304@redhat.com>

On 03/18/2014 12:20 AM, David Smith wrote:
> any chance you can sponsor and approve so i can submit the code via git?
>
> or, if you prefer, I can send you the code modifications.
>
Hi,

sorry for late response,

The write access to git repository is still limited to very small group 
of people and I will be happy to add you there after you become regular 
contributor. Currently, please send a patch to cluster-devel at redhat.com 
where code review will be done. After that review, we will add your code 
into upstream using git-am so you will be preserved as author.

m,



From ajb2 at mssl.ucl.ac.uk  Wed Mar 19 15:16:57 2014
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Wed, 19 Mar 2014 15:16:57 +0000
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
Message-ID: <5329B4E9.9070400@site.mssl.ucl.ac.uk>

On 18/03/14 13:38, Mr.Pine wrote:
> I have accidentally reformatted a GFS cluster.
> We need to unformat it.. is there any way to recover disk ?

Backups?







From Mark.Vallevand at UNISYS.com  Wed Mar 19 19:55:44 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Wed, 19 Mar 2014 14:55:44 -0500
Subject: [Linux-cluster] Resource instance is getting restarted when a node
	is rebooted
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E11149A7@USEA-EXCH8.na.uis.unisys.com>

I'm testing my cluster configuration by rebooting nodes to see what happens.  I can't explain what I see in some cases.

The setup:  I have a cloned resource with its own agent and an IP address resource that is collocated with the cloned resource.  The IP address doesn't need to run on all of the nodes running an instance of the cloned resource.  It just needs to be on one of the nodes.  It's not cloned or meant to be load-balanced.

I do something like this:
crm -F configure <<EOF
                                        primitive IP ocf:heartbeat:IPaddr2 \
                                                params ip=10.1.1.1 nic=eth0 cidr_netmask=24 \
                                                op monitor interval=30s timeout=20s \
                                                op start timeout=30s \
                                                op stop timeout=30s
                                        primitive P ocf:heartbeat:my_agent \
                                                op monitor interval=30s timeout=10s \
                                                op start timeout=30s \
                                                op stop timeout=30s
                                        clone P_clone P \
                                                meta clone-max=2 notify="true" clone-node-max=1
                                        colocation P_withIP INFINITY: IP P_clone
                                        order P_AfterIP INFINITY: IP P_clone
                                        commit
                                        exit
EOF

This works great.  In my 2 node system, node1 has IP and P:0 on it, node2 has P:1 on it, and node3 has nothing on it.
Reboot node2.  I see P:1 start on node3.  Good.
Reboot node3.  I see P:1 start on node2.  Good.
Reboot node1.  I see P:0 and IP start on node3.  Good.  And I see P:1 restart on node2.
What's up with that?
Have I done my collocation incorrectly?  If I reboot the node that has the IP resource on it, all instances of P_clone move or restart.

Any ideas are very welcome.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140319/55db9c19/attachment.htm>

From Mark.Vallevand at UNISYS.com  Wed Mar 19 20:52:40 2014
From: Mark.Vallevand at UNISYS.com (Vallevand, Mark K)
Date: Wed, 19 Mar 2014 15:52:40 -0500
Subject: [Linux-cluster] Resource instance is getting restarted when a
 node	is rebooted
In-Reply-To: <99C8B2929B39C24493377AC7A121E21FC5E11149A7@USEA-EXCH8.na.uis.unisys.com>
References: <99C8B2929B39C24493377AC7A121E21FC5E11149A7@USEA-EXCH8.na.uis.unisys.com>
Message-ID: <99C8B2929B39C24493377AC7A121E21FC5E1114AF9@USEA-EXCH8.na.uis.unisys.com>

Never mind.  It's like one of Murphy's Laws, or at least a Murphy's Corollary.
As soon as you ask for help and describe a problem in some detail, the answer becomes obvious.
It's the order command.  Duh.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Vallevand, Mark K
Sent: Wednesday, March 19, 2014 02:56 PM
To: linux clustering
Subject: [Linux-cluster] Resource instance is getting restarted when a node is rebooted

I'm testing my cluster configuration by rebooting nodes to see what happens.  I can't explain what I see in some cases.

The setup:  I have a cloned resource with its own agent and an IP address resource that is collocated with the cloned resource.  The IP address doesn't need to run on all of the nodes running an instance of the cloned resource.  It just needs to be on one of the nodes.  It's not cloned or meant to be load-balanced.

I do something like this:
crm -F configure <<EOF
                                        primitive IP ocf:heartbeat:IPaddr2 \
                                                params ip=10.1.1.1 nic=eth0 cidr_netmask=24 \
                                                op monitor interval=30s timeout=20s \
                                                op start timeout=30s \
                                                op stop timeout=30s
                                        primitive P ocf:heartbeat:my_agent \
                                                op monitor interval=30s timeout=10s \
                                                op start timeout=30s \
                                                op stop timeout=30s
                                        clone P_clone P \
                                                meta clone-max=2 notify="true" clone-node-max=1
                                        colocation P_withIP INFINITY: IP P_clone
                                        order P_AfterIP INFINITY: IP P_clone
                                        commit
                                        exit
EOF

This works great.  In my 2 node system, node1 has IP and P:0 on it, node2 has P:1 on it, and node3 has nothing on it.
Reboot node2.  I see P:1 start on node3.  Good.
Reboot node3.  I see P:1 start on node2.  Good.
Reboot node1.  I see P:0 and IP start on node3.  Good.  And I see P:1 restart on node2.
What's up with that?
Have I done my collocation incorrectly?  If I reboot the node that has the IP resource on it, all instances of P_clone move or restart.

Any ideas are very welcome.

Regards.
Mark K Vallevand   Mark.Vallevand at Unisys.com<mailto:Mark.Vallevand at Unisys.com>
May you live in interesting times, may you come to the attention of important people and may all your wishes come true.
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140319/d0b6a0ff/attachment.htm>

From cfeist at redhat.com  Wed Mar 19 22:31:20 2014
From: cfeist at redhat.com (Chris Feist)
Date: Wed, 19 Mar 2014 17:31:20 -0500
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <5328F268.9080605@alteeve.ca>
References: <5328F268.9080605@alteeve.ca>
Message-ID: <532A1AB8.307@redhat.com>

On 03/18/2014 08:27 PM, Digimer wrote:
> Hi all,
>
>    I would like to tell rgmanager to give more time for VMs to stop. I want this:
>
> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
> path="/shared/definitions/" exclusive="0" recovery="restart" max_restarts="2"
> restart_expire_time="600">
>    <action name="stop" timeout="10m" />
> </vm>
>
> I already use ccs to create the entry:
>
> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
> path="/shared/definitions/" exclusive="0" recovery="restart" max_restarts="2"
> restart_expire_time="600"/>
>
> via:
>
> ccs -h localhost --activate --sync --password "secret" \
>   --addvm vm01-win2008 \
>   --domain="primary_n01" \
>   path="/shared/definitions/" \
>   autostart="0" \
>   exclusive="0" \
>   recovery="restart" \
>   max_restarts="2" \
>   restart_expire_time="600"
>
> I'm hoping it's a simple additional switch. :)

Unfortunately currently ccs doesn't support setting resource actions.  However 
it's my understanding that rgmanager doesn't check timeouts unless 
__enforce_timeouts is set to "1".  So you shouldn't be seeing a vm resource go 
to failed if it takes a long time to stop.  Are you trying to make the vm 
resource fail if it takes longer than 10 minutes to stop?

>
> Thanks!
>



From lists at alteeve.ca  Wed Mar 19 23:45:56 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 19 Mar 2014 19:45:56 -0400
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <532A1AB8.307@redhat.com>
References: <5328F268.9080605@alteeve.ca> <532A1AB8.307@redhat.com>
Message-ID: <532A2C34.3080903@alteeve.ca>

On 19/03/14 06:31 PM, Chris Feist wrote:
> On 03/18/2014 08:27 PM, Digimer wrote:
>> Hi all,
>>
>>    I would like to tell rgmanager to give more time for VMs to stop. I
>> want this:
>>
>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>> path="/shared/definitions/" exclusive="0" recovery="restart"
>> max_restarts="2"
>> restart_expire_time="600">
>>    <action name="stop" timeout="10m" />
>> </vm>
>>
>> I already use ccs to create the entry:
>>
>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>> path="/shared/definitions/" exclusive="0" recovery="restart"
>> max_restarts="2"
>> restart_expire_time="600"/>
>>
>> via:
>>
>> ccs -h localhost --activate --sync --password "secret" \
>>   --addvm vm01-win2008 \
>>   --domain="primary_n01" \
>>   path="/shared/definitions/" \
>>   autostart="0" \
>>   exclusive="0" \
>>   recovery="restart" \
>>   max_restarts="2" \
>>   restart_expire_time="600"
>>
>> I'm hoping it's a simple additional switch. :)
>
> Unfortunately currently ccs doesn't support setting resource actions.
> However it's my understanding that rgmanager doesn't check timeouts
> unless __enforce_timeouts is set to "1".  So you shouldn't be seeing a
> vm resource go to failed if it takes a long time to stop.  Are you
> trying to make the vm resource fail if it takes longer than 10 minutes
> to stop?

I was afraid you were going to say that. :(

The problem is that after calling 'disable' against the VM service, 
rgmanager waits two minutes. If the service isn't closed in that time, 
the server is forced off (at least, this was the behaviour when I last 
tested this).

The concern is that, by default, windows installs queue updates to 
install when the system shuts down. During this time, windows makes it 
very clear that you should not power off the system during the updates. 
So if this timer is hit, and the VM is forced off, the guest OS can be 
damaged.

Of course, we can debate the (lack of) wisdom of this behaviour, and I 
already document this concern (and even warn people to check for updates 
before stopping the server), it's not sufficient. If a user doesn't read 
the warning, or simply forgets to check, the consequences can be 
non-trivial.

If ccs can't be made to add this attribute, and if the behaviour 
persists (I will test shortly after sending this reply), then I will 
have to edit the cluster.conf directly, something I am loath to do if at 
all avoidable.

Cheers

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From esanchezvela.redhatcluster at gmail.com  Thu Mar 20 01:13:43 2014
From: esanchezvela.redhatcluster at gmail.com (Enrique Sanchez)
Date: Wed, 19 Mar 2014 21:13:43 -0400
Subject: [Linux-cluster] RH Summit '14
In-Reply-To: <CADYQ591QvSaCzAyc-OSrTzHGd-QC_qk57z-+7hczbR8PZkHOjg@mail.gmail.com>
References: <CADYQ591QvSaCzAyc-OSrTzHGd-QC_qk57z-+7hczbR8PZkHOjg@mail.gmail.com>
Message-ID: <CAGH+ktiH6mxFnpTGZZLEFWUsEqtpu84HgSj6j4m3gUVfagHkPg@mail.gmail.com>

I just found out I am going, want me to send u a text message to meet up?


On Thu, Feb 27, 2014 at 12:04 PM, Jeff Stoner <jeff.stoner at dimensiondata.com
> wrote:

> Anyone else going to Red Hat Summit this year? Wanna meetup for a
> beer/coffee/tea/soda?
>
> --
> *Jeff Stoner *
> *Cloud Evangelist*
> Dimension Data CBU
> Tel +1-703-723-5620
> Mobile +1-703-475-7720
> jeff.stoner at dimensiondata.com
> Twitter <http://twitter.com/didatacloud>
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 
Enrique Sanchez Vela
------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140319/7ec994a0/attachment.htm>

From lists at alteeve.ca  Thu Mar 20 01:26:56 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 19 Mar 2014 21:26:56 -0400
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <532A2C34.3080903@alteeve.ca>
References: <5328F268.9080605@alteeve.ca> <532A1AB8.307@redhat.com>
	<532A2C34.3080903@alteeve.ca>
Message-ID: <532A43E0.6030603@alteeve.ca>

On 19/03/14 07:45 PM, Digimer wrote:
> On 19/03/14 06:31 PM, Chris Feist wrote:
>> On 03/18/2014 08:27 PM, Digimer wrote:
>>> Hi all,
>>>
>>>    I would like to tell rgmanager to give more time for VMs to stop. I
>>> want this:
>>>
>>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>>> path="/shared/definitions/" exclusive="0" recovery="restart"
>>> max_restarts="2"
>>> restart_expire_time="600">
>>>    <action name="stop" timeout="10m" />
>>> </vm>
>>>
>>> I already use ccs to create the entry:
>>>
>>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>>> path="/shared/definitions/" exclusive="0" recovery="restart"
>>> max_restarts="2"
>>> restart_expire_time="600"/>
>>>
>>> via:
>>>
>>> ccs -h localhost --activate --sync --password "secret" \
>>>   --addvm vm01-win2008 \
>>>   --domain="primary_n01" \
>>>   path="/shared/definitions/" \
>>>   autostart="0" \
>>>   exclusive="0" \
>>>   recovery="restart" \
>>>   max_restarts="2" \
>>>   restart_expire_time="600"
>>>
>>> I'm hoping it's a simple additional switch. :)
>>
>> Unfortunately currently ccs doesn't support setting resource actions.
>> However it's my understanding that rgmanager doesn't check timeouts
>> unless __enforce_timeouts is set to "1".  So you shouldn't be seeing a
>> vm resource go to failed if it takes a long time to stop.  Are you
>> trying to make the vm resource fail if it takes longer than 10 minutes
>> to stop?
>
> I was afraid you were going to say that. :(
>
> The problem is that after calling 'disable' against the VM service,
> rgmanager waits two minutes. If the service isn't closed in that time,
> the server is forced off (at least, this was the behaviour when I last
> tested this).
>
> The concern is that, by default, windows installs queue updates to
> install when the system shuts down. During this time, windows makes it
> very clear that you should not power off the system during the updates.
> So if this timer is hit, and the VM is forced off, the guest OS can be
> damaged.
>
> Of course, we can debate the (lack of) wisdom of this behaviour, and I
> already document this concern (and even warn people to check for updates
> before stopping the server), it's not sufficient. If a user doesn't read
> the warning, or simply forgets to check, the consequences can be
> non-trivial.
>
> If ccs can't be made to add this attribute, and if the behaviour
> persists (I will test shortly after sending this reply), then I will
> have to edit the cluster.conf directly, something I am loath to do if at
> all avoidable.
>
> Cheers

Confirmed;

I called disable on a VM with gnome running, so that I could abort the 
VM's shut down.

an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
Wed Mar 19 21:06:29 EDT 2014
Local machine disabling vm:vm01-rhel6...Success
Wed Mar 19 21:08:36 EDT 2014

2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been 
a windows guest in the middle of installing updates, it would be highly 
likely to be screwed now.

To confirm, I changed the config to:

<vm autostart="0" domain="primary_n01" exclusive="0" max_restarts="2" 
name="vm01-rhel6" path="/shared/definitions/" recovery="restart" 
restart_expire_time="600">
   <action name="stop" timeout="10m"/>
</vm>

Then I repeated the test:

an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
Wed Mar 19 21:13:18 EDT 2014
Local machine disabling vm:vm01-rhel6...Success
Wed Mar 19 21:23:31 EDT 2014

10 minutes and 13 seconds before the cluster killed the server, much 
less likely to interrupt a in-progress OS update (truth be told, I plan 
to set 30 minutes.

I understand that this blocks other processes, but in an HA environment, 
I'd strongly argue that safe > speed.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From morpheus.ibis at gmail.com  Thu Mar 20 02:12:08 2014
From: morpheus.ibis at gmail.com (Pavel Herrmann)
Date: Thu, 20 Mar 2014 03:12:08 +0100
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <532A43E0.6030603@alteeve.ca>
References: <5328F268.9080605@alteeve.ca> <532A2C34.3080903@alteeve.ca>
	<532A43E0.6030603@alteeve.ca>
Message-ID: <1857777.TcdlaGUVy9@bloomfield>

Hi

On Wednesday 19 of March 2014 21:26:56 Digimer wrote:
> On 19/03/14 07:45 PM, Digimer wrote:
> > On 19/03/14 06:31 PM, Chris Feist wrote:
> >> On 03/18/2014 08:27 PM, Digimer wrote:
> >>> Hi all,
> >>> 
> >>>    I would like to tell rgmanager to give more time for VMs to stop. I
> >>> 
> >>> want this:
> >>> 
> >>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
> >>> path="/shared/definitions/" exclusive="0" recovery="restart"
> >>> max_restarts="2"
> >>> restart_expire_time="600">
> >>> 
> >>>    <action name="stop" timeout="10m" />
> >>> 
> >>> </vm>
> >>> 
> >>> I already use ccs to create the entry:
> >>> 
> >>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
> >>> path="/shared/definitions/" exclusive="0" recovery="restart"
> >>> max_restarts="2"
> >>> restart_expire_time="600"/>
> >>> 
> >>> via:
> >>> 
> >>> ccs -h localhost --activate --sync --password "secret" \
> >>> 
> >>>   --addvm vm01-win2008 \
> >>>   --domain="primary_n01" \
> >>>   path="/shared/definitions/" \
> >>>   autostart="0" \
> >>>   exclusive="0" \
> >>>   recovery="restart" \
> >>>   max_restarts="2" \
> >>>   restart_expire_time="600"
> >>> 
> >>> I'm hoping it's a simple additional switch. :)
> >> 
> >> Unfortunately currently ccs doesn't support setting resource actions.
> >> However it's my understanding that rgmanager doesn't check timeouts
> >> unless __enforce_timeouts is set to "1".  So you shouldn't be seeing a
> >> vm resource go to failed if it takes a long time to stop.  Are you
> >> trying to make the vm resource fail if it takes longer than 10 minutes
> >> to stop?
> > 
> > I was afraid you were going to say that. :(
> > 
> > The problem is that after calling 'disable' against the VM service,
> > rgmanager waits two minutes. If the service isn't closed in that time,
> > the server is forced off (at least, this was the behaviour when I last
> > tested this).
> > 
> > The concern is that, by default, windows installs queue updates to
> > install when the system shuts down. During this time, windows makes it
> > very clear that you should not power off the system during the updates.
> > So if this timer is hit, and the VM is forced off, the guest OS can be
> > damaged.
> > 
> > Of course, we can debate the (lack of) wisdom of this behaviour, and I
> > already document this concern (and even warn people to check for updates
> > before stopping the server), it's not sufficient. If a user doesn't read
> > the warning, or simply forgets to check, the consequences can be
> > non-trivial.
> > 
> > If ccs can't be made to add this attribute, and if the behaviour
> > persists (I will test shortly after sending this reply), then I will
> > have to edit the cluster.conf directly, something I am loath to do if at
> > all avoidable.
> > 
> > Cheers
> 
> Confirmed;
> 
> I called disable on a VM with gnome running, so that I could abort the
> VM's shut down.
> 
> an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
> Wed Mar 19 21:06:29 EDT 2014
> Local machine disabling vm:vm01-rhel6...Success
> Wed Mar 19 21:08:36 EDT 2014
> 
> 2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been
> a windows guest in the middle of installing updates, it would be highly
> likely to be screwed now.

Is this really the best way to handle such an event?

>From what I remember, Windows can (or could, I don't have any 'modern' windows 
laying around) be told to shutdown without updating. maybe a wiser approach 
would be to make the stop event (which I believe is delivered to the guest as 
pressing the ACPI power button) trigger a shutdown without updates.

keep in mind that doing system updates on timer is dangerous, irrelevant of 
the actual time

regards
Pavel Herrmann

> To confirm, I changed the config to:
> 
> <vm autostart="0" domain="primary_n01" exclusive="0" max_restarts="2"
> name="vm01-rhel6" path="/shared/definitions/" recovery="restart"
> restart_expire_time="600">
>    <action name="stop" timeout="10m"/>
> </vm>
> 
> Then I repeated the test:
> 
> an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
> Wed Mar 19 21:13:18 EDT 2014
> Local machine disabling vm:vm01-rhel6...Success
> Wed Mar 19 21:23:31 EDT 2014
> 
> 10 minutes and 13 seconds before the cluster killed the server, much
> less likely to interrupt a in-progress OS update (truth be told, I plan
> to set 30 minutes.
> 
> I understand that this blocks other processes, but in an HA environment,
> I'd strongly argue that safe > speed.
> 
> digimer



From lists at alteeve.ca  Thu Mar 20 02:35:55 2014
From: lists at alteeve.ca (Digimer)
Date: Wed, 19 Mar 2014 22:35:55 -0400
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <1857777.TcdlaGUVy9@bloomfield>
References: <5328F268.9080605@alteeve.ca> <532A2C34.3080903@alteeve.ca>
	<532A43E0.6030603@alteeve.ca> <1857777.TcdlaGUVy9@bloomfield>
Message-ID: <532A540B.8070104@alteeve.ca>

On 19/03/14 10:12 PM, Pavel Herrmann wrote:
> Hi
>
> On Wednesday 19 of March 2014 21:26:56 Digimer wrote:
>> On 19/03/14 07:45 PM, Digimer wrote:
>>> On 19/03/14 06:31 PM, Chris Feist wrote:
>>>> On 03/18/2014 08:27 PM, Digimer wrote:
>>>>> Hi all,
>>>>>
>>>>>     I would like to tell rgmanager to give more time for VMs to stop. I
>>>>>
>>>>> want this:
>>>>>
>>>>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>>>>> path="/shared/definitions/" exclusive="0" recovery="restart"
>>>>> max_restarts="2"
>>>>> restart_expire_time="600">
>>>>>
>>>>>     <action name="stop" timeout="10m" />
>>>>>
>>>>> </vm>
>>>>>
>>>>> I already use ccs to create the entry:
>>>>>
>>>>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>>>>> path="/shared/definitions/" exclusive="0" recovery="restart"
>>>>> max_restarts="2"
>>>>> restart_expire_time="600"/>
>>>>>
>>>>> via:
>>>>>
>>>>> ccs -h localhost --activate --sync --password "secret" \
>>>>>
>>>>>    --addvm vm01-win2008 \
>>>>>    --domain="primary_n01" \
>>>>>    path="/shared/definitions/" \
>>>>>    autostart="0" \
>>>>>    exclusive="0" \
>>>>>    recovery="restart" \
>>>>>    max_restarts="2" \
>>>>>    restart_expire_time="600"
>>>>>
>>>>> I'm hoping it's a simple additional switch. :)
>>>>
>>>> Unfortunately currently ccs doesn't support setting resource actions.
>>>> However it's my understanding that rgmanager doesn't check timeouts
>>>> unless __enforce_timeouts is set to "1".  So you shouldn't be seeing a
>>>> vm resource go to failed if it takes a long time to stop.  Are you
>>>> trying to make the vm resource fail if it takes longer than 10 minutes
>>>> to stop?
>>>
>>> I was afraid you were going to say that. :(
>>>
>>> The problem is that after calling 'disable' against the VM service,
>>> rgmanager waits two minutes. If the service isn't closed in that time,
>>> the server is forced off (at least, this was the behaviour when I last
>>> tested this).
>>>
>>> The concern is that, by default, windows installs queue updates to
>>> install when the system shuts down. During this time, windows makes it
>>> very clear that you should not power off the system during the updates.
>>> So if this timer is hit, and the VM is forced off, the guest OS can be
>>> damaged.
>>>
>>> Of course, we can debate the (lack of) wisdom of this behaviour, and I
>>> already document this concern (and even warn people to check for updates
>>> before stopping the server), it's not sufficient. If a user doesn't read
>>> the warning, or simply forgets to check, the consequences can be
>>> non-trivial.
>>>
>>> If ccs can't be made to add this attribute, and if the behaviour
>>> persists (I will test shortly after sending this reply), then I will
>>> have to edit the cluster.conf directly, something I am loath to do if at
>>> all avoidable.
>>>
>>> Cheers
>>
>> Confirmed;
>>
>> I called disable on a VM with gnome running, so that I could abort the
>> VM's shut down.
>>
>> an-c05n01:~# date; clusvcadm -d vm:vm01-rhel6; date
>> Wed Mar 19 21:06:29 EDT 2014
>> Local machine disabling vm:vm01-rhel6...Success
>> Wed Mar 19 21:08:36 EDT 2014
>>
>> 2 minutes and 7 seconds, then rgmanager forced-off the VM. Had this been
>> a windows guest in the middle of installing updates, it would be highly
>> likely to be screwed now.
>
> Is this really the best way to handle such an event?
>
>  From what I remember, Windows can (or could, I don't have any 'modern' windows
> laying around) be told to shutdown without updating. maybe a wiser approach
> would be to make the stop event (which I believe is delivered to the guest as
> pressing the ACPI power button) trigger a shutdown without updates.
>
> keep in mind that doing system updates on timer is dangerous, irrelevant of
> the actual time
>
> regards
> Pavel Herrmann

This assumes that we can modify how windows behaves. Unless there is a 
magic ACPI event that windows will reliably interpret as "power off 
without updating", we can't rely on this.

We have clients (and I am sure we aren't the only ones) who install 
their own OSes without any input from us. As mentioned earlier, we do 
document the risks, but that's not good enough. We can't force users to 
read.

So we have a choice; Take mitigating steps or let the user shoot 
themselves in the foot "because they should have known better". As 
personally satisfying as option #2 might seem, option #1 is the more 
professional approach, I would _strongly_ argue.

digimer

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From lists at alteeve.ca  Thu Mar 20 19:31:22 2014
From: lists at alteeve.ca (Digimer)
Date: Thu, 20 Mar 2014 15:31:22 -0400
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <5328F268.9080605@alteeve.ca>
References: <5328F268.9080605@alteeve.ca>
Message-ID: <532B420A.5060606@alteeve.ca>

On 18/03/14 09:27 PM, Digimer wrote:
> Hi all,
>
>    I would like to tell rgmanager to give more time for VMs to stop. I
> want this:
>
> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
> path="/shared/definitions/" exclusive="0" recovery="restart"
> max_restarts="2" restart_expire_time="600">
>    <action name="stop" timeout="10m" />
> </vm>
>
> I already use ccs to create the entry:
>
> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
> path="/shared/definitions/" exclusive="0" recovery="restart"
> max_restarts="2" restart_expire_time="600"/>
>
> via:
>
> ccs -h localhost --activate --sync --password "secret" \
>   --addvm vm01-win2008 \
>   --domain="primary_n01" \
>   path="/shared/definitions/" \
>   autostart="0" \
>   exclusive="0" \
>   recovery="restart" \
>   max_restarts="2" \
>   restart_expire_time="600"
>
> I'm hoping it's a simple additional switch. :)
>
> Thanks!

As per the request on #linux-cluster, I have opened a rhbz for this:

https://bugzilla.redhat.com/show_bug.cgi?id=1079032

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From lists at alteeve.ca  Thu Mar 20 20:06:14 2014
From: lists at alteeve.ca (Digimer)
Date: Thu, 20 Mar 2014 16:06:14 -0400
Subject: [Linux-cluster] Adding a stop timeout to a VM service using
	'ccs'
In-Reply-To: <532B420A.5060606@alteeve.ca>
References: <5328F268.9080605@alteeve.ca> <532B420A.5060606@alteeve.ca>
Message-ID: <532B4A36.90703@alteeve.ca>

On 20/03/14 03:31 PM, Digimer wrote:
> On 18/03/14 09:27 PM, Digimer wrote:
>> Hi all,
>>
>>    I would like to tell rgmanager to give more time for VMs to stop. I
>> want this:
>>
>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>> path="/shared/definitions/" exclusive="0" recovery="restart"
>> max_restarts="2" restart_expire_time="600">
>>    <action name="stop" timeout="10m" />
>> </vm>
>>
>> I already use ccs to create the entry:
>>
>> <vm name="vm01-win2008" domain="primary_n01" autostart="0"
>> path="/shared/definitions/" exclusive="0" recovery="restart"
>> max_restarts="2" restart_expire_time="600"/>
>>
>> via:
>>
>> ccs -h localhost --activate --sync --password "secret" \
>>   --addvm vm01-win2008 \
>>   --domain="primary_n01" \
>>   path="/shared/definitions/" \
>>   autostart="0" \
>>   exclusive="0" \
>>   recovery="restart" \
>>   max_restarts="2" \
>>   restart_expire_time="600"
>>
>> I'm hoping it's a simple additional switch. :)
>>
>> Thanks!
>
> As per the request on #linux-cluster, I have opened a rhbz for this:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1079032

Split the rgmanager section out:

https://bugzilla.redhat.com/show_bug.cgi?id=1079039

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From pine5514 at gmail.com  Sat Mar 22 20:13:06 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Sun, 23 Mar 2014 00:43:06 +0430
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>
	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>
	<CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>
	<518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7V8_dwaGUkaswJZ3HRLT3AAm=AW8T6-Pe8U1z2kzfe-GA@mail.gmail.com>

Good news for all :
I successfully recoved all of my data(1.5TB) without even one bit lost!
my program tooks only 1 hour to do all the jobs on my 1.7 TB
partition.(I could not wait 100 days for my bash script to finish).

I will publish my  source code very soon for the public use.

Special thanks to Bob for the help.

Mr.Pine

On Wed, Mar 19, 2014 at 4:58 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> Hi,
>>
>> Scripts is very very slow, so i should write program in c/c++.
>>
>> I need some confidence about data structures and data location on disk.
>> As i reviewed blocks of data:
>>
>> All reserved blocks (GFS2 specific blocks) start by : 0x01161970
>> Blocktype store location is at Byte # 8,
>> Type of start block of each resource group is: 2
>> Bitmaps are in block types 2 & 3.
>> In block type 2, bitmap info starts from Byte # 129
>> In block type 3, bitmap info starts from Byte # 25
>> Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex
>> /dev/..)
>>
>> Is this info right?
>>
>> Logic of my program seams should be like this:
>>
>> (1)
>> Loop in device and temporary store block id of dinode blocks, and also
>> their bitmap locations
>>
>> (2)
>> Change bitmap of blocks to 3 (11)
>>
>> Bob, could you confirm this?
>>
>> Regards
>> Pine.
>
> Hi Pine,
>
> This is correct. The length of RGs is properly determined by the values
> in the "rindex" system file, but 5 is very common, and is usually constant.
> (It may change if you used gfs2_grow or gfs2_convert from gfs1).
> The bitmap is 2 bits per block in the resource group, and it's relative
> to the start of the particular rgrp. You should probably use the same
> algorithm in libgfs2 to change the proper bit in the bitmaps. You can
> get this from the public gfs2-utils git tree.
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From lists at alteeve.ca  Sun Mar 23 01:34:44 2014
From: lists at alteeve.ca (Digimer)
Date: Sat, 22 Mar 2014 21:34:44 -0400
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7V8_dwaGUkaswJZ3HRLT3AAm=AW8T6-Pe8U1z2kzfe-GA@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>	<1958363147.1528636.1395165348118.JavaMail.zimbra@redhat.com>	<CAJ7bT7VKb7w2rdv827WV7EnSKQ-uDckPyn=6AYZBLpxVvXjPug@mail.gmail.com>	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>	<CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>	<CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>	<518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com>
	<CAJ7bT7V8_dwaGUkaswJZ3HRLT3AAm=AW8T6-Pe8U1z2kzfe-GA@mail.gmail.com>
Message-ID: <532E3A34.6050107@alteeve.ca>

That is very good news! Now, about your backups... ;)

Look forward to seeing your code!

digimer

On 22/03/14 04:13 PM, Mr.Pine wrote:
> Good news for all :
> I successfully recoved all of my data(1.5TB) without even one bit lost!
> my program tooks only 1 hour to do all the jobs on my 1.7 TB
> partition.(I could not wait 100 days for my bash script to finish).
>
> I will publish my  source code very soon for the public use.
>
> Special thanks to Bob for the help.
>
> Mr.Pine
>
> On Wed, Mar 19, 2014 at 4:58 PM, Bob Peterson <rpeterso at redhat.com> wrote:
>> ----- Original Message -----
>>> Hi,
>>>
>>> Scripts is very very slow, so i should write program in c/c++.
>>>
>>> I need some confidence about data structures and data location on disk.
>>> As i reviewed blocks of data:
>>>
>>> All reserved blocks (GFS2 specific blocks) start by : 0x01161970
>>> Blocktype store location is at Byte # 8,
>>> Type of start block of each resource group is: 2
>>> Bitmaps are in block types 2 & 3.
>>> In block type 2, bitmap info starts from Byte # 129
>>> In block type 3, bitmap info starts from Byte # 25
>>> Length of RGs are const, 5 in my volume (out put of gfs2_edit -p rindex
>>> /dev/..)
>>>
>>> Is this info right?
>>>
>>> Logic of my program seams should be like this:
>>>
>>> (1)
>>> Loop in device and temporary store block id of dinode blocks, and also
>>> their bitmap locations
>>>
>>> (2)
>>> Change bitmap of blocks to 3 (11)
>>>
>>> Bob, could you confirm this?
>>>
>>> Regards
>>> Pine.
>>
>> Hi Pine,
>>
>> This is correct. The length of RGs is properly determined by the values
>> in the "rindex" system file, but 5 is very common, and is usually constant.
>> (It may change if you used gfs2_grow or gfs2_convert from gfs1).
>> The bitmap is 2 bits per block in the resource group, and it's relative
>> to the start of the particular rgrp. You should probably use the same
>> algorithm in libgfs2 to change the proper bit in the bitmaps. You can
>> get this from the public gfs2-utils git tree.
>>
>> Regards,
>>
>> Bob Peterson
>> Red Hat File Systems
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From rpeterso at redhat.com  Mon Mar 24 12:19:58 2014
From: rpeterso at redhat.com (Bob Peterson)
Date: Mon, 24 Mar 2014 08:19:58 -0400 (EDT)
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <CAJ7bT7V8_dwaGUkaswJZ3HRLT3AAm=AW8T6-Pe8U1z2kzfe-GA@mail.gmail.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>
	<CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>
	<518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com>
	<CAJ7bT7V8_dwaGUkaswJZ3HRLT3AAm=AW8T6-Pe8U1z2kzfe-GA@mail.gmail.com>
Message-ID: <25322364.403317.1395663598176.JavaMail.zimbra@redhat.com>

----- Original Message -----
> Good news for all :
> I successfully recoved all of my data(1.5TB) without even one bit lost!
> my program tooks only 1 hour to do all the jobs on my 1.7 TB
> partition.(I could not wait 100 days for my bash script to finish).
> 
> I will publish my  source code very soon for the public use.
> 
> Special thanks to Bob for the help.
> 
> Mr.Pine

Hi Mr. Pine,

I'm glad I could help. Perhaps when you post your program, we can
somehow incorporate it into "gfs2_edit unformat" tool or something.
I assume you ran fsck.gfs2 after the program, right?

Regards,

Bob Peterson
Red Hat File Systems



From pine5514 at gmail.com  Mon Mar 24 16:19:26 2014
From: pine5514 at gmail.com (Mr.Pine)
Date: Mon, 24 Mar 2014 20:49:26 +0430
Subject: [Linux-cluster] unformat gfs2
In-Reply-To: <25322364.403317.1395663598176.JavaMail.zimbra@redhat.com>
References: <CAJ7bT7V+2gvZqbUXa3bZ2cwawEwikDy9SH3Hpf_ju87HKdYFnw@mail.gmail.com>
	<1499819462.1547710.1395166939684.JavaMail.zimbra@redhat.com>
	<CAJ7bT7W9WB8u8vceACqoNx9QP0ua7PHKvOoYw_VUVtyMgqi68w@mail.gmail.com>
	<105078529.1554190.1395167768555.JavaMail.zimbra@redhat.com>
	<CAJ7bT7WRCv_3h5ZBC49SxbWYOmjTVs2X2=tM1kd6_RUjzWAx0A@mail.gmail.com>
	<CAJ7bT7VnUntai30CgkoJOS165rf=tRhm3KRdaUsgbrJ2u+Z=+g@mail.gmail.com>
	<518413955.1970349.1395232118989.JavaMail.zimbra@redhat.com>
	<CAJ7bT7V8_dwaGUkaswJZ3HRLT3AAm=AW8T6-Pe8U1z2kzfe-GA@mail.gmail.com>
	<25322364.403317.1395663598176.JavaMail.zimbra@redhat.com>
Message-ID: <CAJ7bT7XjmEAj9pc_zO7Y10yqg9quksZZDWq78j3ZJ-hHcxXAFg@mail.gmail.com>

Hi Bob
Very good Idea ... It can be useful for many users.
Yes your right. I ran fsck.gfs2 and It took less than a hour to fix filesystem.

Mr.Pine

On Mon, Mar 24, 2014 at 4:49 PM, Bob Peterson <rpeterso at redhat.com> wrote:
> ----- Original Message -----
>> Good news for all :
>> I successfully recoved all of my data(1.5TB) without even one bit lost!
>> my program tooks only 1 hour to do all the jobs on my 1.7 TB
>> partition.(I could not wait 100 days for my bash script to finish).
>>
>> I will publish my  source code very soon for the public use.
>>
>> Special thanks to Bob for the help.
>>
>> Mr.Pine
>
> Hi Mr. Pine,
>
> I'm glad I could help. Perhaps when you post your program, we can
> somehow incorporate it into "gfs2_edit unformat" tool or something.
> I assume you ran fsck.gfs2 after the program, right?
>
> Regards,
>
> Bob Peterson
> Red Hat File Systems
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster



From lipson12 at yahoo.com  Thu Mar 27 04:12:22 2014
From: lipson12 at yahoo.com (Kaisar Ahmed Khan)
Date: Wed, 26 Mar 2014 21:12:22 -0700 (PDT)
Subject: [Linux-cluster] iscsi sysmlink create problem
In-Reply-To: <53204CA8.2060300@redhat.com>
References: <1394605577.86561.YahooMailNeo@web141205.mail.bf1.yahoo.com>	<1394605887.60258.YahooMailNeo@web141203.mail.bf1.yahoo.com>	<CAE7pJ3Ah==ffS_suuazua4RXLJ81z-xrK3QySo4RT0jtL4bNhA@mail.gmail.com>
	<53204CA8.2060300@redhat.com>
Message-ID: <1395893542.35997.YahooMailNeo@web141202.mail.bf1.yahoo.com>



?
Elvir Kuric,

My OS version is RHEL 6.2 , i just want to create Symlink for iscsi disk /dev/sda? with udev rules . i have done same thing in rhel 5.4.

thanks
kaisar 




On Wednesday, March 12, 2014 6:12 PM, Elvir Kuric <ekuric at redhat.com> wrote:
 
On 03/12/2014 12:43 PM, emmanuel segura wrote:

but it's a cluster problem? ummm
>
>
>
>
>2014-03-12 7:31 GMT+01:00 Kaisar Ahmed Khan <lipson12 at yahoo.com>:
>
>
>>
>>
>>
>>Dear Experts :
>>
>>
>>
>>?following rule is not working to create symlink for iscsi disk , my iscsi device ID /dev/sda and want to link as /dev/iscsi/vendor_kernel
>>
>>please guide me if i miss anything .
>>
>>
>>ACTION=="add", SUBSYSTEM=="block", ENV{ID_MODEL}=="VIRTUAL-DISK", SYMLINK+=iscsi/%E{ID_VENDOR}_%K", MODE="0664"
>>
>>Thanks 
>>?kaisar
>>
>>
>>--
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
>-- 
>esta es mi vida e me la vivo hasta que dios quiera 
>
>if you? share more information with us ( os version ) it could help.

Also below outputs can help to understand how system see device 

if rhel 5? ( and clones ) 

#udevinfo -a -p $(udevinfo -q path -n /dev/DEVICE)  


if rhel 6 ( and clones ) 

#  
udevadm info --query=all -n /dev/DEVICE --attribute-walk

where DEVICE is device you want to write udev rule for. 

Kind regards, 

-- 
Elvir Kuric,TSE / Red Hat / GSS EMEA / 

-- 
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140326/d7c2f590/attachment.htm>

From eduar47 at gmail.com  Fri Mar 28 06:06:44 2014
From: eduar47 at gmail.com (Eduar Arley)
Date: Fri, 28 Mar 2014 01:06:44 -0500
Subject: [Linux-cluster] IP address for replies
Message-ID: <CAHqHMet2Yi0voxQEe2koSa2-4L2MFsk2JJwzwtq_U7NuTh2z0w@mail.gmail.com>

Hello everyone.

I have an HA Cluster in CentOS 6, using CMAN + RGManager (the
supported and 'official' stack in CentOS 6.5). Everything works OK;
however, there is a thing that could give me problems in the future,
so I would like thinking right now how to solve it.

When an incoming packet comes to my cluster (through a floating IP
address), mi active node receives it OK; however, it replies from his
'real' IP address, not from the floating IP. As i'm deploying SIP in
this cluster, maybe some provider in the future could dislike this IP
and reject my calls.

I've read heartbeat have some functionality to fix this, called
IPSrcAddr; however I don't see a similar resource in Conga Web
Interface or in Red Hat documentation. In other websites, I've read
about a workaround involving IP routing rules tables, but I don't
think this is an optimal solution.

Anybody knows a way to fix this in my scenario?

Thanks!


Eduar Cardona



From christian.masopust at siemens.com  Fri Mar 28 07:24:00 2014
From: christian.masopust at siemens.com (Masopust, Christian)
Date: Fri, 28 Mar 2014 07:24:00 +0000
Subject: [Linux-cluster] IP address for replies
In-Reply-To: <CAHqHMet2Yi0voxQEe2koSa2-4L2MFsk2JJwzwtq_U7NuTh2z0w@mail.gmail.com>
References: <CAHqHMet2Yi0voxQEe2koSa2-4L2MFsk2JJwzwtq_U7NuTh2z0w@mail.gmail.com>
Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net>

Hi Eduar,

I have both configurations running on my systems, some with IPsrcaddr and
one with iptables.
I did the first with iptables only because it was my first cluster and was
not aware about IPsrcaddr and was in a hurry to get the cluster up and running :)

Anyway, both solutions work fine on my CentOS 6.x systems. But I don't use any
Web interface for configuration.

br,
christian

-----Urspr?ngliche Nachricht-----
Von: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von Eduar Arley
Gesendet: Freitag, 28. M?rz 2014 07:07
An: linux-cluster at redhat.com
Betreff: [Linux-cluster] IP address for replies

Hello everyone.

I have an HA Cluster in CentOS 6, using CMAN + RGManager (the supported and 'official' stack in CentOS 6.5). Everything works OK; however, there is a thing that could give me problems in the future, so I would like thinking right now how to solve it.

When an incoming packet comes to my cluster (through a floating IP address), mi active node receives it OK; however, it replies from his 'real' IP address, not from the floating IP. As i'm deploying SIP in this cluster, maybe some provider in the future could dislike this IP and reject my calls.

I've read heartbeat have some functionality to fix this, called IPSrcAddr; however I don't see a similar resource in Conga Web Interface or in Red Hat documentation. In other websites, I've read about a workaround involving IP routing rules tables, but I don't think this is an optimal solution.

Anybody knows a way to fix this in my scenario?

Thanks!


Eduar Cardona

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



From eduar47 at gmail.com  Fri Mar 28 12:40:16 2014
From: eduar47 at gmail.com (Eduar Arley)
Date: Fri, 28 Mar 2014 07:40:16 -0500
Subject: [Linux-cluster] IP address for replies
In-Reply-To: <7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net>
References: <CAHqHMet2Yi0voxQEe2koSa2-4L2MFsk2JJwzwtq_U7NuTh2z0w@mail.gmail.com>
	<7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net>
Message-ID: <CAHqHMev0g3m-TgGRFuDTcB8wm-X+wG3ByH4WC1n=PrzHOGGBnQ@mail.gmail.com>

2014-03-28 2:24 GMT-05:00 Masopust, Christian <christian.masopust at siemens.com>:
> Hi Eduar,
>
> I have both configurations running on my systems, some with IPsrcaddr and
> one with iptables.
> I did the first with iptables only because it was my first cluster and was
> not aware about IPsrcaddr and was in a hurry to get the cluster up and running :)
>
> Anyway, both solutions work fine on my CentOS 6.x systems. But I don't use any
> Web interface for configuration.
>
> br,
> christian
>
> -----Urspr?ngliche Nachricht-----
> Von: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] Im Auftrag von Eduar Arley
> Gesendet: Freitag, 28. M?rz 2014 07:07
> An: linux-cluster at redhat.com
> Betreff: [Linux-cluster] IP address for replies
>
> Hello everyone.
>
> I have an HA Cluster in CentOS 6, using CMAN + RGManager (the supported and 'official' stack in CentOS 6.5). Everything works OK; however, there is a thing that could give me problems in the future, so I would like thinking right now how to solve it.
>
> When an incoming packet comes to my cluster (through a floating IP address), mi active node receives it OK; however, it replies from his 'real' IP address, not from the floating IP. As i'm deploying SIP in this cluster, maybe some provider in the future could dislike this IP and reject my calls.
>
> I've read heartbeat have some functionality to fix this, called IPSrcAddr; however I don't see a similar resource in Conga Web Interface or in Red Hat documentation. In other websites, I've read about a workaround involving IP routing rules tables, but I don't think this is an optimal solution.
>
> Anybody knows a way to fix this in my scenario?
>
> Thanks!
>
>
> Eduar Cardona
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster


Hello Christian, thanks for your advice.

Could you please share here the relevant sections of your cluster.conf
file for IPSrcAddr? I currently use Conga web interface, but I can
directly modify the config file if needed.

Thanks!


Eduar Cardona



From bergman at merctech.com  Fri Mar 28 16:37:17 2014
From: bergman at merctech.com (bergman at merctech.com)
Date: Fri, 28 Mar 2014 12:37:17 -0400
Subject: [Linux-cluster] mixing OS versions?
Message-ID: <12440.1396024637@localhost>



I've got a 3-node cluster under CentOS5.

I'd like to add 3 additional nodes, running CentOS6.

Are there any known issues, guidelines, or recommendations for having
a single RHCS cluster with different OS releases on the nodes?

Thanks,

Mark



From fdinitto at redhat.com  Fri Mar 28 19:31:38 2014
From: fdinitto at redhat.com (Fabio M. Di Nitto)
Date: Fri, 28 Mar 2014 20:31:38 +0100
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <12440.1396024637@localhost>
References: <12440.1396024637@localhost>
Message-ID: <5335CE1A.3060509@redhat.com>

On 03/28/2014 05:37 PM, bergman at merctech.com wrote:
> 
> 
> I've got a 3-node cluster under CentOS5.
> 
> I'd like to add 3 additional nodes, running CentOS6.
> 
> Are there any known issues, guidelines, or recommendations for having
> a single RHCS cluster with different OS releases on the nodes?

Only one answer.. don't do it. It's not supported and it's only asking
for troubles.

Fabio



From washer at trlp.com  Fri Mar 28 20:35:20 2014
From: washer at trlp.com (James Washer)
Date: Fri, 28 Mar 2014 13:35:20 -0700
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <5335CE1A.3060509@redhat.com>
References: <12440.1396024637@localhost>
	<5335CE1A.3060509@redhat.com>
Message-ID: <CAO=CEwFcwVaHW1zcTDNaxc_Gi+7eV2rAM7WgeCNXPt=1Sr2cgg@mail.gmail.com>

You can get by, for a short time, with a minor revision difference, say 5.7
and 5.8, but, mixing 5 and 6 will not work. Period


On Fri, Mar 28, 2014 at 12:31 PM, Fabio M. Di Nitto <fdinitto at redhat.com>wrote:

> On 03/28/2014 05:37 PM, bergman at merctech.com wrote:
> >
> >
> > I've got a 3-node cluster under CentOS5.
> >
> > I'd like to add 3 additional nodes, running CentOS6.
> >
> > Are there any known issues, guidelines, or recommendations for having
> > a single RHCS cluster with different OS releases on the nodes?
>
> Only one answer.. don't do it. It's not supported and it's only asking
> for troubles.
>
> Fabio
>
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>



-- 


 - jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140328/199fae49/attachment.htm>

From ajb2 at mssl.ucl.ac.uk  Fri Mar 28 22:07:48 2014
From: ajb2 at mssl.ucl.ac.uk (Alan Brown)
Date: Fri, 28 Mar 2014 22:07:48 +0000
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <5335CE1A.3060509@redhat.com>
References: <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com>
Message-ID: <5335F2B4.6080605@mssl.ucl.ac.uk>

On 28/03/14 19:31, Fabio M. Di Nitto wrote:
>
> Are there any known issues, guidelines, or recommendations for having
> a single RHCS cluster with different OS releases on the nodes?
> Only one answer.. don't do it. It's not supported and it's only asking
> for troubles.
>
>

Seconded. There are _substantial_ differences between Centos/RHEL 5 and 
6 clustering.

You can run one or the other OS, but you can't mix them. The on-disk 
format isn't affected.

Best path is to setup a cluster in 6, shut down the 5 cluster, attach 
disks to the 6 cluster and bring it all back up. The 5 boxes can be 
converted to version 6 afterwards.

(I'm going through this at the moment, as I have 2 EL5 clusters and 1 
EL6 cluster.)

TAKE NOTE:  RHEL/CentOS6 clustering is not quite ready for prime-time - 
if you enable GFS2 quotas and someone busts his quota the machine will 
panic.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20140328/dfeecd32/attachment.htm>

From bergman at merctech.com  Fri Mar 28 22:35:42 2014
From: bergman at merctech.com (bergman at merctech.com)
Date: Fri, 28 Mar 2014 18:35:42 -0400
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: Your message of "Fri,
	28 Mar 2014 22:07:48 -0000." <5335F2B4.6080605@mssl.ucl.ac.uk>
References: <5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost>
	<5335CE1A.3060509@redhat.com>
Message-ID: <6022.1396046142@localhost>



In the message dated: Fri, 28 Mar 2014 22:07:48 -0000,
The pithy ruminations from Alan Brown on 
<Re: [Linux-cluster] mixing OS versions?> were:
=> On 28/03/14 19:31, Fabio M. Di Nitto wrote:
=> >
=> > Are there any known issues, guidelines, or recommendations for having
=> > a single RHCS cluster with different OS releases on the nodes?
=> > Only one answer.. don't do it. It's not supported and it's only asking
=> > for troubles.

Thanks for all the warnings...not what I wanted to hear, but it's good
to get a clear, consistent message.

=> >
=> >
=> 
=> Seconded. There are _substantial_ differences between Centos/RHEL 5 and 
=> 6 clustering.
=> 
=> You can run one or the other OS, but you can't mix them. The on-disk 
=> format isn't affected.

For clarification, we're not using RHCS to manange any shared storage. The
only 'disk' component is the quorum disk.

We're using GPFS as the storage layer.

RHCS manages several services, such as:

	httpd
	mysql
	nis
	pgsql

=> 
=> Best path is to setup a cluster in 6, shut down the 5 cluster, attach 
=> disks to the 6 cluster and bring it all back up. The 5 boxes can be 
=> converted to version 6 afterwards.

That's what I was expecting, unfortunately.

I'll probably do a more gradual approach...bring up a CentOS6 cluster
with it's own quorum disk, and one-by-one add services (httpd, nis,
etc.) to that, bringing them down on the old cluster. Add in some CNAMES
and coordination with the network group and it should be relatively
transparent to the users.
=> 
=> (I'm going through this at the moment, as I have 2 EL5 clusters and 1 
=> EL6 cluster.)
=> 
=> TAKE NOTE:  RHEL/CentOS6 clustering is not quite ready for prime-time - 
=> if you enable GFS2 quotas and someone busts his quota the machine will 
=> panic.

That's an example of why I no longer use GFS2. :)

Thanks,

Mark

=> 
=> 
=> 
=> 
=> -- 
=> Linux-cluster mailing list
=> Linux-cluster at redhat.com
=> https://www.redhat.com/mailman/listinfo/linux-cluster



From christian.masopust at siemens.com  Sat Mar 29 09:00:04 2014
From: christian.masopust at siemens.com (Masopust, Christian)
Date: Sat, 29 Mar 2014 09:00:04 +0000
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <6022.1396046142@localhost>
References: <5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost>
	<5335CE1A.3060509@redhat.com> <6022.1396046142@localhost>
Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net>

> => 
> => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - 
> => if you enable GFS2 quotas and someone busts his quota the machine will => panic.
> 
> That's an example of why I no longer use GFS2. :)
> 
> Thanks,
> 
> Mark

Hi Mark,

what instead of GFS2 ?

br,
christian




From bergman at merctech.com  Sat Mar 29 17:30:41 2014
From: bergman at merctech.com (bergman at merctech.com)
Date: Sat, 29 Mar 2014 13:30:41 -0400
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: Your message of "Sat, 29 Mar 2014 09:00:04 -0000."
	<7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net>
References: <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net>
	<5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost>
	<5335CE1A.3060509@redhat.com> <6022.1396046142@localhost>
Message-ID: <22237.1396114241@localhost>



In the message dated: Sat, 29 Mar 2014 09:00:04 -0000,
The pithy ruminations from "Masopust, Christian" on 
<AW: [Linux-cluster] mixing OS versions?> were:
=> > => 
=> > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - 
=> > => if you enable GFS2 quotas and someone busts his quota the machine will => panic.
=> > 
=> > That's an example of why I no longer use GFS2. :)
=> > 
=> > Thanks,
=> > 
=> > Mark
=> 
=> Hi Mark,
=> 
=> what instead of GFS2 ?

GPFS, as I wrote in the message to which you replied:


-------------------------------------------
From: bergman at merctech.com
To: linux clustering <linux-cluster at redhat.com>
Subject: Re: [Linux-cluster] mixing OS versions?
Date: Fri, 28 Mar 2014 18:35:42 -0400

[SNIP!]

For clarification, we're not using RHCS to manange any shared storage. The
only 'disk' component is the quorum disk.

We're using GPFS as the storage layer.

-------------------------------------------

=> 
=> br,
=> christian
=> 
=> 

-- 
Mark Bergman 



From christian.masopust at siemens.com  Sat Mar 29 17:50:42 2014
From: christian.masopust at siemens.com (Masopust, Christian)
Date: Sat, 29 Mar 2014 17:50:42 +0000
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <22237.1396114241@localhost>
References: <7615AD3742034A45A23EFE13BE43F2ED1AC2AFD5@ATNETS9912TMSX.ww300.siemens.net>
	<5335F2B4.6080605@mssl.ucl.ac.uk> <12440.1396024637@localhost>
	<5335CE1A.3060509@redhat.com> <6022.1396046142@localhost>
	<22237.1396114241@localhost>
Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC334CE@ATNETS9912TMSX.ww300.siemens.net>


> In the message dated: Sat, 29 Mar 2014 09:00:04 -0000, The pithy ruminations from "Masopust, Christian" on
> <AW: [Linux-cluster] mixing OS versions?> were:
> => > =>
> => > => TAKE NOTE: RHEL/CentOS6 clustering is not quite ready for prime-time - => > => if you enable GFS2 quotas 
> and someone busts his quota the machine will => panic.
> => >
> => > That's an example of why I no longer use GFS2. :) => > => > Thanks, => > => > Mark => => Hi Mark, => 
> => what instead of GFS2 ?
> 
> GPFS, as I wrote in the message to which you replied:
> 

sorry, my fault... didn't notice it as GPFS has not been on my radar up to now :)




From swhiteho at redhat.com  Sun Mar 30 11:34:26 2014
From: swhiteho at redhat.com (Steven Whitehouse)
Date: Sun, 30 Mar 2014 12:34:26 +0100
Subject: [Linux-cluster] mixing OS versions?
In-Reply-To: <5335F2B4.6080605@mssl.ucl.ac.uk>
References: <12440.1396024637@localhost> <5335CE1A.3060509@redhat.com>
	<5335F2B4.6080605@mssl.ucl.ac.uk>
Message-ID: <1396179266.2659.30.camel@menhir>

Hi,

On Fri, 2014-03-28 at 22:07 +0000, Alan Brown wrote:
> On 28/03/14 19:31, Fabio M. Di Nitto wrote:
> 
> > 
> > Are there any known issues, guidelines, or recommendations for having
> > a single RHCS cluster with different OS releases on the nodes?
> > Only one answer.. don't do it. It's not supported and it's only asking
> > for troubles.
> > 
> > 
> 
> Seconded. There are _substantial_ differences between Centos/RHEL 5
> and 6 clustering.
> 
> You can run one or the other OS, but you can't mix them. The on-disk
> format isn't affected.
> 
> Best path is to setup a cluster in 6, shut down the 5 cluster, attach
> disks to the 6 cluster and bring it all back up. The 5 boxes can be
> converted to version 6 afterwards.
> 
> (I'm going through this at the moment, as I have 2 EL5 clusters and 1
> EL6 cluster.)
> 
> TAKE NOTE:  RHEL/CentOS6 clustering is not quite ready for prime-time
> - if you enable GFS2 quotas and someone busts his quota the machine
> will panic.
> 
Well that is not entirely true. We have done a great deal of
investigation into this issue. We do test quotas (among many other
things) on each release to ensure that they are working. Our tests have
all passed correctly, and to date you have provided the only report of
this particular issue via our support team. So it is certainly not
something that lots of people are hitting.

We do now have a good idea of where the issue is. However it is clear
that simply exceeding quotas is not enough to trigger it. Instead quotas
need to be exceeded in a particular way.

Abhi is working on a fix which should be available very shortly now.

Returning to the original point however, it is certainly not recommended
to have mixed RHEL or CentOS versions running in the same cluster. It is
much better to keep everything the same, even though the GFS2 on-disk
format has not changed between the versions.

I hope that answers a few questions - let us know if you need more info,

Steve.


> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




From hamid.jafarian at pdnsoft.com  Sun Mar 30 14:34:51 2014
From: hamid.jafarian at pdnsoft.com (Hamid Jafarian)
Date: Sun, 30 Mar 2014 19:04:51 +0430
Subject: [Linux-cluster] GFS2 unformat helper tool
Message-ID: <53382B8B.8060204@pdnsoft.com>

Hi,

We developed GFS2 volume unformat helper tool.
Read about this code at:
http://pdnsoft.com/en/web/pdnen/blog/-/blogs/gfs2-unformat-helper-tool-1

Regards

-- 
Hamid Jafarian
CEO at PDNSoft Co.
Web site: http://www.pdnsoft.com
Blog:     http://jafarian.pdnsoft.com



From lists at alteeve.ca  Sun Mar 30 18:13:40 2014
From: lists at alteeve.ca (Digimer)
Date: Sun, 30 Mar 2014 14:13:40 -0400
Subject: [Linux-cluster] GFS2 unformat helper tool
In-Reply-To: <53382B8B.8060204@pdnsoft.com>
References: <53382B8B.8060204@pdnsoft.com>
Message-ID: <53385ED4.7040006@alteeve.ca>

On 30/03/14 10:34 AM, Hamid Jafarian wrote:
> Hi,
>
> We developed GFS2 volume unformat helper tool.
> Read about this code at:
> http://pdnsoft.com/en/web/pdnen/blog/-/blogs/gfs2-unformat-helper-tool-1
>
> Regards

Thanks for sharing this!

Madi

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?



From christian.masopust at siemens.com  Mon Mar 31 05:52:40 2014
From: christian.masopust at siemens.com (Masopust, Christian)
Date: Mon, 31 Mar 2014 05:52:40 +0000
Subject: [Linux-cluster] IP address for replies
In-Reply-To: <CAHqHMev0g3m-TgGRFuDTcB8wm-X+wG3ByH4WC1n=PrzHOGGBnQ@mail.gmail.com>
References: <CAHqHMet2Yi0voxQEe2koSa2-4L2MFsk2JJwzwtq_U7NuTh2z0w@mail.gmail.com>
	<7615AD3742034A45A23EFE13BE43F2ED1AC2A811@ATNETS9912TMSX.ww300.siemens.net>
	<CAHqHMev0g3m-TgGRFuDTcB8wm-X+wG3ByH4WC1n=PrzHOGGBnQ@mail.gmail.com>
Message-ID: <7615AD3742034A45A23EFE13BE43F2ED1AC34A23@ATNETS9912TMSX.ww300.siemens.net>

> 
> 
> Hello Christian, thanks for your advice.
> 
> Could you please share here the relevant sections of your cluster.conf file for IPSrcAddr? 
> I currently use Conga web interface, but I can directly modify the config file if needed.
> 
> Thanks!
> 
> 
> Eduar Cardona

Hi Eduar,

sorry, I'm not allowed to give out any configuration settings, but the IPsrcaddr resource
is quite easy to configure:

# pcs resource describe IPsrcaddr
Resource options for: ocf:heartbeat:IPsrcaddr
  ipaddress (required): The IP address.
  cidr_netmask: The netmask for the interface in CIDR format. (ie, 24), or in dotted quad notation 255.255.255.0).

br,
christian