[Linux-cluster] gfs2 and quotas - system crash

Abhijith Das adas at redhat.com
Mon Mar 10 19:38:06 UTC 2014



----- Original Message -----
> From: "stephen rankin" <stephen.rankin at stfc.ac.uk>
> To: linux-cluster at redhat.com
> Sent: Monday, March 10, 2014 1:15:08 PM
> Subject: [Linux-cluster] gfs2 and quotas - system crash
> 
> Hello,
> 
> 
> 
> When using gfs2 with quotas on a SAN that is providing storage to two
> clustered systems running CentOS6.5, one of the systems
> can crash. This crash appears to be caused when a user tries
> to add something to a SAN disk when they have exceeded their
> quota on that disk. Sometimes a stack trace is produced in /var/log/messages
> which appears to indicate that it was gfs2 that caused the problem.
> At the same time you get the gfs2 stack trace you also see problems
> with someone exceeding their quota.
> 
> The stack trace is below.
> 
> Has anyone got a solution to this, other than switching of quotas? I have
> switched of quotas which appears to have stabilised the system so far, but I
> do need the quotas on.
> 
> Your help is appreciated.
> 

Hi Stephen,

We have another report of this bug when gfs2 was exported using NFS. 
https://bugzilla.redhat.com/show_bug.cgi?id=1059808. Are you using
NFS in your setup as well? We have not able to reproduce it to figure
out what might be going on. Do you have a set procedure that you're
able to recreate with reliably? If so, it would be of great help.
Also, more info about your setup (file sizes, number of files, how
many nodes mounting gfs2, what kinds of operations are being run)
etc would be helpful as well.

Cheers!
--Abhi

> Stephen Rankin
> STFC, RAL, ISIS
> 
> Mar  5 11:40:50 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded
> for user 101355
> Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_explode_dn(usi660)
> returned NULL: Success
> Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] ldap_result() failed: Invalid
> DN syntax
> Mar  5 11:40:50 chadwick nslcd[11420]: [767df3] lookup of user usi660 failed:
> Invalid DN syntax
> Mar  5 11:41:46 chadwick kernel: ------------[ cut here ]------------
> Mar  5 11:41:46 chadwick kernel: WARNING: at lib/list_debug.c:26
> __list_add+0x6d/0xa0() (Not tainted)
> Mar  5 11:41:46 chadwick kernel: Hardware name: PowerEdge R910
> Mar  5 11:41:46 chadwick kernel: list_add corruption. next->prev should be
> prev (ffff8820531518d0), but was ffff884d4c4594d0. (next=ffff884d4c4594d0).
> Mar  5 11:41:46 chadwick kernel: Modules linked in: gfs2 dlm configfs bridge
> autofs4 des_generic ecb md4 nls_utf8 cifs bnx2fc cnic uio fcoe libfcoe libfc
> 8021q garp stp llc ipv6 microcode power_meter iTCO_wdt iTCO_vendor_support
> dcdbas serio_raw ixgbe dca ptp pps_core mdio lpc_ich mfd_core sg ses
> enclosure i7core_edac edac_core bnx2 ext4 jbd2 mbcache dm_round_robin sr_mod
> cdrom sd_mod crc_t10dif qla2xxx scsi_transport_fc scsi_tgt pata_acpi
> ata_generic ata_piix megaraid_sas dm_multipath dm_mirror dm_region_hash
> dm_log dm_mod [last unloaded: speedstep_lib]
> Mar  5 11:41:46 chadwick kernel: Pid: 74823, comm: vncserver Not tainted
> 2.6.32-431.3.1.el6.x86_64 #1
> Mar  5 11:41:46 chadwick kernel: Call Trace:
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81071e27>] ?
> warn_slowpath_common+0x87/0xc0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81071f16>] ?
> warn_slowpath_fmt+0x46/0x50
> Mar  5 11:41:46 chadwick kernel: [<ffffffff812944ed>] ? __list_add+0x6d/0xa0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff811a6c02>] ? new_inode+0x72/0xb0
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f45d5>] ?
> gfs2_create_inode+0x1b5/0x1150 [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03f3986>] ?
> gfs2_glock_nq_init+0x16/0x40 [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffffa03ffc74>] ? gfs2_mkdir+0x24/0x30
> [gfs2]
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8122766f>] ?
> security_inode_mkdir+0x1f/0x30
> Mar  5 11:41:46 chadwick kernel: [<ffffffff81198149>] ? vfs_mkdir+0xd9/0x140
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ab67>] ?
> sys_mkdirat+0xc7/0x1b0
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8119ac68>] ? sys_mkdir+0x18/0x20
> Mar  5 11:41:46 chadwick kernel: [<ffffffff8100b072>] ?
> system_call_fastpath+0x16/0x1b
> Mar  5 11:41:46 chadwick kernel: ---[ end trace e51734a39976a028 ]---
> Mar  5 11:41:46 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded
> for user 101355
> Mar  5 11:41:47 chadwick abrtd: Directory 'oops-2014-03-05-11:41:47-12194-1'
> creation detected
> Mar  5 11:41:47 chadwick abrt-dump-oops: Reported 1 kernel oopses to Abrt
> Mar  5 11:41:47 chadwick abrtd: Can't open file
> '/var/spool/abrt/oops-2014-03-05-11:41:47-12194-1/uid': No such file or
> directory
> Mar  5 11:41:54 chadwick kernel: GFS2: fsid=analysis:lvol0.1: quota exceeded
> for user 101355
> 
> 
> 
> 
> --
> Scanned by iCritical.
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster




More information about the Linux-cluster mailing list