Recent unexplained quota problems

Mon Sep 17 14:07:14 UTC 2007

>> I'm running a RHEL v3 server, completely up to date...
>>
>> I tried to edit a user's quota (as root) using the command
>> '/usr/sbin/edquota someuser' and I got the error:
>>
>> edquota: Can't open quotafile /home/aquota.user: Read-only file system
>> No filesystems with quota detected.
>>
>> Doing a listing of aquota.user reports:
>>
>> [root at server log]# ll /home/aquota.user
>> -rw-------    1 root     root        15360 Sep  9 04:22 /home/aquota.user
>>
>> /etc/fstab has:
>> LABEL=/home            /home          ext3    defaults,usrquota 1 2
>>
>> A listing of /home shows:
>> drwxr-xr-x  133 root     root         4096 Sep  7 12:49 home
>>
>> If I try to 'touch test' in /home I get:
>> [root at server home]# touch test
>> touch: creating `test': Read-only file system
>>
>> I rebooted the server and everything seems to be okay.  I'm a little 
>> concerned about this though because I can't explain it.
>Chris St. Pierre wrote:
> 
> You should be concerned about this.  The kernel will change a
> filesystem to read-only when it detects an IO error against that FS.
> This can happen for a number of reasons:
> 
>   - Your connection to your SAN dropped;
>   - Your hard drive(s) are dying;
>   - You have significant data corruption;
>   - and on and on...
> 
> Except for the first reason I listed, all of the other reasons I know
> of are Real Bad.
> 
> If you're lucky, you've got some minor data corruption that caused the
> kernel to try to write beyond the end of the drive or something like
> that; you should try running fsck on the filesystem first.  Be warned,
> though, that if you have significant data corruption, fsck may
> completely hose the filesystem, so get as good a backup as you can first.
> 
> You should check /var/log/messages for kernel messages about this.  If
> it happens again, dmesg will also have useful information (at least,
> it will until you reboot).
> 
> If the problem is transient, a simple userspace mount call will fix
> it:
> 
> mount -o remount,rw,usrquota /home
> 
> But that's a gamble.
> 
> Despite what other posters have said, when the kernel changes the
> status of the volume, it does so using kernel-level tools, _not_
> userspace mount calls, so the arguments show in the mount(1) command
> will _not_ reflect the read-only status of the drive.  If mount(1)
> shows that the drive is 'ro', then a person or a program, not the
> kernel, has mounted it read-only.

The /home partition is just that - a partition on the local harddrive. 
I don't think the harddrive is failing as I haven't seen the usual io 
messages in the nightly log files indicating such, but I suppose its 
possible.

/var/log/messages shows:
Sep 16 04:22:28 aspartic kernel: EXT3-fs: ide0(3,3): couldn't remount 
RDWR because of unprocessed orphan
  inode list.  Please umount/remount instead.

I notice the time is at 4:22.  The nightly cron jobs start at 4am.

Looking at messages.1, I see:
Sep  9 20:53:49 aspartic shutdown: shutting down for system reboot
Sep  9 20:53:49 aspartic init: Switching to runlevel: 6
Sep  9 20:53:50 aspartic su(pam_unix)[19525]: session closed for user root
...
Sep  9 20:53:53 aspartic rpc.mountd: Caught signal 15, un-registering 
and exiting.
Sep  9 20:53:53 aspartic nfs: rpc.mountd shutdown succeeded
Sep  9 20:53:57 aspartic kernel: nfsd: last server has exited
Sep  9 20:53:57 aspartic kernel: nfsd: unexporting all filesystems
Sep  9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in 
start_transaction: Readonly filesys
tem
Sep  9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in 
ext3_delete_inode: Readonly filesys
tem
Sep  9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in 
start_transaction: Readonly filesys
tem
Sep  9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in 
ext3_delete_inode: Readonly filesys
tem
Sep  9 20:53:57 aspartic nfs: nfsd shutdown succeeded
Sep  9 20:53:58 aspartic nfs: rpc.rquotad shutdown succeeded
Sep  9 20:53:58 aspartic nfs: Shutting down NFS services:  succeeded

<After reboot>
Sep  9 20:55:00 aspartic fsck: /home:
Sep  9 20:55:00 aspartic fsck: Clearing orphaned inode 1409343 (uid=556, 
gid=500, mode=040700, size=4096
)
Sep  9 20:55:01 aspartic fsck: /home: Clearing orphaned inode 1409458 
(uid=556, gid=500, mode=0100700, s
ize=617)
Sep  9 20:55:01 aspartic fsck: /home: clean, 116383/3850240 files, 
5306683/7691118 blocks

It looks like the harddrive is going bad, but its odd, because I'm 
accustomed to seeing messages in the nightly log.