Recent unexplained quota problems
Ryan Golhar
golharam at umdnj.edu
Mon Sep 17 14:07:14 UTC 2007
>> I'm running a RHEL v3 server, completely up to date...
>>
>> I tried to edit a user's quota (as root) using the command
>> '/usr/sbin/edquota someuser' and I got the error:
>>
>> edquota: Can't open quotafile /home/aquota.user: Read-only file system
>> No filesystems with quota detected.
>>
>> Doing a listing of aquota.user reports:
>>
>> [root at server log]# ll /home/aquota.user
>> -rw------- 1 root root 15360 Sep 9 04:22 /home/aquota.user
>>
>> /etc/fstab has:
>> LABEL=/home /home ext3 defaults,usrquota 1 2
>>
>> A listing of /home shows:
>> drwxr-xr-x 133 root root 4096 Sep 7 12:49 home
>>
>> If I try to 'touch test' in /home I get:
>> [root at server home]# touch test
>> touch: creating `test': Read-only file system
>>
>> I rebooted the server and everything seems to be okay. I'm a little
>> concerned about this though because I can't explain it.
>Chris St. Pierre wrote:
>
> You should be concerned about this. The kernel will change a
> filesystem to read-only when it detects an IO error against that FS.
> This can happen for a number of reasons:
>
> - Your connection to your SAN dropped;
> - Your hard drive(s) are dying;
> - You have significant data corruption;
> - and on and on...
>
> Except for the first reason I listed, all of the other reasons I know
> of are Real Bad.
>
> If you're lucky, you've got some minor data corruption that caused the
> kernel to try to write beyond the end of the drive or something like
> that; you should try running fsck on the filesystem first. Be warned,
> though, that if you have significant data corruption, fsck may
> completely hose the filesystem, so get as good a backup as you can first.
>
> You should check /var/log/messages for kernel messages about this. If
> it happens again, dmesg will also have useful information (at least,
> it will until you reboot).
>
> If the problem is transient, a simple userspace mount call will fix
> it:
>
> mount -o remount,rw,usrquota /home
>
> But that's a gamble.
>
> Despite what other posters have said, when the kernel changes the
> status of the volume, it does so using kernel-level tools, _not_
> userspace mount calls, so the arguments show in the mount(1) command
> will _not_ reflect the read-only status of the drive. If mount(1)
> shows that the drive is 'ro', then a person or a program, not the
> kernel, has mounted it read-only.
The /home partition is just that - a partition on the local harddrive.
I don't think the harddrive is failing as I haven't seen the usual io
messages in the nightly log files indicating such, but I suppose its
possible.
/var/log/messages shows:
Sep 16 04:22:28 aspartic kernel: EXT3-fs: ide0(3,3): couldn't remount
RDWR because of unprocessed orphan
inode list. Please umount/remount instead.
I notice the time is at 4:22. The nightly cron jobs start at 4am.
Looking at messages.1, I see:
Sep 9 20:53:49 aspartic shutdown: shutting down for system reboot
Sep 9 20:53:49 aspartic init: Switching to runlevel: 6
Sep 9 20:53:50 aspartic su(pam_unix)[19525]: session closed for user root
...
Sep 9 20:53:53 aspartic rpc.mountd: Caught signal 15, un-registering
and exiting.
Sep 9 20:53:53 aspartic nfs: rpc.mountd shutdown succeeded
Sep 9 20:53:57 aspartic kernel: nfsd: last server has exited
Sep 9 20:53:57 aspartic kernel: nfsd: unexporting all filesystems
Sep 9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in
start_transaction: Readonly filesys
tem
Sep 9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in
ext3_delete_inode: Readonly filesys
tem
Sep 9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in
start_transaction: Readonly filesys
tem
Sep 9 20:53:57 aspartic kernel: EXT3-fs error (device ide0(3,3)) in
ext3_delete_inode: Readonly filesys
tem
Sep 9 20:53:57 aspartic nfs: nfsd shutdown succeeded
Sep 9 20:53:58 aspartic nfs: rpc.rquotad shutdown succeeded
Sep 9 20:53:58 aspartic nfs: Shutting down NFS services: succeeded
<After reboot>
Sep 9 20:55:00 aspartic fsck: /home:
Sep 9 20:55:00 aspartic fsck: Clearing orphaned inode 1409343 (uid=556,
gid=500, mode=040700, size=4096
)
Sep 9 20:55:01 aspartic fsck: /home: Clearing orphaned inode 1409458
(uid=556, gid=500, mode=0100700, s
ize=617)
Sep 9 20:55:01 aspartic fsck: /home: clean, 116383/3850240 files,
5306683/7691118 blocks
It looks like the harddrive is going bad, but its odd, because I'm
accustomed to seeing messages in the nightly log.
More information about the redhat-list
mailing list