Problems with RHEL 5 server: NFS related?

Nigel Wade nmw at ion.le.ac.uk
Wed Feb 4 09:33:26 UTC 2009


Kenneth Holter wrote:
> Hello list.
> 
> 
> We experienced some problems with one of our RHEL 5 servers, but are having
> difficulties finding the cause. Before we got the chance to gather enough
> information I had to reboot the server, and are left with little information
> about the state before reboot. For what it's worth I'll outline the symptoms
> we saw, just in case someone has experienced a similar thing and knows
> something what may have caused these problems.
> 
> The symptoms we saw were these:
> 
>    1. Running "ls" a particular folder gave us an input/output error. This
>    folder is exported read only as an NFS share, and was mounted on a client.
>    2. Running "/etc/init.d/nfs stop" resulted in a error containing
>    "Shutting down NFS services:  exportfs: could not open /var/lib/nfs/etab for
>    locking" and "rm: cannot remove `/var/lock/subsys/nfs': Read-only file
>    system"
>    3. Both the Red Hat Satellite probe and syslog (an possibly others) had
>    stopped working at approximately the same time.
> 
> First we thought the problems had something to do with NFS because of the
> first two elements in the list above. But we don't see why a read only share
> would case such problems. And the syslog/probe issues doesn't seem to be
> related to NFS. Furthermore, we don't see any indication of file system or
> hardware problems.
> 
> In short, we're not sure what exactly caused these problems, but a restart
> seems to have done the trick. And since we didn't get to gather much info,
> it's very difficult to get to the bottom of this. But does anyone have an
> idea on what kind of problem source may cause the symptoms described above?
> Maybe this is a well known bug of some kind.
> 
> Please ask me for further details if needed.
> 

If a filesystem experiences too many I/O errors then the OS re-mounts it read-only to 
protect the data. It looks like that is what has happened to /var.

The underlying problem you have is those I/O errors. I don't see how that could have 
anything to do with NFS, on the client maybe but on the server those NFS problems are just 
a symptom. There is most likely a filesystem error, a device driver error or a hardware 
error. I'd put my money on a caused by c. It looks like the problem is on the 
disk/controller which contains /var. It may just be that partition which is affected or it 
may be all partitions on that disk or all disks on that controller. Backup what you can 
immediately. Run disk diagnostics on that drive to try to identify the problem.

If there is a problem reading the disk then there's a good chance you'll get I/O errors 
listing the directories. If /var is read-only you won't be able to stop NFS because it 
can't write to /var/lock.  /var mounted read-only will prevent syslog writing to its logs.

-- 
Nigel Wade, System Administrator, Space Plasma Physics Group,
             University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw at ion.le.ac.uk
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555




More information about the redhat-list mailing list