Re: IO lockups and ext3 readonly filecorruption on RHEL4 (pre a U4)

On Tuesday 05 September 2006 07:34 pm, Christian wrote:

> ok, so ext3 will remount the fs to RO. this would happen if a panic()
> occurs? 

These boxes are not panicing.  IO (or O actually) seems to come to a complete 
stop, the system can't sync.. the journal becomes out of sync.. ext3 freaks 
and re-mounts RO, and eventually the system becomes mostly unresponsive (as 
no new processes can be properly started.  Graceful rebooting becomes a 
problem, and eventual reboots find the unsync'd disc very hard to fsck 

> is there anything related in the logs? 

No.. they're read only.

> (if /var is RO too, try  
> to setup a loghost).

We may try that as we already have a shared NetDump server set up.
Can i do syslog to BOTH the local machine AND a network syslog server.  If the 
local logs are locked, will my writing to a remote host still work?

> coud you be more specific? what does fsck.ext3 say? 

It shows thousands of de-linked files being found.  But I have not witnessed 
this first hand, as I am not in front of the console on these machines.  But 
I'll ask.

> is there something 
> in lost+found? 

I'm assuming yes.

> remember to use latest version of e2fsprogs. have you 
> tried a vanilla kernel yet?

Well, yes.  But since it is thus far not able to be reliably reproduced, it's 
hard to tell what works and what doesn't.  If anyone who understands the 
nature of this problem has any suggestions for reliably triggering it, then 
please speak up.

You mentioned some type of forced buffer flush patch last month... any ETA on 

