[Linux-cluster] Httpd Process io blocked

Tue Mar 7 11:35:09 UTC 2006

2006/3/7, Marc Grimme <grimme at atix.de>:
> Hi,
> to debug you could use strace. E.g. executing strace -p 14970 will probably
> show you that the process is waiting for a lock. As the ps already does. My
> first guess would be, that you use apache with php and sessions.

Thanks. But strace doesnt output anything and became Ctrl-C imune. It
needs a sigkill to exit and the traced process stays in T state. I
seems that it doesnt manage to get last system call where the process
is in D state.

>
> If so, the phplib uses flocks for locking the session-ids. Normally it happens
> that one process locks a session. If another process comes along to get an
> flock on that session it has to wait until the further flock is closed. It
> very often happens that the other process gets that flock when the client and
> session are not available any more. Then the flock is held until the apache
> process timesout.
>

I don't think it is session related because I store sessions file
outside the GFS mount point (/tmp) and I run a load balancer based
upon the source adress (to always send requests to the same server and
then keep sessions)

But, we are using mysql query caching (with some libraries like AdoDb)
inside the GFS mount point. Do you think it could be the cache files
which are dead-locked ?

> We have made a patch for a better locking with php which you can find on
> http:/www.open-sharedroot.org in the downloads section.
> Hope that helps
> Regards Marc.
>
> On Tuesday 07 March 2006 11:50, Sébastien DIDIER wrote:
> > Hi,
> >
> > I'm running a two-nodes GFS cluster which hosts web sites. The GFS
> > partition is over a Iscsi device and by now, i'm using manual fencing.
> >
> > Today, I got 5 httpd process on both nodes which got stuck in IO
> > blocking state. I suspected a GFS filesystem corruption but I haven't
> > got any output from the kernel. I ran a fsck two days ago after a
> > power chute.
> >
> > Here's the wait state of the process. (idem for the other node)
> >
> > # ps -o pid,tt,user,fname,wchan -C apache
> >   PID TT       USER     COMMAND  WCHAN
> >  4426 ?        root     apache   -
> > 14970 ?        www-data apache   glock_wait_internal
> > 15103 ?        www-data apache   glock_wait_internal
> > 16780 ?        www-data apache   glock_wait_internal
> > 16959 ?        www-data apache   glock_wait_internal
> > 14936 ?        www-data apache   finish_stop
> > 12859 ?        www-data apache   -
> > 13005 ?        www-data apache   -
> > 13311 ?        www-data apache   semtimedop
> > 13390 ?        www-data apache   semtimedop
> >
> > How can I debug further this problem ? And how can I bring back home
> > my httpd processes without a reboot ?
> >
> > Many thanks for your help.
> >
> > Regards,
> > Sébastien DIDIER
> >
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
>
> --
> Gruss / Regards,
>
> Marc Grimme
> Phone: +49-89 121 409-54
> http://www.atix.de/               http://www.open-sharedroot.org/
>
> **
> ATIX - Ges. fuer Informationstechnologie und Consulting mbH
> Einsteinstr. 10 - 85716 Unterschleissheim - Germany
>
>