[Linux-cluster] Services timeout
Jordi Prats
jprats at cesca.es
Thu Sep 13 07:10:12 UTC 2007
Hi,
This is all the data I can collect. bpkar is a backup process. It have
happened while it was indexing (it takes several weeks) and doing a
backup at the same time.
best regards,
Sep 12 06:09:44 inf04 clurgmgrd[5964]: <notice> Stopping service
padicat.dades
Sep 12 06:09:45 inf04 clurgmgrd: [5964]: <info> Executing
/etc/init.d/add.nfs.padicat.dades stop
Sep 12 06:09:45 inf04 clurgmgrd: [5964]: <info> Removing IPv4 address
192.168.12.205 from bond0
Sep 12 06:09:54 inf04 clurgmgrd: [5964]: <info> Executing
/etc/init.d/add.nfs.recercat status
Sep 12 06:09:55 inf04 clurgmgrd: [5964]: <info> unmounting
/projectes/padicat/dades
Sep 12 06:09:55 inf04 clurgmgrd: [5964]: <notice> Forcefully unmounting
/projectes/padicat/dades
Sep 12 06:09:56 inf04 clurgmgrd: [5964]: <warning> killing process 13266
(root bpbkar /projectes/padicat/dades)
Sep 12 06:09:56 inf04 clurgmgrd: [5964]: <warning> Dropping node-wide
NFS locks
Sep 12 06:10:04 inf04 clurgmgrd: [5964]: <info> Executing
/etc/init.d/add.nfs.padicat.web status
Sep 12 06:10:04 inf04 clurgmgrd: [5964]: <info> Executing
/etc/init.d/add.nfs.local status
Sep 12 06:10:06 inf04 clurgmgrd: [5964]: <info> unmounting
/projectes/padicat/dades
Sep 12 06:10:06 inf04 clurgmgrd: [5964]: <notice> Forcefully unmounting
/projectes/padicat/dades
Sep 12 06:10:07 inf04 clurgmgrd: [5964]: <info> Sending reclaim
notifications via inf04.cesca.es
Sep 12 06:10:07 inf04 rpc.statd[27045]: Version 1.0.6 Starting
Sep 12 06:10:07 inf04 rpc.statd[27045]: Flags: No-Daemon Notify-Only
Sep 12 06:10:10 inf04 rpc.statd[27045]: Caught signal 15, un-registering
and exiting.
Sep 12 06:10:10 inf04 clurgmgrd: [5964]: <info> Sending reclaim
notifications via nfstdx
Sep 12 06:10:10 inf04 rpc.statd[27067]: Version 1.0.6 Starting
Sep 12 06:10:10 inf04 rpc.statd[27067]: Flags: No-Daemon Notify-Only
Sep 12 06:10:13 inf04 rpc.statd[27067]: Caught signal 15, un-registering
and exiting.
Sep 12 06:10:13 inf04 clurgmgrd: [5964]: <info> Sending reclaim
notifications via nfspadicatweb
Sep 12 06:10:13 inf04 rpc.statd[27089]: Version 1.0.6 Starting
Sep 12 06:10:13 inf04 rpc.statd[27089]: Flags: No-Daemon Notify-Only
Sep 12 06:10:14 inf04 clurgmgrd: [5964]: <info> Executing
/etc/init.d/add.nfs.tdx status
Sep 12 06:10:16 inf04 rpc.statd[27089]: Caught signal 15, un-registering
and exiting.
Sep 12 06:10:16 inf04 clurgmgrd: [5964]: <info> Sending reclaim
notifications via nfslocal
Sep 12 06:10:16 inf04 rpc.statd[27321]: Version 1.0.6 Starting
Sep 12 06:10:16 inf04 rpc.statd[27321]: Flags: No-Daemon Notify-Only
Sep 12 06:10:19 inf04 rpc.statd[27321]: Caught signal 15, un-registering
and exiting.
Sep 12 06:10:19 inf04 clurgmgrd: [5964]: <info> Sending reclaim
notifications via nfsrecercat
Sep 12 06:10:19 inf04 rpc.statd[27343]: Version 1.0.6 Starting
Sep 12 06:10:19 inf04 rpc.statd[27343]: Flags: No-Daemon Notify-Only
Sep 12 06:10:22 inf04 rpc.statd[27343]: Caught signal 15, un-registering
and exiting.
Sep 12 06:10:22 inf04 clurgmgrd: [5964]: <err> 'umount
/projectes/padicat/dades' failed, error=0
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <notice> stop on fs
"PADICAT.dades" returned 2 (invalid argument(s))
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <crit> #12: RG padicat.dades
failed to stop; intervention required
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <notice> Service padicat.dades is
failed
Lon Hohberger wrote:
> On Wed, Sep 12, 2007 at 09:14:04AM +0200, Jordi Prats wrote:
>
>> Hi,
>> I have a NFS server with RedHat Cluster. Sometimes when is on heavy load
>> it sets the service status to failed. There's no fs corruption and no
>> daemon is down. I suspect this is caused by some timeout while is
>> checking the fs is mounted. There is any way to define the check
>> interval or the check timeout?
>>
>
> It shouldn't matter about load - a fail only occurs on fail-to-stop
> cases. Do you have any log messages from the incident?
>
>
--
......................................................................
__
/ / Jordi Prats
C E / S / C A Dept. de Sistemes
/_/ Centre de Supercomputació de Catalunya
Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 205 6464 · F. 93 205 6979 · jprats at cesca.es
......................................................................
More information about the Linux-cluster
mailing list