[Linux-cluster] Services timeout

Jordi Prats jprats at cesca.es
Thu Sep 13 07:10:12 UTC 2007


Hi,
This is all the data I can collect. bpkar is a backup process. It have 
happened while it was indexing (it takes several weeks) and doing a 
backup at the same time.

best regards,

Sep 12 06:09:44 inf04 clurgmgrd[5964]: <notice> Stopping service 
padicat.dades
Sep 12 06:09:45 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.padicat.dades stop
Sep 12 06:09:45 inf04 clurgmgrd: [5964]: <info> Removing IPv4 address 
192.168.12.205 from bond0
Sep 12 06:09:54 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.recercat status
Sep 12 06:09:55 inf04 clurgmgrd: [5964]: <info> unmounting 
/projectes/padicat/dades
Sep 12 06:09:55 inf04 clurgmgrd: [5964]: <notice> Forcefully unmounting 
/projectes/padicat/dades
Sep 12 06:09:56 inf04 clurgmgrd: [5964]: <warning> killing process 13266 
(root bpbkar /projectes/padicat/dades)
Sep 12 06:09:56 inf04 clurgmgrd: [5964]: <warning> Dropping node-wide 
NFS locks
Sep 12 06:10:04 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.padicat.web status
Sep 12 06:10:04 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.local status
Sep 12 06:10:06 inf04 clurgmgrd: [5964]: <info> unmounting 
/projectes/padicat/dades
Sep 12 06:10:06 inf04 clurgmgrd: [5964]: <notice> Forcefully unmounting 
/projectes/padicat/dades
Sep 12 06:10:07 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via inf04.cesca.es
Sep 12 06:10:07 inf04 rpc.statd[27045]: Version 1.0.6 Starting
Sep 12 06:10:07 inf04 rpc.statd[27045]: Flags: No-Daemon Notify-Only
Sep 12 06:10:10 inf04 rpc.statd[27045]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:10 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfstdx
Sep 12 06:10:10 inf04 rpc.statd[27067]: Version 1.0.6 Starting
Sep 12 06:10:10 inf04 rpc.statd[27067]: Flags: No-Daemon Notify-Only
Sep 12 06:10:13 inf04 rpc.statd[27067]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:13 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfspadicatweb
Sep 12 06:10:13 inf04 rpc.statd[27089]: Version 1.0.6 Starting
Sep 12 06:10:13 inf04 rpc.statd[27089]: Flags: No-Daemon Notify-Only
Sep 12 06:10:14 inf04 clurgmgrd: [5964]: <info> Executing 
/etc/init.d/add.nfs.tdx status
Sep 12 06:10:16 inf04 rpc.statd[27089]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:16 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfslocal
Sep 12 06:10:16 inf04 rpc.statd[27321]: Version 1.0.6 Starting
Sep 12 06:10:16 inf04 rpc.statd[27321]: Flags: No-Daemon Notify-Only
Sep 12 06:10:19 inf04 rpc.statd[27321]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:19 inf04 clurgmgrd: [5964]: <info> Sending reclaim 
notifications via nfsrecercat
Sep 12 06:10:19 inf04 rpc.statd[27343]: Version 1.0.6 Starting
Sep 12 06:10:19 inf04 rpc.statd[27343]: Flags: No-Daemon Notify-Only
Sep 12 06:10:22 inf04 rpc.statd[27343]: Caught signal 15, un-registering 
and exiting.
Sep 12 06:10:22 inf04 clurgmgrd: [5964]: <err> 'umount 
/projectes/padicat/dades' failed, error=0
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <notice> stop on fs 
"PADICAT.dades" returned 2 (invalid argument(s))
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <crit> #12: RG padicat.dades 
failed to stop; intervention required
Sep 12 06:10:22 inf04 clurgmgrd[5964]: <notice> Service padicat.dades is 
failed



Lon Hohberger wrote:
> On Wed, Sep 12, 2007 at 09:14:04AM +0200, Jordi Prats wrote:
>   
>> Hi,
>> I have a NFS server with RedHat Cluster. Sometimes when is on heavy load 
>> it sets the service status to failed. There's no fs corruption and no 
>> daemon is down. I suspect this is caused by some timeout while is 
>> checking the fs is mounted. There is any way to define the check 
>> interval or the check timeout?
>>     
>
> It shouldn't matter about load - a fail only occurs on fail-to-stop
> cases.  Do you have any log messages from the incident?
>
>   


-- 
......................................................................
         __
        / /          Jordi Prats
  C E / S / C A      Dept. de Sistemes
      /_/            Centre de Supercomputació de Catalunya

  Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
  T. 93 205 6464 · F.  93 205 6979 · jprats at cesca.es
...................................................................... 




More information about the Linux-cluster mailing list