[Linux-cluster] Problem in clvmd/dlm_recoverd

Curtis Collicutt curtis at athabascau.ca
Fri Nov 14 17:06:51 UTC 2008


Excerpts from Nuno Fernandes's message of Fri Nov 14 03:00:13 -0700 2008:
> Hi,
> 
> we have an cluster with 7 machines with a SAN. We are using them to provide 
> virtual machines, so we are using clvmd.
> 
> At some point we are unable to access any of the pv/lv/vg tools. They are all 
> stuck. From stracing them i've come to the conclusion that they are waiting 
> for clvmd.
> 
> Has anyone been in this situation?

This happens to me as well every once and a while. Haven't figure it out yet either.

Thanks,
Curtis.

> 
> Thanks for any help,
> Nuno Fernandes
> 
> in host xen1:                                                                  
>                                              
> Linux blade01.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.32-4.el5                                                     
>                                              
> cman-2.0.84-2.el5_2.1                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 20874 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 20854 pts/1    S+     0:00      \_ /bin/sh /sbin/service clvmd start           
>                                              
> 20861 pts/1    S+     0:00          \_ /bin/bash /etc/init.d/clvmd start       
>                                              
> 20931 pts/1    S+     0:00              \_ /usr/sbin/vgscan -d                 
>                                              
> 20869 ?        Ssl    0:00 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> 20874 [dlm_recoverd]              -                                            
>                                              
> ------------------------------                                                 
>                                              
> Connection to xen1 closed.                                                     
>                                              
> in host xen2:                                                                  
>                                              
> Linux blade02.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
> 2007 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.16-3.el5                                                     
>                                              
> cman-2.0.64-1.0.1.el5                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 22662 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 22613 ?        Ssl    0:02 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> 22662 [dlm_recoverd]              -                                            
>                                              
> ------------------------------                                                 
>                                              
> Connection to xen2 closed.                                                     
>                                              
> in host xen3:                                                                  
>                                              
> Linux blade03.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
> 2007 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.16-3.el5                                                     
>                                              
> cman-2.0.64-1.0.1.el5                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 22236 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 22231 ?        Ssl    0:02 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> Connection to xen3 closed.                                                     
>                                              
> 22236 [dlm_recoverd]              dlm_wait_function                            
>                                              
> ------------------------------                                                 
>                                              
> in host xen4:                                                                  
>                                              
> Linux blade04.dc.xpto.com 2.6.18-8.1.14.el5xen #1 SMP Thu Oct 4 11:38:56 WEST 
> 2007 x86_64 x86_64 x86_64 GNU/Linux            
> lvm2-cluster-2.02.16-3.el5                                                     
>                                              
> cman-2.0.64-1.0.1.el5                                                          
>                                              
>   PID TTY      STAT   TIME COMMAND                                             
>                                              
> 25097 ?        D<     0:00  \_ [dlm_recoverd]                                  
>                                              
> 25092 ?        Ssl    0:02 clvmd -T40                                          
>                                              
> ps ax -o pid,cmd,wchan                                                         
>                                              
> 25097 [dlm_recoverd]              dlm_wait_function                            
>                                              
> ------------------------------                                                 
>                                              
> Connection to xen4 closed.                                                     
>                                              
> in host xen5:
> Linux blade05.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> lvm2-cluster-2.02.32-4.el5
> cman-2.0.84-2.el5_2.1
>   PID TTY      STAT   TIME COMMAND
> 22333 ?        D<     0:00  \_ [dlm_recoverd]
> 22328 ?        Ssl    0:02 clvmd -T40
> ps ax -o pid,cmd,wchan
> 22333 [dlm_recoverd]              -
> ------------------------------
> Connection to xen5 closed.
> in host xen6:
> Linux blade06.dc.xpto.com 2.6.18-92.1.17.el5xen #1 SMP Tue Nov 4 14:13:09 EST 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> lvm2-cluster-2.02.32-4.el5
> cman-2.0.84-2.el5_2.1
>   PID TTY      STAT   TIME COMMAND
> ps ax -o pid,cmd,wchan
> ------------------------------
> Connection to xen6 closed.
> in host xen7:
> Linux blade07.dc.xpto.com 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24 20:01:15 EDT 
> 2008 x86_64 x86_64 x86_64 GNU/Linux
> lvm2-cluster-2.02.32-4.el5
> cman-2.0.84-2.el5
> cman-2.0.84-2.el5_2.1
>   PID TTY      STAT   TIME COMMAND
> 19793 ?        D<     0:00  \_ [dlm_recoverd]
> 19788 ?        Ssl    0:01 clvmd -T40
> ps ax -o pid,cmd,wchan
> 19793 [dlm_recoverd]              -

__ 
    This communication is intended for the use of the recipient to whom it
    is addressed, and may contain confidential, personal, and or privileged
    information. Please contact us immediately if you are not the intended
    recipient of this communication, and do not copy, distribute, or take
    action relying on it. Any communications received in error, or
    subsequent reply, should be deleted or destroyed.
---




More information about the Linux-cluster mailing list