[Linux-cluster] Re: recovering from "resource groups locked" error?

aberoham at gmail.com aberoham at gmail.com
Fri Jun 16 19:39:59 UTC 2006


Btw, all members run on 2.6.9-34.ELsmp, cman-1.0.4-0 and
cman-kernel-smp-2.6.9-43.8 with rgmanager-1.9.46-0.

On 6/16/06, aberoham at gmail.com <aberoham at gmail.com> wrote:
>
>
> If clustat reports rgmanager as online, why would any clusvcadm operation
> fail with "Try again (resource groups locked)" ?
>
> Is there any way to recover from that rgmanger failure/error besides
> resetting the entire cluster?
>
> Details --
>
> Yesterday evening a technician connected a Netgear GS748T switch to my
> network. The new switch somehow caused a storm of traffic that in turn
> caused a disruption of network connectivity across the entire LAN, including
> to all of my CS/GFS cluster nodes, for a few minutes until the new switch
> was removed from the network.
>
> This morning when I finally had a chance to investigate I found that all
> of the cluster members that are supposed to be online were online and that
> the cluster was quorate. But rgmanager would not work and services running
> under rgmanager were hung. (The cluster must have become inquorate and
> blocked access to the shared GFS volume while the outage was in progress.
> But some of the services and rgmanager never recovered?)
>
> I first tried resetting the "lead" member. (This is a pool of mirrored
> storage servers where the lead member creates a rsync batch off of a main
> fileserver and all of the other members then replay the rsync batch that is
> on a shared filesystem against their local filesystem mirror of the main
> fileserver)
>
> No matter what I did rgmanager would not start. cman_tool services would
> report code "S-1,80,4" --
>
> root at gfs05:~
> (0)>cman_tool services
> Service          Name                              GID LID State     Code
> Fence Domain:    "default"                           1   2 run       -
> [2 1 4 3]
>
> DLM Lock Space:  "clvmd"                             2   3 run       -
> [2 1 4 3]
>
> User:            "usrm::manager"                     0   4 join
> S-1,80,4
> []
>
> Other cluster members would report rgmanager as online, yet when I tried
> to operate on member services, the operation would fail with "Try again
> (resource groups locked)".
>
> root at gfs06:~
> (1)>clustat
> Member Status: Quorate
>
>   Member Name                              Status
>   ------ ----                              ------
>   gfs04                                    Online, rgmanager
>   gfs05                                    Online
>   gfs06                                    Online, Local, rgmanager
>   gfs07                                    Online, rgmanager
>   gfs08                                    Offline
>
>   Service Name         Owner (Last)                   State
>   ------- ----         ----- ------                   -----
>   mapsmirror1          gfs05                          started
>   mapsmirror2          gfs06                          started
>   mapsmirror3          gfs07                          started
>   mapsmirror4          gfs04                          started
>   mapsmirror5          (none)                         stopped
> root at gfs06:~
> (0)>clusvcadm -d mapsmirror1
> Member gfs06 disabling mapsmirror1...failed: Try again (resource groups
> locked)
>
> Eventually I just gave up and power cycled all cluster members at ounce.
> Everything, including rgmanger, then came back online OK.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20060616/391a4eab/attachment.htm>


More information about the Linux-cluster mailing list