[Linux-cluster] scsi reservation issue

Fri Nov 9 01:05:06 UTC 2007

On Thu, 2007-11-08 at 19:03 -0500, Christopher Barry wrote:
> On Thu, 2007-11-08 at 15:32 -0600, Ryan O'Hara wrote:
> > Christopher Barry wrote:
> > > 
> > > Okay. I had some other issues to deal with, but now I'm back to this,
> > > and let me get you all up to speed on what I have done, and what I do
> > > not understand about all of this.
> > > 
> > > status:
> > > esx-01: contains nodes 1 thru 3
> > > esx-02: contains nodes 4 thru 6
> > > 
> > > esx-01: all 3 cluster nodes can mount gfs.
> > > 
> > > esx-02: none can mount gfs.
> > > esx-02: scsi reservation errors in dmesg
> > > esx-02: mount fails w/ "can't read superblock" 
> > 
> > OK. So it looks like one of the nodes is still holding a reservation on 
> > the device. First, we need to determine which node has that reservation. 
> >   From any node, you should be able to run the following commands:
> > 
> > sg_persist -i -k /dev/sdc
> > sg_persist -i -r /dev/sdc
> > 
> > The first will list all the keys registered with the device. The second 
> > will show you which key is holding the reservation. At this point, I 
> > would expect that you will only see 1 key registered and that key will 
> > also be the reservation holder, but that it just a guess.
> > 
> > The keys are unique to each node, so we can figure correlate a key to a 
> > node. The key is just the hex representation of the node's IP address. 
> > You can get this by running gethostip -x <hostname>. By doing this, you 
> > should be able to figure out which node is still holding a reservation.
> > Once you determine this key/node, try running /etc/init.d/scsi_reserve 
> > stop from that node. Once that runs, use the sg_persist commands listed 
> > above to see if the reservation is cleared.
> > 
> > > Oddly, with the gfs filesystem unmounted on all nodes, I can format the
> > > gfs filesystem from the esx-02 box (from node4), and then mount it from
> > > a node on esx-01, but cannot mount it on the node I just formatted it
> > > from!
> > > 
> > > fdisk -l shows /dev/sdc1 on nodes 4 thru 6 just fine.
> > 
> > Hmm. I wonder if there is something goofy happening because the nodes 
> > are running within vmware. I have never tried this, so I have no idea. 
> > Either way, we should be able to clear up the problem.
> > 
> > > # sg_persist -C --out /dev/sdc1
> > > fails to clear out the reservations
> > 
> > Right. It believe this must be run from the node holding the 
> > reservation, or at the very least a node that is registered with the 
> > device. Also node that scsi reservations effect the entire LUN, so you 
> > can't issue registrations/reservations to a single partition (ie. sdc1).
> > 
> > > I do not understand these reservations, maybe someone can summarize?
> > 
> > I'll try to be brief. Each node in the cluster can register with a 
> > device, thus a device may have many registrations. Each node registers 
> > by using a unique key. Once registered, one of the nodes can issue a 
> > reservation. Only one node may hold the reservation, the reservations is 
> > created using that node's key. For our purposed, we use a 
> > write-exclusive, registrants only type of reservation. This means that 
> > only nodes that are registered with the device may write to it. As long 
> > as that reservation exists, that rule will be enforced.
> > 
> > When it comes to to remove registrations, there it one caveat: the node 
> > that hold the reservation cannot unregister unless there are no other 
> > nodes registered with the device. This is due to the fact that the 
> > reservations holder must also be registered  *and* if the reservation 
> > were to go away the write-exclusive, registrants-only policy would not 
> > longer be in effect. So ... what may have happened is that you tried to 
> > clear the reservation while other nodes were still registered, which 
> > will fail since that cannot happen. Once all the other nodes have 
> > "unregistered", you should be able to go back and clear the reservation.
> > 
> > Yes, this is a limitation in our product. There is a notion of moving a 
> > reservation (in the case where the reservation holder wants to 
> > unregister), but that is not yet implemented.
> > 
> > > I'm not at the box this sec (vpn-ing in will hork my evolution), but I
> > > will provide any amount of data if either you Ryan, or anyone else has
> > > stuff for me to try.
> > 
> > Please let me know if you have questions or need further assistance 
> > clearing that pesky reservation for you. :)
> > 
> > > Thanks all,
> > > -C
> > >
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster at redhat.com
> > https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> 
> 
> Ryan,
> 
> Thank you so much for your replies.
> 
> I tracked down the registration and reserve to the first cluster node,
> by converting the hex value to the IP per your instructions. All nodes
> reported only this one registration.
> 
> On that node, I try:
> # sg_persist -C --out /dev/sdc
> 
> and it returns a failure, citing a scsi reservation conflict.
> 
> I then try on kop-sds-01, the node holding the reservation:
> 
> #/etc/init.d/scsi_reserve stop
>   connect() failed on local socket: Connection refused
>   No volume groups found
> 
> 
> Now, I initially had clvmd running, and I had volume groups defined, but
> since I'm running on a netapp that does all of that stuff, I decided to
> simplify it and remove this stuff. I removed all that in the beginning,
> after initially trying to troubleshoot this problem. Are these
> reservations somehow stuck looking at an old lvm configuration
> somewhere?
> 
> Thanks!
> 
> -C

YAAAY! Looks like I might have it. I pulled out the command
from /etc/init.d/scsi_reserve and used that on the node in question, and
I am able to mount on node4. I'm rebooting the whole kaboodle now. I
expect all nodes to mount when it comes up.

Thanks Ryan!

Now that I can mount, how bad will the performance be... ;)

-C