[Linux-cluster] SCSI reservation conflicts after update
Sajesh Singh
ssingh at amnh.org
Wed Apr 2 16:36:11 UTC 2008
Ryan and all else that have answered,
Thank you for the info on scsi_reserve. I have disabled the
script and all seems okay. What is a little confusing is that the
script/service was enabled before the upgrade, but did not cause any
scsi reservation conflicts.
-Sajesh-
Ryan O'Hara wrote:
>
> I went back and investigated why this might happen. Seems that I had
> seen it before but could not recall how this sort of thing happens.
>
> For 4.6, the scsi_reserve script should only be run if you intend to
> use SCSI reservations as a fence mechanism, as you correctly pointed
> out at the end of your message. I believe in 4.6 scsi_reserve was
> incorrectly enabled by default.
>
> The real problem is that the keys used for scsi reservations are based
> on node ID. For this reason, it is required that nodeid be defined in
> the cluster.conf file for all nodes. Without this, the nodeid can
> change from node to node between cluster restarts, etc. The
> scsi_reserve and fence_scsi scripts require consistent nodeid (ie.
> they do not change).
>
> So I think the problem we are seeing is that running 'scsi_reserve
> stop' cannot work since that will attempt to remove that node's key
> from the devices. If that key has changed (the node ID changed), it
> will not find a matching registration key on the device and thus fail.
>
> The best bet is to disable scsi_reserve and to clear all scsi
> reservations. As you mentioned, the sg_persist command with the -C
> option should do the trick. I am guessing that the reason that failed
> for you is that you must supply the device name AND the key being used
> for that I_T nexus. You can use sg_persist to list the keys registered
> with a particular device, but since nodeid's may have changed you
> might have to guess the key for a particular node (ie. the node you
> run the sg_persist -C command on). The good news is that when you
> identify the correct key it will clear all the keys.
>
> Ryan
>
> Sajesh Singh wrote:
>> After updating my GFS cluster to the latest packages (as of 3/28/08)
>> on an Enterprise Linux 4.6 cluster (kernel version
>> 2.6.9-67.0.7.ELsmp) I am receiving scsi reservation errors whenever
>> the nodes are rebooted. The node is then subsequently rebooted at
>> varying intervals without any intervention. I have tried to disable
>> the scsi_reserve script from startup, but it does not seem to have
>> any effect. I have also tried to use the sg_persist command to clear
>> all reservations with the -C option to no avail. I first noticed
>> something was wrong when the 2nd node of the 2 node cluster was being
>> updated. That was the first sign of the scsi reservation errors on
>> the console.
>>
>> From my understanding persistent SCSI reservations are only needed
>> if I am using the fence_scsi module.
>>
>> I would appreciate any guidance.
>>
>> Regards,
>>
>> Sajesh Singh
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
More information about the Linux-cluster
mailing list