[Linux-cluster] SCSI reservation conflicts after update

Wed Apr 2 16:36:11 UTC 2008

Ryan and all else that have answered,
       Thank you for the info on scsi_reserve. I have disabled the 
script and all seems okay. What is a little confusing is that the 
script/service was enabled before the upgrade, but did not cause any 
scsi reservation conflicts.

-Sajesh-


Ryan O'Hara wrote:
>
> I went back and investigated why this might happen. Seems that I had 
> seen it before but could not recall how this sort of thing happens.
>
> For 4.6, the scsi_reserve script should only be run if you intend to 
> use SCSI reservations as a fence mechanism, as you correctly pointed 
> out at the end of your message. I believe in 4.6 scsi_reserve was 
> incorrectly enabled by default.
>
> The real problem is that the keys used for scsi reservations are based 
> on node ID. For this reason, it is required that nodeid be defined in 
> the cluster.conf file for all nodes. Without this, the nodeid can 
> change from node to node between cluster restarts, etc. The 
> scsi_reserve and fence_scsi scripts require consistent nodeid (ie. 
> they do not change).
>
> So I think the problem we are seeing is that running 'scsi_reserve 
> stop' cannot work since that will attempt to remove that node's key 
> from the devices. If that key has changed (the node ID changed), it 
> will not find a matching registration key on the device and thus fail.
>
> The best bet is to disable scsi_reserve and to clear all scsi 
> reservations. As you mentioned, the sg_persist command with the -C 
> option should do the trick. I am guessing that the reason that failed 
> for you is that you must supply the device name AND the key being used 
> for that I_T nexus. You can use sg_persist to list the keys registered 
> with a particular device, but since nodeid's may have changed you 
> might have to guess the key for a particular node (ie. the node you 
> run the sg_persist -C command on). The good news is that when you 
> identify the correct key it will clear all the keys.
>
> Ryan
>
> Sajesh Singh wrote:
>> After updating my GFS cluster to the latest packages (as of 3/28/08) 
>> on an Enterprise Linux 4.6 cluster (kernel version 
>> 2.6.9-67.0.7.ELsmp)  I am receiving scsi reservation errors whenever 
>> the nodes are rebooted. The node is then subsequently rebooted at 
>> varying intervals without any intervention. I have tried to disable 
>> the scsi_reserve script from startup, but it does not seem to have 
>> any effect. I have also tried to use the sg_persist command to clear 
>> all reservations with the -C option to no avail. I first noticed 
>> something was wrong when the 2nd node of the 2 node cluster was being 
>> updated. That was the first sign of the scsi reservation errors on 
>> the console.
>>
>>  From my understanding persistent SCSI reservations are only needed 
>> if I am using the fence_scsi module.
>>
>> I would appreciate any guidance.
>>
>> Regards,
>>
>> Sajesh Singh
>>
>> -- 
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>