[Linux-cluster] scsi reservation issue

Fri Nov 2 17:32:05 UTC 2007

Christopher Barry wrote:
> On Wed, 2007-10-31 at 10:44 -0500, Ryan O'Hara wrote:
>> Christopher Barry wrote:
>>> Greetings all,
>>>
>>> I have 2 vmware esx servers, each hitting a NetApp over FS, and each
>>> with 3 RHCS cluster nodes trying to mount a gfs volume.
>>>
>>> All of the nodes (1,2,& 3) on esx-01 can mount the volume fine, but none
>>> of the nodes in the second esx box can mount the gfs volume at all, and
>>> I get the following error in dmesg:
>> Are you intentionally trying to use scsi reservations as a fence method? 
> 
> No. In fact I thought the scsi_reservation service may be *causing* the
> issue, and disabled the service from starting on all nodes. Does this
> have to be on?

No. You only need to run this service if you plan on using scsi 
reservations as a fence method. A scsi reservation will restrict access 
to a device such that only registered nodes can access it. If a 
reservation exist and a unregistered node tries to access the device, 
you'll see what you are seeing.

It may be that some reservations were created and never got cleaned-up, 
which might cause the problem to continue even after the scsi_reserve 
script was disabled. You can manually run '/etc/init.d/scsi_reserve 
stop' to attempt to clean up any reservations. Note that I am assuming 
that any reservations that might still exist on a device were created by 
the scsi_reserve script. If that is the case, you can see what devices a 
node is registered for by doing a '/etc/init.d/scsi_reserve status'. 
Also not that the scsi_reserve script does *not* have to but started or 
enabled to do these things (ie. you can safely run 'status' or 'stop' 
without first running 'start').

On caveat... 'scsi_reserve stop' will not unregister a node if it is the 
reservation holder and other nodes are still registered with a device. 
You can also use sg_persist command directly to clean all registrations 
and reservations. Use the -C option. See the sg_persist man page for a 
better description.

>> It sounds like the nodes on esx-01 are creating reservations, but the 
>> nodes on the second esx box are not registering with the device and 
>> therefore are unable to mount the filesystem. Creation of reservations 
>> and registrations is handled by the scsi_reserve init script, which 
>> should be run at startup on all nodes in the cluster. You can check to 
>> see what devices a node is registered for before you mount the 
>> filesystem by doing /etc/init.d/scsi_reservce status. If your nodes are 
>> not registered with the device and a reservation exists then you won't 
>> be able to mount.
>>
>>> Lock_Harness 2.6.9-72.2 (built Apr 24 2007 12:45:38) installed
>>> GFS 2.6.9-72.2 (built Apr 24 2007 12:45:54) installed
>>> GFS: Trying to join cluster "lock_dlm", "kop-sds:gfs_home"
>>> Lock_DLM (built Apr 24 2007 12:45:40) installed
>>> GFS: fsid=kop-sds:gfs_home.2: Joined cluster. Now mounting FS...
>>> GFS: fsid=kop-sds:gfs_home.2: jid=2: Trying to acquire journal lock...
>>> GFS: fsid=kop-sds:gfs_home.2: jid=2: Looking at journal...
>>> GFS: fsid=kop-sds:gfs_home.2: jid=2: Done
>>> scsi2 (0,0,0) : reservation conflict
>>> SCSI error : <2 0 0 0> return code = 0x18
>>> end_request: I/O error, dev sdc, sector 523720263
>>> scsi2 (0,0,0) : reservation conflict
>>> SCSI error : <2 0 0 0> return code = 0x18
>>> end_request: I/O error, dev sdc, sector 523720271
>>> scsi2 (0,0,0) : reservation conflict
>>> SCSI error : <2 0 0 0> return code = 0x18
>>> end_request: I/O error, dev sdc, sector 523720279
>>> GFS: fsid=kop-sds:gfs_home.2: fatal: I/O error
>>> GFS: fsid=kop-sds:gfs_home.2:   block = 65464979
>>> GFS: fsid=kop-sds:gfs_home.2:   function = gfs_logbh_wait
>>> GFS: fsid=kop-sds:gfs_home.2:   file
>>> = /builddir/build/BUILD/gfs-kernel-2.6.9-72/smp/src/gfs/dio.c, line =
>>> 923
>>> GFS: fsid=kop-sds:gfs_home.2:   time = 1193838678
>>> GFS: fsid=kop-sds:gfs_home.2: about to withdraw from the cluster
>>> GFS: fsid=kop-sds:gfs_home.2: waiting for outstanding I/O
>>> GFS: fsid=kop-sds:gfs_home.2: telling LM to withdraw
>>> lock_dlm: withdraw abandoned memory
>>> GFS: fsid=kop-sds:gfs_home.2: withdrawn
>>> GFS: fsid=kop-sds:gfs_home.2: can't get resource index inode: -5
>>>
>>>
>>> Does anyone have a clue as to where I should start looking?
>>>
>>>
>>> Thanks,
>>> -C
>>>
>>> --
>>> Linux-cluster mailing list
>>> Linux-cluster at redhat.com
>>> https://www.redhat.com/mailman/listinfo/linux-cluster
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster