[Linux-cluster]GFS Problem

Birger Wathne birger at uib.no
Wed May 18 21:51:05 UTC 2005


>On Tue, May 17, 2005 at 11:39:08PM -0600, Frank L. Setinsek wrote:
>  
>
>>
>>   May 17 21:53:52 compute-0-2.local kernel: mptscsih: ioc0: WARNING - Device
>>   (0:0:1) reported QUEUE_FULL!
>>   May 17 21:53:52 compute-0-2.local kernel: SCSI disk error : host 0 channel 0
>>   id 0 lun 1 return code = 440b0000
>>    
>>

I would suspect this is an issue with tagged queueing.

Tagged queueing lets a host tag each I/O request with an identifier so 
the I/O subsystem can answer the requests in a different order. The host 
queries the device to find out how large the queue can be. If you have 
several hosts, all assuming they have the whole queue to themselves they 
could easily fill it...

Read the documentation for your device, and see what the tagged queue 
depth is. See if it can be configured. Then find out how you can set the 
queue depth in your scsi driver. Some drivers can set for each target in 
some config file. Set max queue depth for the device in the scsi driver 
on each node to 1/6 of the total queue depth on the device (since you 
have a 6 node cluster).

Of course the easy test would be to disable tagged queueing completely, 
but the performance hit can be bad. It would quickly show if the problem 
goes away...

Remember that you will have to reconfigure the queue depth on all nodes 
before you can add a new node... So you may want to set the depth to 1/7 
of the total so there is room for one more if these nodes run something 
you cannot restart often.

-- 
birger




More information about the Linux-cluster mailing list