[Linux-cluster] QLA2xxx tagged queue bug.

Tue Feb 15 20:36:16 UTC 2011

I'm documenting this in case anyone else gets bitten

(This is supposed to have been fixed since October, but we encountered 
it in the last few days on RHEL5.6 - either it's not fully fixed or the 
patch has fallen out of the production kernel)

We kept getting GFS and GFS2 filesystems mysteriously going dead with 
"input/output" errors over the last 2 years, which has been traced to a 
bug in qla2xxx:

A QUEUE FULL or BUSY from the target results in a generic error being 
passed up to dm-multipath from the qla2xxx driver (instead of the driver 
backing off the queue size and trying again a few milliseconds later.)

When Dm-multipath receives an error, it marks the path to the target 
"bad" and tries another path. If the queue full condition doesn't clear 
quickly there is a cascade of path failures followed by the target being 
marked as BAD when they've all failed.

If "queue_if_no_path" isn't explicitly enabled in /etc/multipath,conf, 
that causes the i/o error symptoms described above.

Even if the target's tagged queue recovers before all paths fail, there 
tends to be a big hiccup in GFS(2) operations.

If multipathing's queue_if_no_path is enabled and the OS has to wait for 
the target to return, there will be an even longer glitch.

Currently the only workaround available is to set the qla2xxx tagged 
queue depth to a very low value via module options.

Qla2xxx's tagged queue depth is PER LUN, while most target tagged queues 
are PER DEVICE (eg: A Nexsan Satabeast presenting 6 luns has 255 
commands in total, not per lun). It's pretty easy to end up with more 
requests coming out of the initiators than the targets can handle 
simultaneously.