[dm-devel] kernel crash with multipath-tools v0.4.7

Praveen Dhanasekaran - PTI praveend at promise.com
Wed Jun 12 11:03:53 UTC 2013


Hello All,
 
I have a RAID storage box connected to a Linux server through fc cables. The server is using 
kernel 2.6.26, and multipath-tools v0.4.7 (with dm-mpath target version 1.0.5) is installed on it 
to handle the disk scsi devices from the storage box. The multipath device is mounted and 
used for a file system. 
 
When fc port is down, it is seen that the scsi fc transport layer removes the corresponding 
fc-target and all the scsi device attached to this target. And when the fc port come back up, 
I see the kernel crash. Below are the details on the crash. Is there any patch available to fix 
this crash?
 
During port down the output of multipath -l 
===============================================
/ > multipath -l
22230000155f989c9 dm-0 ,
[size=70G][features=0][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ #:#:#:# -   #:#   [failed][undef]
 \_ #:#:#:# -   #:#   [failed][undef]
 \_ #:#:#:# -   #:#   [failed][undef]
 \_ #:#:#:# -   #:#   [failed][undef]
22207000155f84ba3 dm-1 ,
[size=70G][features=0][hwhandler=1 alua][rw]
\_ round-robin 0 [prio=0][enabled]
 \_ #:#:#:# -   #:#   [failed][undef]
 \_ #:#:#:# -   #:#   [failed][undef]
/ >

During port up I am getting below stack dump in Linux server
=============================================
kobject_add_internal failed for 4:0:3:0 with -EEXIST, don't try to register things with the same name in the same directory.
Pid: 1384, comm: scsi_wq_4 Tainted: P          2.6.26EIGHTSTACK_64_V15_V11_8GB_SANFS_UP #1
Call Trace:
 [<ffffffff80334455>] kobject_add_internal+0x11e/0x15d
 [<ffffffff803348b3>] kobject_add+0x74/0x7c
 [<ffffffff80283df4>] bio_alloc_bioset+0x89/0xd9
 [<ffffffff802846a9>] bio_map_kern+0xae/0x103
 [<ffffffff803280c9>] __freed_request+0x2f/0x7f
 [<ffffffff8032813d>] freed_request+0x24/0x44
 [<ffffffff80334332>] kobject_get+0x12/0x17
 [<ffffffff80393f2f>] get_device+0x17/0x1f
 [<ffffffff8039450a>] device_add+0x82/0x519
 [<ffffffff803c300e>] scsi_sysfs_add_sdev+0x86/0x1d5
 [<ffffffff803c140d>] scsi_probe_and_add_lun+0x77b/0x89f
 [<ffffffff803c1a60>] __scsi_scan_target+0x3d0/0x566
 [<ffffffff80221989>] update_curr+0x54/0x64
 [<ffffffff80221989>] update_curr+0x54/0x64
 [<ffffffff803c208f>] scsi_scan_target+0x96/0xb9
 [<ffffffff803c6064>] fc_scsi_scan_rport+0x70/0x84
 [<ffffffff803c5ff4>] fc_scsi_scan_rport+0x0/0x84
 [<ffffffff80233c95>] run_workqueue+0x74/0xee
 [<ffffffff8023431b>] worker_thread+0xd0/0xdb
 [<ffffffff802367e4>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8023424b>] worker_thread+0x0/0xdb
 [<ffffffff8023669c>] kthread+0x47/0x73
 [<ffffffff802247b3>] schedule_tail+0x18/0x4c
 [<ffffffff8020c5b8>] child_rip+0xa/0x12
 [<ffffffff80236655>] kthread+0x0/0x73
 [<ffffffff8020c5ae>] child_rip+0x0/0x12

After fc port up and down for one or two time I am getting below crash in Linux server
===============================================================
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffff802a6ef5>] sysfs_find_dirent+0x9/0x2f
PGD 21101067 PUD 210d0067 PMD 0 
Oops: 0000 [1] 
Firmware Version :  01.01.0000.00
Pid: 1385, comm: fc_wq_4 2.6.26EIGHTSTACK_64_V15_V11_8GB_SANFS_UP #1
 [ffffffff802a6ef5] sysfs_find_dirent+0x9/0x2f
RIP: 0010:[ffffffff802a6ef5] RSP: 0018:ffff8100240cdd30  EFLAGS: 00010286
RAX: ffff810020c14c30 RBX: ffffffff805c1d8e RCX: 0000000000000002
RDX: ffff8100329a8278 RSI: ffffffff805c1d8e RDI: 0000000000000000
RBP: ffffffff805c1d8e R08: 00000000000015c1 R09: ffff81003f818150
R10: 0000000000000000 R11: ffff8100240d3f50 R12: 0000000000000000
R13: ffff810022031c08 R14: ffffffff806ac6a0 R15: 00000000016a5018
Call Trace
--------------------
 [ffffffff802a7914] ? sysfs_get_dirent+0x24/0x58
 [ffffffff802a81c2] ? sysfs_remove_group+0x24/0xc0
 [ffffffff80398a51] ? device_pm_remove+0x18/0x49
 [ffffffff803942e3] ? device_del+0x13/0x15b
 [ffffffff80394434] ? device_unregister+0x9/0x12
 [ffffffff803c2e0a] ? __scsi_remove_device+0x2a/0x7a
 [ffffffff803c2e7b] ? scsi_remove_device+0x21/0x2e
 [ffffffff803c2f62] ? __remove_child+0x0/0x1a
 
Thanks,
Praveen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20130612/4d9d91bb/attachment.htm>


More information about the dm-devel mailing list