<span style="font-family:arial,sans-serif;font-size:13px">Hello everyone,</span><div style="font-family:arial,sans-serif;font-size:13px"> I've been having an intermittent problem on random servers segfaulting while trying to create a snapshot under version lvm2-2.02.17-7.38.3 on kernel 2.6.16.60-0.93.1-bigsmp (SLES 10 SP4). The messages I get are:</div> <div style="font-family:arial,sans-serif;font-size:13px">###########################################</div><div style="font-family:arial,sans-serif;font-size:13px"><div>Dec 27 07:45:39 chelco-app-01 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000001c</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: printing eip:</div><div>Dec 27 07:45:39 chelco-app-01 kernel: f90ab3a7</div><div>Dec 27 07:45:39 chelco-app-01 kernel: *pde = 3780a001</div><div>Dec 27 07:45:39 chelco-app-01 kernel: Oops: 0000 [#1]</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: SMP </div><div>Dec 27 07:45:39 chelco-app-01 kernel: last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq</div><div>Dec 27 07:45:39 chelco-app-01 kernel: Modules linked in: raw dock button battery ac loop dm_snapshot usbhid dm_mod uhci_hcd bnx2x hw_random ehci_hcd qla2xxx hpilo usbcore firmware_class scsi_transport_fc parport_pc lp parport ext3 jbd edd </div> <div>fan thermal processor cciss sd_mod scsi_mod</div><div>Dec 27 07:45:39 chelco-app-01 kernel: CPU: 4</div><div>Dec 27 07:45:39 chelco-app-01 kernel: EIP: 0060:[<f90ab3a7>] Tainted: G X VLI</div><div> Dec 27 07:45:39 chelco-app-01 kernel: EFLAGS: 00210202 (2.6.16.60-0.93.1-bigsmp #1) </div><div>Dec 27 07:45:39 chelco-app-01 kernel: EIP is at __map_bio+0x50/0x11f [dm_mod]</div><div>Dec 27 07:45:39 chelco-app-01 kernel: eax: f90960c4 ebx: 00000000 ecx: f7ff2a60 edx: f7794440</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: esi: f7ff2a58 edi: f90960c4 ebp: f46306c0 esp: f4c15d28</div><div>Dec 27 07:45:39 chelco-app-01 kernel: ds: 007b es: 007b ss: 0068</div><div>Dec 27 07:45:39 chelco-app-01 kernel: Process lvcreate (pid: 6678, threadinfo=f4c14000 task=f7838680)</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: Stack: <0>f7794340 f7794440 f7794440 03201ff0 00000000 03201ff0 00000000 00000008 </div><div>Dec 27 07:45:39 chelco-app-01 kernel: 00000000 00000000 f90960c4 f7ff2a68 f46306c0 f90abd1b 00000000 00000001 </div> <div>Dec 27 07:45:39 chelco-app-01 kernel: 00000008 f428e2e0 fcdfe010 ffffffff c0113d62 00000000 0000001f f7ff2a58 </div><div>Dec 27 07:45:39 chelco-app-01 kernel: Call Trace:</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<f90abd1b>] __split_bio+0x182/0x440 [dm_mod]</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: [<c0113d62>] do_flush_tlb_all+0x0/0x5d</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<f90abff0>] __flush_deferred_io+0x17/0x20 [dm_mod]</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<f90ac14c>] dm_resume+0x8e/0xf9 [dm_mod]</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: [<f90aedd8>] dev_suspend+0x138/0x157 [dm_mod]</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<f90af607>] ctl_ioctl+0x220/0x26e [dm_mod]</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<f90aeca0>] dev_suspend+0x0/0x157 [dm_mod]</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: [<c0179ce8>] do_ioctl+0x48/0x5e</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<c0179f60>] vfs_ioctl+0x262/0x275</div><div>Dec 27 07:45:39 chelco-app-01 kernel: [<c0179fc7>] sys_ioctl+0x54/0x6d</div> <div>Dec 27 07:45:39 chelco-app-01 kernel: [<c0103dcb>] sysenter_past_esp+0x54/0x79</div><div>Dec 27 07:45:39 chelco-app-01 kernel: Code: b4 0a f9 89 70 40 8b 06 83 c0 0c f0 ff 00 8b 54 24 08 8d 4e 08 8b 02 8b 52 04 89 44 24 0c 89 f8 89 54 24 10 8b 5f 04 8b 54 24 08 <ff> 53 1c 83 f8 00 89 c2 0f 8e 93 00 00 00 8b 54 24 08 8b 42 0c</div> </div><div style="font-family:arial,sans-serif;font-size:13px">#############################################################</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"> The result is the target volume gets suspended and the only way to fix it is to reboot and remove the faulty snapshot when it comes back up.</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px"> Now the script I wrote that creates these snapshots will use all available extents from the Volume Group pool which in this case was actually larger than the size of the volume I was trying to snapshot. Thinking this was the problem, I tried creating the snapshot several times using a snapshot size less than or equal to the target volume and it worked every time. So, I tried a value larger than the target to generate a crash and it did BUT not every time. In fact now I can't get it to segfault at all.</div> <div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">So my question is: is creating the snapshot volume with a size larger than the target volume inducing segfaults randomly or could there be another problem lurking? If these weren't production machines I would normally just go with a size smaller than the target but I really need to be sure what exactly is causing the segfaults.</div> <div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">Any help would be appreciated.</div><div style="font-family:arial,sans-serif;font-size:13px"><br> </div><div style="font-family:arial,sans-serif;font-size:13px"> -Tyler</div>