<div dir="ltr">Good morning,<div><br></div><div>Thanks for the response and the fun never stops! This system crashed on Saturday morning with the following </div><div><br></div><div><div><4>------------[ cut here ]------------</div> <div><2>kernel BUG at include/linux/swapops.h:126!</div><div><4>invalid opcode: 0000 [#1] SMP </div><div><4>last sysfs file: /sys/kernel/mm/ksm/run</div><div><4>CPU 7 </div><div><4>Modules linked in: iptable_filter ip_tables nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs exportfs vhost_net macvtap macvlan tun kvm_intel kvm raid456 async_raid6_recov async_pq power_meter raid6_pq async_xor dcdbas xor microcode serio_raw async_memcpy async_tx iTCO_wdt iTCO_vendor_support i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix wmi mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]</div> <div><4></div><div><4>Pid: 4581, comm: ssh Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc. PowerEdge T410/0Y2G6P</div><div><4>RIP: 0010:[<ffffffff8116c501>] [<ffffffff8116c501>] migration_entry_wait+0x181/0x190</div> <div><4>RSP: 0000:ffff8801c1703c88 EFLAGS: 00010246</div><div><4>RAX: ffffea0000000000 RBX: ffffea0003bf6f58 RCX: ffff880236437580</div><div><4>RDX: 00000000001121fd RSI: ffff8801c040e5d8 RDI: 000000002243fa3e</div> <div><4>RBP: ffff8801c1703ca8 R08: ffff8801c040e5d8 R09: 0000000000000029</div><div><4>R10: ffff8801d6850200 R11: 00002ad7d96cbf5a R12: ffffea0007bdec18</div><div><4>R13: 0000000236437580 R14: 0000000236437067 R15: 00002ad7d76b0000</div> <div><4>FS: 00002ad7dace2880(0000) GS:ffff880028260000(0000) knlGS:0000000000000000</div><div><4>CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033</div><div><4>CR2: 00002ad7d76b0000 CR3: 00000001bb686000 CR4: 00000000000007e0</div> <div><4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000</div><div><4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400</div><div><4>Process ssh (pid: 4581, threadinfo ffff8801c1702000, task ffff880261aa7500)</div> <div><4>Stack:</div><div><4> ffff88024b5f22d8 0000000000000000 000000002243fa3e ffff8801c040e5d8</div><div><4><d> ffff8801c1703d88 ffffffff811441b8 0000000000000000 ffff8801c1703d08</div><div><4><d> ffff8801c1703eb8 ffff8801c1703dc8 ffff880328cb48c0 0000000000000040</div> <div><4>Call Trace:</div><div><4> [<ffffffff811441b8>] handle_pte_fault+0xb48/0xb50</div><div><4> [<ffffffff81437dbb>] ? sock_aio_write+0x19b/0x1c0</div><div><4> [<ffffffff8112c6d4>] ? __pagevec_free+0x44/0x90</div> <div><4> [<ffffffff811443fa>] handle_mm_fault+0x23a/0x310</div><div><4> [<ffffffff810474c9>] __do_page_fault+0x139/0x480</div><div><4> [<ffffffff81194fb2>] ? vfs_ioctl+0x22/0xa0</div><div> <4> [<ffffffff811493a0>] ? unmap_region+0x110/0x130</div><div><4> [<ffffffff81195154>] ? do_vfs_ioctl+0x84/0x580</div><div><4> [<ffffffff8151339e>] do_page_fault+0x3e/0xa0</div><div><4> [<ffffffff81510755>] page_fault+0x25/0x30</div> <div><4>Code: e8 f5 2f fc ff e9 59 ff ff ff 48 8d 53 08 85 c9 0f 84 44 ff ff ff 8d 71 01 48 63 c1 48 63 f6 f0 0f b1 32 39 c1 74 be 89 c1 eb e3 <0f> 0b eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 </div> <div><1>RIP [<ffffffff8116c501>] migration_entry_wait+0x181/0x190</div><div><4> RSP <ffff8801c1703c88></div><div><br></div><div style>It rebooted itself, now I must have some filesytem corruption as this is being dumped frequently:</div> <div style><br></div><div style><div>XFS (md127): page discard on page ffffea0003c95018, inode 0x849ec442, offset 0.</div><div>XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 342 of file fs/xfs/xfs_alloc.c. Caller 0xffffffffa02986c2</div> <div><br></div><div>Pid: 1304, comm: xfsalloc/7 Not tainted 2.6.32-358.2.1.el6.x86_64 #1</div><div>Call Trace:</div><div> [<ffffffffa02c20cf>] ? xfs_error_report+0x3f/0x50 [xfs]</div><div> [<ffffffffa02986c2>] ? xfs_alloc_ag_vextent_size+0x482/0x630 [xfs]</div> <div> [<ffffffffa0296a69>] ? xfs_alloc_lookup_eq+0x19/0x20 [xfs]</div><div> [<ffffffffa0296d16>] ? xfs_alloc_fixup_trees+0x236/0x350 [xfs]</div><div> [<ffffffffa02986c2>] ? xfs_alloc_ag_vextent_size+0x482/0x630 [xfs]</div> <div> [<ffffffffa029943d>] ? xfs_alloc_ag_vextent+0xad/0x100 [xfs]</div><div> [<ffffffffa0299e8c>] ? xfs_alloc_vextent+0x2bc/0x610 [xfs]</div><div> [<ffffffffa02a4587>] ? xfs_bmap_btalloc+0x267/0x700 [xfs]</div> <div> [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150</div><div> [<ffffffffa02a4a2e>] ? xfs_bmap_alloc+0xe/0x10 [xfs]</div><div> [<ffffffffa02a4b0a>] ? xfs_bmapi_allocate_worker+0x4a/0x80 [xfs]</div> <div> [<ffffffffa02a4ac0>] ? xfs_bmapi_allocate_worker+0x0/0x80 [xfs]</div><div> [<ffffffff81090ae0>] ? worker_thread+0x170/0x2a0</div><div> [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40</div> <div> [<ffffffff81090970>] ? worker_thread+0x0/0x2a0</div><div> [<ffffffff81096936>] ? kthread+0x96/0xa0</div><div> [<ffffffff8100c0ca>] ? child_rip+0xa/0x20</div><div> [<ffffffff810968a0>] ? kthread+0x0/0xa0</div> <div> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20</div><div>XFS (md127): page discard on page ffffea0003890fa0, inode 0x849ec441, offset 0.</div><div><br></div><div style>Anyway, to respond to your questions:</div></div> <div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 15, 2013 at 3:50 AM, Jussi Silvennoinen <span dir="ltr"><<a href="mailto:jussi_rhel6@silvennoinen.net" target="_blank">jussi_rhel6@silvennoinen.net</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> avg-cpu: %user %nice %system %iowait %steal %idle<br> 11.12 0.03 2.70 3.60 0.00 82.56<br> <br> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn<br> md127 134.36 10336.87 11381.45 19674692141 21662893316<br> </blockquote> <br></div> Do use iostat -x to see more details which will give a better indication how busy the disks are.</blockquote><div><br></div><div><div># iostat -x</div><div>Linux 2.6.32-358.2.1.el6.x86_64 (iem21.local) <span class="" style="white-space:pre"> </span>04/15/2013 <span class="" style="white-space:pre"> </span>_x86_64_<span class="" style="white-space:pre"> </span>(16 CPU)</div> <div><br></div><div>avg-cpu: %user %nice %system %iowait %steal %idle</div><div> 10.33 0.00 3.31 2.24 0.00 84.11</div><div><br></div><div>Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util</div> <div>sda 3.48 1002.05 22.42 33.26 1162.56 8277.06 169.55 6.52 117.17 2.49 13.86</div><div>sdc 3805.96 173.47 292.94 28.83 33747.35 1611.10 109.89 3.47 10.74 0.82 26.46</div> <div>sde 3814.91 174.53 285.98 29.92 33761.01 1628.96 112.03 5.70 17.97 0.97 30.63</div><div>sdb 3813.98 173.45 284.85 28.66 33745.12 1609.93 112.77 4.07 12.94 0.91 28.48</div> <div>sdd 3805.78 174.18 294.19 29.35 33754.41 1621.14 109.34 3.81 11.73 0.84 27.32</div><div>sdf 3813.80 173.68 285.46 29.04 33751.91 1614.36 112.45 4.70 14.91 0.93 29.17</div> <div>md127 0.00 0.00 21.75 45.85 4949.72 5919.63 160.78 0.00 0.00 0.00 0.00</div></div><div><br></div><div style>but I suspect this is inflated, since it just completed a raid5 resync.</div> <div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> I have other similiar filesystems on ext4 with similiar hardware and<br> millions of small files as well. I don't see such sluggishness with small<br> files and directories there. I guess I picked XFS for this filesystem<br> initially because of its fast fsck times.<br> </blockquote> <br></div> Are those other systems also employing software raid? In my experience, swraid is painfully slow with random writes. And your workload in this use case is exactly that.</blockquote><div><br></div><div><br></div><div style> Some of them are and some aren't. I have an opportunity to move this workload to a hardware RAID5, so I may just do that and cut my losses :)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div class="im"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> # grep md127 /proc/mounts <br> /dev/md127 /mesonet xfs<br> rw,noatime,attr2,delaylog,<u></u>sunit=1024,swidth=4096,noquota 0 0<br> </blockquote> <br></div> inode64 is not used, I suspect it would have helped alot. Enabling it afterwards will not help for data which is already on the disk but it will help with new files.</blockquote><div><br></div><div style>Thanks for the tip, I'll try that out.</div> <div style><br></div><div style>daryl</div><div><br></div></div></div></div></div>