[rhelv6-list] RHEL6.2 XFS brutal performence with lots of files

Pat Riehecky riehecky at fnal.gov
Mon Apr 15 13:39:11 UTC 2013


I've run into some terrible performance when I've had a lot of add/remove 
actions on the filesystem in parallel.  They were mostly due to 
fragmentation.  Alas, XFS can get some horrid fragmentation.

xfs_db -c frag -r /dev/<node>

should give you the stats on its fragmentation.

I can't speak for others, but I've got 'xfs_fsr' linked into /etc/cron.weekly/ 
on my personal systems with large XFS filesystems.

Pat




On 04/15/2013 07:58 AM, Daryl Herzmann wrote:
> Good morning,
>
> Thanks for the response and the fun never stops!  This system crashed on 
> Saturday morning with the following
>
> <4>------------[ cut here ]------------
> <2>kernel BUG at include/linux/swapops.h:126!
> <4>invalid opcode: 0000 [#1] SMP
> <4>last sysfs file: /sys/kernel/mm/ksm/run
> <4>CPU 7
> <4>Modules linked in: iptable_filter ip_tables nfsd nfs lockd fscache 
> auth_rpcgss nfs_acl sunrpc bridge stp llc ip6t_REJECT nf_conntrack_ipv6 
> nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs 
> exportfs vhost_net macvtap macvlan tun kvm_intel kvm raid456 
> async_raid6_recov async_pq power_meter raid6_pq async_xor dcdbas xor 
> microcode serio_raw async_memcpy async_tx iTCO_wdt iTCO_vendor_support 
> i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod 
> crc_t10dif pata_acpi ata_generic ata_piix wmi mpt2sas scsi_transport_sas 
> raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
> <4>
> <4>Pid: 4581, comm: ssh Not tainted 2.6.32-358.2.1.el6.x86_64 #1 Dell Inc. 
> PowerEdge T410/0Y2G6P
> <4>RIP: 0010:[<ffffffff8116c501>]  [<ffffffff8116c501>] 
> migration_entry_wait+0x181/0x190
> <4>RSP: 0000:ffff8801c1703c88  EFLAGS: 00010246
> <4>RAX: ffffea0000000000 RBX: ffffea0003bf6f58 RCX: ffff880236437580
> <4>RDX: 00000000001121fd RSI: ffff8801c040e5d8 RDI: 000000002243fa3e
> <4>RBP: ffff8801c1703ca8 R08: ffff8801c040e5d8 R09: 0000000000000029
> <4>R10: ffff8801d6850200 R11: 00002ad7d96cbf5a R12: ffffea0007bdec18
> <4>R13: 0000000236437580 R14: 0000000236437067 R15: 00002ad7d76b0000
> <4>FS:  00002ad7dace2880(0000) GS:ffff880028260000(0000) knlGS:0000000000000000
> <4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>CR2: 00002ad7d76b0000 CR3: 00000001bb686000 CR4: 00000000000007e0
> <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> <4>Process ssh (pid: 4581, threadinfo ffff8801c1702000, task ffff880261aa7500)
> <4>Stack:
> <4> ffff88024b5f22d8 0000000000000000 000000002243fa3e ffff8801c040e5d8
> <4><d> ffff8801c1703d88 ffffffff811441b8 0000000000000000 ffff8801c1703d08
> <4><d> ffff8801c1703eb8 ffff8801c1703dc8 ffff880328cb48c0 0000000000000040
> <4>Call Trace:
> <4> [<ffffffff811441b8>] handle_pte_fault+0xb48/0xb50
> <4> [<ffffffff81437dbb>] ? sock_aio_write+0x19b/0x1c0
> <4> [<ffffffff8112c6d4>] ? __pagevec_free+0x44/0x90
> <4> [<ffffffff811443fa>] handle_mm_fault+0x23a/0x310
> <4> [<ffffffff810474c9>] __do_page_fault+0x139/0x480
> <4> [<ffffffff81194fb2>] ? vfs_ioctl+0x22/0xa0
> <4> [<ffffffff811493a0>] ? unmap_region+0x110/0x130
> <4> [<ffffffff81195154>] ? do_vfs_ioctl+0x84/0x580
> <4> [<ffffffff8151339e>] do_page_fault+0x3e/0xa0
> <4> [<ffffffff81510755>] page_fault+0x25/0x30
> <4>Code: e8 f5 2f fc ff e9 59 ff ff ff 48 8d 53 08 85 c9 0f 84 44 ff ff ff 
> 8d 71 01 48 63 c1 48 63 f6 f0 0f b1 32 39 c1 74 be 89 c1 eb e3 <0f> 0b eb fe 
> 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
> <1>RIP  [<ffffffff8116c501>] migration_entry_wait+0x181/0x190
> <4> RSP <ffff8801c1703c88>
>
> It rebooted itself, now I must have some filesytem corruption as this is 
> being dumped frequently:
>
> XFS (md127): page discard on page ffffea0003c95018, inode 0x849ec442, offset 0.
> XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 342 of file 
> fs/xfs/xfs_alloc.c.  Caller 0xffffffffa02986c2
>
> Pid: 1304, comm: xfsalloc/7 Not tainted 2.6.32-358.2.1.el6.x86_64 #1
> Call Trace:
>  [<ffffffffa02c20cf>] ? xfs_error_report+0x3f/0x50 [xfs]
>  [<ffffffffa02986c2>] ? xfs_alloc_ag_vextent_size+0x482/0x630 [xfs]
>  [<ffffffffa0296a69>] ? xfs_alloc_lookup_eq+0x19/0x20 [xfs]
>  [<ffffffffa0296d16>] ? xfs_alloc_fixup_trees+0x236/0x350 [xfs]
>  [<ffffffffa02986c2>] ? xfs_alloc_ag_vextent_size+0x482/0x630 [xfs]
>  [<ffffffffa029943d>] ? xfs_alloc_ag_vextent+0xad/0x100 [xfs]
>  [<ffffffffa0299e8c>] ? xfs_alloc_vextent+0x2bc/0x610 [xfs]
>  [<ffffffffa02a4587>] ? xfs_bmap_btalloc+0x267/0x700 [xfs]
>  [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
>  [<ffffffffa02a4a2e>] ? xfs_bmap_alloc+0xe/0x10 [xfs]
>  [<ffffffffa02a4b0a>] ? xfs_bmapi_allocate_worker+0x4a/0x80 [xfs]
>  [<ffffffffa02a4ac0>] ? xfs_bmapi_allocate_worker+0x0/0x80 [xfs]
>  [<ffffffff81090ae0>] ? worker_thread+0x170/0x2a0
>  [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
>  [<ffffffff81090970>] ? worker_thread+0x0/0x2a0
>  [<ffffffff81096936>] ? kthread+0x96/0xa0
>  [<ffffffff8100c0ca>] ? child_rip+0xa/0x20
>  [<ffffffff810968a0>] ? kthread+0x0/0xa0
>  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
> XFS (md127): page discard on page ffffea0003890fa0, inode 0x849ec441, offset 0.
>
> Anyway, to respond to your questions:
>
>
> On Mon, Apr 15, 2013 at 3:50 AM, Jussi Silvennoinen 
> <jussi_rhel6 at silvennoinen.net <mailto:jussi_rhel6 at silvennoinen.net>> wrote:
>
>         avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>                   11.12    0.03    2.70    3.60    0.00 82.56
>
>         Device:            tps   Blk_read/s   Blk_wrtn/s Blk_read   Blk_wrtn
>         md127           134.36     10336.87     11381.45 19674692141 21662893316
>
>
>     Do use iostat -x to see more details which will give a better indication
>     how busy the disks are.
>
>
> # iostat -x
> Linux 2.6.32-358.2.1.el6.x86_64 (iem21.local) 04/15/2013 _x86_64_(16 CPU)
>
> avg-cpu:  %user   %nice %system %iowait  %steal %idle
>           10.33    0.00    3.31    2.24    0.00 84.11
>
> Device:         rrqm/s   wrqm/s     r/s     w/s rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda               3.48  1002.05   22.42   33.26  1162.56  8277.06   169.55   
>   6.52  117.17   2.49  13.86
> sdc            3805.96   173.47  292.94   28.83 33747.35  1611.10   109.89   
>   3.47   10.74   0.82  26.46
> sde            3814.91   174.53  285.98   29.92 33761.01  1628.96   112.03   
>   5.70   17.97   0.97  30.63
> sdb            3813.98   173.45  284.85   28.66 33745.12  1609.93   112.77   
>   4.07   12.94   0.91  28.48
> sdd            3805.78   174.18  294.19   29.35 33754.41  1621.14   109.34   
>   3.81   11.73   0.84  27.32
> sdf            3813.80   173.68  285.46   29.04 33751.91  1614.36   112.45   
>   4.70   14.91   0.93  29.17
> md127             0.00     0.00   21.75   45.85  4949.72  5919.63   160.78   
>   0.00    0.00   0.00 0.00
>
> but I suspect this is inflated, since it just completed a raid5 resync.
>
>         I have other similiar filesystems on ext4 with similiar hardware and
>         millions of small files as well.  I don't see such sluggishness with
>         small
>         files and directories there.  I guess I picked XFS for this filesystem
>         initially because of its fast fsck times.
>
>
>     Are those other systems also employing software raid? In my experience,
>     swraid is painfully slow with random writes. And your workload in this
>     use case is exactly that.
>
>
>
> Some of them are and some aren't.  I have an opportunity to move this 
> workload to a hardware RAID5, so I may just do that and cut my losses :)
>
>         # grep md127 /proc/mounts
>         /dev/md127 /mesonet xfs
>         rw,noatime,attr2,delaylog,sunit=1024,swidth=4096,noquota 0 0
>
>
>     inode64 is not used, I suspect it would have helped alot. Enabling it
>     afterwards will not help for data which is already on the disk but it
>     will help with new files.
>
>
> Thanks for the tip, I'll try that out.
>
> daryl
>
>
>
> _______________________________________________
> rhelv6-list mailing list
> rhelv6-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rhelv6-list


-- 
Pat Riehecky

Scientific Linux developer
http://www.scientificlinux.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20130415/b4ee9caf/attachment.htm>


More information about the rhelv6-list mailing list