[Linux-cachefs] kslowd issue

Mark Moseley moseleymark at gmail.com
Fri Feb 19 00:24:19 UTC 2010


On Wed, Dec 23, 2009 at 1:41 PM, Mark Moseley <moseleymark at gmail.com> wrote:
> On Wed, Dec 23, 2009 at 5:01 AM, Greg M <gregm at servu.net.au> wrote:
>> Hi David,
>>
>> We are now running 2.6.32 - no kslowd issues at all, however during peak
>> times  (only ~12Mbps of NFS traffic per box) we get this in dmesg.
>>
>>
>> CacheFiles: I/O Error: Unlink failed
>> FS-Cache: Cache cachefiles stopped due to I/O error
>>
>> Then restart:
>>
>> CacheFiles: File cache on sdb1 unregistering
>> FS-Cache: Withdrawing cache "mycache"
>> FS-Cache: Cache "mycache" added (type cachefiles)
>> CacheFiles: File cache on sdb1 registered
>>
>> Peak period again:
>>
>> CacheFiles: I/O Error: Unlink failed
>> FS-Cache: Cache cachefiles stopped due to I/O error
>>
>> Restart:
>>
>> CacheFiles: File cache on sdb1 unregistering
>> FS-Cache: Withdrawing cache "mycache"
>> FS-Cache: Cache "mycache" added (type cachefiles)
>> CacheFiles: File cache on sdb1 registered
>>
>> Peak period again:
>>
>> CacheFiles: I/O Error: Unlink failed
>> FS-Cache: Cache cachefiles stopped due to I/O error
>>
>>
>> And so on.
>>
>> This is happening on all 10 production VMware guests, running Gentoo on an
>> IBM Bladecenter.
>>
>> Linux dnetwww2 2.6.32-gentoo #1 SMP Sun Dec 20 06:54:41 CST 2009 x86_64
>> Intel(R) Xeon(R) CPU X3360 @ 2.83GHz GenuineIntel GNU/Linux
>>
>> Greg
>
>
> I've seen the same thing on Debian Etch and Debian Lenny on 2.6.32 and
> 2.6.32.1, all on pretty heavily utilized servers (all Dell 1950s, not
> virtualized) -- all serving web hosting traffic over NFS, i.e.
> fscache-heavy stuff with *lots* of individual files. Both are using
> cachefilesd-0.9 -- the Etch one I statically compiled; the Lenny one
> is from lenny-backports.
>
> On Etch (different server), it died with this error but without any Oops
> Dec 14 14:59:16 server kernel: [   52.568006] FS-Cache: Cache
> "CacheFiles" added (type cachefiles)
> Dec 14 14:59:16 server kernel: [   52.568010] CacheFiles: File cache
> on sda4 registered
> Dec 14 15:37:30 server kernel: [ 2347.259571] CacheFiles: I/O Error:
> Unlink failed
> Dec 14 15:37:30 server kernel: [ 2347.259578] FS-Cache: Cache
> cachefiles stopped due to I/O error
>
> On Lenny:
> Dec 15 17:43:09 server kernel: [ 1589.670513] CacheFiles: I/O Error:
> Unlink failed
> Dec 15 17:43:09 server kernel: [ 1589.670518] FS-Cache: Cache
> cachefiles stopped due to I/O error
> Dec 15 17:43:23 server cachefilesd[8944]: Refilling cull table
> Dec 15 17:43:23 server cachefilesd[8944]: Failed to check object's
> in-use state: errno 5 (Input/output error)
> Dec 15 17:43:23 server kernel: [ 1603.311806] CacheFiles: File cache
> on sdb3 unregistering
> Dec 15 17:43:23 server kernel: [ 1603.311810] FS-Cache: Withdrawing
> cache "CacheFiles"
>
>
> Lenny oopses:
>
> There's lots of the below but caching seems to continue for a while
> afterwards, sometimes up to a couple of hours. My most recent attempts
> will get the Oops but sometimes it won't be hours till the
> "CacheFiles: I/O Error: Unlink failed" knocks out the cache.
>
> Dec 15 17:27:07 server kernel: [  627.073122] ------------[ cut here
> ]------------
> Dec 15 17:27:07 server kernel: [  627.073127] WARNING: at fs/sysfs/dir.c:491 ()
> Dec 15 17:27:07 server kernel: [  627.073130] Hardware name: PowerEdge 1950
> Dec 15 17:27:07 server kernel: [  627.073132] sysfs: cannot create
> duplicate filename '/class/bdi/0:209'
> Dec 15 17:27:07 server kernel: [  627.073135] Modules linked in:
> dm_snapshot dm_mirror dm_region_hash dm_log dm_mod xfs tg3 libphy
> nls_iso8859_1 i2c
> _i801 i2c_core evdev i5000_edac i5k_amb hwmon button dcdbas ide_cd_mod
> cdrom bnx2 fan [last unloaded: scsi_wait_scan]
> Dec 15 17:27:07 server kernel: [  627.073164] Pid: 8335, comm: httpd
> Not tainted 2.6.32.1-nx #1
> Dec 15 17:27:07 server kernel: [  627.073166] Call Trace:
> Dec 15 17:27:07 server kernel: [  627.073171]  [<0003143a>] ?
> Dec 15 17:27:07 server kernel: [  627.073174]  [<00031446>] ?
> Dec 15 17:27:07 server kernel: [  627.073177]  [<0003148b>] ?
> Dec 15 17:27:07 server kernel: [  627.073180]  [<00104bfd>] ?
> Dec 15 17:27:07 server kernel: [  627.073183]  [<0010505c>] ?
> Dec 15 17:27:07 server kernel: [  627.073185]  [<001050ab>] ?
> Dec 15 17:27:07 server kernel: [  627.073188]  [<002052d3>] ?
> Dec 15 17:27:07 server kernel: [  627.073191]  [<00205385>] ?
> Dec 15 17:27:07 server kernel: [  627.073193]  [<002058ac>] ?
> Dec 15 17:27:07 server kernel: [  627.073196]  [<00276764>] ?
> Dec 15 17:27:07 server kernel: [  627.073199]  [<00205127>] ?
> Dec 15 17:27:07 server kernel: [  627.073201]  [<0027b7c2>] ?
> Dec 15 17:27:07 server kernel: [  627.073204]  [<00276c28>] ?
> Dec 15 17:27:07 server kernel: [  627.073207]  [<0009f0ad>] ?
> Dec 15 17:27:07 server kernel: [  627.073210]  [<0009f197>] ?
> Dec 15 17:27:07 server kernel: [  627.073212]  [<0016ec3e>] ?
> Dec 15 17:27:07 server kernel: [  627.073215]  [<000c2a97>] ?
> Dec 15 17:27:07 server kernel: [  627.073218]  [<00174e66>] ?
> Dec 15 17:27:07 server kernel: [  627.073221]  [<00007e7f>] ?
> Dec 15 17:27:07 server kernel: [  627.073223]  [<001d41ed>] ?
> Dec 15 17:27:07 server kernel: [  627.073227]  [<00007fc2>] ?
> Dec 15 17:27:07 server kernel: [  627.073229]  [<0001651a>] ?
> Dec 15 17:27:07 server kernel: [  627.073232]  [<002c0000>] ?
> Dec 15 17:27:07 server kernel: [  627.073236]  [<000b8e5b>] ?
> Dec 15 17:27:07 server kernel: [  627.073238]  [<0007f6af>] ?
> Dec 15 17:27:07 server kernel: [  627.073241]  [<000cabf7>] ?
> Dec 15 17:27:07 server kernel: [  627.073244]  [<000cafd7>] ?
> Dec 15 17:27:07 server kernel: [  627.073247]  [<000cac99>] ?
> Dec 15 17:27:07 server kernel: [  627.073249]  [<000cafd7>] ?
> Dec 15 17:27:07 server kernel: [  627.073252]  [<000cb2bd>] ?
> Dec 15 17:27:07 server kernel: [  627.073255]  [<000cb396>] ?
> Dec 15 17:27:07 server kernel: [  627.073257]  [<000cbd49>] ?
> Dec 15 17:27:07 server kernel: [  627.073260]  [<0005124e>] ?
> Dec 15 17:27:07 server kernel: [  627.073263]  [<00050f04>] ?
> Dec 15 17:27:07 server kernel: [  627.073265]  [<0005124e>] ?
> Dec 15 17:27:07 server kernel: [  627.073268]  [<000c3f8b>] ?
> Dec 15 17:27:07 server kernel: [  627.073270]  [<0007f698>] ?
> Dec 15 17:27:07 server kernel: [  627.073273]  [<000c3ff8>] ?
> Dec 15 17:27:07 server kernel: [  627.073275]  [<0000466f>] ?
> Dec 15 17:27:07 server kernel: [  627.073280] ---[ end trace
> 927e9ac79397ac32 ]---
> Dec 15 17:27:07 server kernel: [  627.073284] kobject_add_internal
> failed for 0:209 with -EEXIST, don't try to register things with the
> same name in the same directory.
>
> all with variations on "sysfs: cannot create duplicate filename
> '/class/bdi/0:209'",e.g.:
>
> sysfs: cannot create duplicate filename '/class/bdi/0:209'
> sysfs: cannot create duplicate filename '/class/bdi/0:209'
> sysfs: cannot create duplicate filename '/class/bdi/0:198'
> sysfs: cannot create duplicate filename '/class/bdi/0:198'
> sysfs: cannot create duplicate filename '/class/bdi/0:196'
> sysfs: cannot create duplicate filename '/class/bdi/0:196'
>
> I've also seen this in the logs, but caching seems to continue: NFS:
> Cache request denied due to non-unique superblock keys
>
> What sort of debugging info would it be helpful for us to gather? NFS
> caching in the kernel is like a dream come true for me, so I'm happy
> to help in info gathering and trying out various settings, etc.
>

Just to update, this still occurs on 2.6.32.8, using Debian Lenny,
32-bit, with the cachefilesd from Testing. As you can see, it ran
(with pretty heavy usage) for 2.5 hours and I can verify that there's
quite a bit in the cache (about 640meg). The only things logged were:

Feb 18 16:05:55 server kernel: [ 8951.401790] CacheFiles: I/O Error:
Unlink failed
Feb 18 16:05:55 server kernel: [ 8951.401795] FS-Cache: Cache
cachefiles stopped due to I/O error

Anything I could do to debug this further? Would any output generated
by turning on /sys/module/fscache/parameters/debug help and if so,
what flag(s) would be used? Turning them all on generates a tidal wave
of data :)




More information about the Linux-cachefs mailing list