[Linux-cachefs] Kernel Panic when fscache enabled.

Russell Knighton RussellK at motionpicturesolutions.com
Wed Dec 19 12:30:46 UTC 2012


Hi All,

We would like to use fscache one of our servers, so have installed and configured the system, and it appears to work. However, we receive occasional kernel panics when the cache is used - but not all of the time; my suspiscion is that a high load may cause the panic, but as yet I am unable to reliably reproduce the error. Here is the top trace from the kernel log:


Dec 17 18:12:56 xfers kernel: [255018.803786] ------------[ cut here ]------------
Dec 17 18:12:56 xfers kernel: [255018.812186] kernel BUG at /build/buildd/linux-3.2.0/fs/fscache/operation.c:332!
Dec 17 18:12:56 xfers kernel: [255018.821267] invalid opcode: 0000 [#1] SMP
Dec 17 18:12:56 xfers kernel: [255018.830265] CPU 6
Dec 17 18:12:56 xfers kernel: [255018.830403] Modules linked in: mptctl cachefiles autofs4 bnep rfcomm bluetooth nfsd ext2 ib_iser rdma_cm ib_cm iw_cm vesafb ib_sa ib_mad xfs ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bonding nfs lockd fscache auth_rpcgss nfs_acl sunrpc joydev i7core_edac lp edac_core ioatdma mac_hid parport usbhid hid uas usb_storage mptsas mptscsih mptbase scsi_transport_sas ixgbe igb mdio dca
Dec 17 18:12:56 xfers kernel: [255018.883159]
Dec 17 18:12:56 xfers kernel: [255018.894955] Pid: 20669, comm: kworker/u:2 Not tainted 3.2.0-34-generic #53-Ubuntu Intel Corporation S5520UR/S5520UR
Dec 17 18:12:56 xfers kernel: [255018.907024] RIP: 0010:[<ffffffffa00d1cea>]  [<ffffffffa00d1cea>] fscache_put_operation.part.2+0x18a/0x230 [fscache]
Dec 17 18:12:56 xfers kernel: [255018.919423] RSP: 0018:ffff880511577da0  EFLAGS: 00010286
Dec 17 18:12:56 xfers kernel: [255018.931752] RAX: 00000000ffffffff RBX: ffff88026583b500 RCX: ffffffff81e544c8
Dec 17 18:12:56 xfers kernel: [255018.944872] RDX: 0000000000000000 RSI: ffffffff81e544c8 RDI: ffff88026583b500
Dec 17 18:12:56 xfers kernel: [255018.957472] RBP: ffff880511577dc0 R08: ffff880511576000 R09: 0000000000000000
Dec 17 18:12:56 xfers kernel: [255018.970284] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000103cd3f6c
Dec 17 18:12:56 xfers kernel: [255018.983291] R13: ffff880658a65c00 R14: ffffffff81e544c0 R15: ffffffffa00d2830
Dec 17 18:12:56 xfers kernel: [255018.996757] FS:  0000000000000000(0000) GS:ffff880363c60000(0000) knlGS:0000000000000000
Dec 17 18:12:56 xfers kernel: [255019.010241] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 17 18:12:56 xfers kernel: [255019.023754] CR2: 00007fa618085000 CR3: 0000000001c05000 CR4: 00000000000006e0
Dec 17 18:12:56 xfers kernel: [255019.037765] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec 17 18:12:56 xfers kernel: [255019.052550] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 17 18:12:56 xfers kernel: [255019.066795] Process kworker/u:2 (pid: 20669, threadinfo ffff880511576000, task ffff8803787ddc00)
Dec 17 18:12:56 xfers kernel: [255019.081447] Stack:
Dec 17 18:12:56 xfers kernel: [255019.095802]  ffff880511577dd0 ffff88026583b500 0000000103cd3f6c ffff880658a65c00
Dec 17 18:12:56 xfers kernel: [255019.110619]  ffff880511577de0 ffffffffa00d1dbb ffffffff81e544c0 ffff88026583b500
Dec 17 18:12:56 xfers kernel: [255019.125720]  ffff880511577e00 ffffffffa00d2887 ffff88026583b500 ffff88035849a980
Dec 17 18:12:56 xfers kernel: [255019.141485] Call Trace:
Dec 17 18:12:56 xfers kernel: [255019.160461]  [<ffffffffa00d1dbb>] fscache_put_operation+0x2b/0x70 [fscache]
Dec 17 18:12:56 xfers kernel: [255019.180193]  [<ffffffffa00d2887>] fscache_op_work_func+0x57/0x80 [fscache]
Dec 17 18:12:56 xfers kernel: [255019.199069]  [<ffffffff81084c3a>] process_one_work+0x11a/0x480
Dec 17 18:12:56 xfers kernel: [255019.217997]  [<ffffffff810859f4>] worker_thread+0x164/0x370
Dec 17 18:12:56 xfers kernel: [255019.236788]  [<ffffffff81085890>] ? manage_workers.isra.31+0x130/0x130
Dec 17 18:12:56 xfers kernel: [255019.254801]  [<ffffffff8108a27c>] kthread+0x8c/0xa0
Dec 17 18:12:56 xfers kernel: [255019.271180]  [<ffffffff81666534>] kernel_thread_helper+0x4/0x10
Dec 17 18:12:56 xfers kernel: [255019.288077]  [<ffffffff8108a1f0>] ? flush_kthread_worker+0xa0/0xa0
Dec 17 18:12:56 xfers kernel: [255019.304375]  [<ffffffff81666530>] ? gs_change+0x13/0x13
Dec 17 18:12:56 xfers kernel: [255019.321185] Code: 00 f0 41 0f ba 6d 38 02 19 c0 85 c0 0f 85 2c ff ff ff 49 8b 45 30 a8 04 0f 84 20 ff ff ff 4c 89 ef e8 5b ee ff ff e9 13 ff ff ff <0f> 0b 48 c7 c7 bd 69 0d a0 31 c0 e8 41 15 57 e1 48 c7 c7 70 5a
Dec 17 18:12:56 xfers kernel: [255019.356665] RIP  [<ffffffffa00d1cea>] fscache_put_operation.part.2+0x18a/0x230 [fscache]
Dec 17 18:12:56 xfers kernel: [255019.374834]  RSP <ffff880511577da0>
Dec 17 18:12:56 xfers kernel: [255019.428141] BUG: unable to handle kernel
Dec 17 18:12:56 xfers kernel: [255019.428451] ---[ end trace 2a0982c27b6db3df ]---


Following this trace, there are then hundreds of others as the kernel simply can not recover. Please let me know if you require to see the rest of the output.

Can anyone advise what I can try to diagnose the problem?  I understand there are extensive debug options for fscache, so if someone can recommend an appropriate debugging level, I will try and repeat the crash and capture the output.

Some additional info:


xfers cachefiles # uname -a
Linux xfers 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
xfers cachefiles # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.1 LTS
Release:        12.04
Codename:       precise
xfers cachefiles # cat /etc/cachefilesd.conf | grep -v ^# | grep -v ^$
dir /cache
tag nfs-stores
brun 6%
bcull 3%
bstop 0%
frun 6%
fcull 3%
fstop 0%
xfers cachefiles # df -h | grep /cache
/dev/sdc1                     439G  162G  278G  37% /cache
xfers cachefiles # mount | grep /cache
/dev/sdc1 on /cache type ext4 (rw,noatime,nodiratime,discard)
xfers cachefiles # mount | grep fsc
store.mps.lan:/FFF on /mnt/fff type nfs4 (rw,fsc,hard,intr,rsize=32768,wsize=32768,sloppy,addr=10.0.0.204,clientaddr=172.16.1.10)
xfers cachefiles # cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=11 dat=495 spc=0
Objects: alc=498 nal=0 avl=498 ded=26
ChkAux : non=0 ok=330 upd=0 obs=2
Pages  : mrk=42105945 unc=40370944
Acquire: n=506 nul=0 noc=0 ok=506 nbf=0 oom=0
Lookups: n=498 neg=166 pos=332 crt=166 tmo=0
Updates: n=0 nul=0 run=0
Relinqs: n=6 nul=0 wcr=0 rtr=0
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=2806173 ok=681583 wt=400 nod=62040 nbf=2062550 int=0 oom=0
Retrvls: ops=743623 owt=80 abt=0
Stores : n=7058199 ok=7057613 agn=0 nbf=586 oom=0
Stores : ops=646908 run=7704329 pgs=7057447 rxd=7057613 olm=0
VmScan : nos=40370192 gon=0 bsy=0 can=166
Ops    : pend=80 run=1390531 enq=45871773 can=0 rej=0
Ops    : dfr=1459 rel=1390531 gc=1459
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0




Many thanks to anyone in advance for any help.

Kind regards,
--
-- Russell Knighton (Senior Systems Administrator) --
Motion Picture Solutions Ltd
The Warehouse, 7a North End Rd, London, W14 8ST
Main Office: +44 (0) 207 371 2396 (ext. 225), Desk: +44 (0) 207 751 7016, Mobile: +44 (0) 7758 210 744




More information about the Linux-cachefs mailing list