[Linux-cachefs] [PATCH 0/1] Debug oops in cachefiles_prepare_write with NFS netfs conversion on reexport server
Dave Wysochanski
dwysocha at redhat.com
Fri Sep 23 20:14:38 UTC 2022
Daire this debug patch is for your setup when you have time to test it.
Also I am not sure if David Howells has a theory as to what is happening
just from the oops and description below. I will also keep looking a
bit more at what might be happening.
This patch should help debug the oops seen by Daire's re-export
server while running tests on some of my NFS fscache netfs conversion
patches. This is only the first patch but should give at least
the next clues as to what is happening.
Add this patch on top of my v8 series just posted on Sep 22:
https://marc.info/?l=linux-nfs&m=166385481714542&w=4
Once built and booted, you should enable ftrace events for netfs,
cachefiles, and fscache. To start with, use the following comprehensive
trace events. Note I'm not using trace-cmd because I think you
mentioned this may not work due to the older userspace on your
machine. If you use the /proc files you can also control the events:
echo 1 > /sys/kernel/debug/tracing/events/cachefiles/enable
echo 1 > /sys/kernel/debug/tracing/events/fscache/enable
echo 1 > /sys/kernel/debug/tracing/events/netfs/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
The above should be the equivalent of:
trace-cmd start -e netfs:* -e fscache:* -e cachefiles:*
You may need to increase buffer_size_kb:
cat /sys/kernel/debug/tracing/buffer_size_kb
1024
echo 16384 > /sys/kernel/debug/tracing/buffer_size_kb
cat /sys/kernel/debug/tracing/buffer_size_kb
16384
Then make sure the following is set:
sysctl kernel.ftrace_dump_on_oops=1
sysctl kernel.panic_on_oops=1
Then reproduce the problem. My current theory is detailed below, and if
it's on the mark, I'd expect to see the ftrace event buffer contain a
line like this:
[ 99.078288] kworker/-140 5..... 27889469us : cachefiles_prepare_write: object == NULL: c=00000004
Then we can take the cookie value and hopefully discern the history of
the crash. From the oops, below is my current theory of what is
happening.
Summary of oops
===============
Theory from oops decode and struct offsets (crash-utility) is that somehow
the cachefiles_object * is going to NULL while a write is going in progress.
If that is true, then look at how a cachefiles_object * value stored inside
fscache_cookie.cache_priv could be NULL. There appears to be only one
possibility, and that is cachefiles_withdraw_cookie() was called despite
the fact that we are trying to write to the object.
Details on how cachefiles_object * could be NULL
================================================
[dwysocha at dwysocha kernel]$ grep -n "cache_priv =" fs/cachefiles/*.[ch]
fs/cachefiles/interface.c:188: cookie->cache_priv = object;
fs/cachefiles/interface.c:373: cookie->cache_priv = NULL;
fs/cachefiles/volume.c:80: vcookie->cache_priv = volume;
fs/cachefiles/volume.c:114: volume->vcookie->cache_priv = NULL;
[dwysocha at dwysocha kernel]$
346 /*
347 * Withdraw caching for a cookie.
348 */
349 static void cachefiles_withdraw_cookie(struct fscache_cookie *cookie)
350 {
351 struct cachefiles_object *object = cookie->cache_priv;
352 struct cachefiles_cache *cache = object->volume->cache;
353 const struct cred *saved_cred;
354
355 _enter("o=%x", object->debug_id);
356 cachefiles_see_object(object, cachefiles_obj_see_withdraw_cookie);
357
358 if (!list_empty(&object->cache_link)) {
359 spin_lock(&cache->object_list_lock);
360 cachefiles_see_object(object, cachefiles_obj_see_withdrawal);
361 list_del_init(&object->cache_link);
362 spin_unlock(&cache->object_list_lock);
363 }
364
365 cachefiles_ondemand_clean_object(object);
366
367 if (object->file) {
368 cachefiles_begin_secure(cache, &saved_cred);
369 cachefiles_clean_up_object(object, cache);
370 cachefiles_end_secure(cache, saved_cred);
371 }
372
373 cookie->cache_priv = NULL;
374 cachefiles_put_object(object, cachefiles_obj_put_detach);
375 }
Since this is not a volume, the above is the only possibility for how
cookie->cache_prive == NULL, aside from memory corruption, which
seems unlikely given the predictable nature of the crashes.
Details on how we conclude cachefiles_object * == NULL
======================================================
First, netfs_cache_resources * cannot be null because the caller
netfs_rreq_do_write_to_cache() does the below to set 'cres' before
passing 'cres' to cachefiles_prepare_write:
struct netfs_cache_resources *cres = &rreq->cache_resources;
crash> struct -o netfs_cache_resources.cache_priv
struct netfs_cache_resources {
[8] void *cache_priv;
}
It's possible that 'netfs_cache_resources.cache_priv' value is NULL.
But that would not explain the offset of 32 into the structure
for the next dereference - this is not where we panic (we panic
on a address of 0000000000000008, not 0000000000000032)
crash> struct -o fscache_cookie.cache_priv
struct fscache_cookie {
[32] void *cache_priv;
}
So clearly the above is not NULL. However, the value of fscache_cookie.cache_priv
is possible to be NULL since the offset of 'volume' member is 8. So if the
cachefiles_object * which is stored inside fscache_cookie.cache_priv == NULL,
this would explain the crash.
crash> struct -o cachefiles_object.volume
struct cachefiles_object {
[8] struct cachefiles_volume *volume;
}
crash>
>From Daire
58710.346376] BUG: kernel NULL pointer dereference, address: 0000000000000008
[58710.353508] #PF: supervisor read access in kernel mode
[58710.358775] #PF: error_code(0x0000) - not-present page
[58710.364041] PGD 0 P4D 0
[58710.366721] Oops: 0000 [#1] PREEMPT SMP NOPTI
[58710.371212] CPU: 12 PID: 9134 Comm: kworker/u129:0 Tainted: G E 6.0.0-2.dneg.x86_64 #1
[58710.380647] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 07/22/2022
[58710.389995] Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
[58710.397188] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
#./scripts/faddr2line --list fs/cachefiles/cachefiles.ko
cachefiles_prepare_write+0x2
cachefiles_prepare_write+0x28/0x90:
cachefiles_prepare_write at
/user_data/rpmbuild/BUILD/kernel-ml-6.0.0/linux-6.0.0-2.dneg.x86_64/fs/cachefiles/io.c:587
582 static int cachefiles_prepare_write(struct netfs_cache_resources *cres,
583 loff_t *_start, size_t *_len, loff_t i_size,
584 bool no_space_allocated_yet)
585 {
586 struct cachefiles_object *object = cachefiles_cres_object(cres);
>587< struct cachefiles_cache *cache = object->volume->cache;
588 const struct cred *saved_cred;
589 int ret;
590
591 if (!cachefiles_cres_file(cres)) {
592 if (!fscache_wait_for_operation(cres, FSCACHE_WANT_WRITE))
#./scripts/faddr2line --list fs/netfs/netfs.ko
netfs_rreq_write_to_cache_work+0x11c
netfs_rreq_write_to_cache_work+0x11c/0x320:
netfs_rreq_do_write_to_cache at
/user_data/rpmbuild/BUILD/kernel-ml-6.0.0/linux-6.0.0-2.dneg.x86_64/fs/netfs/io.c:200
195 list_del_init(&next->rreq_link);
196 netfs_put_subrequest(next, false,
197 netfs_sreq_trace_put_merged);
198 }
199
>200< ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len,
201 rreq->i_size, true);
202 if (ret < 0) {
203 trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write);
204 trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip);
205 continue;
(inlined by) netfs_rreq_write_to_cache_work at
/user_data/rpmbuild/BUILD/kernel-ml-6.0.0/linux-6.0.0-2.dneg.x86_64/fs/netfs/io.c:229
224 static void netfs_rreq_write_to_cache_work(struct work_struct *work)
225 {
226 struct netfs_io_request *rreq =
227 container_of(work, struct netfs_io_request, work);
228
>229< netfs_rreq_do_write_to_cache(rreq);
230 }
231
232 static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq)
233 {
234 rreq->work.func = netfs_rreq_write_to_cache_work;
162 /*
163 * Perform any outstanding writes to the cache. We inherit a ref from the
164 * caller.
165 */
166 static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq)
167 {
168 struct netfs_cache_resources *cres = &rreq->cache_resources; <--- THIS CANNOT BE NULL ????
169 struct netfs_io_subrequest *subreq, *next, *p;
170 struct iov_iter iter;
171 int ret;
172
173 trace_netfs_rreq(rreq, netfs_rreq_trace_copy);
174
175 /* We don't want terminating writes trying to wake us up whilst we're
176 * still going through the list.
177 */
178 atomic_inc(&rreq->nr_copy_ops);
179
180 list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) {
181 if (!test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) {
182 list_del_init(&subreq->rreq_link);
183 netfs_put_subrequest(subreq, false,
184 netfs_sreq_trace_put_no_copy);
185 }
186 }
187
188 list_for_each_entry(subreq, &rreq->subrequests, rreq_link) {
189 /* Amalgamate adjacent writes */
190 while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
191 next = list_next_entry(subreq, rreq_link);
192 if (next->start != subreq->start + subreq->len)
193 break;
194 subreq->len += next->len;
195 list_del_init(&next->rreq_link);
196 netfs_put_subrequest(next, false,
197 netfs_sreq_trace_put_merged);
198 }
199
200 ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, <--- PRESUMABLY WE ARE HERE ???
201 rreq->i_size, true);
[58710.346376] BUG: kernel NULL pointer dereference, address: 0000000000000008
[58710.353508] #PF: supervisor read access in kernel mode
[58710.358775] #PF: error_code(0x0000) - not-present page
[58710.364041] PGD 0 P4D 0
[58710.366721] Oops: 0000 [#1] PREEMPT SMP NOPTI
[58710.371212] CPU: 12 PID: 9134 Comm: kworker/u129:0 Tainted: G
E 6.0.0-2.dneg.x86_64 #1
[58710.380647] Hardware name: Google Google Compute Engine/Google
Compute Engine, BIOS Google 07/22/2022
[58710.389995] Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
[58710.397188] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
[58710.403767] Code: 00 00 0f 1f 44 00 00 41 57 49 89 d7 41 56 41 55
41 54 49 89 f4 55 44 89 c5 53 48 8b 47 08 48 89 fb 48 83 7f 10 00 4c
8b 70 20 <49> 8b 46 08 4c 8b 28 74 3f 49 8b 7d 50 e8 66 c6 0b cd 48 8b
73 10
[58710.422676] RSP: 0018:ffffb6e12728bde8 EFLAGS: 00010286
[58710.428031] RAX: ffff93649754b970 RBX: ffff9335925e38f0 RCX: 00000000030cb5e8
[58710.435303] RDX: ffff9335123653e0 RSI: ffff9335123653d8 RDI: ffff9335925e38f0
[58710.442567] RBP: 0000000000000001 R08: 0000000000000001 R09: 8080808080808080
[58710.449833] R10: ffff9335900444f4 R11: 000000000000000f R12: ffff9335123653d8
[58710.457109] R13: ffff9335925e38f0 R14: 0000000000000000 R15: ffff9335123653e0
[58710.464374] FS: 0000000000000000(0000) GS:ffff93733f900000(0000)
knlGS:0000000000000000
[58710.472593] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58710.478480] CR2: 0000000000000008 CR3: 0000004055e56003 CR4: 00000000003706e0
[58710.485767] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58710.493037] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58710.500316] Call Trace:
[58710.502894] <TASK>
[58710.505126] netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
[58710.511201] process_one_work+0x217/0x3e0
[58710.515358] worker_thread+0x4a/0x3b0
[58710.519152] ? process_one_work+0x3e0/0x3e0
[58710.523467] kthread+0xd6/0x100
[58710.526740] ? kthread_complete_and_exit+0x20/0x20
[58710.531659] ret_from_fork+0x1f/0x30
[58710.535370] </TASK>
[58710.537702] Modules linked in: rpcsec_gss_krb5(E) tcp_diag(E)
inet_diag(E) nfsv3(E) nfs(E) cachefiles(E) fscache(E) netfs(E) ext4(E)
mbcache(E) jbd2(E) intel_rapl_msr(E) intel_rapl_common(E) sg(E)
nfit(E) libnvdimm(E) i2c_piix4(E) rapl(E) nfsd(E) auth_rpcgss(E)
sch_fq(E) nfs_acl(E) lockd(E) tcp_bbr(E) grace(E) binfmt_misc(E)
ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E)
crc64_rocksoft_generic(E) crc64_rocksoft(E) crc64(E) 8021q(E) garp(E)
mrp(E) stp(E) crct10dif_pclmul(E) llc(E) crc32_pclmul(E)
crc32c_intel(E) scsi_transport_iscsi(E) ghash_clmulni_intel(E)
serio_raw(E) virtio_scsi(E) gve(E) sunrpc(E) dm_mirror(E)
dm_region_hash(E) dm_log(E) dm_mod(E) fuse(E)
[58710.596756] CR2: 0000000000000008
[58710.600204] ---[ end trace 0000000000000000 ]---
[58710.604952] RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
[58710.611534] Code: 00 00 0f 1f 44 00 00 41 57 49 89 d7 41 56 41 55
41 54 49 89 f4 55 44 89 c5 53 48 8b 47 08 48 89 fb 48 83 7f 10 00 4c
8b 70 20 <49> 8b 46 08 4c 8b 28 74 3f 49 8b 7d 50 e8 66 c6 0b cd 48 8b
73 10
[58710.630429] RSP: 0018:ffffb6e12728bde8 EFLAGS: 00010286
[58710.635799] RAX: ffff93649754b970 RBX: ffff9335925e38f0 RCX: 00000000030cb5e8
[58710.643077] RDX: ffff9335123653e0 RSI: ffff9335123653d8 RDI: ffff9335925e38f0
[58710.650342] RBP: 0000000000000001 R08: 0000000000000001 R09: 8080808080808080
[58710.657607] R10: ffff9335900444f4 R11: 000000000000000f R12: ffff9335123653d8
[58710.664872] R13: ffff9335925e38f0 R14: 0000000000000000 R15: ffff9335123653e0
[58710.672142] FS: 0000000000000000(0000) GS:ffff93733f900000(0000)
knlGS:0000000000000000
[58710.680362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[58710.686238] CR2: 0000000000000008 CR3: 0000004055e56003 CR4: 00000000003706e0
[58710.693503] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[58710.700767] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[58710.708031] Kernel panic - not syncing: Fatal exception
[58710.714064] Kernel Offset: 0xc600000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[58710.724898] ---[ end Kernel panic - not syncing: Fatal exception ]---
I think I reported this one (or similar before) but to be honest, I
have lost track of whether I have only seen it in your branches or I
have seen it in the vanilla kernel too?
Certainly I have seen it many times over the last few months testing
various branches.
Daire
Dave Wysochanski (1):
Debug1 Daire crash in cachefiles_prepare_write
fs/cachefiles/io.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
--
2.31.1
More information about the Linux-cachefs
mailing list