[Linux-cachefs] Pages still marked with private_2

Thu Aug 8 16:44:26 UTC 2013

David,

I retired with your fixes and my newer Ceph implementation. I still
see the same issue with a page being marked as private_2 in the
readahead cleanup code. I understand what happens, but not why it
happens.

On the plus side I haven't seen any hard crashes yet, but I'm putting
it through the paces. I'm not sure if me reworking the fscache code in
Ceph or your wait_on_atomic fix but I'm fine sharing the blame /
success here.

[48532035.686695] BUG: Bad page state in process petabucket  pfn:3b5ffb
[48532035.686715] page:ffffea000ed7fec0 count:0 mapcount:0 mapping:
      (null) index:0x2c
[48532035.686720] page flags: 0x200000000001000(private_2)
[48532035.686724] Modules linked in: ceph libceph cachefiles
auth_rpcgss oid_registry nfsv4 microcode nfs fscache lockd sunrpc
raid10 raid456 async_pq async_xor async_memcpy async_raid6_recov
async_tx raid1 raid0 multipath linear btrfs raid6_pq lzo_compress xor
zlib_deflate libcrc32c
[48532035.686735] CPU: 1 PID: 32420 Comm: petabucket Tainted: G    B
     3.10.0-virtual #45
[48532035.686736]  0000000000000001 ffff88042bf57a48 ffffffff815523f2
ffff88042bf57a68
[48532035.686738]  ffffffff8111def7 ffff880400000001 ffffea000ed7fec0
ffff88042bf57aa8
[48532035.686740]  ffffffff8111e49e 0000000000000000 ffffea000ed7fec0
0200000000001000
[48532035.686742] Call Trace:
[48532035.686745]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
[48532035.686747]  [<ffffffff8111def7>] bad_page+0xc7/0x120
[48532035.686749]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[48532035.686751]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[48532035.686753]  [<ffffffff81123507>] __put_single_page+0x27/0x30
[48532035.686755]  [<ffffffff81123df5>] put_page+0x25/0x40
[48532035.686757]  [<ffffffff81123e66>] put_pages_list+0x56/0x70
[48532035.686759]  [<ffffffff81122a98>] __do_page_cache_readahead+0x1b8/0x260
[48532035.686762]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
[48532035.686835]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
[48532035.686838]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[48532035.686840]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[48532035.686842]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[48532035.686845]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[48532035.686847]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[48532035.686849]  [<ffffffff81005469>] ?
__raw_callee_save_xen_pmd_val+0x11/0x1e
[48532035.686851]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[48532035.686853]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[48532035.686870]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[48532035.686872]  [<ffffffff81003e03>] ? xen_write_msr_safe+0xa3/0xc0
[48532035.686874]  [<ffffffff81004ec2>] ? xen_mc_flush+0xb2/0x1c0
[48532035.686876]  [<ffffffff8100483d>] ? xen_clts+0x8d/0x190
[48532035.686878]  [<ffffffff81556ad6>] ? __schedule+0x3a6/0x820
[48532035.686880]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[48532035.686882]  [<ffffffff81558818>] page_fault+0x28/0x30

- Milosz

On Thu, Jul 25, 2013 at 11:20 AM, David Howells <dhowells at redhat.com> wrote:
> Milosz Tanski <milosz at adfin.com> wrote:
>
>> In my case I'm seeing this in cases when all user space have these
>> opened R/O. Like I wrote this out weeks ago, rebooted... so nobody is
>> using R/W.
>
> I gave Linus a patch to fix wait_on_atomic_t() which he has committed.  Can
> you see if that fixed the problem?  I'm not sure it will, but it's worth
> checking.
>
> David