[PATCH] audit: grab a reference to context->pwd when it's cached

Mon Oct 8 18:48:16 UTC 2012

On 10/6/12 10:22 PM, Peter Moody wrote:
> On Fri, Oct 5, 2012 at 1:23 PM, Eric Sandeen <sandeen at redhat.com> wrote:
>> On 10/5/12 10:57 AM, Peter Moody wrote:
>>> On Fri, Oct 5, 2012 at 8:18 AM, Peter Moody <pmoody at google.com> wrote:
>>>> On Fri, Oct 5, 2012 at 8:16 AM, Peter Moody <pmoody at google.com> wrote:
>>>>> On Fri, Oct 5, 2012 at 7:26 AM, Jeff Layton <jlayton at redhat.com> wrote:
>>>>>> On Fri, 5 Oct 2012 06:57:59 -0700
>>>>>> Peter Moody <pmoody at google.com> wrote:
>>>>>>
>>>>>>> On Fri, Oct 5, 2012 at 5:55 AM, Jeff Layton <jlayton at redhat.com> wrote:
>>>>>>>> On Thu, 4 Oct 2012 11:48:23 -0700
>>>>>>>> Peter Moody <pmoody at google.com> wrote:
>>>>>>>>
>>>>>>>>> On Wed, Sep 26, 2012 at 6:50 AM, Alexander Viro <aviro at redhat.com> wrote:
>>>>>>>>>> On Tue, Sep 25, 2012 at 10:03:23AM -0700, Peter Moody wrote:
>>>>>>>>>>> Hey folks,
>>>>>>>>>>>
>>>>>>>>>>> following up on old patches, are there any comments on this? Did you
>>>>>>>>>>> get around to finding a better way to fix this bug, Al?
>>>>>>>>>>
>>>>>>>>>> Alas, I've found none ;-/  Looks like we'll have to go with this one,
>>>>>>>>>> at least until somebody comes up with better solution.
>>>>>>>>>
>>>>>>>>> Not surprisingly, this patch doesn't actually fix the issue (or at
>>>>>>>>> least doesn't do it correctly).
>>>>>>>>>
>>>>>>>>> I hadn't noticed that get_fs_pwd() actually calls path_get() on
>>>>>>>>> &context->pwd so the additional path_get() is useless and the
>>>>>>>>> reference doesn't ever actually get freed if audit_putname is called
>>>>>>>>> while we're in a syscall.
>>>>>>>>>
>>>>>>>>> Al, Eric, Jeff; do any of you guys have an understanding of what the
>>>>>>>>> initial bug actually is since this clearly doesn't fix it?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> peter
>>>>>>>>>
>>>>>>>>
>>>>>>>> BTW, I ran this test on one of my KVM guests and it ran just fine. That
>>>>>>>> one is an x86_64 guest running a 3.6.0+ kernel. The root fs on there is
>>>>>>>> ext4 though, not ext3. So perhaps that's a factor?
>>>>>>>>
>>>>>>>> The oops message you posted at least looks like something down in the
>>>>>>>> bowels of ext3 or fs/buffer.c.
>>>>>>>
>>>>>>> Yeah, the only place this actually happens for me on these giant xen
>>>>>>> instances we have (6 cores, 32G ram) and it happens on both ext3 and
>>>>>>> ext4 filesystems and it happens with 100% reliability.
>>>>>>>
>>>>>>> The actual oops is from:
>>>>>>>
>>>>>>> static inline void check_irqs_on(void)
>>>>>>> {
>>>>>>> #ifdef irqs_disabled
>>>>>>>         BUG_ON(irqs_disabled());
>>>>>>> #endif
>>>>>>> }
>>>>>>>
>>>>>>> with the code path looking like:
>>>>>>>
>>>>>>> __find_get_block() -> lookup_bh_lru() -> check_irqs_on() -> BUG()
>>>>>>>
>>>>>>
>>>>>> Do you have a backtrace from a more recent kernel? I wonder if
>>>>>> something in the syscall exit codepath is disabling IRQs here?
>>>>>
>>>>> is 3.6.0-rc1 recent enough or do you want something newer?
>>>>
>>>> nevermind, that doesn't boot. One sec.
>>>
>>>
>>> here's 3.5.0
>>>
>>>  ------------[ cut here ]------------
>>> kernel BUG at fs/buffer.c:1220!
>>> invalid opcode: 0000 [#1] SMP
>>> CPU 0
>>> Pid: 3683, comm: a.out Not tainted 3.5.0 #3
>>> RIP: e030:[<ffffffff816a99f4>]  [<ffffffff816a99f4>]
>>> check_irqs_on.part.8+0x4/0x6
>>> RSP: e02b:ffff8807b156dc28  EFLAGS: 00010046
>>> RAX: ffff8807d0dd0000 RBX: ffff8807a7d6df28 RCX: 0000000005883396
>>> RDX: 0000000000001000 RSI: 0000000005883396 RDI: ffff8807cfc0c000
>>> RBP: ffff8807b156dc28 R08: 0000000000000001 R09: ffff8807a7d6de50
>>> R10: f83a2b0a359bf007 R11: 0000000000000000 R12: ffff8807a7d6de54
>>> R13: ffff8807a7d6de80 R14: ffff8807cfc1f120 R15: 0000000005883396
>>> FS:  00007f97164ec700(0000) GS:ffff8807ffc00000(0063) knlGS:0000000000000000
>>> CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
>>> CR2: 00000000f76ca3b0 CR3: 00000007bbb53000 CR4: 0000000000002660
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>>> Process a.out (pid: 3683, threadinfo ffff8807b156c000, task ffff8807bbae8000)
>>
>> One thing that crossed my mind is that back in the 4k stacks days I
>> think sometimes a blown stack could corrupt threadinfo and we'd get
>> spurious warnings like this.
>>
>> Is there any chance something like that has happened? (any stack
>> depth messages, etc?)
> 
> I do'nt see any stack depth messages in there, no.
> 
>> Maybe a crashdump on the BUG() for further poking around would
>> be in order.
> 
> when I add a painc() there, this is what I get. I'm not seeing
> anything about stack depth, Is there something obvious I'm missing?

Not missing anything really; it's a long shot I suppose anyway.

It's possible that there was some deep stack excursion that corrupted
threadinfo, and then we went down the path below and the first bad news
we received was when we tested whether we were in an interrupt.

Hm, your other email said that you only started getting the in_interrupt()
warnings AFTER the BUG_ON()?  That makes me think I might be right after
all:

> I booted a few kernels with the WARN_ON successively further
> down the stack in audit land and it only started firing after BUG_ON()
> in fs/buffer.c

On the other hand, there is a "canary" at the top of the stack which
would usually get overwritten, and a BUG_ON() or panic() should print
that fact, something like "Thread overran stack, or stack corrupted."

What if you build with config STACK_TRACER on, and:

# mount -t debugfs none /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# <run your test w/o the panic, get the BUG_ON>
# cat /sys/kernel/debug/tracing/stack_trace

If CONFIG_DEBUG_STACK_USAGE isn't on, that might help too.

I could be way off here but might be worth a test.

-Eric

> Kernel panic - not syncing: irqs disabled
> Pid: 23011, comm: a.out Tainted: G        W    3.2.5-at18-ganetixenu #2
> Call Trace:
>  [<ffffffff81692b16>] panic+0x8c/0x1a3
>  [<ffffffff8100926d>] ? xen_force_evtchn_callback+0xd/0x10
>  [<ffffffff816960d6>] check_irqs_on.part.13+0x25/0x25
>  [<ffffffff81161dc9>] __find_get_block+0x1f9/0x200
>  [<ffffffff813c7a2f>] ? number.isra.2+0x31f/0x350
>  [<ffffffff811c9d85>] ext3_clear_blocks+0x75/0x140
>  [<ffffffff811c9f5c>] ext3_free_data+0x10c/0x150
>  [<ffffffff811da961>] ? ext3_journal_start_sb+0x31/0x60
>  [<ffffffff811ca635>] ext3_truncate+0x4a5/0x600
>  [<ffffffff81232f38>] ? journal_start+0xb8/0x100
>  [<ffffffff811ccf48>] ext3_evict_inode+0x228/0x2a0
>  [<ffffffff8114bfb1>] evict+0xa1/0x1a0
>  [<ffffffff8114cc61>] iput+0x101/0x210
>  [<ffffffff81148040>] d_kill+0xf0/0x130
>  [<ffffffff81148bd2>] dput+0xd2/0x1b0
>  [<ffffffff8113eb85>] path_put+0x15/0x30
>  [<ffffffff8169348f>] audit_free_names+0xbc/0xdb
>  [<ffffffff810ac629>] audit_syscall_exit+0x139/0x1e0
>  [<ffffffff8169fdaa>] sysexit_audit+0x21/0x5f
> 
>> -Eric
>>
>>> Stack:
>>>  ffff8807b156dc98 ffffffff8116a099 ffff8807b59f3000 ffff8807b156dd30
>>>  ffff8807b156dd60 ffff8807b156de78 ffff8807b156dc78 ffffffff816af231
>>>  ffff8807b156dcd8 ffff8807a7d6e538 ffff8807a7d6df28 ffff8807a7d6de54
>>> Call Trace:
>>>  [<ffffffff8116a099>] __find_get_block+0x1f9/0x200
>>>  [<ffffffff816af231>] ? down_read+0x11/0x30
>>>  [<ffffffff811d1405>] ext3_clear_blocks+0x75/0x140
>>>  [<ffffffff811d15dc>] ext3_free_data+0x10c/0x150
>>>  [<ffffffff811e2061>] ? ext3_journal_start_sb+0x31/0x60
>>>  [<ffffffff811d1cb5>] ext3_truncate+0x4a5/0x600
>>>  [<ffffffff8123d5b8>] ? journal_start+0xb8/0x100
>>>  [<ffffffff8106f406>] ? bit_waitqueue+0x16/0xc0
>>>  [<ffffffff811d4598>] ext3_evict_inode+0x248/0x2c0
>>>  [<ffffffff81153b9a>] evict+0xaa/0x1b0
>>>  [<ffffffff81154843>] iput+0x103/0x210
>>>  [<ffffffff8114fc88>] dentry_iput+0x88/0xd0
>>>  [<ffffffff811505ec>] dput+0x12c/0x250
>>>  [<ffffffff81146275>] path_put+0x15/0x30
>>>  [<ffffffff810b2f35>] __audit_syscall_exit+0x2e5/0x460
>>>  [<ffffffff816b30be>] sysexit_audit+0x29/0x5b
>>> Code: 04 00 00 4c 8d 88 c0 02 00 00 31 c0 e8 5f da ff ff 48 85 db 74
>>> 0c 80 43 5c 01 48 89 df e8 d5
>>> 6a aa ff 5b 41 5c 5d c3 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 55 48 89
>>> e5 0f 0b 55 48 89 e5 41 54 53
>>> RIP  [<ffffffff816a99f4>] check_irqs_on.part.8+0x4/0x6
>>>  RSP <ffff8807b156dc28>
>>> ---[ end trace 8d09f8cfbb601c14 ]---
>>>
>>>