Fwd: Reporting some Kernel Panics

Sun Jul 10 20:48:04 UTC 2005

David Woodhouse wrote:

>On Fri, 2005-07-08 at 14:18 -0600, Timothy R. Chavez wrote:
>  
>
>>These don't look familiar to me, but I think it'd be good to send them
>>out to everyone to take a look... 
>>    
>>
>
>HTML-ised, word-wrapped, 'deepSkyBlue' oopses? Can I have some of what
>you lot are smoking? :)
>
>The first panic is similar to something else I've seen but not managed
>to make any progress with. In that case I was told the precise kernel
>and was able to determine that the oops in proc_get_inode was due to an
>invalid ->owner field in try_module_get().
>
>That one was sent to me as an OpenOffice document, but I'll do the world
>a favour and reproduce it here as text...
>
>Oops: 0000 [1] SMP inode=2 dev=fd:00 mode=040755 ouid=0 ogid=0 rdev=00:00
>CPU 7
>Modules linked in: michael_mic parport_pc lp parport netconsole netdump autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac md5 ipv6 ohci_hcd ehci_hcd tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod aacraid sd_mod scsi_mod
>Pid: 10704, comm: mount_test Not tainted 2.6.9-11.EL.audit.74smp
>RIP: 0010:[<ffffffff801a6764>] <ffffffff801a6764>{proc_get_inode+199}
>RSP: 0018:0000010068eb9ce8 EFLAGS: 00010282
>RAX: 0000000000000007 RBX: 000001007cce6d70 RCX: 0000000000000000
>RDX: ffffffffa01a5180 RSI: 000001000314b478 RDI: ffffffff80476380
>RBP: 0000010068a1e060 R08: 0000010068eb9ca8 R09: 0000000000000000
>RBP: 0000010068a1e060 R08: 0000010068eb9ca8 R09: 0000000000000000
>R10: 000001006d81c950 R11: 0000000000000058 R12: 000001007ff05538
>R13: 0000000000000000 R14: 00000000ffffffea R15: 0000010068eb9d98
>FS: 0000002a95583b00(0000) GS:ffffffff804c6f80(0000) knlGS:0000000000000000
>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: ffffffffa01a5180 CR3: 000000007fe9a000 CR4: 00000000000006e0
>Process mount_test (pid: 10704, threadinfo 0000010068eb8000, task 000001006c5be030)
>Stack: 000001007cce6d70 00000100679ed260 000001006963e600 ffffffff801a91da
>       fffffffffffffff4 00000100679ed260 000001006963e600 00000100679ed368
>       0000010068eb9e58 ffffffff80182060
>Call Trace: <ffffffff801a91da>{proc_lookup+246} 
>            <ffffffff80182060>{do_lookup+230}
>            <ffffffff80182c76>{link_path_walk+2508} 
>            <ffffffff801832a7>{path_lookup+451}
>            <ffffffff80183553>{__user_walk+47}
>            <ffffffff8017e037>{vfs_lstat+21} 
>            <ffffffff80154234>{audit_syscall_entry+306}
>            <ffffffff8017e369>{sys_newlstat+17}
>            <ffffffff801141e4>{syscall_trace_enter+161}
>            <ffffffff80110142>{tracesys+113}
>            <ffffffff801101a2>{tracesys+209}
>
>Code: 83 3a 02 74 32 89 c0 48 c1 e0 07 48 8d 04 02 ff 80 00 01 00
>
>RIP <ffffffff801a6764>{proc_get_inode+199} RSP <0000010068eb9ce8>
>CR2: ffffffffa01a5180
>
>In this case, the owner field of the proc directory in question is set
>to 0xffffffffa01a5180, when it _should_ have been a pointer to a valid
>'struct module'. We don't seem to have been given the faulting address
>in the panic you show, but I'm fairly sure it'll be the same thing. Can
>you tell me which /proc file was being accessed when this happened?
>
>This is happening _before_ the audit hooks in path_lookup() are reached;
>there shouldn't be anything happening in this particular code path which
>is audit-related. There's probably been some problem _beforehand_.
>
>Jeff, wasn't there a netdump in the x86_64 case above? 
>  
>
  There is a netdump in the x86_64 case. It is still available to be 
looked at.
I stepped away from the issue because I could not reproduce it on any
"official U2 kernel variant"
  I was able to reproduce it with audit enabled and with audit disabled on
your version of audit.74. So I assumed that it was a issue in your build 
tree.
I can revisit that if you would like.

>The second and third oopses are basically the same as each other. I'm
>inclined to suspect that it's the call to fops_get() in dentry_open(),
>which is actually another call to try_module_get(). But in that case
>it's a _different_ pointer to a struct module, not one in a
>proc_dir_entry but one in a struct file_operations. Again, what file is
>being access? One in /proc, I'd imagine? 
>  
>
  I can write a test that will try and read all files in /proc. May or 
may not provide
any data.

>How easy is it to reproduce these oopsen? Can it be done with audit
>disabled? Can it be done on the base U1 kernel?
>
>  
>