[Crash-utility] [PATCH] add a new command: ipcs

Dave Anderson anderson at redhat.com
Wed Apr 11 15:56:24 UTC 2012



----- Original Message -----
> At 2012/4/11 22:50, Dave Anderson Wrote:
> >
> >
> > ----- Original Message -----
> >> Hello Dave,
> >>
> >> I cannot get all kernels at hand. So I have to ask you about the code.
> >> Please show me.
> >
> > Why not?  Just download the upstream kernels from here:
> >
> >    http://www.kernel.org/pub/linux/kernel/v2.6/
> >
> >>>
> >>> (a) On these kernel versions:
> >>>
> >>>       2.6.9-89.ELxenU
> >>>       2.6.15-1.2054_FC5
> >>>       2.6.16.33-xen
> >>>       2.6.18-1.2714.el5xen
> >>>       2.6.18-36.el5xen
> >>>       2.6.18-58.el5xen
> >>>       2.6.18-152.el5xen
> >>>       2.6.31 uniprocessor kernel
> >>>
> >>>       the command fails immediatedly with this error:
> >>>
> >>>         ipcs: cannot resolve "hugetlbfs_file_operations"
> >>>
> >>>
> >>> (b) On *all* RHEL5 2.6.18-era kernels, the message queue display
> >>>       always fails like this:
> >>>
> >>>       ------ Message Queues --------
> >>>       KEY        MSQID      UID        PERMS      USED-BYTES
> >>>         MESSAGES
> >>>       ipcs: invalid structure member offset: kern_ipc_perm_id
> >>>             FILE: ipcs.c  LINE: 899  FUNCTION: get_msg_info()
> >>
> >> I want to see the struct msg_queue and struct struct
> >> kern_ipc_perm.
> >
> > Here is the output from a RHEL5 kernel:
> >
> >   crash>  msg_queue
> >   struct msg_queue {
> >       struct kern_ipc_perm q_perm;
> >       int q_id;
> >       time_t q_stime;
> >       time_t q_rtime;
> >       time_t q_ctime;
> >       long unsigned int q_cbytes;
> >       long unsigned int q_qnum;
> >       long unsigned int q_qbytes;
> >       pid_t q_lspid;
> >       pid_t q_lrpid;
> >       struct list_head q_messages;
> >       struct list_head q_receivers;
> >       struct list_head q_senders;
> >   }
> >   SIZE: 160
> >   crash>  kern_ipc_perm
> >   struct kern_ipc_perm {
> >       spinlock_t lock;
> >       int deleted;
> >       key_t key;
> >       uid_t uid;
> >       gid_t gid;
> >       uid_t cuid;
> >       gid_t cgid;
> >       mode_t mode;
> >       long unsigned int seq;
> >       void *security;
> >   }
> >   SIZE: 48
> >   crash>
> >
> > which is the same as the upstream 2.6.18 kernel.
> 
> Ahh, I khow the reason now: msg_queue_q_id is not initialized!!!!
> 
> >
> >>>
> >>> (c) On this 2.6.36-0.16.rc3.git0.fc15 Fedora kernel, it shows:
> >>>
> >>>       ------ Shared Memory Segments ------
> >>>       KEY        SHMID      UID        PERMS      BYTES
> >>>            NATTCH
> >>>           STATUS
> >>>       ipcs: invalid kernel virtual address: 10  type:
> >>>       "nsproxy.ipc_ns"
> >>
> >> what is struct nsproxy? Or is there any symbol referring to
> >> ipc_ns?
> >
> >   crash>  nsproxy
> >   struct nsproxy {
> >       atomic_t count;
> >       struct uts_namespace *uts_ns;
> >       struct ipc_namespace *ipc_ns;
> >       struct mnt_namespace *mnt_ns;
> >       struct pid_namespace *pid_ns;
> >       struct net *net_ns;
> >   }
> >   SIZE: 48
> >   crash>
> >
> > It's the same as upstream 2.6.36, but it's not the offset that's invalid,
> > it's the NULL "nsproxy" address.
> 
> I am surprised that nsproxy is NULL.
> 
> Each user task belongs to a namesapce, so current_task.nsproxy should not
> be NULL. I guess the current task may be a kernel thread in your test.
> 
> Thanks
> Wen Congyang

Actually, even kernel threads have a valid task->nsproxy setting.

But checking into this a bit further, it's not a kernel thread,
but an exiting thread.  Note the invocation-time warning that
the active, panic, task has been removed from the PID hash:
 
 $ crash vmcore.2.6.36-0.16.rc3.git0.fc15.x86_64 vmlinux-2.6.36-0.16.rc3.git0.fc15.x86_64.gz
 
 crash 6.0.6rc5
 Copyright (C) 2002-2012  Red Hat, Inc.
 Copyright (C) 2004, 2005, 2006  IBM Corporation
 Copyright (C) 1999-2006  Hewlett-Packard Co
 Copyright (C) 2005, 2006  Fujitsu Limited
 Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
 Copyright (C) 2005  NEC Corporation
 Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
 Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
 This program is free software, covered by the GNU General Public License,
 and you are welcome to change it and/or distribute copies of it under
 certain conditions.  Enter "help copying" to see the conditions.
 This program has absolutely no warranty.  Enter "help warranty" for details.
  
 GNU gdb (GDB) 7.3.1                                                                                      
 Copyright (C) 2011 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "x86_64-unknown-linux-gnu"...
 
 please wait... (determining panic task)         
 WARNING: active task ffff88001d190000 on cpu 0 not found in PID hash
 
       KERNEL: vmlinux-2.6.36-0.16.rc3.git0.fc15.x86_64.gz
     DUMPFILE: vmcore.2.6.36-0.16.rc3.git0.fc15.x86_64
         CPUS: 1
         DATE: Fri Sep 24 20:46:58 2010
       UPTIME: 00:27:55
 LOAD AVERAGE: 1.53, 1.80, 1.56
        TASKS: 118
     NODENAME: dyna0.home.front
      RELEASE: 2.6.36-0.16.rc3.git0.fc15.x86_64
      VERSION: #1 SMP Fri Sep 3 16:00:27 UTC 2010
      MACHINE: x86_64  (1600 Mhz)
       MEMORY: 510.7 MB
        PANIC: ""
          PID: 7124
      COMMAND: "hardlink"
         TASK: ffff88001d190000  [THREAD_INFO: ffff88001b17a000]
          CPU: 0
        STATE: EXIT_DEAD (PANIC)
 
 crash>

Note that the "ipcs" command uses the current task, whose task_struct
address is ffff88001d190000 in this particular case, and therefore the
task_struct.nsproxy address is ffff88001d1905f0:

 crash> task -R nsproxy
 PID: 7124   TASK: ffff88001d190000  CPU: 0   COMMAND: "hardlink"
   nsproxy = 0x0,
 crash>

Resulting in the error: 

 crash> set debug 4
 debug: 4
      text hit rate: 62% (3143 of 5040)
 crash> ipcs
 ------ Shared Memory Segments ------
 KEY        SHMID      UID        PERMS      BYTES      NATTCH     STATUS      
 <readmem: ffff88001d1905f0, KVADDR, "task_struct.nsproxy", 8, (FOE), 7fffaf719e98>
 <read_kdump: addr: ffff88001d1905f0 paddr: 1d1905f0 cnt: 8>
 <readmem: 10, KVADDR, "nsproxy.ipc_ns", 8, (FOE), 7fffaf719e90>
 ipcs: invalid kernel virtual address: 10  type: "nsproxy.ipc_ns"
      text hit rate: 62% (3143 of 5040)
 crash> 

The "ipcs" code may have to do something similar to what the "mount"
command does here in cmd_mount():

        /* find a context */
        pid = 1;
        while ((namespace_context = pid_to_context(pid)) == NULL)
                pid++;

where namespace_context is used later in get_mount_list():

        } else if (VALID_MEMBER(task_struct_nsproxy)) {
                tc = namespace_context;

                readmem(tc->task + OFFSET(task_struct_nsproxy), KVADDR,
                        &nsproxy, sizeof(void *), "task nsproxy",
                        FAULT_ON_ERROR);
                if (!readmem(nsproxy + OFFSET(nsproxy_mnt_ns), KVADDR,
                        &mnt_ns, sizeof(void *), "nsproxy mnt_ns",
                        RETURN_ON_ERROR|QUIET))
                        error(FATAL, "cannot determine mount list location!\n");
                if (!readmem(mnt_ns + OFFSET(mnt_namespace_root), KVADDR,
                        &root, sizeof(void *), "mnt_namespace root",
                        RETURN_ON_ERROR|QUIET))
                        error(FATAL, "cannot determine mount list location!\n");

Usually pid 1 would suffice, but as I recall, Bob Montgomery ran into
a vmcore where pid 1 wasn't found in the PID hash, so we added this so
that it keeps looking until it found one?  

Dave




More information about the Crash-utility mailing list