[Crash-utility] Broken vtop on kernel 2.6.20?

Wed May 30 16:06:51 UTC 2007

Alex Sidorenko wrote:
> Hi Dave,
> 
> when I try to use 'vtop' for process pages on 2.6.20 kernel (Ubuntu/Feisty) on 
> x86 architecture, I get error messages about page table. The easiest way to 
> reproduce is to run 'ps -a' on a live kernel:
> 
> 
> PID: 0      TASK: c03a2440  CPU: 0   COMMAND: "swapper"
> ps: no user stack
> 
> PID: 0      TASK: df838560  CPU: 1   COMMAND: "swapper"
> ps: no user stack
> 
> PID: 1      TASK: df838a90  CPU: 1   COMMAND: "init"
> ps: read error: physical address: 7f2f0000  type: "page table"
> 
> 
> Running crash with -d8:
> 
> PID: 1      TASK: df838a90  CPU: 1   COMMAND: "init"
> <readmem: df838a90, KVADDR, "fill_task_struct", 1328, (ROE|Q), 8d2eac0>
> <readmem: dfb71e40, KVADDR, "fill_mm_struct", 432, (ROE|Q), 8d8bf80>
>   GETBUF(128 -> 1)
>   FREEBUF(1)
>   GETBUF(128 -> 1)
>   FREEBUF(1)
>   GETBUF(128 -> 1)
>   FREEBUF(1)
>   GETBUF(128 -> 1)
>   FREEBUF(1)
> arg_start: bf991ecf arg_end: bf991ee1 (18)
> env_start: bf991ee1 env_end: bf991ff1 (272)
>   GETBUF(291 -> 1)
> <readmem: dfb6f000, KVADDR, "pgd page", 4096, (FOE), 843cf90>
> <readmem: dfb6f000, KVADDR, "pmd page", 4096, (FOE), 843cf90>
> <readmem: 7f2f0000, PHYSADDR, "page table", 4096, (FOE), 843efa0>
> ps: read error: physical address: 7f2f0000  type: "page table"
> 
> The same crash-4.0-4.1 works fine on live 2.6.15 kernel. Did the page table 
> layout change between 2.6.15 and 2.6.20 ?

Alex,

I think the mystery is solved.  I believe that if you do a "ps -a"
on your machine with the 2.6.20 kernel, you will find that *some*
of the user process may show their arguments and environment data.

Testing this on a 2.6.21-based kernel, I see that to be the case,
where there's a mix of success with "ps -a", say for example,
just taking the "mingetty" processes:

   crash> ps -a mingetty
   PID: 3125   TASK: f7e33870  CPU: 2   COMMAND: "mingetty"
   ps: cannot access user stack address: bfe76f4c

   PID: 3126   TASK: f6b08430  CPU: 0   COMMAND: "mingetty"
   ps: cannot access user stack address: bfac8f4c

   PID: 3127   TASK: f66b7730  CPU: 3   COMMAND: "mingetty"
   ps: cannot access user stack address: bfeeff4c

   PID: 3128   TASK: f693d170  CPU: 2   COMMAND: "mingetty"
   ps: cannot access user stack address: bfe45f4c

   PID: 3129   TASK: f6160630  CPU: 3   COMMAND: "mingetty"
   ps: cannot access user stack address: bfb72f4c

   PID: 3137   TASK: f68be0b0  CPU: 2   COMMAND: "mingetty"
   ARG: /sbin/mingetty tty6
   ENV: HOME=/
        TERM=linux
        SELINUX_INIT=YES
        PATH=/bin:/usr/bin:/sbin:/usr/sbin
        RUNLEVEL=3
        PREVLEVEL=N
        CONSOLE=/dev/console
        INIT_VERSION=sysvinit-2.86

   crash>

So obviously if user virtual address translation works for one task,
it has to work for all of them.  However, presumably the similarity
between the 2.6.21-based kernel above and your 2.6.20 Ubuntu kernel
is that they both use /dev/mem for physical memory access.  And that
is a weakness that has existed all along for the i386 /dev/mem driver.

Here is the top of the read_mem() function in "drivers/char/mem.c":

   static ssize_t read_mem(struct file * file, char __user * buf,
                           size_t count, loff_t *ppos)
   {
           unsigned long p = *ppos;
           ssize_t read, sz;
           char *ptr;

           if (!valid_phys_addr_range(p, count))
                   return -EFAULT;

And since i386 does not have ARCH_HAS_VALID_PHYS_ADDR_RANGE
#define'd, it is bounded by the value of high_memory:

   #ifndef ARCH_HAS_VALID_PHYS_ADDR_RANGE
   static inline int valid_phys_addr_range(unsigned long addr, size_t count)
   {
           if (addr + count > __pa(high_memory))
                   return 0;

           return 1;
   }

So if your system has more than 896MB installed (0x38000000), any
memory at and above that is not accessible.  My test box has 1GB
(0x40000000) installed:

   crash> rd -p 38000000
   rd: read error: physical address: 38000000  type: "32-bit PHYSADDR"
   crash> rd -p 37fff000
   37fff000:  f7ffdcc8
   crash>

And user virtual addresses pages, and usually their page tables, are biased
to use high_memory.  So if your system has memory above 896MB, the user virtual
memory page tables will be allocated from highmem if CONFIG_HIGHPTE:

   struct page *pte_alloc_one(struct mm_struct *mm, unsigned long address)
   {
           struct page *pte;

   #ifdef CONFIG_HIGHPTE
           pte = alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO,0);
   #else
           pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
   #endif
           return pte;
   }

And even if the page table still comes from low memory, the actual user
data pages will always be biased to use __GFP_HIGHMEM.

I should have recognized this by your error message:

   ps: read error: physical address: 7f2f0000  type: "page table"

where physical address 7f2f0000 was way up there...

I forgot about this anomoly because Red Hat RHEL kernels outlaw the use of
/dev/mem for anything above the first 256 pages of physical memory (for security
purposes).  And for that reason, RHEL kernels contain the /dev/crash driver
for live memory access, which is unrestricted.

Dave

The same crash-4.0-4.1 works fine on live 2.6.15 kernel