[Crash-utility] Fwd: s390x fixes

Dave Anderson anderson at redhat.com
Tue May 1 19:24:17 UTC 2012



Hi Michael,

With respect to the 3rd "vm -p" bug, I did some cursory debugging, and
here's what I found.

In all cases, the readmem() failure occurs in _kl_pg_table_deref_s390x() 
as a result of transitioning from one page of PTEs to the next, because
the pointer to the "next" page of PTES contains 0x20, which looks to be 
_SEGMENT_ENTRY_INV or _REGION_ENTRY_INV? (not sure of the s390x nomenclature...)

So you'll see something like this in the page table that points
to the pages of PTEs:
         
         ...
         c6386e0:  0000000000000020 0000000000000020   ....... ....... 
         c6386f0:  000000001608c800 0000000000000020   ............... 
         c638700:  0000000000000020 0000000000000020   ....... ....... 
         ...

The vaddr's in the page of PTEs pointed to by c6386f0 (at 000000001608c800) 
all resolve as expected, but when the virtual address bumps it to c6386f8,
it reads the 0x20, and passes it to _kl_pg_table_deref_s390x().  The user
vaddr(s) that resolve to that next page of PTEs are legitimate, given that
they are in the virtual region defined by the vm_area_struct.  But they
certainly may not be mapped.

Anyway, it seems that there should be something that catches the invalid entry
in s390x_vtop() -- prior to calling _kl_pg_table_deref_s390x()--  and return
FALSE at that point.  

So if I make this kludge:

        ...

        /* Check if this is a large page. */
        if (entry & 0x400ULL) {
                /* Add the 1MB page offset and return the final value. */
                *phys_addr = table + (vaddr & 0xfffffULL);
                return TRUE;
        }

======> if (entry == 0x20) return FALSE;

        /* Get the page table entry */
        entry = _kl_pg_table_deref_s390x(vaddr, entry & ~0x7ffULL);
        if (!entry)
                return FALSE;

        /* Isolate the page origin from the page table entry. */
        paddr = entry & ~0xfffULL;

        /* Add the page offset and return the final value. */
        *phys_addr = paddr + (vaddr & 0xfffULL);

        return TRUE;
}

then everything seems to work OK.  

So unless the calculation of the next page of PTEs is incorrect, which 
seems unlikely, it seems that the 0x20 is legitimate, and should be
recognized?  What do you think?  

Dave


----- Original Message -----
> 
> Mistakenly cc'd to "crash-utility-owner at redhat.com" instead of this
> list...
> 
> ----- Forwarded Message -----
> From: "Dave Anderson" <anderson at redhat.com>
> To: "Michael Holzheu" <holzheu at linux.vnet.ibm.com>
> Cc: crash-utility-owner at redhat.com
> Sent: Monday, April 30, 2012 4:53:46 PM
> Subject: s390x fixes
> 
> 
> Hi Michael,
> 
> I've got a couple simple bug fixes for s390x that I want to
> run by you, plus a third one that I don't have a fix for.
> 
> First the easy ones:
> 
> (1) "bt -t" and "bt -T" fail on the active task on a live system:
> 
>   crash> bt -t
>   PID: 34875  TASK: 14342540          CPU: 1   COMMAND: "crash"
>   bt: invalid/stale stack pointer for this task: 0
>   crash> bt -T
>   PID: 34875  TASK: 14342540          CPU: 1   COMMAND: "crash"
>   bt: invalid/stale stack pointer for this task: 0
>   crash>
> 
> That can be fixed by adding a !LIVE() check to
> s390x_get_stack_frame()
> so that it will use (bt->task + OFFSET(task_struct_thread_ksp):
> 
>         /* get the stack pointer */
>         if(esp){
> -               if(s390x_has_cpu(bt)){
> +               if (!LIVE() && s390x_has_cpu(bt)) {
>                         ksp = ULONG(lowcore +
>                         MEMBER_OFFSET("_lowcore",
>                                 "gpregs_save_area") + (15 *
>                                 S390X_WORD_SIZE));
>                 } else {
>                         readmem(bt->task +
>                         OFFSET(task_struct_thread_ksp),
>                                 KVADDR, &ksp, sizeof(void *),
>                                 "thread_struct ksp", FAULT_ON_ERROR);
>                 }
>                 *esp = ksp;
>         } else {
> 
>  
> (2) "vm -p" can show bogus data when a page is not mapped, like this
> example:
> 
>   crash> vm -p 1
>   PID: 1      TASK: 17b91120          CPU: 1   COMMAND: "init"
>          MM               PGD          RSS    TOTAL_VM
>       14f48400          14f4c000       344k    3116k
>         VMA              START             END        FLAGS FILE
>       14b88c80          2aab283b000      2aab2862000 8001875
>       /sbin/init
>   VIRTUAL           PHYSICAL
>   2aab283b000       SWAP: (unknown swap location)  OFFSET: 0
>   2aab283c000       SWAP: (unknown swap location)  OFFSET: 0
>   2aab283d000       SWAP: (unknown swap location)  OFFSET: 0
>   2aab283e000       SWAP: (unknown swap location)  OFFSET: 0
>   2aab283f000       SWAP: (unknown swap location)  OFFSET: 0
>   2aab2840000       SWAP: (unknown swap location)  OFFSET: 0
>   2aab2841000       SWAP: (unknown swap location)  OFFSET: 0
>   ...
>  
> And that's because when a "machdep->uvtop()" operation is done on a
> user
> page that is not resident, the machine-dependent function should
> return
> FALSE -- but it should return the PTE value in the paddr pointer
> field
> so that it can be translated by vm_area_page_dump().  The
> s390x_uvtop()
> does not return the PTE, so the failed output can vary, because it's
> using
> an uninitialized "paddr" stack variable.  But this is another easy
> fix,
> in this case to s390x_vtop():
> 
> /* lookup virtual address in page tables */
> int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr, int
> verbose)
> {
>         ulong entry, paddr;
>         int level, len;
> 
> +       *phys_addr = 0;
> 
> 
> (3) Even with the (2) applied, however, "vm -p" can fail to translate
>     user addresses in another situation.  If you try this, you'll
>     see a number of failures like this:
> 
>   crash> foreach user vm -p | grep PID
>   PID: 1      TASK: 17b91120          CPU: 1   COMMAND: "init"
>   PID: 599    TASK: 14fbc140          CPU: 1   COMMAND: "udevd"
>   PID: 955    TASK: 14343620          CPU: 0   COMMAND: "udevd"
>   PID: 961    TASK: 13f19220          CPU: 1   COMMAND: "udevd"
>   PID: 1246   TASK: 14cc0ab0          CPU: 0   COMMAND: "auditd"
>   PID: 1247   TASK: 14f88240          CPU: 0   COMMAND: "auditd"
>   PID: 1271   TASK: 140a3320          CPU: 0   COMMAND: "rsyslogd"
>   vm: read error: kernel virtual address: 0  type: "entry"
>   PID: 1272   TASK: 14b11520          CPU: 0   COMMAND: "rs:main
>   Q:Reg"
>   vm: read error: kernel virtual address: 0  type: "entry"
>   PID: 1273   TASK: 16a32440          CPU: 1   COMMAND: "rsyslogd"
>   vm: read error: kernel virtual address: 0  type: "entry"
>   PID: 1274   TASK: 14c3cbb0          CPU: 0   COMMAND: "rsyslogd"
>   vm: read error: kernel virtual address: 0  type: "entry"
>   ...
> 
> And if I take a particular case:
> 
>   crash> vm -p
>   PID: 5088   TASK: 14399420          CPU: 1   COMMAND: "mingetty"
>          MM               PGD          RSS    TOTAL_VM
>       14e49c00          147f8000       116k    2180k
>   ... [ cut ] ...
>         VMA              START             END        FLAGS FILE
>       14c49bc0             8dee1000         8df02000 100073
>   VIRTUAL           PHYSICAL
>   8dee1000           ef03000
>   8dee2000          (not mapped)
>   8dee3000          (not mapped)
>   8dee4000          (not mapped)
>   8dee5000          (not mapped)
>   8dee6000          (not mapped)
>   8dee7000          (not mapped)
>   8dee8000          (not mapped)
>   8dee9000          (not mapped)
>   8deea000          (not mapped)
>   8deeb000          (not mapped)
>   8deec000          (not mapped)
>   8deed000          (not mapped)
>   8deee000          (not mapped)
>   8deef000          (not mapped)
>   8def0000          (not mapped)
>   8def1000          (not mapped)
>   8def2000          (not mapped)
>   8def3000          (not mapped)
>   8def4000          (not mapped)
>   8def5000          (not mapped)
>   8def6000          (not mapped)
>   8def7000          (not mapped)
>   8def8000          (not mapped)
>   8def9000          (not mapped)
>   8defa000          (not mapped)
>   8defb000          (not mapped)
>   8defc000          (not mapped)
>   8defd000          (not mapped)
>   8defe000          (not mapped)
>   8deff000          (not mapped)
>   vm: read error: kernel virtual address: 0  type: "entry"
>   crash>
>   
> So in this example, the page that's failing is 8df00000, which is
> located in the VMA's range from 8dee1000 to 8df02000.  But the
> machdep->uvtop() operation fails unexpectedly:
> 
>   crash> vtop -u 8df00000 -u
>   VIRTUAL           PHYSICAL
>   vtop: read error: kernel virtual address: 0  type: "entry"
>   crash>
> 
> And that "entry" readmem() is in s390x.c code that I don't wish
> to screw around with...
> 
> Hoping you can help,
>   Dave
>   
> 
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
> 




More information about the Crash-utility mailing list