<div dir="ltr"><div dir="ltr">On Tue, Sep 19, 2023 at 2:23 PM Aditya Gupta <<a href="mailto:adityag@linux.ibm.com">adityag@linux.ibm.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello lijiang,<br> <br> On Mon, Sep 18, 2023 at 07:34:04PM +0800, lijiang wrote:<br> > Hi, Aditya<br> > Thank you for the patch.<br> > <br> > On Mon, Sep 11, 2023 at 8:00 PM <<a href="mailto:crash-utility-request@redhat.com" target="_blank">crash-utility-request@redhat.com</a>> wrote:<br> > <br> > > ...<br> > ><br> > > Currently 'crash-tool' fails on vmcore collected on upstream kernel on<br> > > PowerPC64 with the error:<br> > ><br> > > crash: invalid kernel virtual address: 0 type: "first list entry<br> > ><br> > > Presently the address translation for vmemmap addresses is done using<br> > > the vmemmap_list. But with the below commit in Linux, vmemmap_list can<br> > > be empty, in case of Radix MMU on PowerPC64<br> > ><br> > > 368a0590d954: (powerpc/book3s64/vmemmap: switch radix to use a<br> > > different vmemmap handling function)<br> > ><br> > > In case vmemmap_list is empty, then it's head is NULL, which crash tries<br> > > to access and fails due to accessing NULL.<br> > ><br> > > Instead of depending on 'vmemmap_list' for address translation for<br> > > vmemmap addresses, do a kernel pagetable walk to get the physical<br> > > address associated with given virtual address<br> > ><br> > > Reviewed-by: Hari Bathini <<a href="mailto:hbathini@linux.ibm.com" target="_blank">hbathini@linux.ibm.com</a>><br> > > Signed-off-by: Aditya Gupta <<a href="mailto:adityag@linux.ibm.com" target="_blank">adityag@linux.ibm.com</a>><br> > ><br> > > ---<br> > ><br> > > Testing<br> > > =======<br> > ><br> > > Git tree with patch applied:<br> > > <a href="https://github.com/adi-g15-ibm/crash/tree/bugzilla-203296-list-v1" rel="noreferrer" target="_blank">https://github.com/adi-g15-ibm/crash/tree/bugzilla-203296-list-v1</a><br> > ><br> > > This can be tested with '/proc/vmcore' as the vmcore, since makedumpfile<br> > ><br> > <br> > Can you help to describe in detail how to reproduce this issue? Or does<br> > this require any kernel configs to be enabled first? I did not reproduce<br> > the current issue with '/proc/kcore' or vmcore(via cp).<br> > <br> > Test kernel commit: ce9ecca0238b ("Linux 6.6-rc2")<br> > <br> > # ./crash /home/linux/vmlinux<br> <br> Thanks for testing it.<br> <br> This issue occurs only in case of Radix MMU.<br> <br> Overall, these are all the requirements:<br> 1. Upstream linux (master branch) (your commit will also work, ce9ecca0238b)<br> 2. 'CONFIG_PPC_BOOK3S_64' should be 'y' in kernel config (this should be there<br> in default configs)<br></blockquote><div><br></div><div> # grep "CONFIG_PPC_BOOK3S_64" /home/linux/.config</div>CONFIG_PPC_BOOK3S_64=y</div><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> 3. Check in dmesg of the crashed kernel, if it prints 'hash-mmu' or<br> 'radix-mmu'. It should be 'radix-mmu'.<br> <br></blockquote><div> </div># dmesg|grep mmu<br>[ 0.000000] hash-mmu: Page sizes from device-tree:<br>[ 0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0<br>[ 0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7<br>[ 0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56<br>[ 0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1<br>[ 0.000000] hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8<br>[ 0.000000] hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0<br>[ 0.000000] hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3<br>[ 0.000000] hash-mmu: Initializing hash mmu with SLB<br>[ 0.000000] mmu_features = 0xfc006e01<br>[ 0.000000] hash-mmu: ppc64_pft_size = 0x1b<br>[ 0.000000] hash-mmu: htab_hash_mask = 0xfffff<br><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> I guess, the system that was crashed might be using 'hash-mmu'.<br> <br> > also fails in absence of 'vmemmap_list' in upstream linux<br> <br> Yes, it will fail in Hash MMU case, as we depend on 'vmemmap_list' in that case,<br> as the virtual to physical address mapping is not available in page table, in<br> case of Hash-MMU.<br> <br> Only in radix MMU case, it will still work, even if 'vmemmap_list' is removed,<br> since we have the mappings in kernel page table, which is used by this patch.<br> <br> Let me know if the issue still doesn't reproduce even after using a system with<br> Radix MMU.<br> <br></blockquote><div> </div><div>Yes, still not reproduce on my side. But, looks like we have the same system with Radix MMU, it's strange.</div><div><br></div><div>Thanks.</div><div>Lianbo</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> Thanks,<br> - Aditya Gupta<br> <br> </blockquote></div></div>