[Crash-utility] help debug number of CPU detect failure

Santosh ysan99 at gmail.com
Thu Mar 5 20:50:40 UTC 2020


On Thu, Mar 5, 2020 at 12:19 PM Santosh <ysan99 at gmail.com> wrote:
>
> On Wed, Mar 4, 2020 at 2:49 PM Dave Anderson <anderson at prospeed.net> wrote:
> >
> > > Hello List,
> > >
> > > I've a two ELF coredumps from two different HyperV VMs generated by this
> > > tool (https://github.com/Azure/azure-linux-utils/tree/master/vm2core).
> > >
> > > Crash works with one of these coredumps but do not work with other.
> > >
> > > I've placed the output generated by crash tool here:
> > >
> > > Not ok with crash:
> > > ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
> > > vm1_numa_4gb_5cpu.coredump --kaslr 600000 -m phys_base=4355784704 -d8
> > >  https://raw.githubusercontent.com/santoshx/temp/master/notok_with_crash.txt
> > >
> > > Ok with crash:
> > >  ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
> > > vm1_nonuma_4gb_5cpu.coredump --kaslr 3c00000 -m phys_base=2344615936 -d8
> > >  https://raw.githubusercontent.com/santoshx/temp/master/ok_with_crash.txt
> > >
> > >
> > > The problem I see that in non-working case crash fails to detect correct
> > > cpu_possible_mask:
> > >
> > > Relevant part of $ diff ok_with_crash.txt notok_with_crash.txt:
> > >
> > > <   cpu_active_mask: cpus: 0 1 2 3 4
> > > < FREEBUF(0)
> > > < <readmem: ffffffff86039f40, KVADDR, "pv_init_ops", 8, (ROE),
> > > 7ffe01722870>
> > > < <read_kdump: addr: ffffffff86039f40 paddr: 91c39f40 cnt: 8>
> > > < read_netdump: addr: ffffffff86039f40 paddr: 91c39f40 cnt: 8 offset:
> > > 91c3a760
> > > ---
> > >> <readmem: ffffffff826f2b60, KVADDR, "possible", 1024, (ROE),
> > >> 5638a35a2280>
> > >> <read_kdump: addr: ffffffff826f2b60 paddr: 1060f2b60 cnt: 1024>
> > >> read_netdump: addr: ffffffff826f2b60 paddr: 1060f2b60 cnt: 1024 offset:
> > >> fe0f3380
> > >> cpu_possible_mask: cpus: 3 4 5 6 8 13 14 18 20 21 22 26 28 29 30 33 36
> > >> 37 38 48 49 52 53 54 56 59 60 61 62 64 65 68 69 70 72 73 74 75 76 78 82
> > >> 83 85 86 90 91 93 94 96 99 101 102 104 105 108 109 110 114 116 117 118
> > >> 123 124 125 126 128 133 134 138 140 141 142 146 148 149 150 153 156 157
> > >> 158 168 169 172 173 174 176 179 180 181 182 184 185 188 189 190 192 193
> > >> 194 195 196 198 200 202 205 206 211 212 213 214 216 219 221 222 226 228
> > >> 229 230 232 233 234 235 236 238 242 243 245 246 248 251 253 254 256 257
> > >> 260 261 262 266 268 269 270 275 276 277 278 280 285 286 290 292 293 294
> > >> 298 300 301 302 305 308 309 310 320 321 324 325 326 328 331 332 333 334
> > >> 336 337 340 341 342 344 345 346 347 348 350 352 354 357 358 361 362 363
> > >> 365 366 370 372 373 374 376 378 381 382 385 388 389 390 392 393 394 395
> > >> 396 398 402 403 405 406 408 411 413 414 416 417 420 421 422 426 428 429
> > >> 430 435 436 437 438 440 445 446 450 452 453 454 458 460 461 462 465 468
> > >> 469 470 480 481 484 485 486 488 491 492 493 494 496 497 500 50
> > >  1 502 504 505 506 507 508 510 514 515 517 518 520 523 525 526 528 529 532
> > > 533 534 538 540 541 542 547 548 549
> > >
> > > I'm trying to find where the problem is? in the crash too or the tool that
> > > generated the ELF coredumps?
> >
> > I suspect that it's a problem with either the --kaslr offset and/or
> > the phys_base value that you have used.
>
> Is there method to know or print kaslr & phy_base in a running Linux system?

Got it.

crash> p vmcoreinfo_data+1600
$12 = (unsigned char *) 0xffff90ff7cdc3640
"poison)=22\nNUMBER(PG_head_mask)=32768\nNUMBER(PAGE_BUDDY_MAPCOUNT_VALUE)=-128\nNUMBER(HUGETLB_PAGE_DTOR)=2\nNUMBER(phys_base)=-499122176\nSYMBOL(init_top_pgt)=ffffffffa200a000\nSYMBOL(node_data)=ffffffffa225d780\nLENGTH(node_data)=1024\nKERNELOFFSET=1fc00000\nNUMB"...

>
> >
> > It appears that the read of the cpu_possible mask is not using the
> > correct virtual address, or perhaps the wrong physical address, and
> > as a result it is trying to translate bogus data.  In fact, the full
> > output txt file shows that every thing that it reads is garbage, e.g.,
> > the cpu masks, the utsname data structure, the linux_banner string, etc.
> >
> > Dave
> >
> >
> > --
> > Crash-utility mailing list
> > Crash-utility at redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
> >





More information about the Crash-utility mailing list