[Crash-utility] help debug number of CPU detect failure

Thu Mar 5 20:19:18 UTC 2020

On Wed, Mar 4, 2020 at 2:49 PM Dave Anderson <anderson at prospeed.net> wrote:
>
> > Hello List,
> >
> > I've a two ELF coredumps from two different HyperV VMs generated by this
> > tool (https://github.com/Azure/azure-linux-utils/tree/master/vm2core).
> >
> > Crash works with one of these coredumps but do not work with other.
> >
> > I've placed the output generated by crash tool here:
> >
> > Not ok with crash:
> > ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
> > vm1_numa_4gb_5cpu.coredump --kaslr 600000 -m phys_base=4355784704 -d8
> >  https://raw.githubusercontent.com/santoshx/temp/master/notok_with_crash.txt
> >
> > Ok with crash:
> >  ./crash/crash /usr/lib/debug/boot/vmlinux-4.15.0-88-generic
> > vm1_nonuma_4gb_5cpu.coredump --kaslr 3c00000 -m phys_base=2344615936 -d8
> >  https://raw.githubusercontent.com/santoshx/temp/master/ok_with_crash.txt
> >
> >
> > The problem I see that in non-working case crash fails to detect correct
> > cpu_possible_mask:
> >
> > Relevant part of $ diff ok_with_crash.txt notok_with_crash.txt:
> >
> > <   cpu_active_mask: cpus: 0 1 2 3 4
> > < FREEBUF(0)
> > < <readmem: ffffffff86039f40, KVADDR, "pv_init_ops", 8, (ROE),
> > 7ffe01722870>
> > < <read_kdump: addr: ffffffff86039f40 paddr: 91c39f40 cnt: 8>
> > < read_netdump: addr: ffffffff86039f40 paddr: 91c39f40 cnt: 8 offset:
> > 91c3a760
> > ---
> >> <readmem: ffffffff826f2b60, KVADDR, "possible", 1024, (ROE),
> >> 5638a35a2280>
> >> <read_kdump: addr: ffffffff826f2b60 paddr: 1060f2b60 cnt: 1024>
> >> read_netdump: addr: ffffffff826f2b60 paddr: 1060f2b60 cnt: 1024 offset:
> >> fe0f3380
> >> cpu_possible_mask: cpus: 3 4 5 6 8 13 14 18 20 21 22 26 28 29 30 33 36
> >> 37 38 48 49 52 53 54 56 59 60 61 62 64 65 68 69 70 72 73 74 75 76 78 82
> >> 83 85 86 90 91 93 94 96 99 101 102 104 105 108 109 110 114 116 117 118
> >> 123 124 125 126 128 133 134 138 140 141 142 146 148 149 150 153 156 157
> >> 158 168 169 172 173 174 176 179 180 181 182 184 185 188 189 190 192 193
> >> 194 195 196 198 200 202 205 206 211 212 213 214 216 219 221 222 226 228
> >> 229 230 232 233 234 235 236 238 242 243 245 246 248 251 253 254 256 257
> >> 260 261 262 266 268 269 270 275 276 277 278 280 285 286 290 292 293 294
> >> 298 300 301 302 305 308 309 310 320 321 324 325 326 328 331 332 333 334
> >> 336 337 340 341 342 344 345 346 347 348 350 352 354 357 358 361 362 363
> >> 365 366 370 372 373 374 376 378 381 382 385 388 389 390 392 393 394 395
> >> 396 398 402 403 405 406 408 411 413 414 416 417 420 421 422 426 428 429
> >> 430 435 436 437 438 440 445 446 450 452 453 454 458 460 461 462 465 468
> >> 469 470 480 481 484 485 486 488 491 492 493 494 496 497 500 50
> >  1 502 504 505 506 507 508 510 514 515 517 518 520 523 525 526 528 529 532
> > 533 534 538 540 541 542 547 548 549
> >
> > I'm trying to find where the problem is? in the crash too or the tool that
> > generated the ELF coredumps?
>
> I suspect that it's a problem with either the --kaslr offset and/or
> the phys_base value that you have used.

Is there method to know or print kaslr & phy_base in a running Linux system?

>
> It appears that the read of the cpu_possible mask is not using the
> correct virtual address, or perhaps the wrong physical address, and
> as a result it is trying to translate bogus data.  In fact, the full
> output txt file shows that every thing that it reads is garbage, e.g.,
> the cpu masks, the utsname data structure, the linux_banner string, etc.
>
> Dave
>
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>