[Crash-utility] Unknown osrelease information in vmcore with xen
Dietmar Hahn
dietmar.hahn at ts.fujitsu.com
Mon Dec 14 12:59:28 UTC 2015
Hi Dave,
Am Freitag 11 Dezember 2015, 14:35:56 schrieb Dave Anderson:
>
> ----- Original Message -----
> >
> >
> > ----- Original Message -----
> > >
> > >
> > > ----- Original Message -----
> > > > Hi,
> > > >
> > > > I have a SuSE SLES11 vmcore with xen and tried to read the osrelease from
> > > > the vmcore with
> > > > # crash --osrelease vmcore
> > > > unknown
> > > >
> > > > The problem is that there are two notes in the vmcore starting with
> > > > "VMCOREINFO":
> > > >
> > > > Elf64_Nhdr:
> > > > n_namesz: 11 ("VMCOREINFO")
> > > > n_descsz: 1384
> > > > n_type: 0 (unused)
> > > > OSRELEASE=3.0.101-63-xen
> > > > ...
> > > > Elf64_Nhdr:
> > > > n_namesz: 15 ("VMCOREINFO_XEN")
> > > > n_descsz: 4068
> > > > n_type: 0 (unused)
> > > > ...
> > > >
> > > > In the function dump_Elf64_Nhdr() I found:
> > > > vmcoreinfo = STRNEQ(buf, "VMCOREINFO");
> > > >
> > > > But because the "VMCOREINFO_XEN" ist the second one in the file it wins!
> > > >
> > > > When using
> > > > vmcoreinfo = STREQ(buf, "VMCOREINFO");
> > > > all is fine and I get:
> > > > # crash --osrelease vmcore
> > > > 3.0.101-63-xen
> > > >
> > > > So my question is: why is STRNEQ() used?
> > > > Thanks!
> > >
> > > Hello Dietmar,
> > >
> > > As I recall, I did all of the note name checks that way because the length
> > > of the name string is specified by the note->n_namesz field, and therefore
> > > not necessarily guaranteed to be a NULL-terminated string? In reality,
> > > they're probably will be a NULL there though.
> > >
> > > Anyway, I wasn't even familiar with the existence of a VMCOREINFO_XEN note,
> > > so please feel free to post a patch to address it.
> > >
> > > Dave
> >
> >
> > This should work, right?:
> >
> > --- crash-7.1.3/netdump.c.orig
> > +++ crash-7.1.3/netdump.c
> > @@ -1940,7 +1940,8 @@ dump_Elf32_Nhdr(Elf32_Off offset, int st
> > #endif
> > default:
> > xen_core = STRNEQ(buf, "XEN CORE") || STRNEQ(buf, "Xen");
> > - vmcoreinfo = STRNEQ(buf, "VMCOREINFO");
> > + if (!STRNEQ(buf, "VMCOREINFO_XEN"))
> > + vmcoreinfo = STRNEQ(buf, "VMCOREINFO");
> > eraseinfo = STRNEQ(buf, "ERASEINFO");
> > qemuinfo = STRNEQ(buf, "QEMU");
> > if (xen_core) {
> > @@ -2196,7 +2197,8 @@ dump_Elf64_Nhdr(Elf64_Off offset, int st
> > #endif
> > default:
> > xen_core = STRNEQ(buf, "XEN CORE") || STRNEQ(buf, "Xen");
> > - vmcoreinfo = STRNEQ(buf, "VMCOREINFO");
> > + if (!STRNEQ(buf, "VMCOREINFO_XEN"))
> > + vmcoreinfo = STRNEQ(buf, "VMCOREINFO");
> > eraseinfo = STRNEQ(buf, "ERASEINFO");
> > qemuinfo = STRNEQ(buf, "QEMU");
> > if (xen_core) {
> >
> > Dave
>
> Actually, the patch above will prevent the VMCOREINFO_XEN strings from being
> dumped in readable strings by "help -[nD]" or by "crash -d#". Try the attached
> patch instead, where both VMCOREINFO and VMCOREINFO_XEN string data will be
> dumped appropriately.
>
> I do have a couple old RHEL5 xen dom0 dumpfiles that have VMCOREINFO_XEN notes,
> but they do not have VMCOREINFO notes. It must be relatively newer Xen kernels
> that have both? Anyway, check out the updated patch and let me know.
Thanks the patch works very well!
Only a minor nit, now the n_type has changed:
Elf64_Nhdr:
n_namesz: 15 ("VMCOREINFO_XEN")
n_descsz: 4068
n_type: 0 (?)
I'm not familiar enough what should be the right one:
"n_type: 0 (?)" or "n_type: 0 (unused)"
If the second this works for 64 bit:
--- a/netdump.c
+++ b/netdump.c
@@ -2231,6 +2231,8 @@ dump_Elf64_Nhdr(Elf64_Off offset, int store)
} else if (qemuinfo) {
pc->flags2 |= QEMU_MEM_DUMP_ELF;
netdump_print("(QEMUCPUState)\n");
+ } else if (vmcoreinfo_xen) {
+ netdump_print("(unused)\n");
} else
netdump_print("(?)\n");
break;
>
> And BTW, checking the ELF specs, the name string should be NULL-terminated, and
> the n_namesz count should include the NULL-byte:
>
> namesz and name
> The first namesz bytes in name contain a null-terminated character
> representation of the entry's owner and originator.
>
> There's an accompanying diagram example that shows the namesz value counting
> the NULL terminator.
>
> However, the reason for STRNEQ() in crash stems from the fact that the
> original "netdump" generated ELF vmcores were incorrectly created such
> the n_namesz count did not include the NULL, and I believe that the
> descriptor data started immediately after the name string.
OK, I understand.
>
> For example, here's an old 2.6.9-based netdump generated vmcore, where
> the name is "CORE", and therefore *should* have a n_namesz of 5 if it
> included a NULL terminator:
>
> Elf64_Nhdr:
> n_namesz: 4 ("CORE")
> n_descsz: 336
> n_type: 1 (NT_PRSTATUS)
>
> The correct way would include the NULL-byte as well, as is done with
> kdump generated ELF vmcores:
>
> Elf64_Nhdr:
> n_namesz: 5 ("CORE")
> n_descsz: 336
> n_type: 1 (NT_PRSTATUS)
>
> So for backwards-compatibility, I'd prefer to leave the checks using STRNEQ().
Yes you are right.
Many thanks!
Dietmar.
>
> Thanks,
> Dave
--
Company details: http://ts.fujitsu.com/imprint.html
More information about the Crash-utility
mailing list