[Crash-utility] add support for incomplete elf dump file
Dave Anderson
anderson at redhat.com
Mon Oct 27 17:35:15 UTC 2014
----- Original Message -----
> On 10/24/2014 10:05 PM, Dave Anderson wrote:
> > With the patch I that proposed, you can simplify this part of your patch:
> >
> > + unsigned int e_flag = (NULL == nd->elf64) ?
> > + (nd->elf32)->e_flags : (nd->elf64)->e_flags;
> > + int status = e_flag& DUMP_ELF_INCOMPLETE;
> > + if (status&& read_ret>= 0) {
> >
> > to just this:
> > if (is_incomplete_dump()&& (read_ret>= 0)) {
> >
> > But what about compressed kdumps? In the case of DUMP_DH_COMPRESSED_INCOMPLETE
> > dumpfiles, will makedumpfile manipulate the bitmaps and the page_desc.offset values
> > of all "missing" pages to point to the special "pd_zero" page? Will it be
> > impossible to distinguish between legitimate zero-filled pages and
> > "missing" pages?
> >
> > I suppose that if that is true, then the behavior should be the same for
> > both ELF and compressed kdumps, which would be to return zero-filled data.
> >
> > On the other hand...
> >
> > What's bothersome about automatically returning zero-filled data is that,
> > currently, if an ELF vmcore was originally created correctly, but gets truncated
> > *after* the fact, for example, when downloading it from a customer site,
> > check_dumpfile_size() will report the error. But when data from the truncated
> > region gets read, a read error is returned.
> >
> > If zero-filled data is silently returned, I can imagine that there will be all
> > kinds of errors that will occur -- both display/content errors, but it could
> > easily cause SIGSEGV's in the crash utility when it tries to use zero-filled
> > data thinking that it's legitimate. How would the user even know whether it's
> > because the page was truncated or not?
>
> The patch has been adjusted according to your suggestions. The kdump part has been
> fixed. And the warning information is added in both elf part and kdump part.
> Thank you for your guiding.
>
> > The crash utility already has a --zero_excluded flag for returning zero-filled
> > memory if by chance a *legitimately* excluded/filtered page is read. It's
> > hardly ever used, because those pages should never be required. But I
> > wonder whether it could be re-purposed in this case, where by default an error
> > would be returned when reading missing pages, and zero-filled only if it is
> > explicitly requested with the --zero_excluded command line option? Or would
> > that be impossible with DUMP_DH_COMPRESSED_INCOMPLETE compressed kdumps?
>
> I think it's a good idea and is completed in patch. It can also work in kdump part.
> The attachment is the latest patch.
>
> Thanks
> Zhou Wenjian
A couple notes and questions re: your latest patch:
diff --git a/diskdump.c b/diskdump.c
index 1d6f7a5..850b174 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -999,6 +999,17 @@ cache_page(physaddr_t paddr)
if (FLAT_FORMAT()) {
if (!read_flattened_format(dd->dfd, pd.offset, dd->compressed_page, pd.size))
return READ_ERROR;
What about FLAT_FORMAT()? Should the (is_incomplete_dump()/pd.offset==0) test below be
done before the FLAT_FORMAT() check?
+ } else if (is_incomplete_dump() && (0 == pd.offset)) {
Is this how the makedumpfile patch marks pages that have been truncated, i.e., by
creating a page_desc_t that has a pd.offset == 0?
+ /*
+ * if --zero_excluded is specified and incomplete flag is set in dump file,
+ * zero will be used to fill the lost part.
+ */
+ if (!(*diskdump_flags & ZERO_EXCLUDED))
+ return READ_ERROR;
+ error(WARNING, "%s: data may be lost\n"
+ " pfn:%lld\n\n"
+ ,pc->dumpfile,pfn);
When --zero_excluded is entered on the crash command line, then the user knows exactly
what he is doing, and so there should not be any error/warning messages from the read
command itself. A message should only be displayed if debug mode is turned on. For
example, this is how it is done currently, but only if debug is set to 8:
if (CRASHDEBUG(8))
fprintf(fp, "read_diskdump: zero-fill: "
"paddr/pfn: %llx/%lx\n",
(ulonglong)paddr, pfn);
+ memset(dd->compressed_page, 0, dd->block_size);
} else {
if (lseek(dd->dfd, pd.offset, SEEK_SET) == failed)
return SEEK_ERROR;
diff --git a/netdump.c b/netdump.c
index abc85e0..c8f057e 100644
--- a/netdump.c
+++ b/netdump.c
@@ -608,7 +608,21 @@ read_netdump(int fd, void *bufptr, int cnt, ulong addr, physaddr_t paddr)
"offset: %llx\n", (ulonglong)offset);
return SEEK_ERROR;
}
- if (read(nd->ndfd, bufptr, cnt) != cnt) {
+ /*
+ * if --zero_excluded is specified and incomplete flag is set in ELF file,
+ * zero will be used to fill the lost part.
+ */
+ ssize_t read_ret = read(nd->ndfd, bufptr, cnt);
+ if (read_ret != cnt) {
+ if (is_incomplete_dump() && (read_ret >= 0 &&
+ *diskdump_flags & ZERO_EXCLUDED)) {
+ error(WARNING, "%s: data may be lost\n"
+ " offset:%llx\n\n",
+ pc->dumpfile, offset);
Same thing here... if a user enters --zero_excluded on the command line, he should
expect strange behavior as a result. Intermingling a bunch of WARNING messages
like the one above will only make it more confusing.
+ bufptr += read_ret;
+ bzero(bufptr, cnt - read_ret);
+ return cnt;
+ }
if (CRASHDEBUG(8))
fprintf(fp, "read_netdump: READ_ERROR: "
"offset: %llx\n", (ulonglong)offset);
Dave
More information about the Crash-utility
mailing list