[Crash-utility] add support for incomplete elf dump file

Mon Oct 27 17:35:15 UTC 2014

----- Original Message -----
> On 10/24/2014 10:05 PM, Dave Anderson wrote:
> > With the patch I that proposed, you can simplify this part of your patch:
> >
> > +                       unsigned int e_flag = (NULL == nd->elf64) ?
> > +                                                (nd->elf32)->e_flags : (nd->elf64)->e_flags;
> > +                       int status = e_flag&  DUMP_ELF_INCOMPLETE;
> > +                       if (status&&  read_ret>= 0) {
> >
> > to just this:
> >                          if (is_incomplete_dump()&&  (read_ret>= 0)) {
> >
> > But what about compressed kdumps?  In the case of DUMP_DH_COMPRESSED_INCOMPLETE
> > dumpfiles, will makedumpfile manipulate the bitmaps and the page_desc.offset values
> > of all "missing" pages to point to the special "pd_zero" page?  Will it be
> > impossible to distinguish between legitimate zero-filled pages and
> > "missing" pages?
> >
> > I suppose that if that is true, then the behavior should be the same for
> > both ELF and compressed kdumps, which would be to return zero-filled data.
> >
> > On the other hand...
> >
> > What's bothersome about automatically returning zero-filled data is that,
> > currently, if an ELF vmcore was originally created correctly, but gets truncated
> > *after*  the fact, for example, when downloading it from a customer site,
> > check_dumpfile_size() will report the error.  But when data from the truncated
> > region gets read, a read error is returned.
> >
> > If zero-filled data is silently returned, I can imagine that there will be all
> > kinds of errors that will occur -- both display/content errors, but it could
> > easily cause SIGSEGV's in the crash utility when it tries to use zero-filled
> > data thinking that it's legitimate.  How would the user even know whether it's
> > because the page was truncated or not?
> 
> The patch has been adjusted according to your suggestions. The kdump part has been
> fixed. And the warning information is added in both elf part and kdump part.
> Thank you for your guiding.
> 
> > The crash utility already has a --zero_excluded flag for returning zero-filled
> > memory if by chance a *legitimately* excluded/filtered page is read.  It's
> > hardly ever used, because those pages should never be required.  But I
> > wonder whether it could be re-purposed in this case, where by default an error
> > would be returned when reading missing pages, and zero-filled only if it is
> > explicitly requested with the --zero_excluded command line option?  Or would
> > that be impossible with DUMP_DH_COMPRESSED_INCOMPLETE compressed kdumps?
> 
> I think it's a good idea and is completed in patch. It can also work in kdump part.
> The attachment is the latest patch.
> 
> Thanks
> Zhou Wenjian

A couple notes and questions re: your latest patch:

diff --git a/diskdump.c b/diskdump.c
index 1d6f7a5..850b174 100644
--- a/diskdump.c
+++ b/diskdump.c
@@ -999,6 +999,17 @@ cache_page(physaddr_t paddr)
 	if (FLAT_FORMAT()) {
 		if (!read_flattened_format(dd->dfd, pd.offset, dd->compressed_page, pd.size))
 			return READ_ERROR;

What about FLAT_FORMAT()?  Should the (is_incomplete_dump()/pd.offset==0) test below be
done before the FLAT_FORMAT() check? 

+	} else if (is_incomplete_dump() && (0 == pd.offset)) {

Is this how the makedumpfile patch marks pages that have been truncated, i.e., by 
creating a page_desc_t that has a pd.offset == 0?

+		/*
+		 *  if --zero_excluded is specified and incomplete flag is set in dump file,
+		 *  zero will be used to fill the lost part.
+		 */
+		if (!(*diskdump_flags & ZERO_EXCLUDED))
+			return READ_ERROR;
+		error(WARNING, "%s: data may be lost\n"
+			"                      pfn:%lld\n\n"
+			,pc->dumpfile,pfn);

When --zero_excluded is entered on the crash command line, then the user knows exactly
what he is doing, and so there should not be any error/warning messages from the read
command itself.  A message should only be displayed if debug mode is turned on.  For
example, this is how it is done currently, but only if debug is set to 8:

                if (CRASHDEBUG(8))
                        fprintf(fp, "read_diskdump: zero-fill: "
                            "paddr/pfn: %llx/%lx\n",
                                (ulonglong)paddr, pfn);


+		memset(dd->compressed_page, 0, dd->block_size);
 	} else {
 		if (lseek(dd->dfd, pd.offset, SEEK_SET) == failed)
 			return SEEK_ERROR;
diff --git a/netdump.c b/netdump.c
index abc85e0..c8f057e 100644
--- a/netdump.c
+++ b/netdump.c
@@ -608,7 +608,21 @@ read_netdump(int fd, void *bufptr, int cnt, ulong addr, physaddr_t paddr)
 				    "offset: %llx\n", (ulonglong)offset);
 			return SEEK_ERROR;
 		}
-		if (read(nd->ndfd, bufptr, cnt) != cnt) {
+		/*
+ 		 *  if --zero_excluded is specified and incomplete flag is set in ELF file,
+		 *  zero will be used to fill the lost part.
+		 */ 
+		ssize_t read_ret = read(nd->ndfd, bufptr, cnt);
+		if (read_ret != cnt) {
+			if (is_incomplete_dump() && (read_ret >= 0 &&
+					*diskdump_flags & ZERO_EXCLUDED)) {
+				error(WARNING, "%s: data may be lost\n"
+					"                   offset:%llx\n\n",
+					pc->dumpfile, offset);

Same thing here... if a user enters --zero_excluded on the command line, he should
expect strange behavior as a result.  Intermingling a bunch of WARNING messages
like the one above will only make it more confusing.

+				bufptr += read_ret;
+				bzero(bufptr, cnt - read_ret);
+				return cnt;
+			}
 			if (CRASHDEBUG(8))
 				fprintf(fp, "read_netdump: READ_ERROR: "
 				    "offset: %llx\n", (ulonglong)offset);
  
Dave