[Crash-utility] [PATCH] diskdump: fix read_diskdump() returns successfully for illegal 0-size page descriptors
HAGIO KAZUHITO(萩尾 一仁)
k-hagio-ab at nec.com
Fri Sep 8 05:02:24 UTC 2023
On 2023/09/07 19:54, Daisuke Hatayama (Fujitsu) wrote:
> Hagio-san,
>
> I'm now facing the issue on RHEL8 and RHEL9 that crash dump files are
> corrupted although makedumpfile terminated in successful status, in
> the issue which I found the issue this patch tries to fix.
>
> I made a program to check where is corrupted in detail in such crash
> dump file. The following is example:
>
> # ./validatekdump -v vmcore
> disk dump header:
> signature: "KDUMP ^F"
> header_version: 6
> block_size: 4096
> sub_hdr_size: 18
> bitmap_blocks: 1056
> total_ram_blocks: 0
> device_blocks: 0
> written_blocks: 0
>
> kdump sub header:
> phys_base: 0000000172200000
> dump_level: 31
> start_pfn_64: 0
> end_pfn_64: 0
> max_mapnr_64: 17301504
> Total ram pages: 16597666
> Total dumpable pages: 767664
>
> PFN OFFSET SIZE FLAGS PAGE_FLAGS
> 1024 22827136 4096 00000000 0000000000000000 valid
> 1025 22827136 4096 00000000 0000000000000000 valid
> ...strip...
> 1102265 54518942 143 00000000 00000000033cd80d invalid
> 1102266 60779170 42127 00000000 00000000033cd80d invalid
> 1102267 61070498 38287 00000000 00000000033cd80d invalid
>
> In my evaluation, disk dump header, kdump sub header, two bitmaps
> appears consistent. The corrupted part begins in the middle of page
> descriptor table. Page descriptors are corrupted in various patterns,
> for example:
>
> - offset is larger than size of the crash dump file.
> - size is larger than 4096 bytes.
> - size is 0.
> - size is smaller than 4096 bytes but flags doesn't hold the value representing compression.
> - flags doesn't consist of the values defined by DUMP_DH_XXXXX macros.
> - page_flags is not 0 although it is unused.
>
> Corrupted part appears to reach data portion. Some crash dump file
> contains valid 49489 page descriptors whose data portion is compressed
> with LZO according to the flags member, but only 7 of them are
> decompressed successfully.
>
> Unfortunately, information to investigate this issue is just these
> crash dump files. I don't have machines where the issue is
> reproducible. So, while I'm making effort to reproduce the issue, I'm
> checking known similar issues on related components where the root
> cause of this issue possibly lies in, such as makedumpfile,
> filesystem, disk driver, or any hardware issue.
>
> I looked into git log but I didn't find such commits.
>
> Also, I have never seen issues resulting in this kind of data corruption on makedumpfile.
>
> Then, would you tell me if you know some past or current issues on
> makedumpfile that are similar to the above situation?
Hmm, I've tested makedumpfile every week with the latest upstream and
RHEL8/9 kernels on various machines, but I've also never seen such
corruption, IIRC.
Hope you find a reproducer..
Thanks,
Kazu
>
> Thanks.
> HATAYAMA, Daisuke
>
>> ________________________________________
>> From: Crash-utility <crash-utility-bounces at redhat.com> on behalf of HATAYAMA Daisuke <d.hatayama at fujitsu.com>
>> Sent: Thursday, September 7, 2023 18:38
>> To: crash-utility at redhat.com
>> Subject: [Crash-utility] [PATCH] diskdump: fix read_diskdump() returns successfully for illegal 0-size page descriptors
>>
>> read_diskdump() returns successfully for illegal 0-size page
>> descriptors. Page descriptors are illegal if their size member holds 0
>> because makedumpfile never puts 0 there because any data never result
>> in 0 byte by compression. If page descriptors hold 0 in size member,
>> it means the crash dump file is corrupted for some reason.
>>
>> The root cause of this is that sanity check of function cache_page()
>> doesn't focus on such 0-size page descriptors. Then, the 0-size page
>> descriptor is passed to pread(), pread() immediately returns 0
>> successfully because read data is 0 byte, and then read_diskdump()
>> returns successfully.
>>
>> To fix this issue, let the sanity check take into account such 0-size
>> page descriptors and read_diskdump() result in READ_ERROR.
>>
>> Signed-off-by: HATAYAMA Daisuke <d.hatayama at fujitsu.com>
>> ---
>> diskdump.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/diskdump.c b/diskdump.c
>> index 2c284ff..2be7cc7 100644
>> --- a/diskdump.c
>> +++ b/diskdump.c
>> @@ -1210,7 +1210,7 @@ cache_page(physaddr_t paddr)
>> return ret;
>>
>> /* sanity check */
>> - if (pd.size > block_size)
>> + if (pd.size > block_size || !pd.size)
>> return READ_ERROR;
>>
>> /* read page data */
>> --
>> 2.25.1
>>
>> --
>> Crash-utility mailing list
>> Crash-utility at redhat.com
>> https://listman.redhat.com/mailman/listinfo/crash-utility
>> Contribution Guidelines: https://github.com/crash-utility/crash/wiki
More information about the Crash-utility
mailing list