[Crash-utility] [PATCH 0/5] [RFC] Multi-thread support for search cmd

HAGIO KAZUHITO(萩尾 一仁) k-hagio-ab at nec.com
Wed Apr 5 08:35:59 UTC 2023


On 2023/03/25 13:12, Tao Liu wrote:
> The primary part of the patchset will introduce multithread support for search
> cmd to improve its performance. A search operation is mainly made up with 2
> steps: 1) readmem data into pagebuf, 2) search specific values within the
> pagebuf. A typical workflow of search is as follows:
> 
> for addr from low to high:
> do
> 	readmem(addr, pagebuf)
> 	search_value(value, pagebuf)
> 	addr += pagesize
> done
> 
> There are 2 points which we can accelerate: 1) readmem don't have to wait
> search_value, when search_value is working, readmem can read the next pagebuf
> at the same time. 2) higher addr don't have to wait lower addr, they can be
> processed at the same time if we carefully arrange the output order.
> 
> For point 1, we introduce zones for pagebuf, e.g. search_value can work on
> zone 0 while readmem can prepare the data for zone 1. For point 2, we introduce
> multiple search_value in threads, e.g. readmem will prepare 100 pages as a
> batch, then we will have 4 threads of search_value, thread 0 handles page 1~25,
> thread 2 handles page 26~50 page, thread 3 handles page 51~75, thread 4 handles
> page 76~100.
> 
> A typical workflow of multithread search implemented in this patchset is as
> follows, which removed thread synchronization:
> 
> pagebuf[ZONE][BATCH]
> zone_index = buf_index = 0
> create_thread(4, search_value)
> for addr from low to high:
> do
> 	if buf_index < BATCH
> 		readmem(addr, pagebuf[zone_index][buf_index++])
> 		addr += pagesize
> 	else
> 		start_thread(pagebuf[zone_index], 0/4 * BATCH, 1/4 * BATCH)
> 		start_thread(pagebuf[zone_index], 1/4 * BATCH, 2/4 * BATCH)
> 		start_thread(pagebuf[zone_index], 2/4 * BATCH, 3/4 * BATCH)
> 		start_thread(pagebuf[zone_index], 3/4 * BATCH, 4/4 * BATCH)
> 		zone_index++
> 		buf_index = 0
> 	fi
> done
> 
> readmem works in the main process and not multi-threaded, because readmem will
> not only read data from vmcore, decompress it, but walk through page tables if
> virtual address given. It is hard to reimplement it into thread safe version,
> search_value is easier to be made thread-safe. By carefully choose batch size
> and thread num, we can maximize the concurrency.
> 
> The last part of the patchset, is replacing lseek/read to pread for kcore and
> diskdumped vmcore.
> 
> Here is the performance test result chart. Please note the vmcore and
> kcore are tested seperately on 2 different machines. crash-orig is the
> crash compiled with clean upstream code, crash-pread is the code with only
> pread patch applied(patch 5), crash-multi is the code with only multithread
> patches applied(patch 1~4).
> 
> ulong search:
> 
>      $ time echo "search abcd" | ./crash-orig vmcore vmlinux > /dev/null
>      $ time echo "search abcd -f 4 -n 4" | ./crash-multi vmcore vmlinux > /dev/null
> 
> 			 45G vmcore				64G kcore
> 		real        user        sys    		real       user       sys
> crash-orig	16m56.595s  15m57.188s  0m56.698s	1m37.982s  0m51.625s  0m46.266s
> crash-pread	16m46.366s  15m55.790s  0m48.894s	1m9.179s   0m36.646s  0m32.368s
> crash-multi	16m26.713s  19m8.722s   1m29.263s	1m27.661s  0m57.789s  0m54.604s
> 
> string search:
> 
>      $ time echo "search -c abcddbca" | ./crash-orig vmcore vmlinux > /dev/null
>      $ time echo "search -c abcddbca -f 4 -n 4" | ./crash-multi vmcore vmlinux > /dev/null
> 
> 			45G vmcore				64G kcore
> 		real        user        sys    		real       user       sys
> crash-orig	33m33.481s  32m38.321s  0m52.771s	8m32.034s  7m50.050s  0m41.478s
> crash-pread	33m25.623s  32m35.019s  0m47.394s	8m4.347s   7m35.352s  0m28.479s
> crash-multi	16m31.016s  38m27.456s  1m11.048s	5m11.725s  7m54.224s  0m44.186s
> 
> Discussion:
> 
> 1) Either multithread and pread patches can improve the performance a
>     bit, so if both patches applied, the performance can be better.
> 
> 2) Multi-thread search performs much better in search time consumptive
>     tasks, such as string search.

Thank you for the improvement!  sorry, I've not had time to see this and 
the cmd_search() code yet due to other tasks..

I think that multithreading is physically needed to speed up the search 
processing for a huge memory beyond a certain level, and we may have it 
ultimately.  So nice work, but it's complicated, I'm still not sure 
whether it should be done first.  I would like to see whether there is 
room for optimization, cleanup or etc.

So if you have any analysis or trial result you did, please let me know.

Thanks,
Kazu


More information about the Crash-utility mailing list