[Crash-utility] [PATCH] Fix machdep->HZ calculation for kernel versions > 2.6.0
HAGIO KAZUHITO(萩尾 一仁)
k-hagio-ab at nec.com
Thu Apr 22 09:33:20 UTC 2021
-----Original Message-----
> -----Original Message-----
> > 在 2021年01月12日 16:24, HAGIO KAZUHITO(萩尾 一仁) 写道:
> > > Hi Bhupesh,
> > >
> > > -----Original Message-----
> > >> We have hard-coded the HZ value for some ARCHs to either 1000 or 100
> > >> (mainly for kernel versions > 2.6.0), which causes 'help -m' to show
> > >> an incorrect hz value for various architectures.
> > >
> > > Good catch. but seems crash uses (cfq_slice_async * 25) for machdep->hz
> > > if it exists (please see task_init()), RHEL7 has it, but RHEL8 does not.
> > > What do you see on RHEL8 for x86_64 with your patch?
> > >
> >
> > The symbol 'cfq_slice_async' has been removed from upstream kernel:
> > f382fb0bcef4 ("block: remove legacy IO schedulers")
> >
> > And RHEL8 also removed it.
> >
> > > We should search for an alternate way like the current one first.
> > >
> >
> > Currently, there are several ways to get the value of HZ as below:
> >
> > [1] calculate hz via the symbol 'cfq_slice_async'
> > But this symbol has been removed from upstream kernel
>
> According to [0] below, the 'cfq_slice_async' cannot be used for the HZ
> calculation on 4.8 and later kernels. I've not found a perfect alternate,
> but how about using 'bfq_timeout' for 4.12 and later including RHEL8?
e.g. like this:
--- a/task.c
+++ b/task.c
@@ -417,7 +417,16 @@ task_init(void)
STRUCT_SIZE_INIT(cputime_t, "cputime_t");
- if (symbol_exists("cfq_slice_async")) {
+ if (symbol_exists("bfq_timeout")) {
+ uint bfq_timeout;
+ get_symbol_data("bfq_timeout", sizeof(int), &bfq_timeout);
+ if (bfq_timeout) {
+ machdep->hz = bfq_timeout * 8;
+ if (CRASHDEBUG(2))
+ fprintf(fp, "bfq_timeout exists: setting hz to %d\n",
+ machdep->hz);
+ }
+ } else if (symbol_exists("cfq_slice_async")) {
uint cfq_slice_async;
get_symbol_data("cfq_slice_async", sizeof(int),
Lianbo, could you try this on ppc64le if it looks good?
btw, I thought 'read_expire' was better than the 'bfq_timeout' because it
was introduced at 2.6.16 and has been unchanged, but most of kernels(vmlinux)
that I have do not have a symbol for it. (some optimization?)
static const int read_expire = HZ / 2; /* max time before a read is submitted. */
RELEASE: 4.18.0-80.el8.x86_64
crash> p read_expire
No symbol "read_expire" in current context.
p: gdb request failed: p read_expire
Thanks,
Kazu
>
> const int bfq_timeout = HZ / 8;
>
> RELEASE: 4.18.0-80.el8.x86_64
>
> crash> p bfq_timeout
> bfq_timeout = $1 = 125
>
> This value has not been changed since its introduction (aee69d78dec0).
> Recent kernels configured with CONFIG_IOSCHED_BFQ=y can be covered with this?
>
> [0] https://listman.redhat.com/archives/crash-utility/2021-April/msg00026.html
>
> Thanks,
> Kazu
>
>
> >
> > [2] hardcode hz with the value 1000 (if kernel version > 2.6.0)
> >
> > [3] get the hz value from vmcore, but that relies on kernel config
> > such as CONFIG_IKCONFIG, etc.
> >
> > [4] Use sysconf(_SC_CLK_TCK) on some arches, not all arches.
> > See the micro definition of HZ in the defs.h
> >
> > There seems to be no perfect solution. Any ideas?
> >
> >
> > Thanks.
> > Lianbo
> >
> > > Thanks,
> > > Kazu
> > >
> > >>
> > >> I tested this on ppc64le and x86_64 and the hz value reported is 1000,
> > >> whereas the kernel CONFIG_HZ_100 is set to Y. See some logs below:
> > >>
> > >> crash> help -m
> > >> flags: 124000f5
> > >>
> >
> (KSYMS_START|MACHDEP_BT_TEXT|VM_4_LEVEL|VMEMMAP|VMEMMAP_AWARE|PHYS_ENTRY_L4|SWAP_ENTRY_L4|RADIX_MMU|OP
> > >> AL_FW)
> > >> kvbase: c000000000000000
> > >> identity_map_base: c000000000000000
> > >> pagesize: 65536
> > >> pageshift: 16
> > >> pagemask: ffffffffffff0000
> > >> pageoffset: ffff
> > >> stacksize: 16384
> > >> hz: 1000
> > >> mhz: 2800
> > >>
> > >> [host at rhel7]$ grep CONFIG_HZ_100= redhat/configs/kernel-3.10.0-ppc64le.config
> > >> CONFIG_HZ_100=y
> > >>
> > >> Fix the same by using the sysconf(_SC_CLK_TCK) value instead of the
> > >> hardcoded HZ values depending on kernel versions.
> > >>
> > >
>
>
> --
> Crash-utility mailing list
> Crash-utility at redhat.com
> https://listman.redhat.com/mailman/listinfo/crash-utility
More information about the Crash-utility
mailing list