[Crash-utility] x86_64: supporting cpu hot remove

Dave Anderson anderson at redhat.com
Mon Sep 8 19:06:34 UTC 2014



----- Original Message -----
> Hello Dave,
> 
> This patchset is used to fix some bug and modify display of some commands.
> When some cpus are removed, the related data is not reasonable to be
> displayed,
> so hide them.
> 
> Please check the patches to see detailed information.
> 
> --
> Regards
> Qiao Nuohan


Hello Qiao,

I am going to NAK this patch-set in its current format for several reasons.  

It is much too large and affects way too many different/disparate
commands, and in many cases I do not believe that the display
of the offlined cpu's data is unreasonable.

I also don't like the idea of making so many changes to the behavior
of so many architecture-neutral commands -- and then restricting it
to x86_64 only.  

Many of the changes reflect the contents of per-cpu data structures
of offlined cpus, but even though the cpu is currently offline, the
data structures still exist.  Why prevent the user from viewing their
contents? 

A few examples of cases that I don't agree should be modified:

(1) "ptov offset:cpuspec": I see no reason to disallow/hide the 
    translation of such an argument.  Why is a problem?
(2) "kmem -o": Again, I *want* to see these values, and see
    no reason to hide them.
(3) "struct" with an "offset:cpuspec" address.  Perhaps the user
    actually wants to see what the structure contained when the
    cpu was offlined.  What possible harm would there be in displaying
    the structure?
(4) "p" with an "symbol:cpuspec" address.  Same as "struct".
(5) "help -m": this is debug output of machine-dependent data, and
    should absolutely not be hidden/restricted.
(6) I don't like the idea of *not* storing the task information of
    the idle tasks in the "refresh_task_table_xxx()" functions.

and so on...

The crash utility is a debugger, and I don't feel that we should
be removing stuff that could possibly be something that a user
actually wants to see or could conceivably be useful.  Only if the
offline data actually causes the command to fail or display bogus
data, then yes, then it should be addressed.

About the only one of your patches that does seem to make sense
would be "help -r"; AFAICT, all of the others are open to discussion
and would at least need further justification.

Also, when your patch "hides" the data, it quietly skips the output
and goes on to the next cpu.  Wouldn't it at least make more sense
to display "CPU x: [OFFLINE]" or something to that effect (depending
upon the the command's output format)?

So let me suggest that you repost the patch -- in a format that 
contains just the following:

(1) the new interface
(2) any commands or command options that actually fail or show
    incorrect/invalid data.  

With respect to (2), please show examples of before-and-after,
showing the bogus output in the "before", and the proposed display
in the "after".  And please indicate something in the "after" display
that shows the cpu number and the fact that it is offline.

Then -- after the new interface is accepted -- feel free to then post 
individual command/command-option patches if you really believe the
offline data should not be displayed, and we can argue the relevancy or
whether it is reasonable to show/hide the data.  But I can't just
make such wholesale changes in one fell swoop as this patchset 
attempts to do.

Thanks,
  Dave




More information about the Crash-utility mailing list