[Crash-utility] x86_64: supporting cpu hot remove

qiaonuohan qiaonuohan at cn.fujitsu.com
Sun Sep 14 12:49:26 UTC 2014


Hello Dave,

On 09/09/2014 10:04 PM, Dave Anderson wrote:
>>> >  >  Many of the changes reflect the contents of per-cpu data structures
>>> >  >  of offlined cpus, but even though the cpu is currently offline, the
>>> >  >  data structures still exist.  Why prevent the user from viewing their
>>> >  >  contents?
>> >
>> >  I think just showing online cpu's data is reasonable.
> Why?  Give me an example as to when it is/was a problem?
>
>> >  What about adding a internal crash variables (used by command set) to
>> >  hide/show offline cpu's data?
> I suppose that could be done, but again, in my opinion there is no compelling
> reason to do so.  I could be wrong, but aside from maybe "help -r", it seems
> that you are trying to answer a question that nobody's asking.

I know it is important to show data of offline cpu, like debugging hot remove.
But for those who don't care about the removed cpu, hiding offline cpu will be
more clear. Then let me talk about the reason why I think hiding will be more
clear.

I first got a vmcore with 90 cpus at first and 30 of them were physically
removed. After 30 cpus physically removed, the machine works with 60 cpus.
To those who don't care about data of the removed cpu, the following data
is confusing:

1. The machine only got 60 cpus, but crash shows 90 cpus.
2. when I execute command timer, crash show 90 TVEC_BASES, some of them(maybe
exceed 30) are empty. But I have to check which cpu is offline, and then I
can know whether the empty is because of offline cpu or just no timer was set
on that cpu.
3. comes to idle tasks, offline cpu is halt and related idle tasks will not
work, but crash shows they are running right now.
4. ...

After I check kernel, I found when cpu is set to offline, things, processes,
timers, interrupts etc., are migrated to a new cpu. So I tried to hide when
cpu is set offline(logically removed) instead of physically removed.

The attachment is what I am trying to implement. If you don't like it, we can
go on discussing it.

-- 
Regards
Qiao Nuohan
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-add-an-API-to-check-an-offline-cpu.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140914/5d8fa5c3/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0002-add-a-flag-to-display-hide-data-related-to-offline-c.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140914/5d8fa5c3/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0003-x86_64-modify-timer-only-to-display-online-cpus-data.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20140914/5d8fa5c3/attachment-0002.ksh>


More information about the Crash-utility mailing list