[libvirt] [PATCH] nodeinfo: deal with offline cpus in a node

Eric Blake eblake at redhat.com
Wed Jul 18 23:11:47 UTC 2012


On 07/18/2012 02:58 AM, Peter Krempa wrote:
> On 07/18/12 04:01, Eric Blake wrote:
>> Commit 80533ca forgot to think about offline cpus.  When a node
>> cpu is offline, then its topology/ subdirectory is not present,
>> leading to spurious error messages leaked to the user such as:
>>
>> libvir:  error : cannot open
>> /home/dummy/libvirt/tests/nodeinfodata/linux-nodeinfo-sysfs-test-6/node/node0/cpu7/topology/physical_package_id:
>> No such file or directory
>>
>> Fix that, as well as test it; the test data is gathered from a
>> machine with one NUMA node, hyperthreading, and with 2 of the
>> 8 cpus offline.
>>
>> * src/nodeinfo.c (virNodeParseNode): Don't parse topology of
>> offline cpus.
>> * tests/nodeinfotest.c (mymain): Run new test.
>> * tests/nodeinfodata/linux-nodeinfo-sysfs-test-6*: New data.
>> ---
>>
>> Offline cpus are an annoying corner case :)
> 
> Indeed! Who would ever cripple their machine on purpose :)

In small-scale use, probably no one (and developers tend to have
small-scale setups).  Ergo our problems in detecting these sorts of issues.

But in large enterprisey setups with beefy machines having lots of NUMA
nodes, the power savings for offlining an entire node when the machine
is under light load can lead to noticeable cost savings on the power
bill; you generally see the best savings when offlining an entire node
(the way I did it by offlining cpu5 and cpu7, from two unpaired threads,
and since my box only had one node to begin with, probably didn't save
any power).

When power savings are not the issue, then another common reason for
offlining cpus is to temporarily disable hyperthreading (yes, it's a bit
more abrupt than cpu pinning, but also a lot faster to set up).  Believe
it or not, there are workloads that are actually slower when run in
parallel on a hyperthread pair than when run serially on a single cpu
(that is, hyperthreading is a hardware shortcut; it isn't really two
cpus so much as a way to use one cpu to handle two loads, but it only
works insofar as the two loads don't stomp on each other's cache, and
not all loads meet that property).

> 
> ACK, thanks for finding this.

Thanks; pushed.

-- 
Eric Blake   eblake at redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 620 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/libvir-list/attachments/20120718/225950dd/attachment-0001.sig>


More information about the libvir-list mailing list