[Libvir] Question on acquiring cpuTime in struct _virDomainInfo

Jan Michael jan.michael at cern.ch
Wed May 23 12:35:47 UTC 2007


Hi Daniel,

thanks for confirming that I'm on the right way. But I still  
experience problems with a heavily stressed node. Let me first  
explain my current node setup:

<xm info>
	release                : 2.6.18-1.2835.slc4xen
	version                : #1 SMP Wed Nov 29 21:05:58 CET 2006
	machine                : i686
	nr_cpus                : 2
	nr_nodes               : 1
	sockets_per_node       : 2
	cores_per_socket       : 1
	threads_per_core       : 1
	cpu_mhz                : 2800
	total_memory           : 2047
	xen_major              : 3
	xen_minor              : 0
	xen_extra              : .3-rc5-1.2835.s
	xen_caps               : xen-3.0-x86_32p
	xen_pagesize           : 4096
</xm info>

<xm vcpu-list>
	Name             ID VCPUs   CPU State   Time(s) CPU Affinity
	Domain-0         0     0     0   r--    9761.7 any cpu
	Domain-0         0     1     1   ---   10571.9 any cpu
	stornode         2     0     1   r--    7287.6 1
	stornode         2     1     1   ---    6473.1 1
	worknode         3     0     0   ---    2139.3 0
	worknode         3     1     0   ---    1368.2 0
	worknode         3     2     0   ---    1223.1 0
	worknode         3     3     0   ---    1349.5 0
</xm vcpu-list>

I'm running on all domains on every virtual cpu a cpu stress tool,  
called cpuburn. Now I let my small sensor calculate the cpu  
utilisation of the whole node. I calculate the cpu utilisation for  
each domain, one after another, and then sum up the results to the  
node value.

In the described stress situation it tooks about an average of 4  
seconds to make the following to function calls, which provide me the  
cpuTime of a domain

             dom_old = virDomainLookupByID(conn_old, listOfDomains[i]);
	    ret = virDomainGetInfo(dom_old, &info_old);

Here are the stats from my latest measurement:

		old cpuTime		new cpuTime

Domain-0:	3s 4294835190ms		3s 4294849513ms
stornode:	5s 580501ms 		6s 4294550691ms
worknode:	6s 4294546809ms		5s 582761ms

That leads to results in cpu utilisation computation for the node,  
which are much lower, around 75%, than the real value (100%) would be.

One solution would be to add the measured time make those calls to  
used cpuTime. But this in turn can cause calculations of to high  
values because I don't really know in which point in time the value  
is written to the structure.

Nevertheless is xentop showing me every time the correct cpu- 
utilisation of each of my domains. So that I conclude, that this  
problem must have something to do with libvirt API.

Do you ore does anybody else experienced similar issues? Do you know  
any solution to that?

Cheers,

	Jan


On 10.05.2007, at 18:32, Daniel P. Berrange wrote:
> On Thu, May 10, 2007 at 05:41:33PM +0200, Jan Michael wrote:
>> Hi everyone,
>>
>> using libvirt I'm trying to calculate cpu utilization of a node in
>> percent. But sometimes values beyond 100.0% are being calculated.
>> This is because a domain spend more time on a cpu than time is
>> elapsed in the meantime.
>>
>> A short explanation of the way how cpu utilization is computed in my
>> case:
>>
>> 	1. - open two connections with
>> 		conn_cur/conn_old = virConnectOpenReadOnly(NULL);
>> 	2. - get current time
>> 		gettimeofday(&time_old, NULL);
>> 	   - get domain by id with
>> 		dom_old = virDomainLookupByID(conn_old, id)
>> 	   - get domain information
>> 		virDomainGetInfo(dom_old, &info_old);
>> 	3. - sleep a second
>> 	
>> 	4. - doing same stuff like in 2. but with _cur
>>
>> 	5. - compute cpu utilization by dividing used cputime by elapsed  
>> time
>> 		and multiply with 100
>>
>> Am I right if I suppose that cpuTime for _virDomainInfo structure
>> will be directly acquired from the hypervisor in virDomainGetInfo
>> (dom_old, &info_old) or is it already present with getting the domain
>> itself? Is there any better solution of doing this, which is more
>> precise?
>
> This is the best approach - the algorithm you summarized is basically
> the same as I use in virt-manager. The reason it sometimes goes above
> 100% is just due to timing / schedular variations
>
>    1. get timeofday
>    2. get cputime for domA
>    3. sleep a while
>    4. get timeofday
>    5. get cputime for domA
>
> We're basically looking at the ratio of 4-1, against 5-2. It would
> be 100% accurate if you could guarentee no time elapased between
> steps 1 & 2, or between steps 4 & 5, but there's always some latency
> in there, so occassionally you might end up calculating a value that
> is a tiny bit over 100%.  In virt-manager I deal with this by simply
> rounding down to 100 if this occurs.
>
> Based on the hypercalls which are available to us, I don't see any
> way to avoid this scenario. Then again it is not like we really need
> millisecond precision in caculating CPU usage so I don't think its
> a problem worrying about too much.
>
>> And another general question:
>> The monitoring utility of xen, called xentop, provides also
>> statistics about networking and vbds. Are there any plans to provide
>> this values by libvirt in the future?
>
> I'd like to see the ability to track  network & disk I/O stats.
> No one has so far stepped forward to suggest an API or implmentation,
> but I'd welcome anyone interested in taking a look at this area.
>
> Regards,
> Dan.
> -- 
> |=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978  
> 392 2496 -=|
> |=-           Perl modules: http://search.cpan.org/ 
> ~danberr/              -=|
> |=-               Projects: http://freshmeat.net/ 
> ~danielpb/               -=|
> |=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B  
> 9505  -=|




More information about the libvir-list mailing list