[libvirt] [Qemu-devel] IO accounting overhaul

Fri Aug 29 16:32:20 UTC 2014

The Friday 29 Aug 2014 à 17:04:46 (+0100), Stefan Hajnoczi wrote :
> On Thu, Aug 28, 2014 at 04:38:09PM +0200, Benoît Canet wrote:
> > I collected some items of a cloud provider wishlist regarding I/O accouting.
> > 
> > In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> > and doing metrology to help the cloud provider seeks hidden costs.
> > 
> > I'll cover the two former topic in this mail because they are the most important
> > business wize.
> > 
> > 1) prefered place to collect billing IO accounting data:
> > --------------------------------------------------------
> > For billing purpose the collected data must be as close as possible to what the
> > customer would see by using iostats in his vm.
> > 
> > The first conclusion we can draw is that the choice of collecting IO accouting
> > data used for billing in the block devices models is right.
> 
> I agree.  When statistics are collected at lower layers it becomes are
> for the end user to understand numbers that include hidden costs for
> image formats, network protocols, etc.
> 
> > 2) what to do with occurences of rare events:
> > ---------------------------------------------
> > 
> > Another point is that QEMU developpers agree that they don't know which policy
> > to apply to some I/O accounting events.
> > Must QEMU discard invalid I/O write IO or account them as done ?
> > Must QEMU count a failed read I/O as done ?
> > 
> > When discusting this with a cloud provider the following appears: these decisions
> > are really specific to each cloud provider and QEMU should not implement them.
> > The right thing to do is to add accouting counters to collect these events.
> > 
> > Moreover these rare events are precious troubleshooting data so it's an additional
> > reason not to toss them.
> 
> Sounds good, network interface statistics also include error counters.
> 
> > 3) list of block I/O accouting metrics wished for billing and helping the customers
> > -----------------------------------------------------------------------------------
> > 
> > Basic I/O accouting data will end up making the customers bills.
> > Extra I/O accouting informations would be a precious help for the cloud provider
> > to implement a monitoring panel like Amazon Cloudwatch.
> 
> One thing to be aware of is that counters inside QEMU cannot be trusted.
> If a malicious guest can overwrite memory in QEMU then the counters can
> be manipulated.
> 
> For most purposes this should be okay.  Just be aware that evil guests
> could manipulate their counters if a security hole is found in QEMU.
> 
> > Here is the list of counters and statitics I would like to help implement in QEMU.
> > 
> > This is the most important part of the mail and the one I would like the community
> > review the most.
> > 
> > Once this list is settled I would proceed to implement the required infrastructure
> > in QEMU before using it in the device models.
> > 
> > /* volume of data transfered by the IOs */
> > read_bytes
> > write_bytes
> > 
> > /* operation count */
> > read_ios
> > write_ios
> > flush_ios
> > 
> > /* how many invalid IOs the guest submit */
> > invalid_read_ios
> > invalid_write_ios
> > invalid_flush_ios
> > 
> > /* how many io error happened */
> > read_ios_error
> > write_ios_error
> > flush_ios_error
> > 
> > /* account the time passed doing IOs */
> > total_read_time
> > total_write_time
> > total_flush_time
> > 
> > /* since when the volume is iddle */
> > qvolume_iddleness_time
> 
> ?

s/qv/v/

It's the time the volume spent being iddle.
Amazon report it in it's tools.

> 
> > 
> > /* the following would compute latecies for slices of 1 seconds then toss the
> >  * result and start a new slice. A weighted sumation of the instant latencies
> >  * could help to implement this.
> >  */
> > 1s_read_average_latency
> > 1s_write_average_latency
> > 1s_flush_average_latency
> > 
> > /* the former three numbers could be used to further compute a 1 minute slice value */
> > 1m_read_average_latency
> > 1m_write_average_latency
> > 1m_flush_average_latency
> > 
> > /* the former three numbers could be used to further compute a 1 hours slice value */
> > 1h_read_average_latency
> > 1h_write_average_latency
> > 1h_flush_average_latency
> > 
> > /* 1 second average number of requests in flight */
> > 1s_read_queue_depth
> > 1s_write_queue_depth
> > 
> > /* 1 minute average number of requests in flight */
> > 1m_read_queue_depth
> > 1m_write_queue_depth
> > 
> > /* 1 hours average number of requests in flight */
> > 1h_read_queue_depth
> > 1h_write_queue_depth
> 
> I think libvirt captures similar data.  At least virt-manager displays
> graphs with similar data (maybe for CPU, memory, or network instead of
> disk).
> 
> > 4) Making this happen
> > -------------------------
> > 
> > Outscale want to make these IO stat happen and gave me the go to do whatever
> > grunt is required to do so.
> > That said we could collaborate on some part of the work.
> 
> Seems like a nice improvement to the query-blockstats available today.
> 
> CCing libvirt for management stack ideas.
> 
> Stefan